Deep Learning Approaches for Classifying Informal and Formal English Texts Using Linguistic Features

Karunarathna, KMGS; Rupasingha, RAHM; Kumara, BTGS

dc.contributor.author	Karunarathna, KMGS
dc.contributor.author	Rupasingha, RAHM
dc.contributor.author	Kumara, BTGS
dc.date.accessioned	2025-10-01T07:47:19Z
dc.date.available	2025-10-01T07:47:19Z
dc.date.issued	2025-01
dc.identifier.uri	https://ir.kdu.ac.lk/handle/345/8917
dc.identifier.uri	http://doi.org/10.64701/ijrc/345/8917
dc.description.abstract	Effective techniques for automatically classifying texts are becoming increasingly necessary due to the exponential expansion of digital material. Differentiating between formal and informal documents can help students identify appropriate resources for their assignments and improve the effectiveness of information retrieval systems. Although machine learning is extensively utilized in classification of text, there is a lack of research focused to the effective differentiation of formal and informal writings through linguistic features. This gap highlights the necessity for advanced methodologies that improve classification accuracy and enhance the value of digital content in academic and retrieval systems. Our research addresses the problem by utilizing deep learning methodologies and a wide range of 13 linguistic attributes to get enhanced efficacy in text classification. Artificial Neural Networks (ANN), Convolutional Neural Networks (CNN), and Long Short-Term Memory Networks (LSTM) were considered. A dataset , including both formal (news articles, formal documents) and informal (personal letters, personal blogs) texts, were gathered from several web sources. We considered linguistic markers such as colloquialisms, contractions, modal verbs, slang, acronyms, pronouns, phrasal verbs, grammar complexity, vocabulary complexity, voice, and language type to generate the feature vector. The feature vectors were utilized to train and assess the classification models using several cross-validation techniques, particularly 3, 5, 7, and 10 folds. The efficacy of the models was evaluated using performance indicators, f-measure, accuracy, precision, and recall. With the highest accuracy of 99.8% and resilience in differentiating between formal and informal texts, the LSTM model outperformed than the others. Future research will examine big datasets, more linguistic characteristics, sophisticated deep learning models, and real-time and multilingual classification systems.	en_US
dc.language.iso	en	en_US
dc.subject	ANN; CNN; Document Classification; Formal Documents; Informal Documents; LSTM	en_US
dc.title	Deep Learning Approaches for Classifying Informal and Formal English Texts Using Linguistic Features	en_US
dc.type	Journal article	en_US
dc.identifier.faculty	FOC	en_US
dc.identifier.journal	IJRC	en_US
dc.identifier.issue	01	en_US
dc.identifier.volume	04	en_US
dc.identifier.pgnos	9-22	en_US

Files in this item

Name:: IJRC V 4 I (pages 9-22).pdf
Size:: 495.4Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Volume 04 , Issue 01 , 2025 [6]
IJRC

Show simple item record