Show simple item record

dc.contributor.authorKarunarathna, KMGS
dc.contributor.authorRupasingha, RAHM
dc.contributor.authorKumara, BTGS
dc.date.accessioned2025-10-01T07:47:19Z
dc.date.available2025-10-01T07:47:19Z
dc.date.issued2025-01
dc.identifier.urihttps://ir.kdu.ac.lk/handle/345/8917
dc.description.abstractEffective techniques for automatically classifying texts are becoming increasingly necessary due to the exponential expansion of digital material. Differentiating between formal and informal documents can help students identify appropriate resources for their assignments and improve the effectiveness of information retrieval systems. Although machine learning is extensively utilized in classification of text, there is a lack of research focused to the effective differentiation of formal and informal writings through linguistic features. This gap highlights the necessity for advanced methodologies that improve classification accuracy and enhance the value of digital content in academic and retrieval systems. Our research addresses the problem by utilizing deep learning methodologies and a wide range of 13 linguistic attributes to get enhanced efficacy in text classification. Artificial Neural Networks (ANN), Convolutional Neural Networks (CNN), and Long Short-Term Memory Networks (LSTM) were considered. A dataset , including both formal (news articles, formal documents) and informal (personal letters, personal blogs) texts, were gathered from several web sources. We considered linguistic markers such as colloquialisms, contractions, modal verbs, slang, acronyms, pronouns, phrasal verbs, grammar complexity, vocabulary complexity, voice, and language type to generate the feature vector. The feature vectors were utilized to train and assess the classification models using several cross-validation techniques, particularly 3, 5, 7, and 10 folds. The efficacy of the models was evaluated using performance indicators, f-measure, accuracy, precision, and recall. With the highest accuracy of 99.8% and resilience in differentiating between formal and informal texts, the LSTM model outperformed than the others. Future research will examine big datasets, more linguistic characteristics, sophisticated deep learning models, and real-time and multilingual classification systems.en_US
dc.language.isoenen_US
dc.subjectANN; CNN; Document Classification; Formal Documents; Informal Documents; LSTMen_US
dc.titleDeep Learning Approaches for Classifying Informal and Formal English Texts Using Linguistic Featuresen_US
dc.typeJournal articleen_US
dc.identifier.facultyFOCen_US
dc.identifier.journalIJRCen_US
dc.identifier.issue01en_US
dc.identifier.volume04en_US
dc.identifier.pgnos9-22en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record