Heart Disease Risk Identification Using Machine Learning Techniques for A Highly Imbalanced Dataset: A Comparative Study

Fernando, CD; Weerasinghe, PT; Walgampaya, CK

dc.contributor.author	Fernando, CD
dc.contributor.author	Weerasinghe, PT
dc.contributor.author	Walgampaya, CK
dc.date.accessioned	2023-03-23T03:42:54Z
dc.date.available	2023-03-23T03:42:54Z
dc.date.issued	2022-11
dc.identifier.uri	http://ir.kdu.ac.lk/handle/345/6270
dc.description.abstract	Heart disease has become one of the most prevailing universal diseases in the world today. It is estimated that 32% of all deaths worldwide are caused due to heart diseases. One of the major causes for this is that its extremely difficult even for medical practitioners to predict heart diseases as heart attacks as it is a complex task which requires a great amount of knowledge and experience. The number of deaths caused by heart diseases has hugely increased in the recent past. Machine learning has become one of the most popular areas in computer science where many complex problems have been addressed successfully specially in the field of medicine. In this study we trained multiple supervised classifiers namely’; Naïve Bayes, LightGBM, Decision Trees, Random Forest, XGBoost, K Nearest Neighbours and ADABoost and we compared the accuracies and identified what models perform better for heart disease prediction. We used the Behavioral Risk Factor Surveillance System (BRFSS) 2015 Heart Disease Health Indicators Dataset which was highly imbalanced and in order to address the class imbalance problem we used methods such as Synthetic Minority Over Sampling Technique (Smote) Sampling, Adaptive Synthetic Sampling, Random Over Sampling, Random Under Sampling, TomekLink, SmoteTomek, Smoteen and Cluster Centroid. According to the results obtained, we can conclude that the hybrid models such as Smoteen and SmoteTomek performed better than the other sampling methods.	en_US
dc.language.iso	en	en_US
dc.subject	Heart Disease	en_US
dc.subject	Machine Learning	en_US
dc.subject	Class Imbalance	en_US
dc.subject	Sampling methods	en_US
dc.title	Heart Disease Risk Identification Using Machine Learning Techniques for A Highly Imbalanced Dataset: A Comparative Study	en_US
dc.type	Article Full Text	en_US
dc.identifier.journal	KDU JOURNAL OF MULTIDISCIPLINARY STUDIES	en_US
dc.identifier.issue	2	en_US
dc.identifier.volume	4	en_US
dc.identifier.pgnos	43-55	en_US

Files in this item

Name:: KDU-Journal-of-Multidisciplina ...
Size:: 325.4Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Volume 04, Issue 02, 2022 [11]

Show simple item record