A	Comparison	of	Classical	Statistical	&	Machine	Learning	Techniques	in	Binary	Classification

Perera, KVU; Viswakula, SD

dc.contributor.author	Perera, KVU
dc.contributor.author	Viswakula, SD
dc.date.accessioned	2018-06-08T09:47:50Z
dc.date.available	2018-06-08T09:47:50Z
dc.date.issued	2017
dc.identifier.uri	http://ir.kdu.ac.lk/handle/345/1681
dc.description	Article Full Text	en_US
dc.description.abstract	Predicting a precise response for previously unseen input variables is a vital and challenging task, as precise predictions can minimize the risks related to different domains by making correct decisions. The main objective of this study was to compare the performance of several classical statistical and machine learning techniques by considering the prediction task as a binary classification. The classification techniques; Logistic Regression (LR) and Linear Discriminant Analysis (LDA) were considered under classical statistical techniques while Random Forest (RF), Naïve Bayes (NB), Boosting (BT) and Bagging (BA) were considered under machine learning techniques. The performance of those techniques were compared under the two different aspects by using five real datasets. In one aspect, class imbalance was artificially introduced to the datasets by resampling. In the other aspect sampling approaches such as undersampling, oversampling and hybrid approach (mix of both undersampling and oversampling) were considered, to overcome class imbalance in the training set. Several evaluation methods such as accuracy, precision, F-measure, G-mean and Receiver Operating Characteristics Area Under Curve (ROC AUC) were considered to evaluate the performance of the classification techniques. The results indicated that the performance of Random Forest and boosting are better than the performance of other techniques in both resampling and overcoming class imbalance aspects. In many cases when the training set was balanced, not only the machine learning techniques but also the statistical techniques had better performance.	en_US
dc.language.iso	en	en_US
dc.subject	Statistics	en_US
dc.subject	Machine Learning	en_US
dc.subject	Classification	en_US
dc.subject	Resampling	en_US
dc.subject	Class Imbalance	en_US
dc.title	A Comparison of Classical Statistical & Machine Learning Techniques in Binary Classification	en_US
dc.type	Article Full Text	en_US
dc.identifier.journal	KDU IRC	en_US

Files in this item

Name:: 006.pdf
Size:: 488.2Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Computing [28]

Show simple item record