| dc.description.abstract | Predicting	a	precise	response		for	previously	unseen	input	variables	is	a	vital	and	challenging	task,	as	precise	predictions	can	minimize	the	risks	related	to	different	domains	by	making	correct	decisions.	The	main	objective	of	this	study	was	to	compare	the	performance	of	several	classical	statistical	and	machine	learning	techniques	by	considering	the	prediction	task	as	a	binary	classification.	The	classification	techniques;	Logistic	Regression	(LR)	and	Linear	Discriminant	Analysis	(LDA)	were	considered	under	classical	statistical	techniques	while	Random	Forest	(RF),	Naïve	Bayes	(NB),	Boosting	(BT)	and	Bagging	(BA)	were	considered	under	machine	learning	techniques.	The	performance	of	those	techniques	were	compared	under	the	two	different	aspects	by	using	five	real	datasets.	In	one	aspect,	class	imbalance	was	artificially	introduced	to	the	datasets	by	resampling.	In	the	other	aspect	sampling	approaches	such	as	undersampling,	oversampling	and	hybrid	approach	(mix	of	both	undersampling	and	oversampling)	were	considered,	to	overcome	class	imbalance	in	the	training	set.	Several	evaluation	methods	such	as	accuracy,	precision,	F-measure,	G-mean	and	Receiver	Operating	Characteristics	Area	Under	Curve	(ROC	AUC)	were	considered	to	evaluate	the	performance	of	the	classification	techniques.	The	results	indicated	that	the	performance	of	Random	Forest	and	boosting	are	better	than	the	performance	of	other	techniques	in	both	resampling	and	overcoming	class	imbalance	aspects.	In	many	cases	when	the	training	set	was	balanced,	not	only	the	machine	learning	techniques	but	also	the	statistical	techniques	had	better	performance. | en_US |