Data Mining Approach for Identifying High-Quality Journals in Computer Science
Abstract
In the present scientific world, most of the authors of scientific literature are seeking effective ways to share their research findings with large peer groups. But finding a high-quality journal to publish is a huge challenge for them. Most of the journals available today are predatory and less-quality and most of them will publish almost everything that is sent to them without proper quality control. The main aim of this study is to help the researchers in identifying the quality level of Computer Science journals by introducing a data mining approach based on six journal quality metrics: Journal Impact Factor (JIF), SCImago Journal Rank (SJR), Eigenfactor, H-index, Source Normalized Impact per Paper (SNIP) and Article Influence (AI). Further, this aims to present the best metrics to measure the quality of those journals out of the six attributes and a more accurate data mining approach based on those metrics. A sample dataset of 200 journals was used for the study. Hence there were no former defined groups for the journals and they needed to be categorized into groups based on the distribution of values of the quality attributes the K-means clustering algorithm was applied for the dataset and it was clustered into five clusters as excellent, good, fair, poor and very poor using WEKA tool. When finding the best quality metrics, Pearson’s and Spearman’s correlation coefficients were calculated between each attribute against JIF using IBM SPSS Statistics 20 software and it was found that JIF, SJR, and SNIP are the best attributes to measure the quality of those journals based on the high coefficient values. Again a more effective clustering model with an accuracy of 0.9171, sensitivity of 1.0000, specificity of 0.9126, fmeasure of 0.5556 and g-mean of 0.9553 was developed considering only those selected three attributes.
Collections
- Computing [68]