Robert Jankowski
Author(s):
- Robert Jankowski, Faculty of Physics, Warsaw University of Technology
- dr inż. Julian Sienkiewicz, Faculty of Physics, Warsaw University of Technology
Abstract title:
Crucial factors determining the popularity of scientific articles.
Abstract:
The single most important bibliometric criterion for judging the impact of scientific papers is the number of citations received, commonly known as "citation count". However, due to the discipline dependence, this metric is unreliable. Sentiment analysis is the field of study that reflects people's opinions, attitudes, and emotions from written language. Not only the citation impact but sentiment around citation may be an essential metric. Finding the criterions describing the popularity of scientific papers and also determining the popularity threshold was the major objective of this work.
Around 140k journals were gathered from PLoS ONE database. Additionally, a dataset with the number of views of a given scientific article was created. The simple features such as counts of words in the title and abstract and more complex like Gunning fog index, valence and arousal was calculated to ascertain the crucial factors of popularity. The components were used to build machine learning models (Support Vector Machines, Random Forests) and examined the concept of the threshold of popularity. Furthermore, the main part of the paper was divided into sections. In each part, the number of citation and sentiment were computed. Binary classifiers were built using these features. The models were measured of its quality with the Matthews correlation coefficient and F1 score. Moreover, it has been observed that dimensionality reduction (PCA) had a small influence if the results.