Correlation between Quantitative Citation Analysis and Opinion Mining of Citation Contexts

Document Type : Research Paper

Authors

1 MA Student in Knowledge and Information Science,Shiraz University, Shiraz,, Iran.

2 Ph.D. in Knowledge and Information Science, Assistant Professor, Islamic World Science Citation Center (ISC)

3 Ph.D. in Knowledge and Information Science,Associate Professor,Shiraz University. Shiraz, Iran.

4 Ph.D. in Knowledge and Information Science, Associate Professor,Shiraz University. Shiraz, Iran.

Abstract

Purpose: Quantitative citation analysis fails to take into account different citation motivations which may be neutral, confirmative, or negational. It necessitates devising new methods or techniques to evaluate cited documents based on the attitudes of their citing articles, to increase the accuracy of the results of the quantitative approach. The challenge is believed to be partially answered by content analysis of citations, including citation opinion mining. It is based on extracting and analyzing the sentiment words occurring in citations, or citation contexts, i.e., a word window surrounding any given citation within a citing paper. However, there exist few pieces of evidence on the degree of divergence or convergence of the results of the quantitative and content-based approaches. To provide further research evidence, the present study investigated the correlation between the results of these two citation-analysis approaches.
Methodology: Using a citation analysis method with the quantitative and opinion-mining approaches, this communication explored a sample consisting of 524 medical papers. Their bibliographic information and citations were extracted from PubMed and CoLil, respectively. 3663 citations were identified, of which 3639 contexts were available through CoLil. The citations were processed using the KNIME data mining platform. The opinion scores of the words were extracted from SentiWords. The citations opinions were measured in terms of the polarity and strength of the average opinion scores of their words. The data were then analyzed by Spearman Correlation.
Findings: The citations were revealed to carry numerous sentiment words. They were mostly positive in their polarity, however, the number of citations with negative polarity was also considerable. The citation counts were found to be directly and strongly correlated to the absolute count of the opinionated citations. However, they were revealed to be indirectly and weakly correlated to the relative count of opinionated citations, i.e., the number of opinionated citations normalized by the total count of the citations. Furthermore, the citation counts had an insignificant correlation with the relative frequency of positive citations, while displaying significant direct relationships with negative and neutral ones. Moreover, they were indirectly associated with the average opinion scores.
Conclusion: In general, the findings of this research showed that as the number of citations increases, the relative number of opinionated contexts decreases. They were dominated by a positive polarity which is in line with previous studies revealing a confirmative motivation in citation behavior. The predominance of positive opinions implied explicit and implicit confirmation motives of researchers reflected in the citing works: the citer may explicitly express her opinion about the cited article and its features, or implicitly express her approval by citing it (i.e., using algorithms, methods, tools, findings and etc.).
According to the findings, as citation counts increase, the cited papers witness a reduction in their opinionated contexts’ relative numbers, while experiencing an increase in their negative and neutral ones. Consequently, the content-based citation analysis with an opinion-mining approach may be able to adjust the results of the quantitative approach. However, this finding and its generalizability should be treated with caution. Because the sample of the current research was not selected randomly. Given the differences between disciplines and scientific communities in their citation behavior, it is necessary to replicate the research in various contexts to support the results. Moreover, in the dictionary-based opinion mining method applied in the present study, the big challenge is precisely detecting negative opinions. As in negative citations, the negative opinions of citers may be mingled with their reports of negative objective findings. In other words, the method cannot precisely distinguish these two types of opinion contexts, i.e., the negative attitudes of the citers and their narration of negative findings. Moreover, citations are of social nature. Therefore, negative citations are mostly hidden and indirect, and their lexical identification is hence quite difficult. Thus, advanced methods using, for example, machine learning algorithms, are required to detect and analyze any possible implicit and indirect negative opinions which the direct natural language method may fail to capture.

Keywords


جوکار، خدیجه، مریم یقطین، هاجر ستوده و مهدیه میرزابیگی (1399). تحلیل محتوای استناد مقالات آزاد و غیرآزاد به کمک عقیده‌کاوی بافتار استناد. مدیریت اطلاعات سلامت، 17 (5)،  DOI: 10.22122/him.v17i5.4179
یقطین، مریم (1397). تحلیل شباهت نحوی و معنایی هم‌استنادی و نقش آن در رتبه‌بندی ربط در بازیابی مقالات علمی. پایان‌نامه دکتری علم اطلاعات و دانش‌شناسی، دانشگاه شیراز.
Abu-Jbara, A., J Ezra, & D. R. Radev (2013). Purpose and Polarity of Citation: Towards NLP-based Bibliometrics. In HLT-NAACL, 596-606.
Aljuaid, H., Iftikhar, R., Ahmad, S., Asif, M., & Afzal, M. T. (2020). Important citation identification using sentiment analysis of In-text citations. Telematics and Informatics, 101492. doi: https://doi.org/10.1016/j.tele.2020.101492
Amadi, U. P. (2014). Exploiting the role of polarity in citation analysis. University of Maryland, Baltimore County.
Anupkant, S., Kumar, P. S., Sateesh, N., & Mahesh, D. B. (2017). Opinion mining on author's citation characteristics of scientific publications. In 2017 International Conference on Big Data Analytics and Computational Intelligence (ICBDAC) (pp. 348-351). IEEE.
Athar, A. (2014). Sentiment analysis of scientific citations (No. UCAM-CL-TR-856). University of Cambridge, Computer Laboratory. doi: 10.48456/tr-856
Athar, A., & S. Teufel. (2012). Context-enhanced citation sentiment detection. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 597-601.
Cavalcanti, D. C., R. B. Prudêncio, S. S. Pradhan, J. Y. Shah, & R. S. Pietrobon (2011). Good to be bad? Distinguishing between positive and negative citations in scientific impact. In Tools with Artificial Intelligence (ICTAI), 2011 23rd IEEE International Conference on, 156-162. doi:
10.1109/ICTAI.2011.32
Chubin, Daryl E., & Soumyo D. Moitra (1975). Content Analysis of References: Adjunct or Alternative to Citation Indexing. Social Studies of Science, 5, 423-41.
Cole, Jonathan, & Stephen Cole (1971). Measuring the Quality of Sociological Research: Problems in the Use of the “Science Citation Index”. The American Sociologist, 6(1): 23-29.
Ding, Y., G. Zhang, T. Chambers, M. Song, X. Wang, & C. Zhai (2014). Content‐based citation analysis: The next generation of citation analysis. Journal of the Association for Information Science and Technology, 65(9), 1820-1833. doi: https://doi.org/10.1002/asi.23256
Esuli, A., & F. Sebastiani (2006). Sentiwordnet: A publicly available lexical resource for opinion mining. In LREC 6, 417-422.
Feldman, R. (2013). Techniques and Applications for Sentiment Analysis. communications of the ACM, 56(4), 82-89. doi: https://doi.org/10.1145/2436256.2436274
Gabehart, MA (2005). An analysis of citations to retracted articles in the scientific literature. A Master’s Paper for the M.S. in L.S degree. Retrieved 31 May 2018 from https://ils.unc.edu/MSpapers/3050.pdf. doi: https://doi.org/10.17615/dy5f-y998
Garfield, E. (1965, December). Can citation indexing be automated. In Statistical association methods for mechanized documentation, symposium proceedings (Vol. 269, pp. 189-192). https://books.google.com/books?hl=en&lr=&id=AhxaMNgK3BYC&oi=fnd&pg=PA189&dq=citation+motivation+garfield&ots=ZoItIE7RWI&sig=m6wh6bDGLJ8LKOud8dEWO2DTakc
Ghosh, S., Das, D., & Chakraborty, T. (2016). Determining sentiment in citation text and analyzing its impact on the proposed ranking index. In International Conference on Intelligent Text Processing and Computational Linguistics (pp. 292-306). Springer, Cham. doi: https://doi.org/10.1007/978-3-319-75487-1_23
Hernández-Alvarez, M., & J. M. Gómez (2016). Survey about citation context analysis: Tasks, techniques, and resources. Natural Language Engineering22(3), 327-349. doi: https://doi.org/10.1017/S1351324915000388
Jia, M. (2018). Citation Function and Polarity Classification in Biomedical Papers. Electronic Thesis and Dissertation Repository, 5367.
Jokar, K., Yaghtin, M., Sotudeh, H., & Mirzabeigi, M. (2020). Content-Based Citation Analysis of Open Access and Non-Open Access Medical Articles Using Opinion Mining of Citances. Health Information Management17(5), 244-250. doi: 10.22122/him.v17i5.4179 [In Persian].
Khan, K., B. B. Baharudin, & A. Khan. (2009). Mining Opinion from Text Documents: A Survey. 3rd IEEE International Conference on, IEEE, 217-222. doi: 10.1109/DEST.2009.5276756
Kilicoglu, H., Peng, Z., Tafreshi, S., Tran, T., Rosemblat, G., & Schneider, J. (2019). Confirm or refute?: A comparative study on citation sentiment classification in clinical research publications. Journal of biomedical informatics91, 103123. doi: https://doi.org/10.1016/j.jbi.2019.103123
Leung, P. T., E. M. Macdonald, M. B. Stanbrook, I. A. Dhalla, & D. N. Juurlink. (2017). A 1980 letter on the risk of opioid addiction. New England Journal of Medicine376(22), 2194-2195. doi: 10.1056/NEJMc1700150
Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis Lectures on Human Language Technologies, 5, 1-167. doi: https://doi.org/10.2200/S00416ED1V01Y201204HLT016
Luukkonen-Gronow, T. (1988). Bibliometric analysis of Nordic cancer research: a report on study data (No. 8). Nordic Council of Ministers.
Ma, Z., Nam, J., & Weihe, K. (2016). Improve sentiment analysis of citations with author modelling. In Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (pp. 122-127).
MacRoberts, M. H., & B. R. MacRoberts. (1984). The negational reference: Or the art of dissembling. Social Studies of Science, 14(1), 91-94.
MacRoberts, M. H., & B. R. MacRoberts. (1989). Problems of citation analysis: A critical review. Journal of the American Society for Information Science, 40(5), 342 -349. doi: https://doi.org/10.1002/(SICI)1097-4571
Mahalakshmi, G. S., R. Siva, & S. Sendhilkumar. (2015). Context Based Retrieval of Scientific Publications via Reader Lens. In Computational Intelligence in Data Mining-Volume 3 . Springer India, 583-596. doi: https://doi.org/10.1007/978-81-322-2202-6_53
Mäntylä, M. V., Graziotin, D., & Kuutila, M. (2018). The evolution of sentiment analysis—A review of research topics, venues, and top cited papers. Computer Science Review27, 16-32. doi: https://doi.org/10.1016/j.cosrev.2017.10.002
Moravcsik, M. J., & Murugesan, P. (1975). Some results on the function and quality of citations. Social studies of science, 5(1), 86-92.
Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and trends in information retrieval, 2(1-2), 1-135. doi:  http://dx.doi.org/10.1561/1500000011
Parthasarathy, G., & D. C. Tomar. (2014). Sentiment analyzer: Analysis of journal citations from citation databases. In Confluence the Next Generation Information Technology Summit (Confluence), 5th International Conference- IEEE, 923-928. doi:
10.1109/CONFLUENCE.2014.6949321
Piao, S., S. Ananiadou, Y. Tsuruoka, Y. Sasaki, & J. McNaught. (2007). Mining opinion polarity relations of citations. In International Workshop on Computational Semantics (IWCS), 366-371.
Ravi, K., Setlur, S., Ravi, V., & Govindaraju, V. (2018, July). Article citation sentiment analysis using deep learning. In 2018 IEEE 17th International Conference on Cognitive Informatics & Cognitive Computing (ICCI* CC) (pp. 78-85). IEEE. doi: 10.1109/ICCI-CC.2018.8482054
Ringelhan, S., J. Wollersheim, & I. M. Welpe. (2015). I Like, I Cite? Do Facebook Likes Predict the Impact of Scientific Work? PloS one10(8), e0134389. doi: https://doi.org/10.1371/journal.pone.0134389
Schafer, U., & Spurk, C. (2010, September). TAKE scientist's workbench: semantic search and citation-based visual navigation in scholar papers. In Semantic Computing (ICSC), 2010 IEEE Fourth International Conference on (pp. 317-324). IEEE. doi: 10.1109/ICSC.2010.40
Sendhilkumar, S., E. Elakkiya, & G. S. Mahalakshmi. (2013). Citation semantic based approaches to identify article quality. In Proceedings of International conference ICCSEA, 411-420. doi: 10.5121/csit.2013.3543
Small, H. (2011). Interpreting maps of science using citation context sentiments: a preliminary investigation. Scientometrics, 87(2), 373-388. doi: https://doi.org/10.1007/s11192-011-0349-2
Sonnert, G. (1995). What makes a good scientist?: Determinants of peer evaluation among biologists. Social studies of science25(1), 35-55. doi: https://doi.org/10.1177/030631295025001003
Tandon, N., & A. Jain. (2012). Citation context sentiment analysis for structured summarization of research papers. In 35th German Conference on Artificial Intelligence, 24-27.
Tang, R., & M. A. Safer. (2008). Author-rated importance of cited references in biology and psychology publications. Journal of Documentation, 64(2), 246-272. doi: https://doi.org/10.1108/00220410810858047
Teufel, S., A. Siddharthan, & D. Tidhar. (2006). Automatic classification of citation function. In Proceedings of the 2006 conference on empirical methods in natural language processing. Association for Computational Linguistics, 103-110.
Teufel, S., A. Siddharthan, & D. Tidhar. (2009). An annotation scheme for citation function. In Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue. Association for Computational Linguistics, 80-87.
Vinodhini, G., & R. M. Chandrasekaran. (2012). Sentiment analysis and opinion mining: a survey. International Journal2(6), 282-292.
Voos, H., & K. S. Dagaev. (1976). Are All Citations Equal? Or, Did We Op. Cit. You're idem? Journal of Academic Librarianship, 1(6), 19-21.
Vyas, V., Ravi, K., Ravi, V., Uma, V., Setlur, S., & Govindaraju, V. (2020). Article citation study: Context enhanced citation sentiment detection. arXiv preprint arXiv:2005.04534. doi:
https://doi.org/10.48550/arXiv.2005.04534
Wang, M., Leng, D., Ren, J., Zeng, Y., & Chen, G. (2019). Sentiment classification based on linguistic patterns in citation context. Curr. Sci10, 606-616.
Yaghtin, Maryam (2019). Syntactic and Semantic Similarity Analysis of
Co-Citation and Its Role in Relevance Ranking in Scientific Paper Retrieval. Ph.D. Dissertation in Knowledge and Information Science, Shiraz University. [In Persian].
Yaghtin, M., Sotudeh, H., Mirzabeigi, M., Fakhrahmad, S. M., & Mohammadi, M. (2019). In quest of new document relations: evaluating co-opinion relations between co-citations and its impact on Information retrieval effectiveness. Scientometrics119(2), 987-1008. doi: https://doi.org/10.1007/s11192-019-03058-3
Yan, E., Chen, Z., & Li, K. (2020). Authors' status and the perceived quality of their work: Measuring citation sentiment change in nobel articles. Journal of the Association for Information Science and Technology71(3), 314-324. doi: https://doi.org/10.1002/asi.24237
Yu, B. (2013). Automated citation sentiment analysis: what can we learn from biomedical researchers. Proceedings of the American Society for Information Science and Technology, 50(1), 1-9. doi: https://doi.org/10.1002/meet.14505001084
Ziman J. M. (1987). An Introduction to Science Studies: The Philosophical and Social Aspects of Science and Technology, Cambridge: Cambridge University Press.