Comparison of the Textual Similarity of Representation Elements (Title, Abstract, and Keyword) of Articles in the Citation Network of References with a Research Proposal

Document Type : Research Paper

Authors

1 PhD Candidate in Knowledge & Information Science, Shiraz University, Shiraz, Iran

2 Associate Professor, Department of Knowledge & Information Science, Faculty of Education & Psychology, Shiraz University, Shiraz, Iran.

3 Associate Professor, School of Information Studies, Charles Sturt University, Wagga Wagga, Australia.

Abstract

Purpose: The current research compares the representative elements (title, abstract, and keywords) of the articles that existed in the proposal references' citation network with the proposals’ elements. The other goal of this research is to calculate representative elements’ weighted average (title, abstract, and keywords) from a textual similarity perspective.
Methodology: This is an applied and quantitative research that uses citation analysis and content analysis. The research sample is 3019 articles extracted from the citation network of 31 graduated students’ proposals (M.Sc. and Ph.D.) in Chemistry at Shiraz University. All English articles' titles in the proposals' references were searched on the Web of Science database, and each article's file and all articles’ files in its citation network were saved in Excel format. All retrieved files were merged into one file and sorted based on citation count to have the unit citation network for each user's proposal. Because some of the proposals had an extended citation network with more than a thousand articles, 100 articles with the greatest citation count of each network were analyzed to create uniformity and balance among the proposals’ citation networks. Next, the scale of textual similarity of 100 articles' representative elements with the greatest citation count in the citation network, was calculated with the proposal’s title, the proposal’s text, and the titles of the proposal’s references. The scale of textual similarity was checked using designed software based on the Python programming language and measuring the cosine similarity.
 
Findings: The results of the Kruskal-Wallis test showed that there was a significant difference between the articles’ representative elements and the title, text, and references’ titles of the proposals from a textual similarity viewpoint; and in all three cases articles’ abstracts had the most textual similarity with the proposal elements, then, the title and keywords of the articles' citation network were in the second and third ranks; In addition, the representative elements’ weighted average was calculated. The obtained value was 0.62 for the abstract, 0.5 for the title, and 0.22 for the keywords, respectively.
Conclusion: Despite the use of different platforms to measure the similarity between the documents searched and the documents desired by the user, there is still a distance to reach the ideal level. Until now, no research had used the representative elements of the articles that existed in the proposal references' citation network to measure the textual similarity with the proposal elements and had not evaluated their capability. The confirmation of textual similarity among the representative elements of the articles that existed in the proposal references' citation network with proposals’ elements, indicates that the student's proposal can be used as a platform for recommending related articles. Hence, the designers of scientific recommender systems, scientific information retrieval systems, digital libraries, and scientific social networks such as LinkedIn, Academia, and ResearchGate can use the elements of articles' citation networks to recommend related articles. In addition, considering the articles’ representative elements as independent units is important not only for similarity measurement but also for keyword expansion and suggesting the appropriate journal to the authors for publishing their articles. According to the determined weight of representative elements and to increase the efficiency of information systems, it is suggested that designers of such systems use the abstracts and the titles of the articles to measure the similarity and avoid calculating the similarity of the texts as a whole unit. This saves time, resources, and energy, presents better results, and users can reach their target and desired information more easily and faster than before. In addition, for indexing articles in databases and search engines, the articles' abstracts and titles can be prioritized to save financial resources and energy.

Keywords


اکبری نیسیانی، س.، احتشام، ح.، تقی‌زاد، ح.، و دانشور، ح. (1400). جایگاه وزنی مقالات علمی تولیدشده انستیتوکانسر دانشگاه علوم پزشکی تهران: یک مطالعه علم‌سنجی. پژوهش‌نامه علم‌سنجی، 7 (13)، 217- 234‌. https://www.sid.ir/paper/401315/fa
پابرجا، ا.، عباس‌پور، ج.، و نبوی، س. م. (1400). امکان‌سنجی استفاده از شبکه استنادی فهرست منابع موجود در پیشنهاده پژوهشِ (پروپوزالِ) دانشجویان تحصیلات تکمیلی برای پیشنهاد مقاله‌های مرتبط به آنها. پژوهش‌نامه کتابداری و اطلاعرسانی، 1 (21)،322-334. https://doi.org/ 10.22067/INFOSCI.2021.24102.0
حری، ع. (1388). آئین نگارش علمی. تهران: نهاد کتابخانه‌های عمومی کشور.
حمیدزاده، ج.، و صادق‌زاد، م. (1396). فیلترکننده مشارکتی فازی ناهموار مبتنی بر کاربر در سیستم‌های پیشنهاددهنده. مجله مهندسی برق دانشگاه تبریز، 47 (2)، 491-500. https://civilica.com/doc/722526
خلیلی، ل.، و محمدی، ف. (1400). تحلیل علم‌سنجی مجلات منتشرشده به زبان انگلیسی در حوزه علم اطلاعات و دانش‌شناسی ایران بر اساس داده‌های اسکوپوس. پژوهش‌نامه علم‌سنجی، 7 (14)، 197-220. https://doi.org/ 10.22070/RSCI.2020.5329.1368
رحمان‌پور، م.، لیاقت‌دار، م. ج.،  و افشار، ا. (1396). بررسی چالش‌های فرهنگی- اجتماعی و منابع انسانی فراروی توسعه فناوری اطلاعات در آموزش عالی ایران از دیدگاه دانشجویان تحصیلات تکمیلی. فصلنامه تحقیقات فرهنگی ایران، 10 (2). 151-181. http://dx.doi.org/10.22631/jicr.2017.1460.2172
صانعی‌نژاد، ا.، خسروی فارسانی، ه.،  و خیام‌باشی، م.ر. (1395). بهبود کارایی در سامانه‌های پیشنهاددهنده مبتنی بر گراف. کنفرانس بین‌المللی چشم‌اندازهای نو در مهندسی برق و کامپیوتر. تهران.
صراطی شیرازی، م. (1400). مطالعه ارتباط استنادی معکوس در هم‌انتشاری‌های صنعت و دانشگاه بر اساس پایگاه وب آو ساینس در سال 2018. پژوهش‌نامه علم‌سنجی، 7 (14)، 1-22. https://doi.org/10.22070/rsci.2020.5303.1361
عبداللهی، ف.، و نقشینه، ن. (1401). تحلیل موضوعی استنادات و ارجاعات به پروانه‏های ثبت اختراع بین‏المللی ایران. پژوهش‌نامه علم‌سنجی، 8 (15)، 167-190.  https://doi.org/10.22070/rsci.2020.5617.1410
علی‌نژاد چمازکتی، ف.، و میرحق‌جو لنگرودی، س. (1401). تحلیل علم‌سنجی و الگوهای هم‌نویسندگی و استناد مقالات منتشرشده در نشریات وب آو ساینس کشورهای ایران و ترکیه. پژوهش‌نامه علم‌سنجی، (زودآیندhttps://doi.org/10.22070/rsci.2022.14765.1513  
گراسمن، د. ا.، و فریدر، اف. (1384). بازیابی اطلاعات: الگوریتم‌ها و روش‌های اکتشافی. ترجمة جعفر مهراد و سارا کلینی. مشهد: کتابخانه رایانه‌ای.
محمدزاده، ف.، فهیمی‌فر، س.، و حسن‌زاده، م. (1400). بررسی مقاله‌های پراستناد پژوهشگران ایرانی در پایگاه وب آو ساینس بر اساس الگوی همکاری‌ها در سال‌های 2007 تا 2017. پژوهش‌نامه علم‌سنجی، 7 (14)، 77-98.  https://doi.org/10.22070/rsci.2020.3850.1241  
ملکی، ا.، عباس‌پور، ج.، جوکار، ع.، و ستوده، ه. (1398). رابطه رتبه آموزشی کتاب‌های درسی دانشگاه‌های برتر جهان با شاخص‌های استناد، پیج رنک و هیتس. پژوهش‌نامه علم‌سنجی،  5 (10)، 221-240.  https://doi.org/‌ 10.22070/rsci.2019.4125.1267
موئد، ه. (1387). تحلیل استنادی در ارزیابی پژوهش. ترجمة عباس میرزایی و حیدر مختاری،  تهران: چاپار.
نجفی برازجانی، ا.، بصیریان جهرمی، ر و.، حمیدی، ع. (1401). بررسی تطبیقی مقاله‌های نمایه‌شده پژوهشگران کشورهای خاورمیانه در حوزه دیابت در پایگاه وب آو ساینس با استفاده از شاخص‌های نفوذ اندیشه‌ای و نفوذ اجتماعی. پژوهش‌نامه علم‌سنجی، 9 (1)، (بهار و تابستان).    https://doi.org/10.22070/RSCI.2022.15411.1540
 
Abdollahi, F., & Naghshineh, N. (2022). Topical Analysis of References from and Citations to Iranian International Patents. Scientometrics Research Journal, 8 (1), (Spring & Summer), 167-190. https://doi.org/10.22070/rsci.2020.5617.1410  [In Persian].
Akbari Neisiani, S., Ehtesham, H., Taghizad, H., & Daneshvar, H. (2021). Position of scientific articles produced by the Cancer Institute of Tehran University of Medical Sciences in terms of weight: a scientometric study. Scientometrics Research Journal, 7 (1), (Spring & Summer), 217-234. https://doi.org/10.22070/rsci.2020.5124.1348  [In Persian].
Alfraidi, H., Lee, W. S., & Sankoff, D. (2015). Literature visualization and similarity measurement based on citation relations. [In 2015 19th International Conference on Information Visualisation], 217-222. IEEE. https://doi.org/10.1109/iV.2015.47
Alinezhad Chamazkoti, F., & mirhaghjoo langerudi, S. (2022). Scientometric Analysis and Co-authorship Patterns and Citation of the Articles Published in Iranian and Turkish Journals in WoS. Scientometrics Research Journal, (Online Published, 6 June). https://doi.org/10.22070/rsci.2022.14765.1513  [In Persian].
Arif, M. A. (2016). Content aware citation recommendation system. [In 2016 International Conference on Emerging Technological Trends (ICETT)], 1-6. IEEE. https://doi.org/10.1109/ICETT.2016.7873690  
Beel, J., Aizawa, A., Breitinger, C., & Gipp, B. (2017). Mr. DLib: recommendations-as-a-service (RaaS) for academia. [In 2017 ACM/IEEE Joint Conference on Digital Libraries], 1-2. https://ieeexplore.ieee.org/document/7991606
Bichteler, J., & Eaton, E. A. (1980). The Combined Use of Bibliographic Coupling and Cocitation for Document Retrieval. Journal of the American Society for Information Science, 31(4), 278-282. https://doi.org/10.1002/asi.4630310408
Borges, E. N., Pereira, I. A., Tomasini, C., & Vargas, A. P. (2012). ARGOSearch: An Information Retrieval System based on text similarity and extensible relevance criteria. [In 2012 31st International Conference of the Chilean Computer Science Society], 133-141. https://doi.org/10.1109/SCCC.2012.23
Bornmann, L., & Mutz, R. (2015). Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references. Journal of the Association for Information Science and Technology, 66(11), 2215-2222.  https://doi.org/10.1002/asi.23329
Byrne, J. R. (1975). Relative effectiveness of titles, abstracts, and subject headings for machine retrieval from the COMPENDEX services. Journal of the American Society for Information Science, 26(4), 223-229.   https://doi.org/10.1002/asi.4630260405
Chikhi, N. F., Rothenburger, B., & Aussenac-Gilles, N. (2008). Combining link and content information for scientific topics discovery. [In 2008 20th IEEE International Conference on Tools with Artificial Intelligence], 2, 211-214. IEEE. https://doi.org/10.1109/ICTAI.2008.136
Eto, M. (2013). Evaluations of context-based co-citation searching. Scientometrics, 94(2), 651-673. https://doi.org/10.1007/s11192-012-0756-z
Farouk, M. (2019). Measuring sentences similarity: a survey. Indian Journal of Science and Technology, 12(25). https://doi.org/10.17485/ijst/2019/v12i25/143977  
Farouk, M. (2020). Measuring text similarity based on structure and word embedding. Cognitive Systems Research, 63, 1-10.  https://doi.org/10.1016/j.cogsys.2020.04.002.
Fonseca, B. M., Golgher, P., Pôssas, B., Ribeiro-Neto, B., & Ziviani, N. (2005). Concept-based interactive query expansion. [In Proceedings of the 14th ACM international conference on Information and knowledge management], 696-703. https://doi.org/10.1145/1099554.1099726    
Garcia, D. C. F., Gattaz, C. C., & Gattaz, N. C. (2019). The relevance of title, abstract and keywords for scientific paper writing. Revista de Administração Contemporânea23, 1-9.  https://doi.org/10.1590/1982-7849rac2019190178 
Gazni, A. (2011). Are the abstracts of high impact articles more readable? Investigating the evidence from top research institutions in the world. Journal of Information Science, 37(3), 273–281.  https://doi.org/10.1177/0165551511401658
Gomaa, W. H., & Fahmy, A. A. (2017). SimAll: A flexible tool for text similarity. [In the Seventeenth Conference on Language Engineering ESOLEC], 17, 122-127. https://www.academia.edu/35381793/SimAll_A_flexible_tool_for_text_similarity
Grossman, D. A., & Frieder, F. (2004). Information retrieval: Algorithms and heuristic methods. Translated by Jafar Mehrad and Sara Koleini. Mashhad: Computer Library Publications. [In Persian].
Habibi, M., & Cahyo, P. W. (2020). Journal Classification Based on Abstract Using Cosine Similarity and Support Vector Machine. JISKA (Jurnal Informatika Sunan Kalijaga), 4(3), 185-192. https://www.researchgate.net/publication/342867323_Journal_Classification_Based_on_Abstract_Using_Cosine_Similarity_and_Support_Vector_Machine
Hamedani, M. R., Kim, S. W., & Kim, D. J. (2016). SimCC: A novel method to consider both content and citations for computing similarity of scientific papers. Information Sciences, 334, 273-292.  https://doi.org/10.1016/j.ins.2015.12.001
Hamedani, M. R., Lee, S. C., & Kim, S. W. (2013). On combining text-based and link-based similarity measures for scientific papers. In Proceedings of the 2013 Research in Adaptive and Convergent Systems, 111-115.  https://doi.org/10.1145/2513228.2513321
Hamidzadeh, J., & Sadeghzadeh, M. (2017). A User Based Fuzzy Rough Collaborative Filtering in Recommender Systems. Tabriz Journal of Electrical Engineering, 47(2), 491-500. https://civilica.com/doc/722526  [In Persian].
Harter, S. P., & Nisonger, T. E. (1993). Semantic Relationships between Cited and Citing Articles in Library and Information Science Journals. Journal of the American Society for Information Science, 44(9), 543-552.‌ https://doi.org/10.1002/(SICI)1097-4571(199310)44:9<543::AID-ASI4>3.0.CO;2-F
Hassan, H. A. M., Sansonetti, G., Gasparetti, F., Micarelli, A., & Beel, J. (2019). Bert, elmo, use and infersent sentence encoders: The panacea for research-paper recommendation? In RecSys (Late-Breaking Results), 6-10. https://www.researchgate.net/publication/335555312_BERT_ELMo_USE_and_InferSent_Sentence_Encoders_The_Panacea_for_Research-Paper_Recommendation
Horri, A. (2009). Scientific writing. Tehran: Iran Public Libraries. [In Persian].
Jamali, H. R., & Nikzad, M. (2011). Article title type and its relation with the number of downloads and citations. Scientometrics88(2), 653-661.‌ https://doi.org/10.1007/s11192-011-0412-z
Kerzendorf, W. E. (2019). Knowledge discovery through text-based similarity searches for astronomy literature. Journal of Astrophysics and Astronomy, 40(3), 1-7.‌‌ https://doi.org/10.1007/s12036-019-9590-5
Khalili, L., & Mohammadi, F. (2021). Scientometric Analysis of English-language Journals in the Field of Knowledge and Information Science in Iran Based on Scopus Data. Scientometrics Research Journal, 7 (2), (Autumn & Winter), 197-220. https://doi.org/10.22070/rsci.2020.5329.1368  [In Persian].
Kinley, K., Tjondronegoro, D., Partridge, H., & Edwards, S. (2014). Modeling users' web search behavior and their cognitive styles. Journal of the Association for Information Science and Technology65(6), 1107-1123. ‌ https://doi.org/10.1002/asi.23053
Komkhao, M., Lu, J., Li, Z., & Halang, W. A. (2013). Incremental collaborative filtering based on Mahalanobis distance and fuzzy membership for recommender systems. International Journal of General Systems42(1), 41-66. ‌https://doi.org/10.1080/03081079.2012.710437
Küçüktunç, O., Saule, E., Kaya, K., & Çatalyürek, Ü. V. (2012). Recommendation on academic networks using direction aware citation analysis. arXiv preprint arXiv, 1205.1143.
https://doi.org/10.48550/arXiv.1205.1143
Kusumastuti, S., Derks, M. G., Tellier, S., Di Nucci, E., Lund, R., Mortensen, E. L., & Westendorp, R. G. (2016). Successful ageing: A study of the literature using citation network analysis. Maturitas93, 4-12.  https://doi.org/10.1016/j.maturitas.2016.04.010
Li, X., Wang, H., & Yan, X. (2015). Accurate recommendation based on opinion mining. In Genetic and Evolutionary Computing, 399-408.‌ https://doi.org/10.1007/978-3-319-12286-1_41
Luu, L. A., & Kim, J. J. (2012). Automatic suggestion for PubMed query reformulation. Journal of Computing Science and Engineering6(2), 161-167.  https://doi.org/10.5626/JCSE.2012.6.2.161
Maleki, A., Abbaspour, G., Jowkar, A., & Sotudeh, H. (2019). The Relationship Between Textbooks’ Teaching Ranks in World Top Universities and Citation, PageRank and HITS Indicators. Scientometrics Research Journal, 5 (2), (Autumn & Winter), 221-240. https://doi.org/‌10.22070/rsci.2019.4125.1267  [In Persian].
Menczer, F. (2004). Combining Link and Content Analysis to Estimate Semantic Similarity. [In Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters], May 19-21, 2004, New York, NY, USA, 452-453.  https://doi.org/10.1145/1013367.1013521
Moed, Hank. (2008). Citation analysis in research evaluation. Translated by Abbas Mirzaei and Haider Mokhtari. Tehran: Chapar. [In Persian].
Mohammadzadeh, F., Fahimifar, S., & Hasanzadeh, M. (2021). Investigating and Visualization of Iranian highly cited papers in order to discover the most effective at the international level in the period of ten years 2007-2017. Scientometrics Research Journal, 7(2), (Autumn & Winter), 77-98. https://doi.org/10.22070/rsci.2020.3850.1241  [In Persian].
Najafi Borazjani, A., Basirian Jahromi, R., & Hamidi, A. (2023). A Comparative Study of Indexed Articles on Diabetes by Middle Eastern Scholars in the Web of Science from 2010 to 2019 using the Indicators of Ideational Influence and Social Influence. Scientometrics Research Journal, 9 (1), (Spring & Summer), 447-468. https://doi.org/10.22070/rsci.2022.15411.1540 [In Persian].
Nicholas, D., Huntington, P., & Watkinson, A. (2003). Digital journals, Big Deals and online searching behavior: A pilot study. In Aslib Proceedings: New Information Perspectives, 55(1/2), 84–109.  https://doi.org/10.1108/00012530310462742
Pabarja, E., Abbaspour, J., & Nabavi, S. (2021). Feasibility Study of Using References Citation Network in Postgraduate Students’ Proposal for Suggesting Related Articles. Library and Information Science Research, 11(1), 322-334. https://doi.org/10.22067/infosci.2021.24102.0  [In Persian].
Park, K., Hong, J. S., & Kim, W. (2020). A methodology combining cosine similarity with classifier for text classification. Applied Artificial Intelligence34(5), 396-411.  https://doi.org/10.1080/08839514.2020.1723868
Peng, J., Yang, D., Tang, S., Wang, T., & Gao, J. (2008). A new similarity computing method based on concept similarity in Chinese text processing. Science in China Series F: Information Sciences51(9), 1215-1230. https://doi.org/10.1007/s11432-008-0103-4
Porcel, C., Tejeda-Lorente, A., Martínez, M. A., & Herrera-Viedma, E. (2012). A hybrid recommender system for the selective dissemination of research resources in a technology transfer office. Information Sciences184(1), 1-19.  https://doi.org/10.1016/j.ins.2011.08.026
Rahmanpoor, M., Liaghatdar, M., & Afshar, E. (2017). Cultural-social and human resource challenges facing development of information technology in Iran's higher education in viewpoint of graduate students. Journal of Iranian Cultural Research, 10(2), 151-181. https://doi.org/10.22631/jicr.2017.1460.2172  [In Persian].
Renu, R. S., & Mocko, G. (2016). Computing similarity of text-based assembly processes for knowledge retrieval and reuse. Journal of Manufacturing Systems39, 101-110.  https://doi.org/10.1016/j.jmsy.2016.03.004
Renuka, S., Raj Kiran, G. S. S., & Rohit, P. (2021). An unsupervised content-based article recommendation system using natural language processing. In Data Intelligence and Cognitive Informatics, 165-180. https://doi.org/10.1007/978-981-15-8530-2_13
Ristanti, P. Y., Wibawa, A. P., & Pujianto, U. (2019). Cosine similarity for title and abstract of economic journal classification. [In 2019 5th International Conference on Science in Information Technology (ICSITech)], 123-127. https://doi.org/10.1109/ICSITech46713.2019.8987547
Saneinejad, E., Khosravi F., Hadi, K., M. R. (2015). Improving efficiency in graph-based recommender systems, [In The first international conference on new perspectives in electrical and computer engineering], Tehran. [In Persian].
Serati Shirazi, M. (2021). Studying the Reverse citation relations in university and industry co-publications based on Web of Science database in 2018. Scientometrics Research Journal, 7(2), (Autumn & Winter), 1-22. https://doi.org/10.22070/rsci.2020.5303.1361  [In Persian].
Simkin MV & Roychowdhury VP. (2002). Read before you cite! Complex Systems, 14, 269–274.
https://doi.org/10.48550/arXiv.cond-mat/0212043
Spink, A., Park, M., & Koshman, S. (2006). Factors affecting assigned information problem ordering during Web search: An exploratory study. Information Processing & Management42(5), 1366-1378.  https://doi.org/10.1016/j.ipm.2006.01.007
Sterling, J. A., & Montemore, M. M. (2021). Combining Citation Network Information and Text Similarity for Research Article Recommender Systems. IEEE Access, 10, 16-23.   https://doi.org/10.1109/ACCESS.2021.3137960
Subotic, S., & Mukherjee, B. (2014). Short and amusing: The relationship between title characteristics, downloads, and citations in psychology articles. Journal of information science40(1), 115-124.  https://doi.org/10.1177/0165551513511393
Sun, J., Jiang, Y., Cheng, X., Du, W., Liu, Y., & Ma, J. (2018). A hybrid approach for article recommendation in research social networks. Journal of Information Science44(5), 696-711.  https://doi.org/10.1177/0165551517728449 
Xia, F., Liu, H., Lee, I., & Cao, L. (2016). Scientific article recommendation: Exploiting common author relations and historical preferences. IEEE Transactions on Big Data2(2), 101-112. Scientific Article Recommendation: Exploiting Common Author Relations and Historical Preferences | IEEE Journals & Magazine | IEEE Xplore
Yang, C. C., & Liu, N. (2006). Measuring similarity of semi-structured documents with context weights. [In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval], 719-720.  https://doi.org/10.1145/1148170.1148334
Yoon, S. H., Kim, J. S., Kim, S. W., & Lee, C. (2012). TL-Rank: A Blend of Text and Link Information for Measuring Similarity in Scientific Literature Databases. IEICE TRANSACTIONS on Information and Systems95(10), 2556-2559. https://search.ieice.org/bin/summary.php?id=e95-d_10_2556
Yoon, S. H., Kim, S. W., Kim, J. S., & Hwang, W. S. (2011). On computing text-based similarity in scientific literature. [In Proceedings of the 20th international conference companion on World wide web],169-170.  https://doi.org/10.1145/1963192.1963278
Yousefi, Z., Sotudeh, H., Mirzabeigi, M., Fakhrahmad, S. M., Nikseresht, A., & Mohammadi, M. (2019). Investigating text power in predicting semantic similarity. International Journal of Information Science and Management (IJISM)17(1), 17. https://ijism.ricest.ac.ir/article_698288.html