پیش‌بینی تأثیرگذاری پژوهش‌های علمی حوزه زیست‌فناوری با استفاده از الگوریتم‌های یادگیری ماشین

نوع مقاله : مقاله پژوهشی

نویسنده

استادیار، علم اطلاعات و دانش‌شناسی، گروه ارزیابی سیاست‌ها و پایش علم، فناوری و نوآوری، مرکز تحقیقات سیاست علمی کشور، تهران، ایران.

چکیده

هدف: مطالعه حاضر قصد دارد رابطة هر یک از متغیرهای مختلف اثرگذاری بروندادهای علمی را بر همدیگر موردسنجش قرار داده و نیز بررسی کند کدا‌م‌یک از الگوریتم‌های ماشین می‌توانند اثرگذاری علمی، اجتماعی و اقتصادی بروندادهای علمی را پیش‌بینی کنند.
روش‌شناسی: پژوهش حاضر ازنظر هدف، کاربردی و ازلحاظ روش، توصیفی بوده و با رویکرد علم‌سنجی انجام‌شده است. جامعه پژوهش، بروندادهای حوزة زیست‌فناوری ایران است که در پایگاه اسکوپوس در بازة 2003-2024 نمایه شده‌اند. برای استخراج داده‌ها از پایگاه تحلیلی سایول استفاده شد. در این پژوهش از ضریب همبستگی پیرسون و از بسته نرم‌افزاری R به‌منظور تعیین رابطه بین شاخص‌های موردمطالعه استفاده شد و رگرسیون خطی چندگانه، نزدیک‌ترین همسایه، درخت‌های تصمیم‌گیری، جنگل‌های تصادفی و تقویت گرادیان نیز به‌عنوان مدل‌های پیش‌بینی کننده، مورد و ارزیابی قرار گرفت. به‌منظور انجام آزمون‌ها و الگوریتم‌ها از زبان برنامه‌نویسی پایتون استفاده شد.
یافته‌ها: در عرصه اثرگذاری خروجی‌های علمی موردمطالعه، حجم استنادها با شاخص‌های متعددی رابطه مثبت و معنی‌داری داشته است. در حوزة تأثیرگذاری اقتصادی نیز این نتیجه حاصل شد که تعداد استنادات ثبت اختراع به‌عنوان یکی از شاخص‌های معرف این نوع تأثیر، با موارد متعددی رابطه مثبت و معنی‌داری داشته است. در مورد تأثیرگذاری اجتماعی نیز تعداد بازدیدها رابطه مثبت و معنی‌داری با بسیاری از شاخص‌ها دارد. بر اساس نتایج حاصل‌شده، رگرسیون خطی چند متغیره با نمره صحت بالاتر و نمره انحراف معیار پایین‌تر، بهتر توانست میزان اثرگذاری علمی، فناورانه و اجتماعی بروندادهای علمی ایران در حوزة زیست‌فناوری را پیش‌بینی کند.
نتیجه‌گیری: مهم‌ترین عامل مؤثر بر کیفیت مقالات ازجمله در بُعد استنادها، بازدیدها و کاربردی بودن، همکاری بین‌المللی بوده که در این زمینة لازم است تدابیری اندیشه شود. پیشنهاد می‌شود هنگام ارزیابی کمّی و کیفی مقالات، از شاخص‌های متنوعی استفاده گردد تا تصویر شفاف‌تری از اثرگذاری پژوهش‌ها حاصل شود.

کلیدواژه‌ها

موضوعات


عنوان مقاله [English]

Predicting Scientific Research Impacts in Biotechnology by Machine Learning Algorithms

نویسنده [English]

  • Ghasem Azadi Ahmadabadi
Assistant Professor, Policy Evaluation and Monitoring of Science, Technology, and Innovation Department, National Research Insti-tute for Science Policy, Tehran, Iran.
چکیده [English]

purpose: Research impact is a key concern for stakeholders, as it reflects the beneficial and profitable applications of research across multiple dimensions, including society, economy, environment, culture, and health. This study aims to analyze the interrelationships among variables influencing scientific outputs and to identify the most effective machine learning algorithms for predicting their scientific, social, and economic impacts.
Methodology: The current research is applied in purpose and descriptive in method, utilizing a scientometric approach. This study aims to explore the relationship between the quantity of scientific publications and scientific cooperation, as well as the scientific, social, and economic impact of Iran's scientific contributions in the field of biotechnology. Additionally, it seeks to determine which machine learning algorithms are more effective in predicting the impact of scientific outputs across various dimensions. The research focuses on Iran's biotechnology scientific outputs indexed in the Scopus database from 2003 to 2024. The data, extracted on 21 January 2024, were analyzed using the SciVal analytical database. In this research, Pearson's correlation coefficient and the R software package were used to examine the relationships between the studied indicators. Machine learning algorithms, including multiple linear regression, nearest neighbors, decision trees, random forests, and gradient boosting, were applied and evaluated as predictive models. The Python programming language was employed to conduct tests and implement these algorithms.
Findings: The findings of this study showed that Iran's scientific outputs in this field in the period from 2003 to 2023 had increased 36 times, which is considered extremely high progress. A positive and significant relationship was observed between international collaboration and various impact indicators, including citation counts, Field-Weighted Citation Impact, Output in Top 10% Citation Percentiles, Patent-Citations Count, Patent-Citations per Scholarly Output, Scholarly Output cited by Patents, Patents Count, Views Count, Output in Top 10% Views Percentiles, Views per Publication, and Field-Weighted Views Impact. The index of academic collaboration also shows a positive and significant relationship with several indicators, including citation counts, Field-Weighted Citation Impact, Output in the Top 10% Citation Percentiles, Publications in the Top 10% Journal Percentiles (by Cite Score Percentile), Patents Count, Scholarly Output cited by Patents, Views Count, and Field-Weighted Views Impact. Academic-government collaboration  is also positively and significantly correlated with three indicators citations per publication, Patent-Citations Count, and Patent-Citations per Scholarly Output. In the case of the impact of the studied scientific outputs, citation counts are positively and significantly associated with several indicators, including Scholarly Output cited by Patents, Patents Count, Views Count, Views per Publication, and the Field-Weighted Views Impact of the field of biotechnology. In terms of economic impact, the result indicated that the number of patent citations is a key representative indicator. This indicator showed positive and significant relationships with other metrics, including academic Collaboration, international Collaboration, citation counts, citations per Publications, Field-Weighted Citation Impact, Views Count positive and significant relationship in Output in Top 10% Views Percentiles, Views per Publication and Field-Weighted Views Impact in the field of biotechnology. Regarding social impact, the analysis revealed that Views Count is positively and significantly correlated with several indicators, including citation counts, Field-Weighted Citation Impact, the number of patent citations, Patent-Citations per Scholarly Output, Scholarly Output cited by Patents and Patents Count in biotechnology. Based on the results, multivariate linear regression demonstrated higher accuracy and a lower standard deviation, making it a more effective model for predicting the scientific, technological, and social impact of Iran's scientific outputs in biotechnology.
Conclusion: International cooperation is the most significant factor influencing the quality of articles, including metrics such as citations, views, and applicability. Therefore, it is crucial to explore and implement strategies to enhance such collaborations. To better assess the effectiveness of research, it is recommended to employ a diverse set of indicators during both quantitative and qualitative evaluations of articles. For policymakers in science and technology, the primary focus should be on value creation and generating added value, particularly in the economic sector. This is essential despite the observed quantitative and qualitative growth in Iran's scientific outputs.

کلیدواژه‌ها [English]

  • Predicting research impact
  • Machine learning algorithms
  • Scientific re-search impact
  • Economic research impact
  • Social research impact
 
ابراهیمی، س.، دهقان، م.، و جوکار، ع. (1396). بررسی شاخص‌های پیش‌بینی کننده تأثیرگذاری علمی برای افزایش استناد گیری مقالات نشریه‌های علمی. پژوهشنامه پردازش و مدیریت اطلاعات، 32(3)، 661-694.  
آزادی احمدآبادی، ق. (1403). ارزیابی تأثیرات بروندادهای علمی: مطالعه موردی حوزه زیست‌فناوری ایران. [گزارش طرح پژوهشی]. مرکز تحقیقات سیاست علمی کشور.https://nrisp.ac.ir/%D8%A7%D8%B1%D
آزادی احمدآبادی، ق. (1403). سطوح و شاخص‌های ارزیابی تأثیرات پژوهش بر اساس تحلیل نظام‌های ارزیابی. ترویج علم. 15(1)، 76-103. https://doi.org/10.22034/popsci.2024.424371.1306
آزادی احمدآبادی، ق.، عبدی، س.، و رمضانی، ا. (1401). مطالعه تأثیرات علمی، اقتصادی و اجتماعی پژوهش‌های حوزه محیط‌زیست ایران. محیط‌زیست و توسعه فرا بخشی، (78)7، 5-38.
بابااکبری ساری، ا.، قهرمانی، م.، فتحی واجارگاه، ک.، و مؤتمنی، ع. (1396). ارائه الگوی ارزشیابی اثرات پژوهش‌های مدیریتی. پژوهش‌های مدیریت در ایران، 21(1)، 93-119.  https://mri.modares.ac.ir/article_418.html
بذرافشان، ا.، بیرانوند، ع.، و شجاعی فرد، ع. (1402). پیش‌بینی تعداد استنادات دریافتی حوزه فیزیک ذرات در اسکوپوس به کمک نمرات دگر سنجی پلام‌ایکس. فصلنامه بازیابی دانش و نظام‌های معنایی، [انتشار آنلاین از ۱۶ خرداد]. https://doi.org/10.22054/jks.2023.71392.1551
بیرانوند، ع.، گلشنی، م.، و دلقندی، ف. (1401). بررسی تأثیر شاخص‌های سایت ‌اسکور، اس‌ان‌آی‌پی و اس‌جی‌آر نشریات حوزه وب‌معنایی بر تعداد استنادات دریافتی مقالات. فصلنامه بازیابی دانش و نظام‌های معنایی، ۱۲(۴۲). https://doi.org/10.22054/jks.2022.67616.1501
نوروزی چاکلی، ع. (۱402). آشنایی با علم‌سنجی (مبانی، مفاهیم، روابط و ریشه‌ها). تهران: سازمان مطالعه و تدوین کتب علوم انسانی دانشگاه‌ها (سمت)، مرکز تحقیق و توسعه علوم انسانی؛ دانشگاه شاهد، مرکز چاپ و انتشارات. https://samt.ac.ir/fa/book/99/
Abrishami, A., & Aliakbary, S. (2019). Predicting citation counts based on deep neural network learning techniques. arXiv. https://doi.org/10.48550/arXiv.1809.04365  
Akella, A. P., Alhoori, H., Kondamudi, P. R., Freeman, C., & Zhou, H. (2021). Early indicators of scientific impact: Predicting citations with altmetrics. Journal of Informetrics, 15(2), 101128. https://doi.org/10.1016/j.joi.2020.101128
Alchokr, R., Haider, R., Shakeel, Y., Leich, T., Saake, G., & Krüger, J. (2023). Forecasting Publication’s Success Using Machine Learning Prediction Models [Conference presentation]. In International Workshop on Bibliometric-Enhanced Information (BIR). CEUR-WS. org.
      https://jacobkrueger.github.io/assets/papers/Alchokr2023ForcastingSuccess.pdf
Alohali, Y. A., Fayed, M. S., Mesallam, T., Abdelsamad, Y., Almuhawas, F., & Hagr, A. (2022). A machine learning model to predict citation counts of scientific papers in otology field. BioMed Research International, 2022, 1-12. https://doi.org/10.1155/2022/2239152
Anninos, L. N. (2013). Research performance evaluation: Some critical thoughts on standard bibliometric indicators. Studies in Higher Education, 39,(9) 1542–1561.
Ayoub, A., Amin, R., & Wani, Z. A. (2023). Exploring the Impact of Altmetrics in Relation to Citation Count and SCImago Journal Rank (SJR). Journal of Scientometric Research, 12(3), 603-608. https://doi.org/10.5530/jscires.12.3.058
Azadi Ahmadabadi, G., Abdi, S., & Ramezani, A. (2022). Studying the Scientific, Economic and Social Effects of Iranian Environmental Researches. Environment and Interdisciplinary Development, 7(78), 38-55. https://doi.org/10.22034/envj.2023.351434.1217 [In Persian].
Azadi Ahmadabadi, G. (2024). Evaluation of the effects of scientific outputs: case study of Iran's biotechnology [Research project report]. National Research Institute for Science Policy (NRISP). https://nrisp.ac.ir/%D8%A7%D8%B1%/ [In Persian].
Azadi, G. (2024). Evaluation research impacts: levels and indicators. Journal of the Popularization of Science, 15(1), 76-103.
      https://doi.org/10.22034/popsci.2024.424371.1306 [In Persian].
Babaakbarisari, A., Ghahremani, M., Fathi vajargah, K., & Moatameni, A. (2021). Developing Management Researches Impacts Assessment Model. Management Research in Iran, 21(1), 93-119. https://mri.modares.ac.ir/article_418.html [In Persian].
Bazrafshan, A., Biranvand, A., & Shojaeifard, A. (2023). Predicting the number of citations received in particle physics Scopus with the help of Plumx-Altmetric scores. Knowledge Retrieval and Semantic Systems, [Available online from 6 June].
      https://doi.org/10.22054/jks.2023.71392.1551 [In Persian].
Bai, X., Liu, H., Zhang, F., Ning, Z., Kong, X., Lee, I., & Xia, F. (2017). An overview on evaluating and predicting scholarly article impact. Information, 8(3), 73.
Bernard, S., Adam, S., & Heutte, L. (2012). Dynamic random forests. Pattern Recognition Letters, 33(12), 1580-1586. https://doi.org/10.1016/j.patrec.2012.04.003
Biranvand, A., Golshani, M., & Delghandi, F. (2022). Investigating the impact of Citescore, SNIP, and SJR indicators of semantic web publications on the number of received citations of articles. Knowledge Retrieval and Semantic Systems, 12(42). 
      https://doi.org/10.22054/jks.2022.67616.1501 [In Persian].
Bollen, J., Van de Sompel, H., Hagberg, A., & Chute, R. (2009). A principal component analysis of 39 scientific impact measures. PloS one, 4(6), e6022.
Bornmann, L., Leydesdorff, L., & Wang, J. (2013). Which percentile-based approach should be preferred for calculating normalized citation impact values? An empirical comparison of five approaches including a newly developed citation-rank approach (p100). Journal of Informetrics, 7(4), 933–944. https://doi.org/10.1016/j.joi.2013.09.003
Ebrahimy, S., Dehghan, M., & Jowkar, A. (2017). Evaluation the predictive indicators of scientific impact to increase the citations of articles in scientific journals. Iranian Journal of Information Processing and Management, 32(3), 661-694.
Gu, X., & Krenn, M. (2024). Forecasting high-impact research topics via machine learning on evolving knowledge graphs (Version 2). arXiv. https://doi.org/10.48550/arXiv.2402.08640
Guthrie, S., Wamae, W., Diepeveen, S., Wooding, S., & Grant, J. (2013). Measuring research, A guide to research evaluation frameworks and tools, RAND Corporation, MG-1217-AAMC, 2013. https://www.rand.org/pubs/monographs/MG1217.html
Hansen, I. S., & Torvund, M. (2022). Predicting the impact of academic articles on marketing research: Using machine learning to predict highly cited marketing articles [Unpublished master's dissertation] Norwegian School of Economics. https://openaccess.nhh.no/nhh- xmlui/bitstream/handle/11250/3015929/masterthesis.pdf?sequence=1
Hastie, T., & Tibshirani, R. (1995). Discriminant adaptive nearest neighbor classification and regression. Advances in neural information processing systems, 8, 409-415.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: data mining, inference, and prediction, Springer Series in Statistics,  337-387.
      https://www.sas.upenn.edu/~fdiebold/NoHesitations/BookAdvanced.pdf
Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245), 255-260. https://doi.org/10.1126/science.aaa8415
Kosteas, V. D. (2018). Predicting long-run citation counts for articles in top economics journals. Scientometrics, 115(3), 1395-1412. https://doi.org/10.1007/s11192-018-2703-0
Maulud, D., & Abdulazeez, A. M. (2020). A review on linear regression comprehensive in machine learning. Journal of Applied Science and Technology Trends, 1(2), 140-147.
McNamara, D., Wong, P., Christen, P., & Ng, K. S. (2013). Predicting high impact academic papers using citation network features [Conference presenation]. In Trends and Applications in Knowledge Discovery and Data Mining- PAKDD 2013 International Workshops: DMApps, DANTH, QIMIE, BDM, CDA, CloudSD, Gold Coast, QLD, Australia, April 14-17, 2013, Revised Selected Papers 17 (pp. 14-25). Springer Berlin Heidelberg.
      https://link.springer.com/chapter/10.1007/978-3-642-40319-4_2
Paun, M., Abigaela, B., Paul, B., Anastasia, C., Catalina, E., Anne, H., Nicoleta, I., & Eduard, M. (2020). Predicting long-term citation counts in Web of Science: COVID-19 early publications case study. Romanian Statistical Review, (4).
      https://www.revistadestatistica.ro/wp-content/uploads/2020/12/A4-RRS4_2020.pdf
Newson, R., King, L., Rychetnik, L., Bauman, A. E., Redman, S., Milat, A. J., Schroeder, J., Cohen, G., & Chapman, S. (2015). A mixed methods study of the factors that influence whether intervention research has policy and practice impacts: perceptions of Australian researchers. BMJ open, 5(7), e008153. https://doi.org/10.1136/bmjopen-2015-008153
Noroozi Chakoli, A. (2023). Introduction to scientometric (foundations, concepts, relations & origins). Tehran: SAMT, Shahed University. https://samt.ac.ir/en/book/3376/introduction-to-scientometric [In Persian].
Piryonesi, S. M., & El-Diraby, T. E. (2020). Data analytics in asset management: Cost-effective prediction of the pavement condition index. Journal of infrastructure systems, 26(1).
      http://dx.doi.org/10.1061/(ASCE)IS.1943-555X.0000512
Stegehuis, C., Litvak, N., & Waltman, L. (2015). Predicting the long-term citation impact of recent publications. Journal of informetrics, 9(3), 642-657.
Studer, M., Ritschard, G., Gabadinho, A., & Müller, N. S. (2011). Discrepancy analysis of state sequences. Sociological methods & amp; research, 40(3), 471-510.
      https://doi.org/10.1177%2F0049124111415372
Timilsina, M., Davis, B., Taylor, M., & Hayes, C. (2016). Towards predicting academic impact from mainstream news and weblogs: A heterogeneous graph-based approach [Conference presentation]. In 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) (pp. 1388-1389). IEEE. 
      https://ieeexplore.ieee.org/document/7752425
Weihs, L., & Etzioni, O. (2017). Learning to predict citation-based impact measures. [Conference presentation]. In 2017 ACM/IEEE joint conference on digital libraries (JCDL), (p 49–58). https://dl.acm.org/doi/abs/10.5555/3200334.3200341
Weinberger, K. Q., & Saul, L. K. (2009). Distance metric learning for large margin nearest neighbor classification. Journal of machine learning research, 244-207.
Williams, K., & Lewis, J. M. (2021). Understanding, measuring, and encouraging public policy research impact. Australian Journal of Public Administration, 80(3), 554-564.
Wooldridge, J., & King, M. B. (2018). Altmetric scores: An early indicator of research impact. Journal of the Association for Information Science and Technology, 70(3), 271–282.
Xiong, C., Sun, H., Pan, D., & Li, Y. (2019). Personalized Collaborative Filtering Recommendation Algorithm based on Linear Regression [Conference presentation]. In 2019 IEEE International Conference on Power Data Science (ICPDS) (pp. 363–368). IEEE. https://doi.org/10.18280/mmep.060307
Wu, X., Kumar, V., Ross Quinlan, J., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G. J., Ng, A., Liu, B., Yu, P. S., Zhou, Z.H., Steibach, M., Hand, D. J., & Steinberg, D. (2007). Top 10 algorithms in data mining. Knowledge and information systems, 14(1), 1-37.
Yu, T., Yu, G., Li, P.-Y., & Wang, L. (2014). Citation impact prediction for scientific papers using stepwise regression analysis. Scientometrics, 101(2), 1233-1252.
Zhang, F., & Wu, S. (2020). Predicting future influence of papers, researchers, and venues in a dynamic academic network. Journal of Informetrics, 14(2), 101035.
Ziegler, A., & König, I. R. (2013). Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 4(1), 55-63. https://doi.org/10.1002/widm.1114