Topic Modeling of Endocrinology and Metabolism Articles by Iranian Researchers in the Web of Science

Document Type : Research Paper

Authors

1 PhD student in Information and Knowledge Science, Department of Communication and Knowledge Sciences, Science and Research Branch, Islamic Azad University, Tehran, Iran.

2 Professor, Department of Communication Science and Science, Islamic Azad University, Tehran, Iran.

3 Assistant Professor of Applied Mathematics, Islamic Azad University, South Tehran Branch.

4 Associate Professor, Islamic Azad University ,Science & Research Branch

Abstract

Purpose: Probabilistic topic modeling methods consist of a set of algorithms whose main purpose is to discover the hidden subject structure in a large volume of documents. The purpose of this study is to thematically model the articles of Iranian researchers in the field of endocrinology and metabolism in the citation database of Web of Science.
Methodology: The present research is of applied type and has been done by text mining and content analysis method. In this study, all required data were retrieved from the Web of Science Citation Database using the keywords registered in the medical subject heading without a time limit until November 6, 2018. Then, using a hidden allocation algorithm, the whole set of documents in MATLAB was analyzed.  
Findings: Subject categories were extracted as groups of 20 words in 10 subject categories. Then, by endocrinologists, the subject categories were named based on their relationship to various topics in the field of         endocrinology and metabolism, and each category was assigned a subject title.
Conclusion: The results indicate that the implementation of the latent Dirichlet allocation model has an acceptable performance in presenting the categories of endocrinology and metabolism. The extracted subject categories have good homogeneity and thematic relevance with each  other.

Keywords


 
حشمتی، هاشم، بهنام‌پور، ناصر، خراسانی، فرشته، و مقدم، زهرا. (1392). شیوع عوارض مزمن دیابت و برخی عوامل مرتبط آن در بیماران دیابتی نوع دو مراجعه‌کننده به مرکز دیابت شهرستان فریدون‌کنار. مجله دانشکده علوم پزشکی نیشابور، 1 (2).
شکرچیان چالشتری، رضا. (1395). مدل‌سازی موضوعی با استفاده از خوشه‌بندی برای اسناد دامنه خاص. پایان‌نامه کارشناسی ارشد. دانشگاه تهران.
صابری، محمدکریم، و اسفندیاری مقدم، علیرضا. (1390). بررسی میزان دسترس‌پذیری و زوال استنادهای وبی مقالات نمایه‌شده در مؤسسه اطلاعات علمی (ISI) در حوزه اطلاعات سلامت و کتابداری پزشکی. مدیریت اطلاعات سلامت، 8 (2)، 189-197.
قاضی میرسعید، جواد، و صنیعی، نادیا. (1394). ارزیابی جایگاه علمی مراکز تحقیقاتی غدد درون‌ریز، دیابت و متابولیسم دانشگاه‌های علوم پزشکی کشور به روش .Exergy مجله علمی دانشگاه علوم پزشکی کردستان، 20 (5)، 110- 119.
لاریجانی، باقر، و دیگران. (1395). چارچوب ملی ارائه خدمت در بیماری دیابت در راستای سند ملی پیشگیری و کنترل بیماری‌های غیرواگیر. تهران: کمیته ملی پیشگیری و کنترل بیماری‌های غیرواگیر.
مسعودی، بابک، و راحتی، سعید. (1394). رفع ابهام معنایی واژگان مبهم فارسی با مدل موضوعی LDA. فصل‌نامه علمی پژوهشی پردازش علائم و داده‌ها، ۱۲ (4)، ۱۱۷-125.
Bastani, K., Namavari, H., & Shaffer, J. (2019). Latent Dirichlet allocation (LDA) for topic modeling of the CFPB consumer complaints. 127, 256-271.
Blei, M. D. (2017). Latent dirichlet allocation. J Mach Learn Res. 3:993–1922.
Blei, D.M., Ng, A.Y. & Jordan, M.I. (2003). Latent Dirichlet Allocation. Journal of machine Learning research, 3,993–1922.
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. 41(6), 391-407.
Golden, S.H., Robinson, K.A., Saldanha, I. Anton, B. & Ladenson, W. (2009). Prevalence and Incidence of Endocrine & Metabolic Disorders in the United States: A Comprehensive Review. J Clin Endocrinol Metab,94(6),1853–1878.
Hidayatullah, A. F., Aditya, S. K., & Karimah, S. T. (2019). Topic modeling of weather and climate condition on twitter using latent dirichlet allocation (LDA). Paper presented at the IOP Conference Series: Materials Science and Engineering. IOP Publishing.
Hofmann, T. (1999). Probabilistic latent semantic indexing. Paper presented at the Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval.
Hofmann, T. (2001). Unsupervised learning by probabilistic latent semantic analysis. Mach Learn,42(1–2),177–196.
Howes, C., Purver, M. & McCabe, R. (2013). Using conversation topics for predicting therapy outcomes in schizophrenia. Biomed Inf Insights, 6, BII. S11661.
Kandula, S., Curtis, D., Hill, B., & Zeng-Treitler, Q. (2011). Use of topic modeling for recommending relevant education material to diabetic patients. Paper presented at the AMIA annual symposium proceedings.
Liu, L., Tang, L., Dong, W, Shaowen, Y. & Zhoucorresponding W. (2016). An overview of topic modeling and its current applications in bioinformatics. SpringerPlus,5(1),1608.
Mehler, A. & Waltinger, U. (2009). Enhancing document modeling by means of open topic models Crossing the frontier of classification schemes in digital libraries by example of the DDC. Library Hi Tech,27(4),520-539.
Özmutlu, S. & Çavdur, F. (2005) Neural network applications for automatic new topic identification. Online Information Review,29(1),34-53.
Park, S., Choi, D., Lee, W., Jung, D., Kim, M., & Moon, C. (2014). Disease-medicine topic model for prescription record mining. Paper presented at the 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC).
Ramage, D., Hall, D., Nallapati, R., & Manning, C. D. (2009). Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. Paper presented at the Proceedings of the 2009 conference on empirical methods in natural language processing.
Robinson, S.D. (2019). Temporal topic modeling applied to aviation safety reports: A subject matter expert review. Safety Science. 116, 275-286.
Sendhilkumar, S., Nachiyar, S.N., & Mahalakshmi, G.S. (2013). Novelty Detection via Topic Modeling in Research Articles. Paper presented at the Proceedings of international conference ICCSEA.
Song, C.W., Jung, H., & Chung, K. (2017). Cluster Comput. Development of a medical big-data mining process using topic modeling. Cluster Computing, (22),1949–1958.
Taghva, K., Russell, B. and Sadeh, M. (2003). A list of farsi stopwords. Retrieved Sept,7(2).
Wang, L., Zhang, Y., Zhang, Y., Xu, X., & Cao, S. (2017). Prescription function prediction using topic model and multilabel classifiers. Evidence-Based Complementary and Alternative Medicine, 2017.
Zhao, W., Chen, J. J., Perkins, R., Liu, Z., Ge, W., Ding, Y., & Zou, W. (2015). A heuristic approach to determine an appropriate number of topics in topic modeling. Paper presented at the BMC bioinformatics.
Verheggen, K., Ræder, H., Berven, Frode S., Martens, L., & Barsnes, H. (2020). Anatomy and evolution of database search engines—a central component of mass spectrometry based proteomic workflows. mass spectrometry review,39(3),292-306.