عنوان مقاله [English]
Provide a framework to solve the problem of confusion and dispersion of authors' names in Persian articles, which has led to rupture and lack of comprehensiveness in information retrieval.present Research is an applied scientometrics method that is carried out by documentary procedure and the required data is collected from the ISC. The initial statistical population is 913 record during the period 1395 to 1397. The proposed framework consists of three stages: search, matching and grouping. In this regard, after initial pre-processing and feature extraction, the search operation is performed with the aim of finding records that are potentially likely to be identical. The same records are then found through further investigation in the adaptation phase, which is based on random forest.
Finding: Email address, last name and first name are the most important features to optimize name writing confusion. Using a random forest as a classifier in matching phase, with an accuracy of over 99%, can solve the problem of confusion in writing the names of authors. Results show the high efficiency of this method in uniformity of names according to the criteria of accuracy, recall and F value compared to the support vector machine, the nearest neighbor and genetics.