关键词: MEDLINE author name completeness author name disambiguation author name search

Mesh : MEDLINE Authorship Periodicals as Topic Names

来  源:   DOI:10.1093/jamia/ocae127   PDF(Pubmed)

Abstract:
OBJECTIVE: Author name incompleteness, referring to only first initial available instead of full first name, is a long-standing problem in MEDLINE and has a negative impact on biomedical literature systems. The purpose of this study is to create an Enhanced Author Names (EAN) dataset for MEDLINE that maximizes the number of complete author names.
METHODS: The EAN dataset is built based on a large-scale name comparison and restoration with author names collected from multiple literature databases such as MEDLINE, Microsoft Academic Graph, and Semantic Scholar. We assess the impact of EAN on biomedical literature systems by conducting comparative and statistical analyses between EAN and MEDLINE\'s author names dataset (MAN) on 2 important tasks, author name search and author name disambiguation.
RESULTS: Evaluation results show that EAN improves the number of full author names in MEDLINE from 69.73 million to 110.9 million. EAN not only restores a substantial number of abbreviated names prior to the year 2002 when the NLM changed its author name indexing policy but also improves the availability of full author names in articles published afterward. The evaluation of the author name search and author name disambiguation tasks reveal that EAN is able to significantly enhance both tasks compared to MAN.
CONCLUSIONS: The extensive coverage of full names in EAN suggests that the name incompleteness issue can be largely mitigated. This has significant implications for the development of an improved biomedical literature system. EAN is available at https://zenodo.org/record/10251358, and an updated version is available at https://zenodo.org/records/10663234.
摘要:
目的:作者姓名不完整,仅引用第一个可用的首字母而不是完整的名字,是MEDLINE中一个长期存在的问题,对生物医学文献系统产生负面影响。这项研究的目的是为MEDLINE创建增强作者姓名(EAN)数据集,以最大程度地增加完整作者姓名的数量。
方法:EAN数据集是基于从多个文献数据库(如MEDLINE)收集的作者姓名进行大规模名称比较和恢复而构建的。Microsoft学术图,和语义学者。我们通过对EAN和MEDLINE的两个重要任务的作者姓名数据集(MAN)进行比较和统计分析来评估EAN对生物医学文献系统的影响。作者姓名搜索和作者姓名歧义消除。
结果:评估结果表明,EAN将MEDLINE中的作者全名数量从6973万提高到了11090万。EAN不仅在2002年NLM更改其作者姓名索引策略之前恢复了大量的缩写名称,而且还提高了之后发表的文章中作者姓名的可用性。对作者姓名搜索和作者姓名歧义消除任务的评估表明,与MAN相比,EAN能够显着增强这两个任务。
结论:EAN对全名的广泛覆盖表明,名称不完整的问题可以在很大程度上得到缓解。这对于开发改进的生物医学文献系统具有重要意义。EAN可在https://zenodo.org/record/10251358获得,更新版本可在https://zenodo.org/records/10663234获得。
公众号