METHODS: The EAN dataset is built based on a large-scale name comparison and restoration with author names collected from multiple literature databases such as MEDLINE, Microsoft Academic Graph, and Semantic Scholar. We assess the impact of EAN on biomedical literature systems by conducting comparative and statistical analyses between EAN and MEDLINE\'s author names dataset (MAN) on 2 important tasks, author name search and author name disambiguation.
RESULTS: Evaluation results show that EAN improves the number of full author names in MEDLINE from 69.73 million to 110.9 million. EAN not only restores a substantial number of abbreviated names prior to the year 2002 when the NLM changed its author name indexing policy but also improves the availability of full author names in articles published afterward. The evaluation of the author name search and author name disambiguation tasks reveal that EAN is able to significantly enhance both tasks compared to MAN.
CONCLUSIONS: The extensive coverage of full names in EAN suggests that the name incompleteness issue can be largely mitigated. This has significant implications for the development of an improved biomedical literature system. EAN is available at https://zenodo.org/record/10251358, and an updated version is available at https://zenodo.org/records/10663234.
方法:EAN数据集是基于从多个文献数据库(如MEDLINE)收集的作者姓名进行大规模名称比较和恢复而构建的。Microsoft学术图,和语义学者。我们通过对EAN和MEDLINE的两个重要任务的作者姓名数据集(MAN)进行比较和统计分析来评估EAN对生物医学文献系统的影响。作者姓名搜索和作者姓名歧义消除。
结果:评估结果表明,EAN将MEDLINE中的作者全名数量从6973万提高到了11090万。EAN不仅在2002年NLM更改其作者姓名索引策略之前恢复了大量的缩写名称,而且还提高了之后发表的文章中作者姓名的可用性。对作者姓名搜索和作者姓名歧义消除任务的评估表明,与MAN相比,EAN能够显着增强这两个任务。
结论:EAN对全名的广泛覆盖表明,名称不完整的问题可以在很大程度上得到缓解。这对于开发改进的生物医学文献系统具有重要意义。EAN可在https://zenodo.org/record/10251358获得,更新版本可在https://zenodo.org/records/10663234获得。