{Reference Type}: Journal Article {Title}: Bridging the gap in author names: building an enhanced author name dataset for biomedical literature system. {Author}: Zhang L;Song N;Gui S;Wu K;Lu W; {Journal}: J Am Med Inform Assoc {Volume}: 31 {Issue}: 8 {Year}: 2024 Aug 1 {Factor}: 7.942 {DOI}: 10.1093/jamia/ocae127 {Abstract}: OBJECTIVE: Author name incompleteness, referring to only first initial available instead of full first name, is a long-standing problem in MEDLINE and has a negative impact on biomedical literature systems. The purpose of this study is to create an Enhanced Author Names (EAN) dataset for MEDLINE that maximizes the number of complete author names.
METHODS: The EAN dataset is built based on a large-scale name comparison and restoration with author names collected from multiple literature databases such as MEDLINE, Microsoft Academic Graph, and Semantic Scholar. We assess the impact of EAN on biomedical literature systems by conducting comparative and statistical analyses between EAN and MEDLINE's author names dataset (MAN) on 2 important tasks, author name search and author name disambiguation.
RESULTS: Evaluation results show that EAN improves the number of full author names in MEDLINE from 69.73 million to 110.9 million. EAN not only restores a substantial number of abbreviated names prior to the year 2002 when the NLM changed its author name indexing policy but also improves the availability of full author names in articles published afterward. The evaluation of the author name search and author name disambiguation tasks reveal that EAN is able to significantly enhance both tasks compared to MAN.
CONCLUSIONS: The extensive coverage of full names in EAN suggests that the name incompleteness issue can be largely mitigated. This has significant implications for the development of an improved biomedical literature system. EAN is available at https://zenodo.org/record/10251358, and an updated version is available at https://zenodo.org/records/10663234.