关键词: COVID-19 coronavirus deep learning infodemic infodemiology information retrieval literature multistage retrieval neural search online information

Mesh : Algorithms COVID-19 Humans Information Storage and Retrieval Language SARS-CoV-2

来  源:   DOI:10.2196/30161   PDF(Pubmed)

Abstract:
The COVID-19 global health crisis has led to an exponential surge in published scientific literature. In an attempt to tackle the pandemic, extremely large COVID-19-related corpora are being created, sometimes with inaccurate information, which is no longer at scale of human analyses.
In the context of searching for scientific evidence in the deluge of COVID-19-related literature, we present an information retrieval methodology for effective identification of relevant sources to answer biomedical queries posed using natural language.
Our multistage retrieval methodology combines probabilistic weighting models and reranking algorithms based on deep neural architectures to boost the ranking of relevant documents. Similarity of COVID-19 queries is compared to documents, and a series of postprocessing methods is applied to the initial ranking list to improve the match between the query and the biomedical information source and boost the position of relevant documents.
The methodology was evaluated in the context of the TREC-COVID challenge, achieving competitive results with the top-ranking teams participating in the competition. Particularly, the combination of bag-of-words and deep neural language models significantly outperformed an Okapi Best Match 25-based baseline, retrieving on average, 83% of relevant documents in the top 20.
These results indicate that multistage retrieval supported by deep learning could enhance identification of literature for COVID-19-related questions posed using natural language.
摘要:
COVID-19全球健康危机导致已发表的科学文献呈指数级激增。为了应对大流行,正在创建非常大的与COVID-19相关的语料库,有时信息不准确,不再是人类分析的规模。
在大量与COVID-19相关的文献中寻找科学证据的背景下,我们提出了一种信息检索方法,用于有效识别相关来源,以回答使用自然语言提出的生物医学查询。
我们的多阶段检索方法结合了概率加权模型和基于深度神经架构的重排序算法,以提高相关文档的排名。将COVID-19查询的相似性与文档进行比较,并将一系列后处理方法应用于初始排序列表,以提高查询与生物医学信息源的匹配性,提升相关文档的位置。
该方法是在TREC-COVID挑战的背景下进行评估的,与参加比赛的顶级团队取得比赛成绩。特别是,词袋和深度神经语言模型的组合显着优于基于OkapiBestMatch25的基线,平均检索,83%的相关文件在前20名。
这些结果表明,深度学习支持的多级检索可以增强使用自然语言提出的与COVID-19相关问题的文献识别。
公众号