关键词: deep learning health misinformation infodemic information retrieval language model transfer learning

来  源:   DOI:10.2196/42630   PDF(Pubmed)

Abstract:
BACKGROUND: Widespread misinformation in web resources can lead to serious implications for individuals seeking health advice. Despite that, information retrieval models are often focused only on the query-document relevance dimension to rank results.
OBJECTIVE: We investigate a multidimensional information quality retrieval model based on deep learning to enhance the effectiveness of online health care information search results.
METHODS: In this study, we simulated online health information search scenarios with a topic set of 32 different health-related inquiries and a corpus containing 1 billion web documents from the April 2019 snapshot of Common Crawl. Using state-of-the-art pretrained language models, we assessed the quality of the retrieved documents according to their usefulness, supportiveness, and credibility dimensions for a given search query on 6030 human-annotated, query-document pairs. We evaluated this approach using transfer learning and more specific domain adaptation techniques.
RESULTS: In the transfer learning setting, the usefulness model provided the largest distinction between help- and harm-compatible documents, with a difference of +5.6%, leading to a majority of helpful documents in the top 10 retrieved. The supportiveness model achieved the best harm compatibility (+2.4%), while the combination of usefulness, supportiveness, and credibility models achieved the largest distinction between help- and harm-compatibility on helpful topics (+16.9%). In the domain adaptation setting, the linear combination of different models showed robust performance, with help-harm compatibility above +4.4% for all dimensions and going as high as +6.8%.
CONCLUSIONS: These results suggest that integrating automatic ranking models created for specific information quality dimensions can increase the effectiveness of health-related information retrieval. Thus, our approach could be used to enhance searches made by individuals seeking online health information.
摘要:
背景:网络资源中广泛存在的错误信息可能会对寻求健康建议的个人产生严重影响。尽管如此,信息检索模型通常只关注查询文档相关性维度来对结果进行排名。
目的:研究基于深度学习的多维信息质量检索模型,以提高在线医疗信息搜索结果的有效性。
方法:在本研究中,我们模拟了在线健康信息搜索场景,其中包含32个不同的健康相关查询的主题集和一个包含2019年4月常见爬网快照中10亿个Web文档的语料库。使用最先进的预训练语言模型,我们根据检索到的文件的有用性评估其质量,支持性,以及6030人工注释的给定搜索查询的可信度,查询-文档对。我们使用迁移学习和更具体的领域适应技术来评估这种方法。
结果:在迁移学习设置中,有用性模型提供了帮助和伤害兼容文档之间的最大区别,相差5.6%,导致检索到的前10名中的大多数有用文档。支持性模型实现了最佳的伤害相容性(+2.4%),而有用性的结合,支持性,和可信度模型在有用的主题上实现了帮助和伤害兼容性之间的最大区别(+16.9%)。在域自适应设置中,不同模型的线性组合表现出稳健的性能,所有尺寸的帮助-伤害兼容性都高于+4.4%,高达+6.8%。
结论:这些结果表明,集成为特定信息质量维度创建的自动排名模型可以提高与健康相关的信息检索的有效性。因此,我们的方法可用于增强寻求在线健康信息的个人的搜索。
公众号