关键词: SemMedDB biomedical knowledge confounders directed acyclic graph (DAG) knowledge graph

Mesh : Data Mining / methods Humans Knowledge Bases MEDLINE

来  源:   DOI:10.1111/jebm.12602

Abstract:
OBJECTIVE: It is essential for health researchers to have a systematic understanding of third-party variables that influence both the exposure and outcome under investigation, as shown by a directed acyclic graph (DAG). The traditional construction of DAGs through literature review and expert knowledge often needs to be more systematic and consistent, leading to potential biases. We try to introduce an automatic approach to building network linking variables of interest.
METHODS: Large-scale text mining from medical literature was utilized to construct a conceptual network based on the Semantic MEDLINE Database (SemMedDB). SemMedDB is a PubMed-scale repository of the \"concept-relation-concept\" triple format. Relations between concepts are categorized as Excitatory, Inhibitory, or General.
RESULTS: To facilitate the use of large-scale triple sets in SemMedDB, we have developed a computable biomedical knowledge (CBK) system (https://cbk.bjmu.edu.cn/), a website that enables direct retrieval of related publications and their corresponding triples without the necessity of writing SQL statements. Three case studies were elaborated to demonstrate the applications of the CBK system.
CONCLUSIONS: The CBK system is openly available and user-friendly for rapidly capturing a set of influencing factors for a phenotype and building candidate DAGs between exposure-outcome variables. It could be a valuable tool to reduce the exploration time in considering relationships between variables, and constructing a DAG. A reliable and standardized DAG could significantly improve the design and interpretation of observational health research.
摘要:
目的:健康研究人员必须系统地了解影响暴露和结果的第三方变量,如有向无环图(DAG)所示。传统的通过文献综述和专家知识构建DAG往往需要更加系统和一致,导致潜在的偏见。我们尝试引入一种自动方法来构建网络链接感兴趣的变量。
方法:利用医学文献中的大规模文本挖掘来构建基于语义MEDLINE数据库(SemMedDB)的概念网络。SemMedDB是“概念-关系-概念”三元组格式的PubMed规模存储库。概念之间的关系被归类为兴奋,抑制性,或将军。
结果:为了便于在SemMedDB中使用大规模三元组,我们开发了一个可计算的生物医学知识(CBK)系统(https://cbk。bjmu.edu.cn/),一个网站,可以直接检索相关出版物及其相应的三元组,而无需编写SQL语句。阐述了三个案例研究来展示CBK系统的应用。
结论:CBK系统是公开可用且用户友好的,可以快速捕获一组表型的影响因素,并在暴露-结果变量之间建立候选DAG。这可能是一个有价值的工具,可以减少考虑变量之间关系的探索时间,构建DAG。可靠和标准化的DAG可以显着改善观察性健康研究的设计和解释。
公众号