knowledge representation

  • 文章类型: Journal Article
    背景:世界卫生组织已将健康的社会决定因素(SDoH)描述为个人出生的条件,活,工作,和年龄。这些条件可以分为3个相互关联的级别,称为宏观(社会),mesolevel(community),和微观(个体)决定因素。SDoH的范围超出了生物医学水平,仍然需要连接其他领域,如经济学,公共政策,和社会因素。
    目的:提供一种可计算的人工制品,将健康数据与涉及不同水平决定因素的概念联系起来,可能会提高我们对SDoH对人群影响的理解。通过决定因素和生物学因素之间的明确联系,对SDoH进行建模可能有助于减少文献中现有的空白。这反过来可以使研究人员和临床医生更好地理解数据并通过使用语义链接发现新知识。
    方法:开发了一种实验本体来表示对SDoH的社会和经济特征的知识。分析了来自27个文献来源的信息,以收集概念并使用Web本体语言进行编码,版本2(OWL2)和Protégé。四名评估人员使用自然语言翻译独立审查了本体论公理。评估的分析和基本形式本体论的选定术语用于创建经过修订的本体论,该本体论具有从宏观到微观决定因素的广泛知识概念。
    结果:文献检索确定了每个决定因素级别的几个讨论主题。以卫生政策为中心的宏观决定因素的出版物,收入不平等,福利,和环境。与中水平决定因素有关的文章讨论了工作,工作条件,社会心理因素,社会经济地位,结果,食物,贫穷,住房,和犯罪。最后,为检查性别的微观决定因素找到的来源,种族,种族,和行为。概念是从文献中收集的,用于产生由383个类组成的本体,109个对象属性,和748个逻辑公理。推理测试显示没有不一致的公理。
    结论:这个本体论对异构的社会和经济概念进行建模,以代表SDoH的各个方面。SDoH的范围是广阔的,虽然本体论很广泛,它仍处于早期阶段。根据我们目前的理解,这种本体代表了第一次尝试专注于现有本体目前未涵盖的知识概念。未来的方向将包括进一步扩大本体论与其他生物医学本体论的联系,包括粒度语义的对齐。
    BACKGROUND: Social determinants of health (SDoH) have been described by the World Health Organization as the conditions in which individuals are born, live, work, and age. These conditions can be grouped into 3 interrelated levels known as macrolevel (societal), mesolevel (community), and microlevel (individual) determinants. The scope of SDoH expands beyond the biomedical level, and there remains a need to connect other areas such as economics, public policy, and social factors.
    OBJECTIVE: Providing a computable artifact that can link health data to concepts involving the different levels of determinants may improve our understanding of the impact SDoH have on human populations. Modeling SDoH may help to reduce existing gaps in the literature through explicit links between the determinants and biological factors. This in turn can allow researchers and clinicians to make better sense of data and discover new knowledge through the use of semantic links.
    METHODS: An experimental ontology was developed to represent knowledge of the social and economic characteristics of SDoH. Information from 27 literature sources was analyzed to gather concepts and encoded using Web Ontology Language, version 2 (OWL2) and Protégé. Four evaluators independently reviewed the ontology axioms using natural language translation. The analyses from the evaluations and selected terminologies from the Basic Formal Ontology were used to create a revised ontology with a broad spectrum of knowledge concepts ranging from the macrolevel to the microlevel determinants.
    RESULTS: The literature search identified several topics of discussion for each determinant level. Publications for the macrolevel determinants centered around health policy, income inequality, welfare, and the environment. Articles relating to the mesolevel determinants discussed work, work conditions, psychosocial factors, socioeconomic position, outcomes, food, poverty, housing, and crime. Finally, sources found for the microlevel determinants examined gender, ethnicity, race, and behavior. Concepts were gathered from the literature and used to produce an ontology consisting of 383 classes, 109 object properties, and 748 logical axioms. A reasoning test revealed no inconsistent axioms.
    CONCLUSIONS: This ontology models heterogeneous social and economic concepts to represent aspects of SDoH. The scope of SDoH is expansive, and although the ontology is broad, it is still in its early stages. To our current understanding, this ontology represents the first attempt to concentrate on knowledge concepts that are currently not covered by existing ontologies. Future direction will include further expanding the ontology to link with other biomedical ontologies, including alignment for granular semantics.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    精确的语义表示对于让机器真正理解自然语言文本的含义非常重要,尤其是生物医学文献。尽管可以用现有方法准确地表示单个句子中单词之间的语义关系,两个句子之间的关系还不能准确建模,这导致缺乏上下文信息,并且难以执行可解释的语义推断。此外,合并由不同专家策划的语义表示是具有挑战性的。现有方法没有充分解决这些关键挑战。在本文中,我们提出了一个结构化语义表示(FSSR)的框架来解决这些问题。FSSR使用双层结构Construct,它结合了Paradigm和Instance来表示单词或句子的语义。它使用六种类型的规则来表示句子构造之间的语义关系,并使用计算模型来表示动作。FSSR是基于图的语义表示,其中节点表示构造或范例。两个节点通过一条边(一条规则)连接。此外,FSSR实现了可解释的推理和新信息的主动获取,如案例研究所示。本案例研究对癌症预后分析文章的语义进行了建模,并复制了其文本结果和图表。我们提供了一个可视化推理过程的网站(http://cragraph。synergylab。cn)。
    Precise semantic representation is important for allowing machines to truly comprehend the meaning of natural language text, especially biomedical literature. Although the semantic relations among words in a single sentence may be accurately represented with existing approaches, relations between two sentences cannot yet be accurately modeled, which leads to a lack of contextual information and difficulty in performing interpretable semantic inference. Additionally, it is challenging to merge semantic representations curated by different experts. These critical challenges are insufficiently addressed by existing methods. In this paper, we present a framework for structured semantic representation (FSSR) to address these issues. FSSR uses a double-layer structure Construct that combines Paradigm and Instance to represent the semantics of a word or a sentence. It uses six types of rules to represent the semantic relations between sentence Constructs and uses a Computational Model to represent an action. FSSR is a graph-based representation of semantics, in which a node represents a Construct or a Paradigm. Two nodes are connected by an edge (a rule). In addition, FSSR enables interpretable inference and active acquisition of new information, as illustrated in a case study. This case study models the semantics of a cancer prognostic analysis article and reproduces its text results and charts. We provide a website that visualizes the inference process (http://cragraph.synergylab.cn).
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    有趣的是,在随机对照试验出版物中,38.5%的临床结果描述包含复杂的文本。现有的术语不足以使结果及其措施标准化,时间属性,定量指标,和其他属性。在这项研究中,我们分析了COVID-19试验样本中结局文本中的语义模式,并提出了一种数据驱动的结局建模方法.我们得出的结论是,数据驱动的知识表示可以从已发表的临床研究中对结果文本进行自然语言处理。
    Anecdotally, 38.5% of clinical outcome descriptions in randomized controlled trial publications contain complex text. Existing terminologies are insufficient to standardize outcomes and their measures, temporal attributes, quantitative metrics, and other attributes. In this study, we analyzed the semantic patterns in the outcome text in a sample of COVID-19 trials and presented a data-driven method for modeling outcomes. We conclude that a data-driven knowledge representation can benefit natural language processing of outcome text from published clinical studies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:知识图是生物医学和许多其他领域中知识表示的一种常见形式。我们开发了一种基于生物医学知识图的开放式系统,称为在面向知识的途径(ROBOKOP)中链接的生物医学对象上的推理。ROBOKOP由前端用户界面和后端知识图组成。ROBOKOP用户界面允许用户提出问题并探索答案子图。用户还可以通过对底层知识图的直接Cypher查询来提出问题,它目前包含大约600万个节点或生物医学实体,以及描述节点之间关系的1.4亿条边或谓词,来自30多个精选数据源。
    目的:我们旨在将ROBOKOP应用于来自美国国家环境健康科学研究所的环境多态性登记处(EPR)的工作场所暴露和免疫介导疾病的调查数据。
    方法:我们分析了EPR调查数据,确定了工作场所化学品暴露与免疫介导疾病之间的45种关联,根据研究参与者的自我报告(n=4574),在错误发现率校正后,20个关联在P<0.05时显著。然后,我们使用ROBOKOP(1)通过确定ROBOKOP知识图中是否存在合理的连接来验证关联,以及(2)提出可能解释它们的生物学机制,并作为后续测试的假设。我们强调以下三个示例性关联:一氧化碳-多发性硬化症,氨哮喘,和异丙醇过敏性疾病。
    结果:ROBOKOP成功返回了在驱动示例上下文中提出的三个查询的答案集。答案集包括潜在的中介基因,以及可能解释观察到的关联的支持性证据。
    结论:我们展示了ROBOKOP在现实世界中的应用,以产生工作场所化学品暴露与免疫介导的疾病之间关联的机制假设。我们预计ROBOKOP将在许多生物医学领域和其他科学学科中得到广泛应用,加快发现和生成机械假设,开放的自然。
    BACKGROUND: Knowledge graphs are a common form of knowledge representation in biomedicine and many other fields. We developed an open biomedical knowledge graph-based system termed Reasoning Over Biomedical Objects linked in Knowledge Oriented Pathways (ROBOKOP). ROBOKOP consists of both a front-end user interface and a back-end knowledge graph. The ROBOKOP user interface allows users to posit questions and explore answer subgraphs. Users can also posit questions through direct Cypher query of the underlying knowledge graph, which currently contains roughly 6 million nodes or biomedical entities and 140 million edges or predicates describing the relationship between nodes, drawn from over 30 curated data sources.
    OBJECTIVE: We aimed to apply ROBOKOP to survey data on workplace exposures and immune-mediated diseases from the Environmental Polymorphisms Registry (EPR) within the National Institute of Environmental Health Sciences.
    METHODS: We analyzed EPR survey data and identified 45 associations between workplace chemical exposures and immune-mediated diseases, as self-reported by study participants (n= 4574), with 20 associations significant at P<.05 after false discovery rate correction. We then used ROBOKOP to (1) validate the associations by determining whether plausible connections exist within the ROBOKOP knowledge graph and (2) propose biological mechanisms that might explain them and serve as hypotheses for subsequent testing. We highlight the following three exemplar associations: carbon monoxide-multiple sclerosis, ammonia-asthma, and isopropanol-allergic disease.
    RESULTS: ROBOKOP successfully returned answer sets for three queries that were posed in the context of the driving examples. The answer sets included potential intermediary genes, as well as supporting evidence that might explain the observed associations.
    CONCLUSIONS: We demonstrate real-world application of ROBOKOP to generate mechanistic hypotheses for associations between workplace chemical exposures and immune-mediated diseases. We expect that ROBOKOP will find broad application across many biomedical fields and other scientific disciplines due to its generalizability, speed to discovery and generation of mechanistic hypotheses, and open nature.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:2019年COVID-19疫情迅速席卷全球,给人类造成不可挽回的损失。大流行表明,对疾病爆发的早期反应仍然存在延迟,需要一种用于未知疾病爆发检测的方法。本研究的目的是建立一种新的医学知识表示和推理模型,并利用该模型探索未知疾病暴发检测的可行性。方法:本研究以临床数据的诊断意义为特征,定义异常值。并将特征定义为与知识库匹配的推理规则的前身,在检测已知或新出现的传染病爆发方面取得了成就。同时,该研究建立了一个综合监测基地,以捕获目标病例的特征,以提高系统的可靠性和容错能力。结果:本研究将该方法与严重急性呼吸系统综合症(SARS)相结合,中东呼吸综合征(MERS)和早期COVID-19疫情作为实证研究。结果表明,有了合适的监测指南,这项研究中提出的方法能够检测SARS的爆发,MERS,和早期的COVID-19大流行。确诊感染病例的快速匹配准确率为89.1、26.3-98%,82%,综合症监测点将捕获其余病例的特征,以确保整体检测准确性。根据武汉市早期的COVID-19数据,这项研究估计,早期COVID-19病例从发病到地方当局反应的中位时间可以减少到7.0-10.0天。结论:本研究为将传统医学知识转化为结构化数据并形成诊断规则提供了新的解决方案,能够代表医生的后勤思维和不同用户之间的知识传播。实证研究的结果表明,通过不断将医学知识输入系统,拟议的方法将能够从现有疾病中检测未知疾病,并对最初的疫情做出早期反应。
    Background: The outbreak of COVID-19 in 2019 has rapidly swept the world, causing irreparable loss to human beings. The pandemic has shown that there is still a delay in the early response to disease outbreaks and needs a method for unknown disease outbreak detection. The study\'s objective is to establish a new medical knowledge representation and reasoning model, and use the model to explore the feasibility of unknown disease outbreak detection. Methods: The study defined abnormal values with diagnostic significances from clinical data as the Features, and defined the Features as the antecedents of inference rules to match with knowledge bases, achieved in detecting known or emerging infectious disease outbreaks. Meanwhile, the study built a syndromic surveillance base to capture the target cases\' Features to improve the reliability and fault-tolerant ability of the system. Results: The study combined the method with Severe Acute Respiratory Syndrome (SARS), Middle East Respiratory Syndrome (MERS), and early COVID-19 outbreaks as empirical studies. The results showed that with suitable surveillance guidelines, the method proposed in this study was capable to detect outbreaks of SARS, MERS, and early COVID-19 pandemics. The quick matching accuracies of confirmed infection cases were 89.1, 26.3-98%, and 82%, and the syndromic surveillance base would capture the Features of the remaining cases to ensure the overall detection accuracies. Based on the early COVID-19 data in Wuhan, this study estimated that the median time of the early COVID-19 cases from illness onset to local authorities\' responses could be reduced to 7.0-10.0 days. Conclusions: This study offers a new solution to transfer traditional medical knowledge into structured data and form diagnosis rules, enables the representation of doctors\' logistic thinking and the knowledge transmission among different users. The results of empirical studies demonstrate that by constantly inputting medical knowledge into the system, the proposed method will be capable to detect unknown diseases from existing ones and perform an early response to the initial outbreaks.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Interoperability issues are common in biomedical informatics. Reusing data generated from a system in another system, or integrating an existing clinical decision support system (CDSS) in a new organization is a complex task due to recurrent problems of concept mapping and alignment. The GL-DSS of the DESIREE project is a guideline-based CDSS to support the management of breast cancer patients. The knowledge base is formalized as an ontology and decision rules. OncoDoc is another CDSS applied to breast cancer management. The knowledge base is structured as a decision tree. OncoDoc has been routinely used by the multidisciplinary tumor board physicians of the Tenon Hospital (Paris, France) for three years leading to the resolution of 1,861 exploitable decisions. Because we were lacking patient data to assess the DESIREE GL-DSS, we investigated the option of reusing OncoDoc patient data. Taking into account that we have two CDSSs with two formalisms to represent clinical practice guidelines and two knowledge representation models, we had to face semantic and structural interoperability issues. This paper reports how we created 10,681 synthetic patients to solve these issues and make OncoDoc data re-usable by the GL-DSS of DESIREE.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:对临床决策支持系统(CDS)功能的不满被认为是CDSS开发的主要挑战。CDSS设计中的主要困难是将功能与期望的和实际的临床工作流程相匹配。计算机可解释指南(CIG)用于以可计算语言形式化临床实践指南(CPG)中的医学知识。然而,现有的CIG框架需要针对每个CIG语言的特定解释器,阻碍了实现和互操作性的简易性。
    目的:本文旨在描述一种不同的临床知识和数据表示方法。我们打算改变临床医生对CDSS的看法,使其具有足够的表达能力,同时为Web应用程序和移动应用程序保持较小的通信和软件足迹。这种方法最初旨在为WebCDSS和未来的移动应用程序创建一个可读和最小的语法,用于产前护理指南,通过将系统行为与临床工作流程对齐来改进人机交互和增强可用性。
    方法:我们为CDSS设计并实现了架构设计,,它使用模型-视图-控制器(MVC)架构和基于XML的MVC架构中的知识引擎。知识引擎设计还集成了CDSS中所需的匹配临床护理工作流程的要求。对于设计任务的此组件,我们在特定的目标临床环境中使用了用于产前护理的CPG的工作本体论分析.
    结果:与用于CDS的其他常用CIG相比,我们的XML方法可以用来利用XML的灵活格式来促进结构化数据的电子共享。更重要的是,我们可以利用它的灵活性,以无处不在的低级规范语言标准化CIG结构设计,普遍,计算效率高,可与网络技术集成,和人类可读的。
    结论:我们的知识表示框架结合了医学中CDS中使用的其他CIGs的基本要素,并被证明足以编码许多产前保健CPGs及其相关的临床工作流程。该框架似乎足够通用,可以与医学中的其他CPG一起使用。XML被证明是一种语言,足以以可计算的形式描述计划问题,并且具有足够的限制性和表达能力,可以在临床系统中实现。它也可以有效的移动应用程序,间歇性通信需要一个小的占地面积和一个自主的应用程序。这种方法可用于整合医学中更专业的CIGs的重叠功能。
    BACKGROUND: Displeasure with the functionality of clinical decision support systems (CDSSs) is considered the primary challenge in CDSS development. A major difficulty in CDSS design is matching the functionality to the desired and actual clinical workflow. Computer-interpretable guidelines (CIGs) are used to formalize medical knowledge in clinical practice guidelines (CPGs) in a computable language. However, existing CIG frameworks require a specific interpreter for each CIG language, hindering the ease of implementation and interoperability.
    OBJECTIVE: This paper aims to describe a different approach to the representation of clinical knowledge and data. We intended to change the clinician\'s perception of a CDSS with sufficient expressivity of the representation while maintaining a small communication and software footprint for both a web application and a mobile app. This approach was originally intended to create a readable and minimal syntax for a web CDSS and future mobile app for antenatal care guidelines with improved human-computer interaction and enhanced usability by aligning the system behavior with clinical workflow.
    METHODS: We designed and implemented an architecture design for our CDSS, which uses the model-view-controller (MVC) architecture and a knowledge engine in the MVC architecture based on XML. The knowledge engine design also integrated the requirement of matching clinical care workflow that was desired in the CDSS. For this component of the design task, we used a work ontology analysis of the CPGs for antenatal care in our particular target clinical settings.
    RESULTS: In comparison to other common CIGs used for CDSSs, our XML approach can be used to take advantage of the flexible format of XML to facilitate the electronic sharing of structured data. More importantly, we can take advantage of its flexibility to standardize CIG structure design in a low-level specification language that is ubiquitous, universal, computationally efficient, integrable with web technologies, and human readable.
    CONCLUSIONS: Our knowledge representation framework incorporates fundamental elements of other CIGs used in CDSSs in medicine and proved adequate to encode a number of antenatal health care CPGs and their associated clinical workflows. The framework appears general enough to be used with other CPGs in medicine. XML proved to be a language expressive enough to describe planning problems in a computable form and restrictive and expressive enough to implement in a clinical system. It can also be effective for mobile apps, where intermittent communication requires a small footprint and an autonomous app. This approach can be used to incorporate overlapping capabilities of more specialized CIGs in medicine.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    自我监测技术产生患者生成的数据,这些数据可用于个性化营养目标设置,以改善人群健康;然而,当应用于具有稀疏和不规则的自我监测数据的个人级别个性化时,大多数计算方法都受到限制。我们将来自专家建议系统的信息学方法应用于具有挑战性的临床问题:从患者生成的饮食和血糖数据中生成个性化的营养目标。
    我们应用了定性过程编码和决策树建模,以了解注册营养师如何将患者生成的数据转化为糖尿病饮食自我管理的建议(即,知识模型)。我们将此过程编码在一组函数中,这些函数将饮食和血糖数据作为输入和输出饮食建议(即,推理引擎)。营养师评估了面部效度。使用四个患者数据集,我们比较了我们的推理机输出的临床叙述和由专家临床医生开发的黄金标准。
    对营养师来说,知识模型表示如何从患者数据中提出建议。推理引擎的建议与黄金标准(范围=42%-75%)一致,与叙述性临床观察结果一致(范围=63%-83%)为74%。
    定性建模和自动化营养师对患者数据的推理导致了一个代表临床知识的知识模型。然而,我们的知识模型与黄金标准的一致性不如叙事临床推荐,提出了关于如何最好地评估将患者生成的数据与专家知识相结合的方法的问题。
    新的信息学方法,将数据驱动的方法与专家决策相结合,以实现个性化的目标设定,比如这里介绍的知识库和推理引擎,通过将患者生成的数据与临床知识进行综合,证明有可能扩展患者生成的数据的范围。然而,与人类专家相比,用于从患者生成的数据中识别信号的计算机算法的优缺点仍然存在重要问题。
    Self-monitoring technologies produce patient-generated data that could be leveraged to personalize nutritional goal setting to improve population health; however, most computational approaches are limited when applied to individual-level personalization with sparse and irregular self-monitoring data. We applied informatics methods from expert suggestion systems to a challenging clinical problem: generating personalized nutrition goals from patient-generated diet and blood glucose data.
    We applied qualitative process coding and decision tree modeling to understand how registered dietitians translate patient-generated data into recommendations for dietary self-management of diabetes (i.e., knowledge model). We encoded this process in a set of functions that take diet and blood glucose data as an input and output diet recommendations (i.e., inference engine). Dietitians assessed face validity. Using four patient datasets, we compared our inference engine\'s output to clinical narratives and gold standards developed by expert clinicians.
    To dietitians, the knowledge model represented how recommendations from patient data are made. Inference engine recommendations were 63 % consistent with the gold standard (range = 42 %-75 %) and 74 % consistent with narrative clinical observations (range = 63 %-83 %).
    Qualitative modeling and automating how dietitians reason over patient data resulted in a knowledge model representing clinical knowledge. However, our knowledge model was less consistent with gold standard than narrative clinical recommendations, raising questions about how best to evaluate approaches that integrate patient-generated data with expert knowledge.
    New informatics approaches that integrate data-driven methods with expert decision making for personalized goal setting, such as the knowledge base and inference engine presented here, demonstrate the potential to extend the reach of patient-generated data by synthesizing it with clinical knowledge. However, important questions remain about the strengths and weaknesses of computer algorithms developed to discern signal from patient-generated data compared to human experts.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    BACKGROUND: Evidence-based guidelines and recommendations can be transformed into \"If-Then\" Clinical Evidence Logic Statements (CELS). Imaging-related CELS were represented in standardized formats in the Harvard Medical School Library of Evidence (HLE).
    OBJECTIVE: We aimed to (1) describe the representation of CELS using established Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT), Clinical Quality Language (CQL), and Fast Healthcare Interoperability Resources (FHIR) standards and (2) assess the limitations of using these standards to represent imaging-related CELS.
    METHODS: This study was exempt from review by the Institutional Review Board as it involved no human subjects. Imaging-related clinical recommendations were extracted from evidence sources and translated into CELS. The clinical terminologies of CELS were represented using SNOMED CT and the condition-action logic was represented in CQL and FHIR. Numbers of fully and partially represented CELS were tallied.
    RESULTS: A total of 765 CELS were represented in the HLE as of December 2018. We were able to fully represent 137 of 765 (17.9%) CELS using SNOMED CT, CQL, and FHIR. We were able to represent terms using SNOMED CT in the temporal component for action (\"Then\") statements in CQL and FHIR in 755 of 765 (98.7%) CELS.
    CONCLUSIONS: CELS were represented as shareable clinical decision support (CDS) knowledge artifacts using existing standards-SNOMED CT, FHIR, and CQL-to promote and accelerate adoption of evidence-based practice. Limitations to standardization persist, which could be minimized with an add-on set of standard terms and value sets and by adding time frames to the CQL framework.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:本体是语义Web的关键使能技术。Web本体语言(OWL)是一种用于发布和共享本体的语义标记语言。
    目标:可定制的供应,可计算,正式代表分子遗传学信息和健康信息,通过电子健康记录(EHR)接口,可以在实现精准医疗方面发挥关键作用。在这项研究中,我们以囊性纤维化为例,构建了基于Ontology的CysticFibrobis知识库原型(OntoKBCF),通过EHR原型提供此类信息.此外,我们详细阐述了构造和表示原则,方法,应用程序,以及我们在OntoKBCF建设中面临的代表性挑战。这些原理和方法可以在构建其他基于本体的领域知识库时参考和应用。
    方法:首先,我们根据囊性纤维化在分子水平和临床表型水平上可能的临床信息需求定义了OntoKBCF的范围.然后,我们选择了要在OntoKBCF中表示的知识源。我们利用自上而下的内容分析和自下而上的构建来构建OntoKBCF。Protégé-OWL用于构建OntoKBCF。构造原则包括(1)尽可能使用现有的基本术语;(2)在表示中使用交叉和组合;(3)表示尽可能多的不同类型的事实;(4)为每种类型提供2-5个示例。Protégé-5.1.0中的HermiT1.3.8.413用于检查OntoKBCF的一致性。
    结果:成功构建了OntoKBCF,包含408个类,35个属性,和113个等效类。OntoKBCF包括原子概念(例如氨基酸)和复杂概念(例如“青春期女性囊性纤维化患者”)及其描述。我们证明了OntoKBCF可以通过EHR原型自动提供和使用可定制的分子和健康信息。主要挑战包括提供对不同患者群体的更全面的说明以及不确定知识的表示,模棱两可的概念,和负面陈述以及关于囊性纤维化的更复杂和详细的分子机制或通路信息。
    结论:虽然囊性纤维化只是一个例子,基于OntoKBCF的当前结构,扩展原型以涵盖不同主题应该相对简单。此外,支撑其发展的原则可以重复使用,用于建立替代的人类单基因疾病知识库。
    BACKGROUND: Ontologies are key enabling technologies for the Semantic Web. The Web Ontology Language (OWL) is a semantic markup language for publishing and sharing ontologies.
    OBJECTIVE: The supply of customizable, computable, and formally represented molecular genetics information and health information, via electronic health record (EHR) interfaces, can play a critical role in achieving precision medicine. In this study, we used cystic fibrosis as an example to build an Ontology-based Knowledge Base prototype on Cystic Fibrobis (OntoKBCF) to supply such information via an EHR prototype. In addition, we elaborate on the construction and representation principles, approaches, applications, and representation challenges that we faced in the construction of OntoKBCF. The principles and approaches can be referenced and applied in constructing other ontology-based domain knowledge bases.
    METHODS: First, we defined the scope of OntoKBCF according to possible clinical information needs about cystic fibrosis on both a molecular level and a clinical phenotype level. We then selected the knowledge sources to be represented in OntoKBCF. We utilized top-to-bottom content analysis and bottom-up construction to build OntoKBCF. Protégé-OWL was used to construct OntoKBCF. The construction principles included (1) to use existing basic terms as much as possible; (2) to use intersection and combination in representations; (3) to represent as many different types of facts as possible; and (4) to provide 2-5 examples for each type. HermiT 1.3.8.413 within Protégé-5.1.0 was used to check the consistency of OntoKBCF.
    RESULTS: OntoKBCF was constructed successfully, with the inclusion of 408 classes, 35 properties, and 113 equivalent classes. OntoKBCF includes both atomic concepts (such as amino acid) and complex concepts (such as \"adolescent female cystic fibrosis patient\") and their descriptions. We demonstrated that OntoKBCF could make customizable molecular and health information available automatically and usable via an EHR prototype. The main challenges include the provision of a more comprehensive account of different patient groups as well as the representation of uncertain knowledge, ambiguous concepts, and negative statements and more complicated and detailed molecular mechanisms or pathway information about cystic fibrosis.
    CONCLUSIONS: Although cystic fibrosis is just one example, based on the current structure of OntoKBCF, it should be relatively straightforward to extend the prototype to cover different topics. Moreover, the principles underpinning its development could be reused for building alternative human monogenetic diseases knowledge bases.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号