knowledge representation

  • 文章类型: Journal Article
    BACKGROUND: Social determinants of health (SDoH) have been described by the World Health Organization as the conditions in which individuals are born, live, work, and age. These conditions can be grouped into 3 interrelated levels known as macrolevel (societal), mesolevel (community), and microlevel (individual) determinants. The scope of SDoH expands beyond the biomedical level, and there remains a need to connect other areas such as economics, public policy, and social factors.
    OBJECTIVE: Providing a computable artifact that can link health data to concepts involving the different levels of determinants may improve our understanding of the impact SDoH have on human populations. Modeling SDoH may help to reduce existing gaps in the literature through explicit links between the determinants and biological factors. This in turn can allow researchers and clinicians to make better sense of data and discover new knowledge through the use of semantic links.
    METHODS: An experimental ontology was developed to represent knowledge of the social and economic characteristics of SDoH. Information from 27 literature sources was analyzed to gather concepts and encoded using Web Ontology Language, version 2 (OWL2) and Protégé. Four evaluators independently reviewed the ontology axioms using natural language translation. The analyses from the evaluations and selected terminologies from the Basic Formal Ontology were used to create a revised ontology with a broad spectrum of knowledge concepts ranging from the macrolevel to the microlevel determinants.
    RESULTS: The literature search identified several topics of discussion for each determinant level. Publications for the macrolevel determinants centered around health policy, income inequality, welfare, and the environment. Articles relating to the mesolevel determinants discussed work, work conditions, psychosocial factors, socioeconomic position, outcomes, food, poverty, housing, and crime. Finally, sources found for the microlevel determinants examined gender, ethnicity, race, and behavior. Concepts were gathered from the literature and used to produce an ontology consisting of 383 classes, 109 object properties, and 748 logical axioms. A reasoning test revealed no inconsistent axioms.
    CONCLUSIONS: This ontology models heterogeneous social and economic concepts to represent aspects of SDoH. The scope of SDoH is expansive, and although the ontology is broad, it is still in its early stages. To our current understanding, this ontology represents the first attempt to concentrate on knowledge concepts that are currently not covered by existing ontologies. Future direction will include further expanding the ontology to link with other biomedical ontologies, including alignment for granular semantics.






  • 文章类型: Journal Article
    Precise semantic representation is important for allowing machines to truly comprehend the meaning of natural language text, especially biomedical literature. Although the semantic relations among words in a single sentence may be accurately represented with existing approaches, relations between two sentences cannot yet be accurately modeled, which leads to a lack of contextual information and difficulty in performing interpretable semantic inference. Additionally, it is challenging to merge semantic representations curated by different experts. These critical challenges are insufficiently addressed by existing methods. In this paper, we present a framework for structured semantic representation (FSSR) to address these issues. FSSR uses a double-layer structure Construct that combines Paradigm and Instance to represent the semantics of a word or a sentence. It uses six types of rules to represent the semantic relations between sentence Constructs and uses a Computational Model to represent an action. FSSR is a graph-based representation of semantics, in which a node represents a Construct or a Paradigm. Two nodes are connected by an edge (a rule). In addition, FSSR enables interpretable inference and active acquisition of new information, as illustrated in a case study. This case study models the semantics of a cancer prognostic analysis article and reproduces its text results and charts. We provide a website that visualizes the inference process (






  • 文章类型: Journal Article
    Anecdotally, 38.5% of clinical outcome descriptions in randomized controlled trial publications contain complex text. Existing terminologies are insufficient to standardize outcomes and their measures, temporal attributes, quantitative metrics, and other attributes. In this study, we analyzed the semantic patterns in the outcome text in a sample of COVID-19 trials and presented a data-driven method for modeling outcomes. We conclude that a data-driven knowledge representation can benefit natural language processing of outcome text from published clinical studies.






  • 文章类型: Journal Article
    BACKGROUND: Knowledge graphs are a common form of knowledge representation in biomedicine and many other fields. We developed an open biomedical knowledge graph-based system termed Reasoning Over Biomedical Objects linked in Knowledge Oriented Pathways (ROBOKOP). ROBOKOP consists of both a front-end user interface and a back-end knowledge graph. The ROBOKOP user interface allows users to posit questions and explore answer subgraphs. Users can also posit questions through direct Cypher query of the underlying knowledge graph, which currently contains roughly 6 million nodes or biomedical entities and 140 million edges or predicates describing the relationship between nodes, drawn from over 30 curated data sources.
    OBJECTIVE: We aimed to apply ROBOKOP to survey data on workplace exposures and immune-mediated diseases from the Environmental Polymorphisms Registry (EPR) within the National Institute of Environmental Health Sciences.
    METHODS: We analyzed EPR survey data and identified 45 associations between workplace chemical exposures and immune-mediated diseases, as self-reported by study participants (n= 4574), with 20 associations significant at P<.05 after false discovery rate correction. We then used ROBOKOP to (1) validate the associations by determining whether plausible connections exist within the ROBOKOP knowledge graph and (2) propose biological mechanisms that might explain them and serve as hypotheses for subsequent testing. We highlight the following three exemplar associations: carbon monoxide-multiple sclerosis, ammonia-asthma, and isopropanol-allergic disease.
    RESULTS: ROBOKOP successfully returned answer sets for three queries that were posed in the context of the driving examples. The answer sets included potential intermediary genes, as well as supporting evidence that might explain the observed associations.
    CONCLUSIONS: We demonstrate real-world application of ROBOKOP to generate mechanistic hypotheses for associations between workplace chemical exposures and immune-mediated diseases. We expect that ROBOKOP will find broad application across many biomedical fields and other scientific disciplines due to its generalizability, speed to discovery and generation of mechanistic hypotheses, and open nature.






  • 文章类型: Journal Article
    Background: The outbreak of COVID-19 in 2019 has rapidly swept the world, causing irreparable loss to human beings. The pandemic has shown that there is still a delay in the early response to disease outbreaks and needs a method for unknown disease outbreak detection. The study\'s objective is to establish a new medical knowledge representation and reasoning model, and use the model to explore the feasibility of unknown disease outbreak detection. Methods: The study defined abnormal values with diagnostic significances from clinical data as the Features, and defined the Features as the antecedents of inference rules to match with knowledge bases, achieved in detecting known or emerging infectious disease outbreaks. Meanwhile, the study built a syndromic surveillance base to capture the target cases\' Features to improve the reliability and fault-tolerant ability of the system. Results: The study combined the method with Severe Acute Respiratory Syndrome (SARS), Middle East Respiratory Syndrome (MERS), and early COVID-19 outbreaks as empirical studies. The results showed that with suitable surveillance guidelines, the method proposed in this study was capable to detect outbreaks of SARS, MERS, and early COVID-19 pandemics. The quick matching accuracies of confirmed infection cases were 89.1, 26.3-98%, and 82%, and the syndromic surveillance base would capture the Features of the remaining cases to ensure the overall detection accuracies. Based on the early COVID-19 data in Wuhan, this study estimated that the median time of the early COVID-19 cases from illness onset to local authorities\' responses could be reduced to 7.0-10.0 days. Conclusions: This study offers a new solution to transfer traditional medical knowledge into structured data and form diagnosis rules, enables the representation of doctors\' logistic thinking and the knowledge transmission among different users. The results of empirical studies demonstrate that by constantly inputting medical knowledge into the system, the proposed method will be capable to detect unknown diseases from existing ones and perform an early response to the initial outbreaks.






  • 文章类型: Journal Article
    Interoperability issues are common in biomedical informatics. Reusing data generated from a system in another system, or integrating an existing clinical decision support system (CDSS) in a new organization is a complex task due to recurrent problems of concept mapping and alignment. The GL-DSS of the DESIREE project is a guideline-based CDSS to support the management of breast cancer patients. The knowledge base is formalized as an ontology and decision rules. OncoDoc is another CDSS applied to breast cancer management. The knowledge base is structured as a decision tree. OncoDoc has been routinely used by the multidisciplinary tumor board physicians of the Tenon Hospital (Paris, France) for three years leading to the resolution of 1,861 exploitable decisions. Because we were lacking patient data to assess the DESIREE GL-DSS, we investigated the option of reusing OncoDoc patient data. Taking into account that we have two CDSSs with two formalisms to represent clinical practice guidelines and two knowledge representation models, we had to face semantic and structural interoperability issues. This paper reports how we created 10,681 synthetic patients to solve these issues and make OncoDoc data re-usable by the GL-DSS of DESIREE.






  • 文章类型: Journal Article
    BACKGROUND: Displeasure with the functionality of clinical decision support systems (CDSSs) is considered the primary challenge in CDSS development. A major difficulty in CDSS design is matching the functionality to the desired and actual clinical workflow. Computer-interpretable guidelines (CIGs) are used to formalize medical knowledge in clinical practice guidelines (CPGs) in a computable language. However, existing CIG frameworks require a specific interpreter for each CIG language, hindering the ease of implementation and interoperability.
    OBJECTIVE: This paper aims to describe a different approach to the representation of clinical knowledge and data. We intended to change the clinician\'s perception of a CDSS with sufficient expressivity of the representation while maintaining a small communication and software footprint for both a web application and a mobile app. This approach was originally intended to create a readable and minimal syntax for a web CDSS and future mobile app for antenatal care guidelines with improved human-computer interaction and enhanced usability by aligning the system behavior with clinical workflow.
    METHODS: We designed and implemented an architecture design for our CDSS, which uses the model-view-controller (MVC) architecture and a knowledge engine in the MVC architecture based on XML. The knowledge engine design also integrated the requirement of matching clinical care workflow that was desired in the CDSS. For this component of the design task, we used a work ontology analysis of the CPGs for antenatal care in our particular target clinical settings.
    RESULTS: In comparison to other common CIGs used for CDSSs, our XML approach can be used to take advantage of the flexible format of XML to facilitate the electronic sharing of structured data. More importantly, we can take advantage of its flexibility to standardize CIG structure design in a low-level specification language that is ubiquitous, universal, computationally efficient, integrable with web technologies, and human readable.
    CONCLUSIONS: Our knowledge representation framework incorporates fundamental elements of other CIGs used in CDSSs in medicine and proved adequate to encode a number of antenatal health care CPGs and their associated clinical workflows. The framework appears general enough to be used with other CPGs in medicine. XML proved to be a language expressive enough to describe planning problems in a computable form and restrictive and expressive enough to implement in a clinical system. It can also be effective for mobile apps, where intermittent communication requires a small footprint and an autonomous app. This approach can be used to incorporate overlapping capabilities of more specialized CIGs in medicine.






  • 文章类型: Journal Article
    Self-monitoring technologies produce patient-generated data that could be leveraged to personalize nutritional goal setting to improve population health; however, most computational approaches are limited when applied to individual-level personalization with sparse and irregular self-monitoring data. We applied informatics methods from expert suggestion systems to a challenging clinical problem: generating personalized nutrition goals from patient-generated diet and blood glucose data.
    We applied qualitative process coding and decision tree modeling to understand how registered dietitians translate patient-generated data into recommendations for dietary self-management of diabetes (i.e., knowledge model). We encoded this process in a set of functions that take diet and blood glucose data as an input and output diet recommendations (i.e., inference engine). Dietitians assessed face validity. Using four patient datasets, we compared our inference engine\'s output to clinical narratives and gold standards developed by expert clinicians.
    To dietitians, the knowledge model represented how recommendations from patient data are made. Inference engine recommendations were 63 % consistent with the gold standard (range = 42 %-75 %) and 74 % consistent with narrative clinical observations (range = 63 %-83 %).
    Qualitative modeling and automating how dietitians reason over patient data resulted in a knowledge model representing clinical knowledge. However, our knowledge model was less consistent with gold standard than narrative clinical recommendations, raising questions about how best to evaluate approaches that integrate patient-generated data with expert knowledge.
    New informatics approaches that integrate data-driven methods with expert decision making for personalized goal setting, such as the knowledge base and inference engine presented here, demonstrate the potential to extend the reach of patient-generated data by synthesizing it with clinical knowledge. However, important questions remain about the strengths and weaknesses of computer algorithms developed to discern signal from patient-generated data compared to human experts.







  • 文章类型: Journal Article
    BACKGROUND: Evidence-based guidelines and recommendations can be transformed into \"If-Then\" Clinical Evidence Logic Statements (CELS). Imaging-related CELS were represented in standardized formats in the Harvard Medical School Library of Evidence (HLE).
    OBJECTIVE: We aimed to (1) describe the representation of CELS using established Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT), Clinical Quality Language (CQL), and Fast Healthcare Interoperability Resources (FHIR) standards and (2) assess the limitations of using these standards to represent imaging-related CELS.
    METHODS: This study was exempt from review by the Institutional Review Board as it involved no human subjects. Imaging-related clinical recommendations were extracted from evidence sources and translated into CELS. The clinical terminologies of CELS were represented using SNOMED CT and the condition-action logic was represented in CQL and FHIR. Numbers of fully and partially represented CELS were tallied.
    RESULTS: A total of 765 CELS were represented in the HLE as of December 2018. We were able to fully represent 137 of 765 (17.9%) CELS using SNOMED CT, CQL, and FHIR. We were able to represent terms using SNOMED CT in the temporal component for action (\"Then\") statements in CQL and FHIR in 755 of 765 (98.7%) CELS.
    CONCLUSIONS: CELS were represented as shareable clinical decision support (CDS) knowledge artifacts using existing standards-SNOMED CT, FHIR, and CQL-to promote and accelerate adoption of evidence-based practice. Limitations to standardization persist, which could be minimized with an add-on set of standard terms and value sets and by adding time frames to the CQL framework.






  • 文章类型: Journal Article
    BACKGROUND: Ontologies are key enabling technologies for the Semantic Web. The Web Ontology Language (OWL) is a semantic markup language for publishing and sharing ontologies.
    OBJECTIVE: The supply of customizable, computable, and formally represented molecular genetics information and health information, via electronic health record (EHR) interfaces, can play a critical role in achieving precision medicine. In this study, we used cystic fibrosis as an example to build an Ontology-based Knowledge Base prototype on Cystic Fibrobis (OntoKBCF) to supply such information via an EHR prototype. In addition, we elaborate on the construction and representation principles, approaches, applications, and representation challenges that we faced in the construction of OntoKBCF. The principles and approaches can be referenced and applied in constructing other ontology-based domain knowledge bases.
    METHODS: First, we defined the scope of OntoKBCF according to possible clinical information needs about cystic fibrosis on both a molecular level and a clinical phenotype level. We then selected the knowledge sources to be represented in OntoKBCF. We utilized top-to-bottom content analysis and bottom-up construction to build OntoKBCF. Protégé-OWL was used to construct OntoKBCF. The construction principles included (1) to use existing basic terms as much as possible; (2) to use intersection and combination in representations; (3) to represent as many different types of facts as possible; and (4) to provide 2-5 examples for each type. HermiT within Protégé-5.1.0 was used to check the consistency of OntoKBCF.
    RESULTS: OntoKBCF was constructed successfully, with the inclusion of 408 classes, 35 properties, and 113 equivalent classes. OntoKBCF includes both atomic concepts (such as amino acid) and complex concepts (such as \"adolescent female cystic fibrosis patient\") and their descriptions. We demonstrated that OntoKBCF could make customizable molecular and health information available automatically and usable via an EHR prototype. The main challenges include the provision of a more comprehensive account of different patient groups as well as the representation of uncertain knowledge, ambiguous concepts, and negative statements and more complicated and detailed molecular mechanisms or pathway information about cystic fibrosis.
    CONCLUSIONS: Although cystic fibrosis is just one example, based on the current structure of OntoKBCF, it should be relatively straightforward to extend the prototype to cover different topics. Moreover, the principles underpinning its development could be reused for building alternative human monogenetic diseases knowledge bases.





