Mesh : Gene Ontology Molecular Sequence Annotation / methods Databases, Genetic Computational Biology / methods Semantics Humans

来  源:   DOI:10.1093/bioinformatics/btae246

Abstract:
BACKGROUND: Biological background knowledge plays an important role in the manual quality assurance (QA) of biological database records. One such QA task is the detection of inconsistencies in literature-based Gene Ontology Annotation (GOA). This manual verification ensures the accuracy of the GO annotations based on a comprehensive review of the literature used as evidence, Gene Ontology (GO) terms, and annotated genes in GOA records. While automatic approaches for the detection of semantic inconsistencies in GOA have been developed, they operate within predetermined contexts, lacking the ability to leverage broader evidence, especially relevant domain-specific background knowledge. This paper investigates various types of background knowledge that could improve the detection of prevalent inconsistencies in GOA. In addition, the paper proposes several approaches to integrate background knowledge into the automatic GOA inconsistency detection process.
RESULTS: We have extended a previously developed GOA inconsistency dataset with several kinds of GOA-related background knowledge, including GeneRIF statements, biological concepts mentioned within evidence texts, GO hierarchy and existing GO annotations of the specific gene. We have proposed several effective approaches to integrate background knowledge as part of the automatic GOA inconsistency detection process. The proposed approaches can improve automatic detection of self-consistency and several of the most prevalent types of inconsistencies.
This is the first study to explore the advantages of utilizing background knowledge and to propose a practical approach to incorporate knowledge in automatic GOA inconsistency detection. We establish a new benchmark for performance on this task. Our methods may be applicable to various tasks that involve incorporating biological background knowledge.
METHODS: https://github.com/jiyuc/de-inconsistency.
摘要:
背景:生物背景知识在生物数据库记录的手动质量保证(QA)中起着重要作用。一个这样的QA任务是检测基于文献的基因本体注释(GOA)中的不一致性。此手动验证基于对用作证据的文献的全面审查,确保了GO注释的准确性。基因本体论(GO)术语,以及GOA记录中的注释基因。虽然已经开发了用于检测GOA中语义不一致的自动方法,它们在预定的环境中运作,缺乏利用更广泛证据的能力,特别是相关领域的背景知识。本文研究了各种类型的背景知识,这些知识可以改善对GOA中普遍存在的不一致性的检测。此外,本文提出了几种将背景知识集成到自动GOA不一致检测过程中的方法。
结果:我们使用几种与GOA相关的背景知识扩展了以前开发的GOA不一致数据集,包括GeneRIF声明,证据文本中提到的生物学概念,GO的分级和现有的GO注解特定基因。作为自动GOA不一致检测过程的一部分,我们提出了几种有效的方法来集成背景知识。所提出的方法可以改善对自我一致性和几种最普遍的不一致类型的自动检测。
这是第一个探索利用背景知识的优势并提出一种实用的方法来将知识纳入自动GOA不一致性检测的研究。我们为这项任务的绩效建立了新的基准。我们的方法可能适用于涉及结合生物学背景知识的各种任务。
方法:https://github.com/jiyuc/de-inconsistency。
公众号