关键词: crop cross-species knowledge graph polyphenotype gene traits regulating-genes

来  源:   DOI:10.3389/fpls.2024.1361716   PDF(Pubmed)

Abstract:
Identifying polyphenotype genes that simultaneously regulate important agronomic traits (e.g., plant height, yield, and disease resistance) is critical for developing novel high-quality crop varieties. Predicting the associations between genes and traits requires the organization and analysis of multi-dimensional scientific data. The existing methods for establishing the relationships between genomic data and phenotypic data can only elucidate the associations between genes and individual traits. However, there are relatively few methods for detecting elite polyphenotype genes. In this study, a knowledge graph for traits regulating-genes was constructed by collecting data from the PubMed database and eight other databases related to the staple food crops rice, maize, and wheat as well as the model plant Arabidopsis thaliana. On the basis of the knowledge graph, a model for predicting traits regulating-genes was constructed by combining the data attributes of the gene nodes and the topological relationship attributes of the gene nodes. Additionally, a scoring method for predicting the genes regulating specific traits was developed to screen for elite polyphenotype genes. A total of 125,591 nodes and 547,224 semantic relationships were included in the knowledge graph. The accuracy of the knowledge graph-based model for predicting traits regulating-genes was 0.89, the precision rate was 0.91, the recall rate was 0.96, and the F1 value was 0.94. Moreover, 4,447 polyphenotype genes for 31 trait combinations were identified, among which the rice polyphenotype gene IPA1 and the A. thaliana polyphenotype gene CUC2 were verified via a literature search. Furthermore, the wheat gene TraesCS5A02G275900 was revealed as a potential polyphenotype gene that will need to be further characterized. Meanwhile, the result of venn diagram analysis between the polyphenotype gene datasets (consists of genes that are predicted by our model) and the transcriptome gene datasets (consists of genes that were differential expression in response to disease, drought or salt) showed approximately 70% and 54% polyphenotype genes were identified in the transcriptome datasets of Arabidopsis and rice, respectively. The application of the model driven by knowledge graph for predicting traits regulating-genes represents a novel method for detecting elite polyphenotype genes.
摘要:
识别同时调节重要农艺性状的多表型基因(例如,植物高度,产量,和抗病性)对于开发新型优质作物品种至关重要。预测基因和性状之间的关联需要对多维科学数据进行组织和分析。现有的建立基因组数据和表型数据之间关系的方法只能阐明基因和个体性状之间的关联。然而,检测精英多表型基因的方法相对较少。在这项研究中,通过从PubMed数据库和与主要粮食作物水稻相关的其他八个数据库中收集数据,构建了性状调节基因的知识图,玉米,和小麦以及模式植物拟南芥。在知识图谱的基础上,通过结合基因节点的数据属性和基因节点的拓扑关系属性,构建了一个预测性状调节基因的模型。此外,开发了一种预测调节特定性状的基因的评分方法,以筛选优良的多表型基因。知识图谱中总共包含了125,591个节点和547,224个语义关系。基于知识图谱的模型预测性状调控基因的准确率为0.89,准确率为0.91,召回率为0.96,F1值为0.94。此外,确定了31个性状组合的4,447个多表型基因,其中水稻多表型基因IPA1和拟南芥多表型基因CUC2通过文献检索得到验证。此外,小麦基因TraesCS5A02G275900被揭示为潜在的多表型基因,需要进一步表征。同时,多表型基因数据集(由我们的模型预测的基因组成)和转录组基因数据集(由对疾病有差异表达的基因组成,干旱或盐)显示在拟南芥和水稻的转录组数据集中鉴定了大约70%和54%的多表型基因,分别。由知识图驱动的模型在预测性状调节基因中的应用代表了一种检测精英多表型基因的新方法。
公众号