关键词: Genotype and phenotype relationship Model optimization Normalized pointwise mutual information Pancreatitis Text mining

Mesh : Humans Genotype Phenotype Databases, Factual Pancreatitis / genetics Data Mining / methods

来  源:   DOI:10.1016/j.compbiomed.2023.106868

Abstract:
Pancreatitis is a relatively serious disease caused by the self-digestion of trypsin in the pancreas. The generation of diseases is closely related to gene and phenotype information. Generally, gene-phenotype relations are mainly obtained through clinical experiments, but the cost is huge. With the amount of published biomedical literature increasing exponentially, it carries a wealth of disease-related gene and phenotype information. This study provided an effective way to obtain disease-related gene and phenotype information. To our best knowledge, this work first attempted to explore relationships between genotype and phenotype about the pancreatitis from the computational perspective. It mined 6152 genes and 76,753 pairs of genotype and phenotype extracted from the biomedical literature about pancreatitis using text mining. Based on the above 76,753 pairs, the study proposed an improved normalized point-wise mutual information (REL-NPMI) model to optimize gene-phenotype relations related to pancreatitis, and obtained 12,562 gene-phenotype pairs which may be related to pancreatitis. The extracted top 20 results were validated and evaluated. The experimental results show that the method is promising for exploring pancreatitis\' molecular mechanism, thus it provides a computational way for studying pancreatitis\' disease pathogenesis. Data resources and the Pancreatitis Gene-Phenotype Association Database are available at http://114.116.4.45:8081/and resources are also available at https://github.com/polipoptbe8023/REL-NPMI.git.
摘要:
胰腺炎是由胰腺中胰蛋白酶的自我消化引起的相对严重的疾病。疾病的产生与基因和表型信息亲密相干。一般来说,基因-表型关系主要通过临床实验获得,但是成本是巨大的。随着已发表的生物医学文献数量成倍增加,它携带了丰富的疾病相关基因和表型信息。本研究为获取疾病相关基因和表型信息提供了有效途径。据我们所知,这项工作首先试图从计算的角度探讨胰腺炎基因型和表型之间的关系。它使用文本挖掘从有关胰腺炎的生物医学文献中提取了6152个基因和76,753对基因型和表型。根据上述76,753对,该研究提出了一种改进的标准化逐点互信息(REL-NPMI)模型来优化与胰腺炎相关的基因-表型关系,获得了12,562个可能与胰腺炎相关的基因-表型对。对提取的前20名结果进行验证和评价。实验结果表明,该方法对探索胰腺炎的分子机制是有希望的,从而为研究胰腺炎的发病机理提供了一种计算方法。数据资源和胰腺炎基因-表型关联数据库可在http://114.116.4.45:8081/获得,资源也可在https://github.com/polipotbe8023/REL-NPMI获得。git.
公众号