computational prediction model

  • 文章类型: Journal Article
    微生物-药物关联的识别可以极大地促进药物研发。用于筛选微生物-药物关联的传统方法是耗时的,人力密集型,而且行为成本很高,所以计算方法是一个很好的选择。然而,他们中的大多数忽略了丰富序列的组合,结构信息,和微生物-药物网络拓扑。
    在这项研究中,我们开发了一个基于改进型图注意力变分自编码器(MGAVAEMDA)的计算框架,通过将生物信息与变分自编码器相结合来推断潜在的微药物关联.在MGAVAEMDA,我们首先使用了多个数据库,其中包括微生物序列,药物结构,和微生物-药物关联数据库,经过多次相似度计算,建立微生物和药物的两个综合特征矩阵,聚变,平滑,和阈值。然后,我们采用了变分自动编码器和图形注意力的组合来提取微生物和药物的低维特征表示。最后,将低维特征表示和图形邻接矩阵输入随机森林分类器,以获得微生物-药物关联评分,从而识别潜在的微生物-药物关联.此外,为了校正模型复杂性和冗余计算以提高效率,我们引入了一个改进的图卷积神经网络嵌入到变分自动编码器用于计算低维特征。
    实验结果表明,MGAVAEMDA的预测性能优于五种最先进的方法。对于主要测量(AUC=0.9357,AUPR=0.9378),与次优方法相比,MGAVAEMDA的相对改进分别为1.76%和1.47%,分别。
    我们对两种药物进行了案例研究,发现PubMed中已报道了超过85%的预测关联。综合实验结果验证了我们模型在准确推断潜在微生物-药物关联方面的可靠性。
    UNASSIGNED: The identification of microbe-drug associations can greatly facilitate drug research and development. Traditional methods for screening microbe-drug associations are time-consuming, manpower-intensive, and costly to conduct, so computational methods are a good alternative. However, most of them ignore the combination of abundant sequence, structural information, and microbe-drug network topology.
    UNASSIGNED: In this study, we developed a computational framework based on a modified graph attention variational autoencoder (MGAVAEMDA) to infer potential microbedrug associations by combining biological information with the variational autoencoder. In MGAVAEMDA, we first used multiple databases, which include microbial sequences, drug structures, and microbe-drug association databases, to establish two comprehensive feature matrices of microbes and drugs after multiple similarity computations, fusion, smoothing, and thresholding. Then, we employed a combination of variational autoencoder and graph attention to extract low-dimensional feature representations of microbes and drugs. Finally, the lowdimensional feature representation and graphical adjacency matrix were input into the random forest classifier to obtain the microbe-drug association score to identify the potential microbe-drug association. Moreover, in order to correct the model complexity and redundant calculation to improve efficiency, we introduced a modified graph convolutional neural network embedded into the variational autoencoder for computing low dimensional features.
    UNASSIGNED: The experiment results demonstrate that the prediction performance of MGAVAEMDA is better than the five state-of-the-art methods. For the major measurements (AUC =0.9357, AUPR =0.9378), the relative improvements of MGAVAEMDA compared to the suboptimal methods are 1.76 and 1.47%, respectively.
    UNASSIGNED: We conducted case studies on two drugs and found that more than 85% of the predicted associations have been reported in PubMed. The comprehensive experimental results validated the reliability of our models in accurately inferring potential microbe-drug associations.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Researches have demonstrated that microorganisms are indispensable for the nutrition transportation, growth and development of human bodies, and disorder and imbalance of microbiota may lead to the occurrence of diseases. Therefore, it is crucial to study relationships between microbes and diseases. In this manuscript, we proposed a novel prediction model named MADGAN to infer potential microbe-disease associations by combining biological information of microbes and diseases with the generative adversarial networks. To our knowledge, it is the first attempt to use the generative adversarial network to complete this important task. In MADGAN, we firstly constructed different features for microbes and diseases based on multiple similarity metrics. And then, we further adopted graph convolution neural network (GCN) to derive different features for microbes and diseases automatically. Finally, we trained MADGAN to identify latent microbe-disease associations by games between the generation network and the decision network. Especially, in order to prevent over-smoothing during the model training process, we introduced the cross-level weight distribution structure to enhance the depth of the network based on the idea of residual network. Moreover, in order to validate the performance of MADGAN, we conducted comprehensive experiments and case studies based on databases of HMDAD and Disbiome respectively, and experimental results demonstrated that MADGAN not only achieved satisfactory prediction performances, but also outperformed existing state-of-the-art prediction models.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:最近的证据表明,人类微生物参与了人体重要的生物学活动。宿主-微生物群相互作用的功能障碍可能导致复杂的人类疾病。关于宿主-微生物群相互作用的知识可以为理解疾病的病理机制提供有价值的见解。然而,仅仅通过常规的湿实验室实验,从生物“干草堆”中识别出特定于疾病的微生物是耗时且昂贵的。随着下一代测序和基于组学的试验的发展,必须开发用于大规模预测微生物-疾病关联的计算预测模型。
    结果:基于来自人类微生物-疾病关联数据库(HMDAD)的已知微生物-疾病关联,所提出的模型在留一交叉验证和五倍交叉验证中显示出可靠的性能,ROC曲线下面积(AUC)的高值为0.9456和0.8866,分别。在大肠癌的案例研究中,在预测的前20种微生物中,有80%已通过已发表的文献进行了实验证实。
    结论:基于功能相似的微生物倾向于与人类疾病共享相似的相互作用模式的假设,我们在这里提出了一个基于贝叶斯疾病导向排名的计算模型,以优先考虑与各种人类疾病相关的最有潜力的微生物。根据基因的序列信息,利用两种计算方法(BLAST+和MEGA7)从不同角度测量微生物-微生物相似性。通过从医学主题标题(MeSH)数据捕获层级信息来计算疾病-疾病相似性。实验结果验证了该模型的准确性和有效性。这项工作有望促进有前途的微生物生物标志物的表征和鉴定。
    BACKGROUND: Recent evidences have suggested that human microorganisms participate in important biological activities in the human body. The dysfunction of host-microbiota interactions could lead to complex human disorders. The knowledge on host-microbiota interactions can provide valuable insights into understanding the pathological mechanism of diseases. However, it is time-consuming and costly to identify the disorder-specific microbes from the biological \"haystack\" merely by routine wet-lab experiments. With the developments in next-generation sequencing and omics-based trials, it is imperative to develop computational prediction models for predicting microbe-disease associations on a large scale.
    RESULTS: Based on the known microbe-disease associations derived from the Human Microbe-Disease Association Database (HMDAD), the proposed model shows reliable performance with high values of the area under ROC curve (AUC) of 0.9456 and 0.8866 in leave-one-out cross validations and five-fold cross validations, respectively. In case studies of colorectal carcinoma, 80% out of the top-20 predicted microbes have been experimentally confirmed via published literatures.
    CONCLUSIONS: Based on the assumption that functionally similar microbes tend to share the similar interaction patterns with human diseases, we here propose a group based computational model of Bayesian disease-oriented ranking to prioritize the most potential microbes associating with various human diseases. Based on the sequence information of genes, two computational approaches (BLAST+ and MEGA 7) are leveraged to measure the microbe-microbe similarity from different perspectives. The disease-disease similarity is calculated by capturing the hierarchy information from the Medical Subject Headings (MeSH) data. The experimental results illustrate the accuracy and effectiveness of the proposed model. This work is expected to facilitate the characterization and identification of promising microbial biomarkers.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    长链非编码RNA(lncRNA),一种超过200个核苷酸的非编码RNA,与各种复杂疾病有关。为了准确地识别潜在的lncRNA-疾病关联对于了解疾病的发病机制非常重要。开发新的药物,并针对不同的人类疾病设计个性化的诊断和治疗方法。与生物实验的复杂性和高成本相比,计算方法可以快速有效地预测潜在的lncRNA-疾病关联。因此,开发lncRNA疾病预测的计算方法是一个有前途的途径。然而,由于现有技术方法的预测精度较低,目前,准确有效地识别lncRNA疾病具有巨大的挑战性。本文提出了一种称为LPARP的集成方法,这是基于标签传播算法和随机投影来解决这个问题。具体来说,标签传播算法最初用于获得lncRNA-疾病关联的估计得分,然后使用随机预测来准确预测疾病相关的lncRNAs。实证实验表明,LAPRP在三个golddata集上取得了良好的预测效果,优于现有的最先进的预测方法。它也可用于预测分离的疾病和新的lncRNAs。膀胱癌的案例研究,食管鳞状细胞癌,结直肠癌进一步证明了该方法的可靠性。提出的LPARP算法可以用较少的数据稳定有效地预测潜在的lncRNA-疾病相互作用。LPARP可以作为生物医学研究的有效和可靠的工具。
    Long noncoding RNA (lncRNA), a type of more than 200 nucleotides non-coding RNA, is related to various complex diseases. To precisely identify the potential lncRNA-disease association is important to understand the disease pathogenesis, to develop new drugs, and to design individualized diagnosis and treatment methods for different human diseases. Compared with the complexity and high cost of biological experiments, computational methods can quickly and effectively predict potential lncRNA-disease associations. Thus, it is a promising avenue to develop computational methods for lncRNA-disease prediction. However, owing to the low prediction accuracy ofstate of the art methods, it is vastly challenging to accurately and effectively identify lncRNA-disease at present. This article proposed an integrated method called LPARP, which is based on label-propagation algorithm and random projection to address the issue. Specifically, the label-propagation algorithm is initially used to obtain the estimated scores of lncRNA-disease associations, and then random projections are used to accurately predict disease-related lncRNAs.The empirical experiments showed that LAPRP achieved good prediction on three golddatasets, which is superior to existing state-of-the-art prediction methods. It can also be used to predict isolated diseases and new lncRNAs. Case studies of bladder cancer, esophageal squamous-cell carcinoma, and colorectal cancer further prove the reliability of the method. The proposed LPARP algorithm can predict the potential lncRNA-disease interactions stably and effectively with fewer data. LPARP can be used as an effective and reliable tool for biomedical research.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Biomedical knowledge graphs (KGs), which can help with the understanding of complex biological systems and pathologies, have begun to play a critical role in medical practice and research. However, challenges remain in their embedding and use due to their complex nature and the specific demands of their construction. Existing studies often suffer from problems such as sparse and noisy datasets, insufficient modeling methods and non-uniform evaluation metrics. In this work, we established a comprehensive KG system for the biomedical field in an attempt to bridge the gap. Here, we introduced PharmKG, a multi-relational, attributed biomedical KG, composed of more than 500 000 individual interconnections between genes, drugs and diseases, with 29 relation types over a vocabulary of ~8000 disambiguated entities. Each entity in PharmKG is attached with heterogeneous, domain-specific information obtained from multi-omics data, i.e. gene expression, chemical structure and disease word embedding, while preserving the semantic and biomedical features. For baselines, we offered nine state-of-the-art KG embedding (KGE) approaches and a new biological, intuitive, graph neural network-based KGE method that uses a combination of both global network structure and heterogeneous domain features. Based on the proposed benchmark, we conducted extensive experiments to assess these KGE models using multiple evaluation metrics. Finally, we discussed our observations across various downstream biological tasks and provide insights and guidelines for how to use a KG in biomedicine. We hope that the unprecedented quality and diversity of PharmKG will lead to advances in biomedical KG construction, embedding and application.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    It is known that miRNA plays an increasingly important role in many physiological processes. Disease-related miRNAs could be potential biomarkers for clinical diagnosis, prognosis, and treatment. Therefore, accurately inferring potential miRNAs related to diseases has become a hot topic in the bioinformatics community recently. In this study, we proposed a mathematical model based on matrix decomposition, named MFMDA, to identify potential miRNA-disease associations by integrating known miRNA and disease-related data, similarities between miRNAs and between diseases. We also compared MFMDA with some of the latest algorithms in several established miRNA disease databases. MFMDA reached an AUC of 0.9061 in the fivefold cross-validation. The experimental results show that MFMDA effectively infers novel miRNA-disease associations. In addition, we conducted case studies by applying MFMDA to three types of high-risk human cancers. While most predicted miRNAs are confirmed by external databases of experimental literature, we also identified a few novel disease-related miRNAs for further experimental validation.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    越来越多的证据表明,microRNA(miRNAs)在许多重要的生物过程中起着重要的作用,它们的突变和紊乱将导致各种复杂疾病的发生。通过计算方法预测与潜在疾病相关的miRNA有利于识别生物标志物和发现特定的药物。这可以大大降低诊断成本,治愈,预后,预防人类疾病。然而,如何通过有效整合不同的生物学数据来进一步实现对潜在miRNA-疾病关联的更可靠的预测是研究人员面临的挑战。在这项研究中,我们通过使用联合多相似融合和空间投影(MSFSP)的联合方法提出了一个计算模型。MSFSP首先融合了整合的疾病相似度(由疾病语义相似度,疾病功能相似性,和疾病汉明相似性)与整合的miRNA相似性(由miRNA功能相似性组成,miRNA序列相似性,和miRNA汉明相似性)。其次,它通过使用相似性网络从实验验证的miRNA-疾病关联的布尔网络构建了miRNA-疾病关联的加权网络.最后,它通过加权miRNA空间投影得分和疾病空间投影得分来计算预测结果。留一交叉验证表明,MSFSP具有出色的预测准确性,受试者工作特征曲线下面积(AUC)为0.9613,优于其他五个现有模型。在案例研究中,MSFSP的预测能力得到了进一步证实,因为前列腺肿瘤和肺肿瘤的前50个预测中的96%和98%已通过实验证据成功验证,并且对孤立疾病的前50个预测中的100%也发现了支持实验证据.
    Growing evidences have indicated that microRNAs (miRNAs) play a significant role relating to many important bioprocesses; their mutations and disorders will cause the occurrence of various complex diseases. The prediction of miRNAs associated with underlying diseases via computational approaches is beneficial to identify biomarkers and discover specific medicine, which can greatly reduce the cost of diagnosis, cure, prognosis, and prevention of human diseases. However, how to further achieve a more reliable prediction of potential miRNA-disease associations with effective integration of different biological data is a challenge for researchers. In this study, we proposed a computational model by using a federated method of combined multiple-similarities fusion and space projection (MSFSP). MSFSP firstly fused the integrated disease similarity (composed of disease semantic similarity, disease functional similarity, and disease Hamming similarity) with the integrated miRNA similarity (composed of miRNA functional similarity, miRNA sequence similarity, and miRNA Hamming similarity). Secondly, it constructed the weighted network of miRNA-disease associations from the experimentally verified Boolean network of miRNA-disease associations by using similarity networks. Finally, it calculated the prediction results by weighting miRNA space projection scores and the disease space projection scores. Leave-one-out cross-validation demonstrated that MSFSP has the distinguished predictive accuracy with area under the receiver operating characteristics curve (AUC) of 0.9613 better than that of five other existing models. In case studies, the predictive ability of MSFSP was further confirmed as 96 and 98% of the top 50 predictions for prostatic neoplasms and lung neoplasms were successfully validated by experimental evidences and supporting experimental evidences were also found for 100% of the top 50 predictions for isolated diseases.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    Effectively representing Medical Subject Headings (MeSH) headings (terms) such as disease and drug as discriminative vectors could greatly improve the performance of downstream computational prediction models. However, these terms are often abstract and difficult to quantify. In this paper, we converted the MeSH tree structure into a relationship network and applied several graph embedding algorithms on it to represent these terms. Specifically, the relationship network consisting of nodes (MeSH headings) and edges (relationships), which can be constructed by the tree num. Then, five graph embedding algorithms including DeepWalk, LINE, SDNE, LAP and HOPE were implemented on the relationship network to represent MeSH headings as vectors. In order to evaluate the performance of the proposed methods, we carried out the node classification and relationship prediction tasks. The results show that the MeSH headings characterized by graph embedding algorithms can not only be treated as an independent carrier for representation, but also can be utilized as additional information to enhance the representation ability of vectors. Thus, it can serve as an input and continue to play a significant role in any computational models related to disease, drug, microbe, etc. Besides, our method holds great hope to inspire relevant researchers to study the representation of terms in this network perspective.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Published Erratum
    [This corrects the article DOI: 10.3389/fgene.2019.01259.].
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    长链非编码RNA(长ncRNA,lncRNAs)与一系列细胞发育过程和疾病有关,虽然它们不被翻译成蛋白质。通过计算方法推断疾病相关的lncRNAs可以帮助理解疾病的发病机制。但是这些当前的计算方法仍然没有取得显著的预测性能:例如相似性网络的不准确构建和已知lncRNA-疾病关联的数量不足。在这项研究中,我们提出了基于整合空间投影得分的lncRNA-疾病关联推断(LDAI-ISPS),该推断由以下关键步骤组成:通过组合所有全局信息,将已知lncRNA-疾病关联的布尔网络更改为加权网络(例如,疾病语义相似性,lncRNA功能相似性,和已知的lncRNA-疾病关联);通过加权网络的矢量投影获得空间投影得分,以形成没有偏差的最终预测得分。留一交叉验证(LOOCV)结果表明,与其他方法相比,LDAI-ISPS具有更高的准确性,曲线下面积(AUC)值为0.9154,用于推断疾病,AUC值为0.8865,用于推断新的lncRNAs(其与疾病相关的关联未知),推断孤立疾病的AUC值为0.7518(其与lncRNAs相关的关联未知)。案例研究还证实了LDAI-ISPS作为传统生物学实验的辅助工具在推断潜在的LncRNA-疾病关联和孤立疾病方面的预测性能。
    Long non-coding RNAs (long ncRNAs, lncRNAs) of all kinds have been implicated in a range of cell developmental processes and diseases, while they are not translated into proteins. Inferring diseases associated lncRNAs by computational methods can be helpful to understand the pathogenesis of diseases, but those current computational methods still have not achieved remarkable predictive performance: such as the inaccurate construction of similarity networks and inadequate numbers of known lncRNA-disease associations. In this research, we proposed a lncRNA-disease associations inference based on integrated space projection scores (LDAI-ISPS) composed of the following key steps: changing the Boolean network of known lncRNA-disease associations into the weighted networks via combining all the global information (e.g., disease semantic similarities, lncRNA functional similarities, and known lncRNA-disease associations); obtaining the space projection scores via vector projections of the weighted networks to form the final prediction scores without biases. The leave-one-out cross validation (LOOCV) results showed that, compared with other methods, LDAI-ISPS had a higher accuracy with area-under-the-curve (AUC) value of 0.9154 for inferring diseases, with AUC value of 0.8865 for inferring new lncRNAs (whose associations related to diseases are unknown), with AUC value of 0.7518 for inferring isolated diseases (whose associations related to lncRNAs are unknown). A case study also confirmed the predictive performance of LDAI-ISPS as a helper for traditional biological experiments in inferring the potential LncRNA-disease associations and isolated diseases.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

公众号