Mesh : Humans Neural Networks, Computer Phenotype Cohort Studies DNA Methylation / genetics Male Female Middle Aged Smoking / genetics Genomics / methods Adult Computational Biology / methods CpG Islands / genetics Aged Multiomics

来  源:   DOI:10.1038/s41540-024-00405-w   PDF(Pubmed)

Abstract:
Integrating multi-omics data into predictive models has the potential to enhance accuracy, which is essential for precision medicine. In this study, we developed interpretable predictive models for multi-omics data by employing neural networks informed by prior biological knowledge, referred to as visible networks. These neural networks offer insights into the decision-making process and can unveil novel perspectives on the underlying biological mechanisms associated with traits and complex diseases. We tested the performance, interpretability and generalizability for inferring smoking status, subject age and LDL levels using genome-wide RNA expression and CpG methylation data from the blood of the BIOS consortium (four population cohorts, Ntotal = 2940). In a cohort-wise cross-validation setting, the consistency of the diagnostic performance and interpretation was assessed. Performance was consistently high for predicting smoking status with an overall mean AUC of 0.95 (95% CI: 0.90-1.00) and interpretation revealed the involvement of well-replicated genes such as AHRR, GPR15 and LRRN3. LDL-level predictions were only generalized in a single cohort with an R2 of 0.07 (95% CI: 0.05-0.08). Age was inferred with a mean error of 5.16 (95% CI: 3.97-6.35) years with the genes COL11A2, AFAP1, OTUD7A, PTPRN2, ADARB2 and CD34 consistently predictive. For both regression tasks, we found that using multi-omics networks improved performance, stability and generalizability compared to interpretable single omic networks. We believe that visible neural networks have great potential for multi-omics analysis; they combine multi-omic data elegantly, are interpretable, and generalize well to data from different cohorts.
摘要:
将多组学数据集成到预测模型中有可能提高准确性,这对精准医学至关重要。在这项研究中,我们通过采用由先验生物学知识提供信息的神经网络,为多组数据开发了可解释的预测模型,称为可见网络。这些神经网络提供了对决策过程的见解,并可以揭示与性状和复杂疾病相关的潜在生物学机制的新观点。我们测试了性能,推断吸烟状况的可解释性和普遍性,使用来自BIOS联盟血液的全基因组RNA表达和CpG甲基化数据(四个人群队列,Ntotal=2940)。在队列交叉验证设置中,评估了诊断性能和解释的一致性.预测吸烟状况的性能一直很高,总体平均AUC为0.95(95%CI:0.90-1.00),并且解释显示涉及复制良好的基因,例如AHRR,GPR15和LRRN3。LDL水平预测仅在R2为0.07(95%CI:0.05-0.08)的单个队列中得到推广。推断年龄与基因COL11A2,AFAP1,OTUD7A的平均误差为5.16(95%CI:3.97-6.35)年,PTPRN2、ADARB2和CD34一致预测。对于这两个回归任务,我们发现使用多组学网络提高了性能,与可解释的单组学网络相比,稳定性和泛化性。我们认为,可见神经网络在多组学分析中具有巨大的潜力;它们优雅地结合了多组学数据,是可解释的,并很好地推广到来自不同队列的数据。
公众号