关键词: Imputation Metabolomics Multi-scale Variational autoencoder Whole genome sequencing

来  源:   DOI:10.1016/j.compbiomed.2024.108813

Abstract:
BACKGROUND: Missing data is a common challenge in mass spectrometry-based metabolomics, which can lead to biased and incomplete analyses. The integration of whole-genome sequencing (WGS) data with metabolomics data has emerged as a promising approach to enhance the accuracy of data imputation in metabolomics studies.
METHODS: In this study, we propose a novel method that leverages the information from WGS data and reference metabolites to impute unknown metabolites. Our approach utilizes a multi-scale variational autoencoder to jointly model the burden score, polygenetic risk score (PGS), and linkage disequilibrium (LD) pruned single nucleotide polymorphisms (SNPs) for feature extraction and missing metabolomics data imputation. By learning the latent representations of both omics data, our method can effectively impute missing metabolomics values based on genomic information.
RESULTS: We evaluate the performance of our method on empirical metabolomics datasets with missing values and demonstrate its superiority compared to conventional imputation techniques. Using 35 template metabolites derived burden scores, PGS and LD-pruned SNPs, the proposed methods achieved R2-scores > 0.01 for 71.55 % of metabolites.
CONCLUSIONS: The integration of WGS data in metabolomics imputation not only improves data completeness but also enhances downstream analyses, paving the way for more comprehensive and accurate investigations of metabolic pathways and disease associations. Our findings offer valuable insights into the potential benefits of utilizing WGS data for metabolomics data imputation and underscore the importance of leveraging multi-modal data integration in precision medicine research.
摘要:
背景:数据缺失是基于质谱的代谢组学中的一个共同挑战,这可能导致有偏见和不完整的分析。将全基因组测序(WGS)数据与代谢组学数据整合已成为一种有希望的方法,可提高代谢组学研究中数据填补的准确性。
方法:在本研究中,我们提出了一种新的方法,利用WGS数据和参考代谢物的信息来估算未知的代谢物。我们的方法利用多尺度变分自动编码器来联合建模负担分数,多遗传风险评分(PGS),和连锁不平衡(LD)修剪的单核苷酸多态性(SNP)用于特征提取和缺失的代谢组学数据填补。通过学习两个组学数据的潜在表示,我们的方法可以基于基因组信息有效地估算缺失的代谢组学值.
结果:我们评估了我们的方法在具有缺失值的经验代谢组学数据集上的性能,并证明了其与常规插补技术相比的优越性。使用35种模板代谢物得出的负担评分,PGS和LD修剪的SNP,对于71.55%的代谢物,所提出的方法的R2得分>0.01.
结论:在代谢组学插补中整合WGS数据不仅提高了数据完整性,而且增强了下游分析,为更全面和准确的代谢途径和疾病关联研究铺平了道路。我们的发现为利用WGS数据进行代谢组学数据插补的潜在好处提供了有价值的见解,并强调了在精准医学研究中利用多模式数据集成的重要性。
公众号