关键词: Gromov-Wasserstein LC-MS cancer biology cancer metabolism computational biology data integration human optimal transport systems biology untargeted metabolomics

Mesh : Metabolomics / methods Humans Chromatography, Liquid / methods Algorithms Mass Spectrometry / methods Pancreatic Neoplasms / metabolism Liver Neoplasms / metabolism Metabolome

来  源:   DOI:10.7554/eLife.91597   PDF(Pubmed)

Abstract:
Untargeted metabolomic profiling through liquid chromatography-mass spectrometry (LC-MS) measures a vast array of metabolites within biospecimens, advancing drug development, disease diagnosis, and risk prediction. However, the low throughput of LC-MS poses a major challenge for biomarker discovery, annotation, and experimental comparison, necessitating the merging of multiple datasets. Current data pooling methods encounter practical limitations due to their vulnerability to data variations and hyperparameter dependence. Here, we introduce GromovMatcher, a flexible and user-friendly algorithm that automatically combines LC-MS datasets using optimal transport. By capitalizing on feature intensity correlation structures, GromovMatcher delivers superior alignment accuracy and robustness compared to existing approaches. This algorithm scales to thousands of features requiring minimal hyperparameter tuning. Manually curated datasets for validating alignment algorithms are limited in the field of untargeted metabolomics, and hence we develop a dataset split procedure to generate pairs of validation datasets to test the alignments produced by GromovMatcher and other methods. Applying our method to experimental patient studies of liver and pancreatic cancer, we discover shared metabolic features related to patient alcohol intake, demonstrating how GromovMatcher facilitates the search for biomarkers associated with lifestyle risk factors linked to several cancer types.
摘要:
通过液相色谱-质谱(LC-MS)进行非目标代谢组学分析可测量生物样本中大量的代谢物,推进药物开发,疾病诊断,和风险预测。然而,LC-MS的低通量对生物标志物的发现提出了重大挑战,注释,和实验比较,需要合并多个数据集。当前的数据池化方法由于其对数据变化和超参数依赖性的脆弱性而遇到实际限制。这里,我们介绍GromovMatcher,一种灵活且用户友好的算法,可使用最佳传输自动组合LC-MS数据集。通过利用特征强度相关结构,与现有方法相比,GromovMatcher提供了更高的对准精度和鲁棒性。此算法可扩展到数千个需要最小超参数调整的功能。用于验证比对算法的手动整理数据集在非靶向代谢组学领域受到限制,因此,我们开发了一个数据集拆分过程来生成验证数据集对,以测试GromovMatcher和其他方法产生的对齐。将我们的方法应用于肝癌和胰腺癌的实验患者研究,我们发现与患者酒精摄入量相关的共同代谢特征,证明GromovMatcher如何促进与生活方式风险因素相关的生物标志物的搜索与几种癌症类型。
公众号