Mesh : Deep Learning Tandem Mass Spectrometry / methods Polysaccharides / chemistry analysis Glycomics / methods Humans Chromatography, Liquid / methods Software Workflow Neural Networks, Computer

来  源:   DOI:10.1038/s41592-024-02314-6   PDF(Pubmed)

Abstract:
Glycans constitute the most complicated post-translational modification, modulating protein activity in health and disease. However, structural annotation from tandem mass spectrometry (MS/MS) data is a bottleneck in glycomics, preventing high-throughput endeavors and relegating glycomics to a few experts. Trained on a newly curated set of 500,000 annotated MS/MS spectra, here we present CandyCrunch, a dilated residual neural network predicting glycan structure from raw liquid chromatography-MS/MS data in seconds (top-1 accuracy: 90.3%). We developed an open-access Python-based workflow of raw data conversion and prediction, followed by automated curation and fragment annotation, with predictions recapitulating and extending expert annotation. We demonstrate that this can be used for de novo annotation, diagnostic fragment identification and high-throughput glycomics. For maximum impact, this entire pipeline is tightly interlaced with our glycowork platform and can be easily tested at https://colab.research.google.com/github/BojarLab/CandyCrunch/blob/main/CandyCrunch.ipynb . We envision CandyCrunch to democratize structural glycomics and the elucidation of biological roles of glycans.
摘要:
聚糖构成了最复杂的翻译后修饰,调节健康和疾病中的蛋白质活性。然而,来自串联质谱(MS/MS)数据的结构注释是糖组学的瓶颈,阻止高通量的努力,并将糖组学归于少数专家。对一组新策划的500,000个注释的MS/MS光谱进行了培训,在这里我们介绍CandyCrunch,从原始液相色谱-MS/MS数据以秒为单位预测聚糖结构的扩张残余神经网络(前1位准确度:90.3%)。我们开发了一个基于Python的开放访问原始数据转换和预测工作流程,然后是自动策展和片段注释,预测概括和扩展专家注释。我们证明了这可以用于从头注释,诊断片段鉴定和高通量糖组学。对于最大的影响,整个管道与我们的糖功平台紧密交织,可以在https://colab轻松测试。研究。google.com/github/BojarLab/CandyCrunch/blob/main/CandyCrunch.ipynb.我们设想CandyCrunch使结构糖组学民主化,并阐明聚糖的生物学作用。
公众号