关键词: Explainable deep neural network Small molecule identification Spectral similarity Structural similarity Tandem mass spectrometry

来  源:   DOI:10.1186/s13321-024-00858-5   PDF(Pubmed)

Abstract:
Small molecule identification is a crucial task in analytical chemistry and life sciences. One of the most commonly used technologies to elucidate small molecule structures is mass spectrometry. Spectral library search of product ion spectra (MS/MS) is a popular strategy to identify or find structural analogues. This approach relies on the assumption that spectral similarity and structural similarity are correlated. However, popular spectral similarity measures, usually calculated based on identical fragment matches between the MS/MS spectra, do not always accurately reflect the structural similarity. In this study, we propose TransExION, a Transformer based Explainable similarity metric for IONS. TransExION detects related fragments between MS/MS spectra through their mass difference and uses these to estimate spectral similarity. These related fragments can be nearly identical, but can also share a substructure. TransExION also provides a post-hoc explanation of its estimation, which can be used to support scientists in evaluating the spectral library search results and thus in structure elucidation of unknown molecules. Our model has a Transformer based architecture and it is trained on the data derived from GNPS MS/MS libraries. The experimental results show that it improves existing spectral similarity measures in searching and interpreting structural analogues as well as in molecular networking. SCIENTIFIC CONTRIBUTION: We propose a transformer-based spectral similarity metrics that improves the comparison of small molecule tandem mass spectra. We provide a post hoc explanation that can serve as a good starting point for unknown spectra annotation based on database spectra.
摘要:
小分子鉴定是分析化学和生命科学中的一项重要任务。阐明小分子结构的最常用技术之一是质谱法。产物离子光谱(MS/MS)的光谱库搜索是识别或查找结构类似物的流行策略。这种方法依赖于光谱相似性和结构相似性相关的假设。然而,流行的光谱相似性度量,通常基于MS/MS光谱之间的相同片段匹配来计算,并不总是准确地反映结构相似性。在这项研究中,我们提议TransExion,基于Transformer的可解释的IONS相似性度量。TransExION通过其质量差异检测MS/MS光谱之间的相关片段,并使用它们来估计光谱相似性。这些相关的片段可以几乎相同,但也可以共享一个子结构。TransExion还提供了对其估计的事后解释,这可用于支持科学家评估光谱库搜索结果,从而在未知分子的结构阐明。我们的模型具有基于Transformer的体系结构,并根据从GNPSMS/MS库导出的数据进行训练。实验结果表明,它改进了搜索和解释结构类似物以及分子网络中现有的光谱相似性度量。科学贡献:我们提出了一种基于变压器的光谱相似性度量,可改善小分子串联质谱的比较。我们提供了事后解释,可以作为基于数据库光谱的未知光谱注释的良好起点。
公众号