Spectrum clustering

频谱聚类
  • 文章类型: Congress
    2023年欧洲生物信息学质谱学会(EuBIC-MS)开发者大会于1月15日至1月20日召开,2023年,在提契诺州MonteVerità的国会斯特凡诺·弗朗辛,瑞士。参与者是从事计算质谱(MS)工作的科学家和开发人员,代谢组学,和蛋白质组学。为期5天的计划分为介绍性主题演讲和平行的黑客马拉松会议,重点是“蛋白质组学中的人工智能”,以刺激MS驱动的组学领域的未来方向。在后者中,参与者开发了生物信息学工具和资源,以满足社区的突出需求。黑客马拉松允许经验不足的参与者向更先进的计算MS专家学习,并积极为高度相关的研究项目做出贡献。通过改进数据分析和促进未来的研究,我们成功地产生了一些适用于蛋白质组学社区的新工具。
    The 2023 European Bioinformatics Community for Mass Spectrometry (EuBIC-MS) Developers Meeting was held from January 15th to January 20th, 2023, in Congressi Stefano Franscin at Monte Verità in Ticino, Switzerland. The participants were scientists and developers working in computational mass spectrometry (MS), metabolomics, and proteomics. The 5-day program was split between introductory keynote lectures and parallel hackathon sessions focusing on \"Artificial Intelligence in proteomics\" to stimulate future directions in the MS-driven omics areas. During the latter, the participants developed bioinformatics tools and resources addressing outstanding needs in the community. The hackathons allowed less experienced participants to learn from more advanced computational MS experts and actively contribute to highly relevant research projects. We successfully produced several new tools applicable to the proteomics community by improving data analysis and facilitating future research.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Video-Audio Media
    背景:微生物群落的高度多样性和复杂性使鉴定和量化群落中表达的大量蛋白质成为一项艰巨的挑战。传统的元蛋白质组学方法在很大程度上依赖于MS/MS谱对消化样品中相应短肽的准确鉴定。然后进行蛋白质推断以及随后对检测到的蛋白质进行分类学和功能分析。这些方法取决于源自样品特异性宏基因组数据或来自公共存储库的蛋白质序列数据库的可用性。由于这些蛋白质序列数据库的不完整和不完善,以及群落中不同细菌物种表达的同源蛋白的优势,这种肽鉴定和蛋白质推断的计算过程是具有挑战性和容易出错的,这阻碍了跨多个样本的元蛋白质组的比较。
    结果:我们开发了metaSpectraST,无监督和数据库独立的元蛋白质组学工作流程,它通过根据光谱相似性对实验观察到的MS/MS光谱进行聚类来定量分析和比较元蛋白质组学样品。我们在断奶后立即将metaSpectraST应用于从两只不同母鼠的同窝中收集的粪便样本。在没有任何肽谱鉴定的情况下获得不同小鼠的微生物群落的定量蛋白质组谱,并用于评估样品之间的总体相似性并突出任何区别标记。与传统的依赖数据库的元蛋白质组学分析相比,metaSpectraST在对样本进行分类和检测断奶后小鼠肠道微生物组的细微微生物组变化方面更为成功。metaSpectraST也可以用作从具有广泛个体间差异的样品中选择合适的生物学重复的工具。
    结论:metaSpectraST可以定量地快速分析蛋白质组学样品,无需构建蛋白质序列数据库或鉴定MS/MS谱。它最大限度地保留了包含在实验MS/MS光谱的信息,通过聚类所有的第一,从而能够更好地描绘复杂的微生物群落,并突出其功能的变化,与传统方法相比。将本节中的videobyte标记为ESM4视频摘要。
    The high diversity and complexity of the microbial community make it a formidable challenge to identify and quantify the large number of proteins expressed in the community. Conventional metaproteomics approaches largely rely on accurate identification of the MS/MS spectra to their corresponding short peptides in the digested samples, followed by protein inference and subsequent taxonomic and functional analysis of the detected proteins. These approaches are dependent on the availability of protein sequence databases derived either from sample-specific metagenomic data or from public repositories. Due to the incompleteness and imperfections of these protein sequence databases, and the preponderance of homologous proteins expressed by different bacterial species in the community, this computational process of peptide identification and protein inference is challenging and error-prone, which hinders the comparison of metaproteomes across multiple samples.
    We developed metaSpectraST, an unsupervised and database-independent metaproteomics workflow, which quantitatively profiles and compares metaproteomics samples by clustering experimentally observed MS/MS spectra based on their spectral similarity. We applied metaSpectraST to fecal samples collected from littermates of two different mother mice right after weaning. Quantitative proteome profiles of the microbial communities of different mice were obtained without any peptide-spectrum identification and used to evaluate the overall similarity between samples and highlight any differentiating markers. Compared to the conventional database-dependent metaproteomics analysis, metaSpectraST is more successful in classifying the samples and detecting the subtle microbiome changes of mouse gut microbiomes post-weaning. metaSpectraST could also be used as a tool to select the suitable biological replicates from samples with wide inter-individual variation.
    metaSpectraST enables rapid profiling of metaproteomic samples quantitatively, without the need for constructing the protein sequence database or identification of the MS/MS spectra. It maximally preserves information contained in the experimental MS/MS spectra by clustering all of them first and thus is able to better profile the complex microbial communities and highlight their functional changes, as compared with conventional approaches. tag the videobyte in this section as ESM4 Video Abstract.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    等压稳定同位素标记技术如串联质量标签(TMT)在蛋白质组学中已经变得流行,因为它们能够在单个实验中从多达18个样品中高精度地相对定量蛋白质。虽然肽定量中的缺失值在单个TMT实验中很少见,当组合多个TMT实验时,它们迅速增加。随着该领域走向分析越来越多的样本,减少缺失值的工具对于分析TMT数据集也变得更加重要。为此,我们开发了SIMSI转移(基于相似性的等压质谱2[MS2]识别转移),一种软件工具,通过对来自多个TMT实验的类似串联MS2进行聚类来扩展我们以前开发的软件MaRaCluster(©MatthewThe)。SIMSI转移是基于相似性聚集的MS2光谱代表相同肽的假设。因此,在一个TMT批次中通过数据库搜索进行的肽鉴定可以转移到另一个TMT批次,其中相同的肽被片段化但未被鉴定。为了评估这种方法的有效性,我们在屏蔽搜索引擎识别结果上测试了SIMSI-Transfer,并恢复了>80%的屏蔽标识,同时将传输过程中的错误控制在1%以下错误发现率。将SIMSI转移应用于来自临床蛋白质组学肿瘤分析联盟的六个已发表的完整蛋白质组和磷酸蛋白质组数据集,导致使用TMT定量鉴定的MS2光谱增加26%至45%。这显著减少了批次中缺失值的数量,反过来,在所有TMT批次中鉴定的肽和蛋白质的数量增加了43%至56%和13%至16%,分别。
    Isobaric stable isotope labeling techniques such as tandem mass tags (TMTs) have become popular in proteomics because they enable the relative quantification of proteins with high precision from up to 18 samples in a single experiment. While missing values in peptide quantification are rare in a single TMT experiment, they rapidly increase when combining multiple TMT experiments. As the field moves toward analyzing ever higher numbers of samples, tools that reduce missing values also become more important for analyzing TMT datasets. To this end, we developed SIMSI-Transfer (Similarity-based Isobaric Mass Spectra 2 [MS2] Identification Transfer), a software tool that extends our previously developed software MaRaCluster (© Matthew The) by clustering similar tandem MS2 from multiple TMT experiments. SIMSI-Transfer is based on the assumption that similarity-clustered MS2 spectra represent the same peptide. Therefore, peptide identifications made by database searching in one TMT batch can be transferred to another TMT batch in which the same peptide was fragmented but not identified. To assess the validity of this approach, we tested SIMSI-Transfer on masked search engine identification results and recovered >80% of the masked identifications while controlling errors in the transfer procedure to below 1% false discovery rate. Applying SIMSI-Transfer to six published full proteome and phosphoproteome datasets from the Clinical Proteomic Tumor Analysis Consortium led to an increase of 26 to 45% of identified MS2 spectra with TMT quantifications. This significantly decreased the number of missing values across batches and, in turn, increased the number of peptides and proteins identified in all TMT batches by 43 to 56% and 13 to 16%, respectively.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    Modern shotgun proteomics experiments generate gigabytes of spectra every hour, only a fraction of which were utilized to form biological conclusions. Instead of being stored as flat files in public data repositories, this large amount of data can be better organized to facilitate data reuse. Clustering these spectra by similarity can be helpful in building high-quality spectral libraries, correcting identification errors, and highlighting frequently observed but unidentified spectra. However, large-scale clustering is time-consuming. Here, we present ClusterSheep, a method utilizing Graphics Processing Units (GPUs) to accelerate the process. Unlike previously proposed algorithms for this purpose, our method performs true pairwise comparison of all spectra within a precursor mass-to-charge ratio tolerance, thereby preserving the full cluster structures. ClusterSheep was benchmarked against previously reported clustering tools, MS-Cluster, MaRaCluster, and msCRUSH. The software tool also functions as an interactive visualization tool with a persistent state, enabling the user to explore the resulting clusters visually and retrieve the clustering results as desired.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    Exponential growth of the mass spectrometry (MS) data is exhibited when the mass spectrometry-based proteomics has been developing rapidly. It is a great challenge to develop some quick, accurate and repeatable methods to identify peptides and proteins. Nowadays, the spectral library searching has become a mature strategy for tandem mass spectra based proteins identification in proteomics, which searches the experiment spectra against a collection of confidently identified MS/MS spectra that have been observed previously, and fully utilizes the abundance in the spectrum, peaks from non-canonical fragment ions, and other features. This review provides an overview of the implement of spectral library search strategy, and two key steps, spectral library construction and spectral library searching comprehensively, and discusses the progress and challenge of the library search strategy.
    基于质谱的蛋白质组学快速发展,蛋白质质谱数据也呈指数式增长。寻找速度快、准确度高以及重复性好的鉴定方法是该领域的一项重要任务。谱图库搜索策略直接比较实验谱图与谱图库中的真实谱图,充分利用了谱图中的丰度、非常规碎裂模式和其他的一些特征,使得搜索更加快速和准确,成为蛋白质组学的主流鉴定方法之一。文中介绍基于谱图库的蛋白质组质谱数据鉴定策略,并针对其中两个关键步骤——谱图库构建方法和谱图库搜索方法进行深入介绍,探讨了谱图库策略的进展和挑战。.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    光谱库搜索已成为蛋白质组学数据分析中识别串联质谱的成熟方法。这篇评论提供了可用的光谱库搜索引擎的全面概述,并强调了它们的独特功能。此外,总结了提供光谱库的资源,并提出了通过模拟光谱扩展实验光谱库的工具。最后,讨论了光谱聚类算法,该算法利用与光谱库搜索引擎相同的光谱到光谱匹配算法,并允许使用新颖的方法来分析蛋白质组学数据。
    Spectral library searching has become a mature method to identify tandem mass spectra in proteomics data analysis. This review provides a comprehensive overview of available spectral library search engines and highlights their distinct features. Additionally, resources providing spectral libraries are summarized and tools presented that extend experimental spectral libraries by simulating spectra. Finally, spectrum clustering algorithms are discussed that utilize the same spectrum-to-spectrum matching algorithms as spectral library search engines and allow novel methods to analyse proteomics data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号