peptide identification

肽鉴定
  • 文章类型: Journal Article
    质谱(MS)的进步使生物系统中蛋白质组的高通量分析成为可能。最先进的MS数据分析依赖于数据库搜索算法,通过识别肽谱匹配(PSM)来量化蛋白质,将质谱转换为肽序列。不同的数据库搜索算法使用不同的搜索策略,因此可以识别唯一的PSM。没有现有的方法可以聚合所有用户指定的数据库搜索算法,同时保证鉴定的肽的数量增加和对错误发现率(FDR)的控制.为了填补这个空白,我们提出了一个统计框架,肽的聚集鉴定结果(APIR),这与所有数据库搜索算法普遍兼容。值得注意的是,在FDR阈值下,APIR保证识别至少一样多,如果不是更多,肽作为个体数据库搜索算法。在复杂的蛋白质组学标准数据集上对APIR的评估表明,APIR胜过单个数据库搜索算法,并根据经验控制FDR。实际数据研究表明,APIR可以识别与疾病相关的蛋白质和一些个别数据库搜索算法错过的翻译后修饰。APIR框架可轻松扩展到其他高通量生物医学数据分析中多种算法的汇总发现,例如,RNA测序数据的差异基因表达分析。APIRR软件包可在https://github.com/yiling0210/APIR获得。
    Advances in mass spectrometry (MS) have enabled high-throughput analysis of proteomes in biological systems. The state-of-the-art MS data analysis relies on database search algorithms to quantify proteins by identifying peptide-spectrum matches (PSMs), which convert mass spectra to peptide sequences. Different database search algorithms use distinct search strategies and thus may identify unique PSMs. However, no existing approaches can aggregate all user-specified database search algorithms with a guaranteed increase in the number of identified peptides and a control on the false discovery rate (FDR). To fill in this gap, we proposed a statistical framework, Aggregation of Peptide Identification Results (APIR), that is universally compatible with all database search algorithms. Notably, under an FDR threshold, APIR is guaranteed to identify at least as many, if not more, peptides as individual database search algorithms do. Evaluation of APIR on a complex proteomics standard dataset showed that APIR outpowers individual database search algorithms and empirically controls the FDR. Real data studies showed that APIR can identify disease-related proteins and post-translational modifications missed by some individual database search algorithms. The APIR framework is easily extendable to aggregating discoveries made by multiple algorithms in other high-throughput biomedical data analysis, e.g., differential gene expression analysis on RNA sequencing data. The APIR R package is available at https://github.com/yiling0210/APIR.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    内源性肽是一类丰富且通用的生物分子,具有与神经功能相关的重要作用,内分泌,免疫系统和其他。质谱是鉴定内源性肽的首要技术,然而,由于缺乏可靠的原始质谱分析和解释的优化计算资源,该领域仍然面临挑战。由于内源性肽的独特特性,当前的数据库搜索程序可能会出现差异。这通常需要专门的搜索注意事项。在这里,我们呈现出高吞吐量,新颖的评分算法,用于在任何内源性肽数据库中提取和排序保守的氨基酸序列基序。图案是跨生物体的保守模式,代表对生物功能至关重要的序列部分,包括维持体内平衡。主题,我们新颖的主题数据库生成算法,旨在与EndoGenius合作,一种针对内源性肽的数据库搜索进行优化的程序,该程序由基序数据库提供动力,以利用生物学背景来产生识别。MotifQuest旨在在没有任何先验知识的情况下快速开发主题数据库,传统序列比对资源不可能完成的艰巨任务。在这项工作中,我们通过展示其识别抗菌肽的能力,说明了MotifQuest将EndoGenius的识别效用扩展到其他内源性肽的效用。此外,我们讨论了MotifQuest从FASTA数据库文件中解析出基序的潜在效用,这些基序可以进一步验证为新的肽类药物候选药物.
    Endogenous peptides are an abundant and versatile class of biomolecules with vital roles pertinent to the functionality of the nervous, endocrine, and immune systems and others. Mass spectrometry stands as a premier technique for identifying endogenous peptides, yet the field still faces challenges due to the lack of optimized computational resources for reliable raw mass spectra analysis and interpretation. Current database searching programs can exhibit discrepancies due to the unique properties of endogenous peptides, which typically require specialized search considerations. Herein, we present a high throughput, novel scoring algorithm for the extraction and ranking of conserved amino acid sequence motifs within any endogenous peptide database. Motifs are conserved patterns across organisms, representing sequence moieties crucial for biological functions, including maintenance of homeostasis. MotifQuest, our novel motif database generation algorithm, is designed to work in partnership with EndoGenius, a program optimized for database searching of endogenous peptides and that is powered by a motif database to capitalize on biological context to produce identifications. MotifQuest aims to quickly develop motif databases without any prior knowledge, a laborious task not possible with traditional sequence alignment resources. In this work we illustrate the utility of MotifQuest to expand EndoGenius\' identification utility to other endogenous peptides by showcasing its ability to identify antimicrobial peptides. Additionally, we discuss the potential utility of MotifQuest to parse out motifs from a FASTA database file that can be further validated as new peptide drug candidates.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    蛋白质组学,研究生物系统中的蛋白质,近年来取得了显著进步,随着蛋白质同工型检测成为下一个主要领域之一。主要挑战之一是由于大数据中的蛋白质推断问题和蛋白质错误发现率估计挑战,实现必要的肽和蛋白质覆盖以自信地区分同种型。在这一章中,我们描述了人工智能辅助肽属性预测在Oktoberfest数据库搜索引擎评分中的应用,一种被证明有效的方法,特别是对于复杂的样本和广泛的搜索空间,这可以大大提高肽的覆盖率。Further,它说明了一种通过PickedGroupFDR方法增加同工型覆盖率的方法,该方法旨在应用于大型数据时表现出色。提供了真实世界的例子来说明工具在重新评分的背景下的效用,蛋白质分组,和错误发现率估计。通过实施这些尖端技术,研究人员可以实现肽和同工型覆盖率的大幅增加,从而在他们的研究中释放了蛋白质同工型检测的潜力,并揭示了它们在生物过程中的作用和功能。
    Proteomics, the study of proteins within biological systems, has seen remarkable advancements in recent years, with protein isoform detection emerging as one of the next major frontiers. One of the primary challenges is achieving the necessary peptide and protein coverage to confidently differentiate isoforms as a result of the protein inference problem and protein false discovery rate estimation challenge in large data. In this chapter, we describe the application of artificial intelligence-assisted peptide property prediction for database search engine rescoring by Oktoberfest, an approach that has proven effective, particularly for complex samples and extensive search spaces, which can greatly increase peptide coverage. Further, it illustrates a method for increasing isoform coverage by the PickedGroupFDR approach that is designed to excel when applied on large data. Real-world examples are provided to illustrate the utility of the tools in the context of rescoring, protein grouping, and false discovery rate estimation. By implementing these cutting-edge techniques, researchers can achieve a substantial increase in both peptide and isoform coverage, thus unlocking the potential of protein isoform detection in their studies and shedding light on their roles and functions in biological processes.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    这项工作的目的是发展,第一次,可持续战略,基于超声辅助提取的使用,天然深共晶溶剂,和加压液体萃取,从石灰(柑橘xlatifolia)果皮中提取蛋白质,并评估其释放生物活性肽的潜力。PLE显示最大的蛋白质提取(66-69%),使用三种不同的酶水解(Alcalase2.4LFG,Alcalase®PURE2.4L,和嗜热菌蛋白酶)。评价了释放肽的体外抗氧化和降压活性。尽管所有水解产物都显示出抗氧化和降压活性,用嗜热菌蛋白酶获得的水解产物显示出最显著的值。由于所有水解产物中的总酚类含量较低,肽可能是这些生物活性的主要贡献者。通过UHPLC-QTOF-MS分析水解产物,鉴定出总共98种不同的肽。这些肽中的大多数富含与抗氧化活性相关的氨基酸。
    The aim of this work was to develop, for the first time, sustainable strategies, based on the use of Ultrasound-Assisted Extraction, Natural Deep Eutectic Solvents, and Pressurized Liquid Extraction, to extract proteins from lime (Citrus x latifolia) peels and to evaluate their potential to release bioactive peptides. PLE showed the largest extraction of proteins (66-69%), which were hydrolysed using three different enzymes (Alcalase 2.4 L FG, Alcalase®PURE 2.4 L, and Thermolysin). The in vitro antioxidant and antihypertensive activities of released peptides were evaluated. Although all hydrolysates showed antioxidant and antihypertensive activity, the hydrolysate obtained with Thermolysin showed the most significant values. Since the Total Phenolic Content in all hydrolysates was low, peptides were likely the main contributors to these bioactivities. Hydrolysates were analyzed by UHPLC-QTOF-MS and a total of 98 different peptides were identified. Most of these peptides were rich in amino acids associated with antioxidant activity.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    植物基因组学通过提供创新的解决方案来提高作物产量,在增强全球粮食安全和可持续性方面发挥着关键作用。抗病性,和压力耐受性。随着测序基因组数量的增长以及基因组组装的准确性和连续性的提高,植物基因组的结构注释仍然是一个重大的挑战,由于它们的大尺寸,多倍体,和丰富的重复内容。在本文中,我们概述了作物基因组学研究的现状,突出了各种作物物种基因组特征的多样性。我们还评估了流行的基因预测工具在识别作物基因组中的基因的准确性,并检查了影响其性能的因素。我们的发现突出了BRAKER2和Helixer作为领先的结构基因组注释工具的优势和局限性,并强调了基因组复杂性的影响。碎片化,重复他们的表演内容。此外,我们使用质谱数据评估了预测蛋白质作为蛋白质组学研究中可靠搜索空间的适用性.我们的结果为今后完善和推进结构基因组注释领域提供了有价值的见解。
    Plant genomics plays a pivotal role in enhancing global food security and sustainability by offering innovative solutions for improving crop yield, disease resistance, and stress tolerance. As the number of sequenced genomes grows and the accuracy and contiguity of genome assemblies improve, structural annotation of plant genomes continues to be a significant challenge due to their large size, polyploidy, and rich repeat content. In this paper, we present an overview of the current landscape in crop genomics research, highlighting the diversity of genomic characteristics across various crop species. We also assessed the accuracy of popular gene prediction tools in identifying genes within crop genomes and examined the factors that impact their performance. Our findings highlight the strengths and limitations of BRAKER2 and Helixer as leading structural genome annotation tools and underscore the impact of genome complexity, fragmentation, and repeat content on their performance. Furthermore, we evaluated the suitability of the predicted proteins as a reliable search space in proteomics studies using mass spectrometry data. Our results provide valuable insights for future efforts to refine and advance the field of structural genome annotation.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    源自由肽性质预测器启用的数据库搜索引擎的肽谱匹配的重新评分超过了来自传统数据库搜索引擎的肽鉴定的性能。与传统数据库搜索引擎计算的肽谱匹配分数相反,根据比较观察到的和预测的肽特性,重新计算肽谱匹配产生分数,如碎片离子强度和保留时间。这些新产生的分数能够更有效地区分正确和不正确的肽谱匹配。这种方法被证明可以显著改善可靠鉴定的肽的数量。促进对不同领域具有挑战性的数据集的分析,如免疫吸附,元蛋白质组学,蛋白质基因组学,和单细胞蛋白质组学。在这次审查中,我们总结了导致最近引入多个数据驱动的重新评分管道的关键要素。我们概述了相关的后处理评分工具,为各种应用引入突出的数据驱动的评分管道,并强调局限性,这种方法的机遇和未来前景及其对基于质谱的蛋白质组学的影响。
    Rescoring of peptide spectrum matches originating from database search engines enabled by peptide property predictors is exceeding the performance of peptide identification from traditional database search engines. In contrast to the peptide spectrum match scores calculated by traditional database search engines, rescoring peptide spectrum matches generates scores based on comparing observed and predicted peptide properties, such as fragment ion intensities and retention times. These newly generated scores enable a more efficient discrimination between correct and incorrect peptide spectrum matches. This approach was shown to lead to substantial improvements in the number of confidently identified peptides, facilitating the analysis of challenging datasets in various fields such as immunopeptidomics, metaproteomics, proteogenomics, and single-cell proteomics. In this review, we summarize the key elements leading up to the recent introduction of multiple data-driven rescoring pipelines. We provide an overview of relevant post-processing rescoring tools, introduce prominent data-driven rescoring pipelines for various applications, and highlight limitations, opportunities, and future perspectives of this approach and its impact on mass spectrometry-based proteomics.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    美国龙虾,美洲人马罗斯,不仅具有相当大的经济重要性,而且已经成为神经科学研究中的首要模型生物。神经肽,一类重要的细胞间信号分子,在广泛的生理和心理过程中起着至关重要的作用。利用最近测序的美国龙虾的高质量基因组草案,我们的研究试图描述这种模式生物的神经肽组.采用先进的质谱技术,我们在美洲人鱼中鉴定出24种神经肽前体和101种独特的成熟神经肽。有趣的是,其中67种神经肽首次被发现。我们的发现提供了龙虾神经系统的肽属性的全面概述,并强调了这些神经肽的组织特异性分布。总的来说,这项研究不仅丰富了我们对美国龙虾神经元复杂性的理解,而且为未来研究这些肽在甲壳类物种中的功能作用奠定了基础。质谱数据已与标识符PXD047230一起存放在PRIDE存储库中。
    The American lobster, Homarus americanus, is not only of considerable economic importance but has also emerged as a premier model organism in neuroscience research. Neuropeptides, an important class of cell-to-cell signaling molecules, play crucial roles in a wide array of physiological and psychological processes. Leveraging the recently sequenced high-quality draft genome of the American lobster, our study sought to profile the neuropeptidome of this model organism. Employing advanced mass spectrometry techniques, we identified 24 neuropeptide precursors and 101 unique mature neuropeptides in Homarus americanus. Intriguingly, 67 of these neuropeptides were discovered for the first time. Our findings provide a comprehensive overview of the peptidomic attributes of the lobster\'s nervous system and highlight the tissue-specific distribution of these neuropeptides. Collectively, this research not only enriches our understanding of the neuronal complexities of the American lobster but also lays a foundation for future investigations into the functional roles that these peptides play in crustacean species. The mass spectrometry data have been deposited in the PRIDE repository with the identifier PXD047230.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    液相色谱耦合质谱(LC-MS/MS)是获得疾病或患者特异性人类白细胞抗原(HLA)呈递的直接证据的主要方法。然而,与蛋白质组学中胰蛋白酶肽的分析相比,HLA肽的分析仍然存在计算和统计学挑战.最近,基于碎片离子强度的匹配分数评估预测和观察到的光谱之间的相似性被证明大大增加了确定的肽的数量,特别是在分析非胰蛋白酶肽的用例中。在这一章中,我们详细描述了如何从最先进的深度学习模型中受益来分析和验证单个光谱的三个过程,单次测量,以及基于质谱的免疫感受态学中的多种测量。为此,我们解释如何使用通用频谱浏览器(USE),网上啤酒节,离线啤酒节。对于基于强度的评分,啤酒节使用来自深度学习框架Prosit的碎片离子强度和保留时间预测,在ProteomeTools项目中生成的大量合成肽和串联质谱上训练的深度神经网络。显示的示例突出了深度学习辅助分析如何增加已识别的HLA肽的数量,促进自信地发现新表位,或提供协助评估隐藏肽的存在,如剪接的肽。
    Liquid chromatography-coupled mass spectrometry (LC-MS/MS) is the primary method to obtain direct evidence for the presentation of disease- or patient-specific human leukocyte antigen (HLA). However, compared to the analysis of tryptic peptides in proteomics, the analysis of HLA peptides still poses computational and statistical challenges. Recently, fragment ion intensity-based matching scores assessing the similarity between predicted and observed spectra were shown to substantially increase the number of confidently identified peptides, particularly in use cases where non-tryptic peptides are analyzed. In this chapter, we describe in detail three procedures on how to benefit from state-of-the-art deep learning models to analyze and validate single spectra, single measurements, and multiple measurements in mass spectrometry-based immunopeptidomics. For this, we explain how to use the Universal Spectrum Explorer (USE), online Oktoberfest, and offline Oktoberfest. For intensity-based scoring, Oktoberfest uses fragment ion intensity and retention time predictions from the deep learning framework Prosit, a deep neural network trained on a very large number of synthetic peptides and tandem mass spectra generated within the ProteomeTools project. The examples shown highlight how deep learning-assisted analysis can increase the number of identified HLA peptides, facilitate the discovery of confidently identified neo-epitopes, or provide assistance in the assessment of the presence of cryptic peptides, such as spliced peptides.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    肽谱匹配(PSM)的重新评分已成为分析串联质谱数据的标准程序。这强调了对此类算法的软件维护和持续改进的需求。我们介绍MS2Rescore3.0,一个多功能的,模块化,和用户友好的平台,旨在增加肽的鉴定。研究人员可以在各种平台上以最小的努力安装MS2Rescore,并从图形用户界面中受益。模块化的PythonAPI,和大量的文档。为了展示这个新版本,我们将MS2Rescore3.0与MSAmanda3.0连接,MSAmanda3.0是完善的搜索引擎的新版本,解决了以前对自动评分的限制。在新功能中,MSAmanda现在包含可用于重新评分的其他输出列。当应用于具有挑战性的数据集时,可以最好地揭示重新评分的全部潜力。因此,我们评估了这两种工具在公开可用的单细胞数据集上的性能,PSM的数量大幅增加,从而证明MS2Rescore提供了增强肽鉴定的强大解决方案。MS2Rescore的模块化设计和用户友好的界面使数据驱动的重新评分易于访问,即使是没有经验的用户。因此,我们希望MS2Rescore成为更广泛的蛋白质组学社区的有价值的工具。MS2Rescore可在https://github.com/compomics/ms2rescore获得。
    Rescoring of peptide-spectrum matches (PSMs) has emerged as a standard procedure for the analysis of tandem mass spectrometry data. This emphasizes the need for software maintenance and continuous improvement for such algorithms. We introduce MS2Rescore 3.0, a versatile, modular, and user-friendly platform designed to increase peptide identifications. Researchers can install MS2Rescore across various platforms with minimal effort and benefit from a graphical user interface, a modular Python API, and extensive documentation. To showcase this new version, we connected MS2Rescore 3.0 with MS Amanda 3.0, a new release of the well-established search engine, addressing previous limitations on automatic rescoring. Among new features, MS Amanda now contains additional output columns that can be used for rescoring. The full potential of rescoring is best revealed when applied on challenging data sets. We therefore evaluated the performance of these two tools on publicly available single-cell data sets, where the number of PSMs was substantially increased, thereby demonstrating that MS2Rescore offers a powerful solution to boost peptide identifications. MS2Rescore\'s modular design and user-friendly interface make data-driven rescoring easily accessible, even for inexperienced users. We therefore expect the MS2Rescore to be a valuable tool for the wider proteomics community. MS2Rescore is available at https://github.com/compomics/ms2rescore.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    在鸟枪蛋白质组学中,蛋白质组搜索引擎分析通过实验获得的质谱,然后报告每个光谱的肽-光谱匹配(PSM)。然而,识别的大多数PSM都不正确,因此,已经开发了各种后处理软件来重新排序肽的鉴定。然而,这些方法存在诸如对分配的依赖等问题,依赖浅层模型,和有限的效力。在这项工作中,我们提议AttnPep,一种利用自我注意力模块对PSM分数进行评分的深度学习模型。该模块有助于神经网络专注于与PSM分类相关的特征,而忽略不相关的特征。这允许AttnPep分析不同搜索引擎的输出并提高PSM辨别精度。我们认为PSM是正确的,如果它达到的q值<0.01和AttnPep与现有的主流软件PeptideProphet比较,渗滤器,和proteoTorch。结果表明,AttnPep发现相对于其他方法,正确的PSM平均增加了9.29%。此外,AttnPep能够更好地区分正确和不正确的PSM,并在复杂的SWATH数据集中发现了更多的合成肽。
    In shotgun proteomics, the proteome search engine analyzes mass spectra obtained by experiments, and then a peptide-spectra match (PSM) is reported for each spectrum. However, most of the PSMs identified are incorrect, and therefore various postprocessing software have been developed for reranking the peptide identifications. Yet these methods suffer from issues such as dependency on distribution, reliance on shallow models, and limited effectiveness. In this work, we propose AttnPep, a deep learning model for rescoring PSM scores that utilizes the Self-Attention module. This module helps the neural network focus on features relevant to the classification of PSMs and ignore irrelevant features. This allows AttnPep to analyze the output of different search engines and improve PSM discrimination accuracy. We considered a PSM to be correct if it achieves a q-value <0.01 and compared AttnPep with existing mainstream software PeptideProphet, Percolator, and proteoTorch. The results indicated that AttnPep found an average increase in correct PSMs of 9.29% relative to the other methods. Additionally, AttnPep was able to better distinguish between correct and incorrect PSMs and found more synthetic peptides in the complex SWATH data set.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号