enzyme prediction

  • 文章类型: Journal Article
    准确理解酶的生物学功能对于病理学和工业生物技术中的各种任务至关重要。然而,现有方法通常速度不够快,对预测结果缺乏解释,这严重限制了它们的实际应用。根据我们之前的工作,Deepre,我们通过设计新颖的自我引导注意力并结合通过大型蛋白质语言模型学习的生物学知识,提出了一种新的可解释和快速版本(ifDEEPre),以准确预测酶的佣金数量并确认其功能。新颖的自我引导注意力旨在优化表征的独特贡献,自动检测关键蛋白质基序以提供有意义的解释。从原始蛋白质序列中学习的表示经过严格筛选,以提高框架的运行速度,比DEEPre快50倍,同时需要小12.89倍的存储空间。大型语言模块被纳入,以学习数以亿计的蛋白质的物理特性,扩展整个网络的生物学知识。大量的实验表明,如果DEEPre优于所有当前的方法,在新数据集上实现超过14.22%的F1分数。此外,经过训练的ifDEEPre模型通过仅获取没有标记信息的原始序列来准确捕获多级蛋白质生物学模式并推断酶的进化趋势。同时,如果DEEPre预测不同酵母亚种之间的进化关系,这与地面事实高度一致。案例研究表明,如果DEEPre能够检测到关键的氨基酸基序,这对设计新型酶具有重要意义。运行ifDEEPre的Web服务器可在https://proj获得。CSE。中大。edu.hk/aihlab/ifdeepre/为公众提供便捷的服务。同时,ifDEEPre可在GitHub上免费获得,网址为https://github.com/ml4bio/ifDEEPre/。
    Accurate understanding of the biological functions of enzymes is vital for various tasks in both pathologies and industrial biotechnology. However, the existing methods are usually not fast enough and lack explanations on the prediction results, which severely limits their real-world applications. Following our previous work, DEEPre, we propose a new interpretable and fast version (ifDEEPre) by designing novel self-guided attention and incorporating biological knowledge learned via large protein language models to accurately predict the commission numbers of enzymes and confirm their functions. Novel self-guided attention is designed to optimize the unique contributions of representations, automatically detecting key protein motifs to provide meaningful interpretations. Representations learned from raw protein sequences are strictly screened to improve the running speed of the framework, 50 times faster than DEEPre while requiring 12.89 times smaller storage space. Large language modules are incorporated to learn physical properties from hundreds of millions of proteins, extending biological knowledge of the whole network. Extensive experiments indicate that ifDEEPre outperforms all the current methods, achieving more than 14.22% larger F1-score on the NEW dataset. Furthermore, the trained ifDEEPre models accurately capture multi-level protein biological patterns and infer evolutionary trends of enzymes by taking only raw sequences without label information. Meanwhile, ifDEEPre predicts the evolutionary relationships between different yeast sub-species, which are highly consistent with the ground truth. Case studies indicate that ifDEEPre can detect key amino acid motifs, which have important implications for designing novel enzymes. A web server running ifDEEPre is available at https://proj.cse.cuhk.edu.hk/aihlab/ifdeepre/ to provide convenient services to the public. Meanwhile, ifDEEPre is freely available on GitHub at https://github.com/ml4bio/ifDEEPre/.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    肠道微生物群中的细菌具有代谢多种人类药物的能力,食物,还有毒素,但是,由于当前实验方法的耗时性质,这些化学事件的负责酶在很大程度上仍未表征。过去已经尝试通过计算来预测哪些细菌种类和酶负责肠道环境中的化学转化。但由于最小的化学表示和序列相似性搜索方案,精度较低。这里,我们提出了一种计算机模拟方法,该方法采用化学和蛋白质相似性算法来识别MicrobioMe酶反应(SIMMER)。我们表明,SIMMER准确地预测了所查询反应的负责物种和酶,与以前的方法不同。我们通过预测已知在人类肠道中发生的88种药物转化的先前未表征的酶,证明了SIMMER在药物代谢背景下的用例。我们在外部数据集上验证了这些预测,并提供了SIMMER对甲氨蝶呤代谢预测的体外验证,抗关节炎药物.在证明了其实用性和准确性之后,我们使SIMMER可以作为命令行和网络工具,具有灵活的输入和输出选项,用于确定人体肠道内的化学转化。我们将SIMMER作为微生物组研究人员工具箱的计算补充,使他们能够在进行冗长的实验室实验之前做出明智的假设,这些实验需要表征可以改变人类摄入的化合物的新型细菌酶。
    Bacteria within the gut microbiota possess the ability to metabolize a wide array of human drugs, foods, and toxins, but the responsible enzymes for these chemical events remain largely uncharacterized due to the time-consuming nature of current experimental approaches. Attempts have been made in the past to computationally predict which bacterial species and enzymes are responsible for chemical transformations in the gut environment, but with low accuracy due to minimal chemical representation and sequence similarity search schemes. Here, we present an in silico approach that employs chemical and protein Similarity algorithms that Identify MicrobioMe Enzymatic Reactions (SIMMER). We show that SIMMER accurately predicts the responsible species and enzymes for a queried reaction, unlike previous methods. We demonstrate SIMMER use cases in the context of drug metabolism by predicting previously uncharacterized enzymes for 88 drug transformations known to occur in the human gut. We validate these predictions on external datasets and provide an in vitro validation of SIMMER\'s predictions for metabolism of methotrexate, an anti-arthritic drug. After demonstrating its utility and accuracy, we made SIMMER available as both a command-line and web tool, with flexible input and output options for determining chemical transformations within the human gut. We present SIMMER as a computational addition to the microbiome researcher\'s toolbox, enabling them to make informed hypotheses before embarking on the lengthy laboratory experiments required to characterize novel bacterial enzymes that can alter human ingested compounds.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    合成生物学和代谢工程依赖于计算搜索工具来预测工业重要化合物的新型生物合成途径。其中许多都来自于芳香氨基酸。途径搜索工具的涵盖反应和化合物的范围各不相同,以及排名和评估的指标。在这项工作中,我们提出了一种新的计算资源,称为ARBRE:芳香化合物RetroBiosynthesisRepository和Explorer。它由一个以芳香氨基酸生物合成为中心的综合生化反应网络和一个导航该网络的计算工具箱组成。ARBRE涵盖了超过33\'000个已知和390\'000个新反应,预测具有广义酶促反应规则和超过74\'000个化合物,其中生化数据库已知19\'000,PubChem仅已知55\'000。通过分配酶促反应,将以前仅属于PubChem数据库且以前无法整合到生化网络中的1,000多个分子纳入ARBRE反应网络。ARBRE可用于路径搜索,酶注释,途径排名,可视化,以及围绕已知生化途径和木质素降解产物的网络扩展,以预测有价值的化合物衍生物。符合开放科学的标准,我们已经在git(https://github.com/EPFL-LCSB/ARBRE)上向科学界免费提供了工具箱,我们在http://lcsb-databases上提供了网络版本。epfl.ch/arbre/。我们设想ARBRE将为社区提供新的计算资源和全面的搜索工具,以预测和排名工业上重要的芳族化合物的途径。
    Synthetic biology and metabolic engineering rely on computational search tools for predictions of novel biosynthetic pathways to industrially important compounds, many of which are derived from aromatic amino acids. Pathway search tools vary in their scope of covered reactions and compounds, as well as in metrics for ranking and evaluation. In this work, we present a new computational resource called ARBRE: Aromatic compounds RetroBiosynthesis Repository and Explorer. It consists of a comprehensive biochemical reaction network centered around aromatic amino acid biosynthesis and a computational toolbox for navigating this network. ARBRE encompasses over 33\'000 known and 390\'000 novel reactions predicted with generalized enzymatic reactions rules and over 74\'000 compounds, of which 19\'000 are known to biochemical databases and 55\'000 only to PubChem. Over 1\'000 molecules that were solely part of the PubChem database before and were previously impossible to integrate into a biochemical network are included into the ARBRE reaction network by assigning enzymatic reactions. ARBRE can be applied for pathway search, enzyme annotation, pathway ranking, visualization, and network expansion around known biochemical pathways and products of lignin degradation to predict valuable compound derivations. In line with the standards of open science, we have made the toolbox freely available to the scientific community on git (https://github.com/EPFL-LCSB/ARBRE) and we provide the web-version at http://lcsb-databases.epfl.ch/arbre/. We envision that ARBRE will provide the community with a new computational resource and comprehensive search tool to predict and rank pathways towards industrially important aromatic compounds.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    The ATLAS of Biochemistry is a repository of both known and novel predicted biochemical reactions between biological compounds listed in the Kyoto Encyclopedia of Genes and Genomes (KEGG). ATLAS was originally compiled based on KEGG 2015, though the number of KEGG reactions has increased by almost 20 percent since then. Here, we present an updated version of ATLAS created from KEGG 2018 using an increased set of generalized reaction rules. Furthermore, we improved the accuracy of the enzymes that are predicted for catalyzing novel reactions. ATLAS now contains ∼150 000 reactions, out of which 96% are novel. In this report, we present detailed statistics on the updated ATLAS and highlight the improvements with regard to the previous version. Most importantly, 107 reactions predicted in the original ATLAS are now known to KEGG, which validates the predictive power of our approach. The updated ATLAS is available at https://lcsb-databases.epfl.ch/atlas.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    BACKGROUND: Genome-scale metabolic modeling is a cornerstone of systems biology analysis of microbial organisms and communities, yet these genome-scale modeling efforts are invariably based on incomplete functional annotations. Annotated genomes typically contain 30-50% of genes without functional annotation, severely limiting our knowledge of the \"parts lists\" that the organisms have at their disposal. These incomplete annotations may be sufficient to derive a model of a core set of well-studied metabolic pathways that support growth in pure culture. However, pathways important for growth on unusual metabolites exchanged in complex microbial communities are often less understood, resulting in missing functional annotations in newly sequenced genomes.
    RESULTS: Here, we present results on a comprehensive reannotation of 27 bacterial reference genomes, focusing on enzymes with EC numbers annotated by KEGG, RAST, EFICAz, and the BRENDA enzyme database, and on membrane transport annotations by TransportDB, KEGG and RAST. Our analysis shows that annotation using multiple tools can result in a drastically larger metabolic network reconstruction, adding on average 40% more EC numbers, 3-8 times more substrate-specific transporters, and 37% more metabolic genes. These results are even more pronounced for bacterial species that are phylogenetically distant from well-studied model organisms such as E. coli.
    CONCLUSIONS: Metabolic annotations are often incomplete and inconsistent. Combining multiple functional annotation tools can greatly improve genome coverage and metabolic network size, especially for non-model organisms and non-core pathways.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

公众号