Homology search

  • 文章类型: Journal Article
    最近AlphaFold2在蛋白质结构预测中的成功很大程度上依赖于来自巨大的同源蛋白质序列的共同进化信息,蛋白质序列的综合数据库(大奇幻数据库)。相比之下,现有的核苷酸数据库没有合并以促进更广泛和更深入的同源性搜索.这里,我们通过整合来自RNAcentral的非编码RNA(ncRNA)序列建立了一个全面的数据库,来自宏基因组学RAST(MG-RAST)的转录组组装和宏基因组组装,基因组仓库(GWH)的基因组序列,和MGnify的基因组序列,除了核苷酸(nt)数据库及其子集在国家生物技术信息中心(NCBI)。所得的所有可能RNA序列的主数据库(MARS)比NCBI的nt数据库大20倍或比RNAcentral大60倍。与现有的最新技术相比,新的数据集以及新的拆分搜索策略可以大大改善同源性搜索。对于映射到Rfam的大多数结构化RNA,它也比来自Rfam的手动管理MSA产生更准确和更敏感的多序列比对(MSA)。结果表明,MARS与全自动同源性搜索工具RNAcmap相结合将有助于改善基于MSA的ncRNAs和RNA语言模型的结构和功能推断。MARS可以在https://ngdc访问。cncb.AC.cn/omix/release/OMIX003037和RNAcmap3可在http://zhouyq-lab访问。szbl.AC.cn/download/.
    Recent success of AlphaFold2 in protein structure prediction relied heavily on co-evolutionary information derived from homologous protein sequences found in the huge, integrated database of protein sequences (Big Fantastic Database). In contrast, the existing nucleotide databases were not consolidated to facilitate wider and deeper homology search. Here, we built a comprehensive database by incorporating the non-coding RNA (ncRNA) sequences from RNAcentral, the transcriptome assembly and metagenome assembly from metagenomics RAST (MG-RAST), the genomic sequences from Genome Warehouse (GWH), and the genomic sequences from MGnify, in addition to the nucleotide (nt) database and its subsets in National Center of Biotechnology Information (NCBI). The resulting Master database of All possible RNA sequences (MARS) is 20-fold larger than NCBI\'s nt database or 60-fold larger than RNAcentral. The new dataset along with a new split-search strategy allows a substantial improvement in homology search over existing state-of-the-art techniques. It also yields more accurate and more sensitive multiple sequence alignments (MSAs) than manually curated MSAs from Rfam for the majority of structured RNAs mapped to Rfam. The results indicate that MARS coupled with the fully automatic homology search tool RNAcmap will be useful for improved structural and functional inference of ncRNAs and RNA language models based on MSAs. MARS is accessible at https://ngdc.cncb.ac.cn/omix/release/OMIX003037, and RNAcmap3 is accessible at http://zhouyq-lab.szbl.ac.cn/download/.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    同源重组(HR)是一种基于模板的DNA双链断裂修复途径,其功能是维持基因组完整性。HR反应的重要组成部分是鉴定在修复期间使用的模板DNA。这通过称为同源性搜索的机制发生。同源性搜索在两个步骤中发生:碰撞步骤,其中两个DNA片段被迫碰撞,以及选择步骤,其导致匹配的DNA序列之间的同源配对。RecA/Rad51蛋白家族的重组酶与解旋酶的合作促进了同源模板的选择,转位酶,和拓扑异构酶,确定匹配的整体保真度。这种分子机器的动物园在同源性搜索过程中起到调节关键中间体的作用。这些中间体包括以稳定配对DNA的置换环(D-环)的形式探测短序列同源性和早期链侵入中间体的重组酶丝。这里,我们将讨论在HR反应过程中如何在分子水平上调节这些特定中间体的最新进展。我们还将讨论这些中间体的稳定性如何影响HR反应的最终结果。最后,我们将讨论最近开发的生理模型,以解释同源性搜索如何保护基因组。
    Homologous recombination (HR) is a template-based DNA double-strand break repair pathway that functions to maintain genomic integrity. A vital component of the HR reaction is the identification of template DNA to be used during repair. This occurs through a mechanism known as the homology search. The homology search occurs in two steps: a collision step in which two pieces of DNA are forced to collide and a selection step that results in homologous pairing between matching DNA sequences. Selection of a homologous template is facilitated by recombinases of the RecA/Rad51 family of proteins in cooperation with helicases, translocases, and topoisomerases that determine the overall fidelity of the match. This menagerie of molecular machines acts to regulate critical intermediates during the homology search. These intermediates include recombinase filaments that probe for short stretches of homology and early strand invasion intermediates in the form of displacement loops (D-loops) that stabilize paired DNA. Here, we will discuss recent advances in understanding how these specific intermediates are regulated on the molecular level during the HR reaction. We will also discuss how the stability of these intermediates influences the ultimate outcomes of the HR reaction. Finally, we will discuss recent physiological models developed to explain how the homology search protects the genome.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:编码专门功能的共定位基因集在微生物基因组中是常见的,并且也发生在较大的真核生物的基因组中。重要的例子包括生物合成基因簇(BGC),它产生具有药用,农业,和工业价值(如抗菌药物)。BGC的比较分析可以通过突出公共基因组中的分布和识别变体来帮助发现新的代谢物。不幸的是,基因簇水平的同源性检测仍然无法进行,耗时且难以解释。
    结果:比较基因簇分析工具箱(CAGECAT)是一个快速且用户友好的平台,可以减轻对整个基因簇进行比较分析的困难。该软件提供同源性搜索和下游分析,而无需命令行或编程专业知识。通过利用远程BLAST数据库,总是提供最新的结果,CAGECAT可以产生有助于比较的相关匹配,分类学分布,或未知查询的演变。该服务是可扩展和可互操作的,并实现了blaster和熟料管道以执行同源性搜索,过滤,基因邻域估计,和所得变体BGC的动态可视化。使用可视化模块,出版物质量的数字可以直接从网络浏览器定制,这极大地加速了他们的解释,通过信息覆盖来识别BGC查询中的保守基因。
    结论:总体而言,CAGECAT是一种可扩展的软件,可以通过标准的网络浏览器进行接口,用于整个区域同源性搜索和比较来自NCBI的不断更新的基因组。公共Web服务器和可安装的docker映像是开源的,无需注册即可免费获得:https://cagecat。生物信息学。nl.
    BACKGROUND: Co-localized sets of genes that encode specialized functions are common across microbial genomes and occur in genomes of larger eukaryotes as well. Important examples include Biosynthetic Gene Clusters (BGCs) that produce specialized metabolites with medicinal, agricultural, and industrial value (e.g. antimicrobials). Comparative analysis of BGCs can aid in the discovery of novel metabolites by highlighting distribution and identifying variants in public genomes. Unfortunately, gene-cluster-level homology detection remains inaccessible, time-consuming and difficult to interpret.
    RESULTS: The comparative gene cluster analysis toolbox (CAGECAT) is a rapid and user-friendly platform to mitigate difficulties in comparative analysis of whole gene clusters. The software provides homology searches and downstream analyses without the need for command-line or programming expertise. By leveraging remote BLAST databases, which always provide up-to-date results, CAGECAT can yield relevant matches that aid in the comparison, taxonomic distribution, or evolution of an unknown query. The service is extensible and interoperable and implements the cblaster and clinker pipelines to perform homology search, filtering, gene neighbourhood estimation, and dynamic visualisation of resulting variant BGCs. With the visualisation module, publication-quality figures can be customized directly from a web-browser, which greatly accelerates their interpretation via informative overlays to identify conserved genes in a BGC query.
    CONCLUSIONS: Overall, CAGECAT is an extensible software that can be interfaced via a standard web-browser for whole region homology searches and comparison on continually updated genomes from NCBI. The public web server and installable docker image are open source and freely available without registration at: https://cagecat.bioinformatics.nl .
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Meiosis is a highly conserved specialised cell division in sexual life cycles of eukaryotes, forming the base of gene reshuffling, biological diversity and evolution. Understanding meiotic machinery across different plant lineages is inevitable to understand the lineage-specific evolution of meiosis. Functional and cytogenetic studies of meiotic proteins from all plant lineage representatives are nearly impossible. So, we took advantage of the genomics revolution to search for core meiotic proteins in accumulating plant genomes by the highly sensitive homology search approaches, PSI-BLAST, HMMER and CLANS. We could find that most of the meiotic proteins are conserved in most of the lineages. Exceptionally, Arabidopsis thaliana ASY4, PHS1, PRD2, PRD3 orthologs were mostly not detected in some distant algal lineages suggesting their minimal conservation. Remarkably, an ancestral duplication of SPO11 to all eukaryotes could be confirmed. Loss of SPO11-1 in Chlorophyta and Charophyta is likely to have occurred, suggesting that SPO11-1 and SPO11-2 heterodimerisation may be a unique feature in land plants of Viridiplantae. The possible origin of the meiotic proteins described only in plants till now, DFO and HEIP1, could be traced and seems to occur in the ancestor of vascular plants and Streptophyta, respectively. Our comprehensive approach is an attempt to provide insights about meiotic core proteins and thus the conservation of meiotic pathways across plant kingdom. We hope that this will serve the meiotic community a basis for further characterisation of interesting candidates in future.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    自1992年以来,所有用于快速和灵敏地鉴定进化的最先进方法,结构,和蛋白质之间的功能关系(也称为“同源性检测”)使用序列和序列谱(PSSM)。蛋白质语言模型(pLM)概括序列,可能捕获与PSSM相同的约束,例如,通过嵌入。这里,我们探索了如何使用这种嵌入进行最近邻搜索,以识别具有不同序列的蛋白质对之间的关系(远程同源性检测<20%成对序列同一性的水平,PIDE).虽然这种方法适用于具有单个结构域的蛋白质,我们展示了将其应用于多结构域蛋白质的当前挑战,并提出了一些如何克服现有局限性的想法,原则上。我们观察到,当应用于蛋白质嵌入空间时,足够具有挑战性的数据集分离对于提供对最近邻搜索行为的深入相关见解至关重要。让我们所有的方法都可以随时为他人所用。
    Since 1992, all state-of-the-art methods for fast and sensitive identification of evolutionary, structural, and functional relations between proteins (also referred to as \"homology detection\") use sequences and sequence-profiles (PSSMs). Protein Language Models (pLMs) generalize sequences, possibly capturing the same constraints as PSSMs, e.g., through embeddings. Here, we explored how to use such embeddings for nearest neighbor searches to identify relations between protein pairs with diverged sequences (remote homology detection for levels of <20% pairwise sequence identity, PIDE). While this approach excelled for proteins with single domains, we demonstrated the current challenges applying this to multi-domain proteins and presented some ideas how to overcome existing limitations, in principle. We observed that sufficiently challenging data set separations were crucial to provide deeply relevant insights into the behavior of nearest neighbor search when applied to the protein embedding space, and made all our methods readily available for others.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    虽然同源重组途径的分子库得到了很好的研究,能够在遥远的同源区域之间进行重组的搜索机制知之甚少。早期的工作表明重组酶RecA,同源性搜索的重要组成部分,形成细长的细丝,在断裂部位成核。这种RecA结构如何进行长距离搜索尚不清楚。这里,我们跟踪RecA在Caulobacter染色体上诱导单个双链断裂后的动力学。我们发现RecA-核蛋白丝,一旦形成,在细胞中以定向的方式快速易位,经历了几次极点到极点的穿越,直到同源性搜索完成。伴随着易位,我们观察到灯丝长度的动态变化。重要的是在体内,单独的RecA细丝不能进行这种长距离移动;易位和相关的长度变化都取决于染色体(SMC)样蛋白RecN的结构维持作用,通过它的ATP酶循环。总之,我们已经发现了RecN驱动的同源性搜索的三个关键要素:RecA有限段的移动性,灯丝长度的变化,以及进行多次极点到极点遍历的能力,它们共同指向最优搜索策略。
    While the molecular repertoire of the homologous recombination pathways is well studied, the search mechanism that enables recombination between distant homologous regions is poorly understood. Earlier work suggests that the recombinase RecA, an essential component for homology search, forms an elongated filament, nucleating at the break site. How this RecA structure carries out long-distance search remains unclear. Here, we follow the dynamics of RecA after induction of a single double-strand break on the Caulobacter chromosome. We find that the RecA-nucleoprotein filament, once formed, rapidly translocates in a directional manner in the cell, undergoing several pole-to-pole traversals, until homology search is complete. Concomitant with translocation, we observe dynamic variation in the length of the filament. Importantly in vivo, the RecA filament alone is incapable of such long-distance movement; both translocation and associated length variations are contingent on action of structural maintenance of chromosome (SMC)-like protein RecN, via its ATPase cycle. In summary, we have uncovered the three key elements of homology search driven by RecN: mobility of a finite segment of RecA, changes in filament length, and ability to conduct multiple pole-to-pole traversals, which together point to an optimal search strategy.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    锥虫属于一个非凡的单细胞群体,类寄生虫,真核生物系统发育树的早期分支,表现出影响基因表达的有趣的生物学特征(无内含子多顺反子转录,转接,和RNA编辑),新陈代谢,表面分子,和细胞器(糖酵解的分隔,表面分子的变化,和独特的线粒体DNA),细胞生物学和生命周期(吞噬液泡逃避和细胞形态发生的复杂模式)。自2005年以来,几种锥虫的大量基因组规模数据变得可用(基因组,转录组,和蛋白质组),科学界可以进一步研究这些不寻常特征的潜在机制,并解决可能揭示真核生物早期进化生物学方面的其他未探索现象。一个基本方面包括在这些原始微生物的整个进化史中涉及基因的获取和丢失的过程和机制。这里,我们对该组的三个主要代表的假基因进行了全面的计算机分析:利什曼原虫,布鲁氏锥虫,和克氏锥虫.假基因,DNA片段源自改变的基因,这些基因失去了原有的功能,是基因组遗物,可以提供功能基因进化史的基本记录,以及有关宿主基因组的动态和进化的线索。用功能蛋白作为代理扫描这些基因组,以揭示具有蛋白质编码特征的基因间区域,依靠定制的阈值来区分统计和生物学上显著的序列相似性,从碎片中重新组装剩余序列,我们发现了成千上万的伪基因和数百个开放阅读框架,每个锥虫都有特殊的特征:突变谱,number,内容,密度,密码子偏倚,平均大小,单拷贝或多拷贝基因起源,突变的数量和类型,假定的原始函数,和转录活性。这些特征表明假基因形成的共同过程,假基因进化的不同模式和现存的生物学功能,和/或这些寄生虫在进化过程中进行的独特基因组组织,以及作用于不同谱系的不同进化和/或选择压力。
    Trypanosomatids belong to a remarkable group of unicellular, parasitic organisms of the order Kinetoplastida, an early diverging branch of the phylogenetic tree of eukaryotes, exhibiting intriguing biological characteristics affecting gene expression (intronless polycistronic transcription, trans-splicing, and RNA editing), metabolism, surface molecules, and organelles (compartmentalization of glycolysis, variation of the surface molecules, and unique mitochondrial DNA), cell biology and life cycle (phagocytic vacuoles evasion and intricate patterns of cell morphogenesis). With numerous genomic-scale data of several trypanosomatids becoming available since 2005 (genomes, transcriptomes, and proteomes), the scientific community can further investigate the mechanisms underlying these unusual features and address other unexplored phenomena possibly revealing biological aspects of the early evolution of eukaryotes. One fundamental aspect comprises the processes and mechanisms involved in the acquisition and loss of genes throughout the evolutionary history of these primitive microorganisms. Here, we present a comprehensive in silico analysis of pseudogenes in three major representatives of this group: Leishmania major, Trypanosoma brucei, and Trypanosoma cruzi. Pseudogenes, DNA segments originating from altered genes that lost their original function, are genomic relics that can offer an essential record of the evolutionary history of functional genes, as well as clues about the dynamics and evolution of hosting genomes. Scanning these genomes with functional proteins as proxies to reveal intergenic regions with protein-coding features, relying on a customized threshold to distinguish statistically and biologically significant sequence similarities, and reassembling remnant sequences from their debris, we found thousands of pseudogenes and hundreds of open reading frames, with particular characteristics in each trypanosomatid: mutation profile, number, content, density, codon bias, average size, single- or multi-copy gene origin, number and type of mutations, putative primitive function, and transcriptional activity. These features suggest a common process of pseudogene formation, different patterns of pseudogene evolution and extant biological functions, and/or distinct genome organization undertaken by those parasites during evolution, as well as different evolutionary and/or selective pressures acting on distinct lineages.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    在这一章中,我们概述了植物直系同源预测和系统发育分析的管道。该计算管道使用来自不同软件的算法,使生物信息学初学者生物学家能够预测可以与许多不同的植物非模型和模型物种共享的直系同源物,并识别基因丢失事件。直系同源物的预测允许(1)研究植物基因组的进化关系,(2)发现它们的起源,函数,(3)对环境适应性的影响。我们开发了一条管道来适应,不仅是真核生物,还有原核生物,具有小的或大的基因组。从直系同源物预测获得的所有结果将使系统发育树构建成为可能,使用基因和物种(系统发育)系统发育方法。
    In this chapter, we outline a pipeline for ortholog prediction and phylogenetic analysis in plants. This computational pipeline uses algorithms from different software to enable bioinformatic-beginner biologists to predict orthologs that can be shared with many distinct plant nonmodel and model species and identify gene loss events. Prediction of orthologs allows (1) investigation of the evolutionary relationships of plant genomes, (2) discovery of their origin, function, and (3) the impact of their adaptability to the environment.We developed a pipeline to fit, not only eukaryote but also prokaryote organisms, with small or large genomes. All results acquired from the orthologs predication will enable phylogenetic tree construction, using gene and species (phylogenomic) phylogeny approaches.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    酶在细胞环境中拥挤的条件下以精确的精确度催化各种各样的反应。当遇到附近选择的小分子时,尽管大多数酶仍然对它们选择的底物具有特异性,其他一些能够接受一系列基材,随后生产各种产品。维生素B12是人类必需的必需营养素,其生物合成涉及多底物α-磷酸核糖基转移酶CobT,可激活B12的下部配体。维生素B12是辅因子cobamide家族的成员,它们与中央协调的钴离子共享一个共同的四吡咯corrin支架,以及上部和下部配体。B12和其他cobamides之间的结构差异主要来自低级配体的变化,它通过CobT和其他下游酶连接到活化的corrin环上。在这一章中,我们通过从先前表征的内容中汲取教训,描述了鉴定和重组新CobT同源物的活性所涉及的步骤。然后,我们强调生化技术来研究这些同源物的独特性质。最后,我们描述了一种成对底物竞争测定法来对CobT底物偏好进行排序,一种可应用于其他多底物酶研究的通用方法。总的来说,CobT的分析提供了可以由生物体或社区合成的cobamides的范围的见解,补充从复杂的宏基因组数据中预测cobamide多样性的努力。
    Enzymes catalyze a wide variety of reactions with exquisite precision under crowded conditions within cellular environments. When encountered with a choice of small molecules in their vicinity, even though most enzymes continue to be specific about the substrate they pick, some others are able to accept a range of substrates and subsequently produce a variety of products. The biosynthesis of Vitamin B12, an essential nutrient required by humans involves a multi-substrate α-phosphoribosyltransferase enzyme CobT that activates the lower ligand of B12. Vitamin B12 is a member of the cobamide family of cofactors which share a common tetrapyrrolic corrin scaffold with a centrally coordinated cobalt ion, and an upper and a lower ligand. The structural difference between B12 and other cobamides mainly arises from variations in the lower ligand, which is attached to the activated corrin ring by CobT and other downstream enzymes. In this chapter, we describe the steps involved in identifying and reconstituting the activity of new CobT homologs by deriving lessons from those previously characterized. We then highlight biochemical techniques to study the unique properties of these homologs. Finally, we describe a pairwise substrate competition assay to rank CobT substrate preference, a general method that can be applied for the study of other multi-substrate enzymes. Overall, the analysis with CobT provides insights into the range of cobamides that can be synthesized by an organism or a community, complementing efforts to predict cobamide diversity from complex metagenomic data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    Metagenomic analysis, a technique used to comprehensively analyze microorganisms present in the environment, requires performing high-precision homology searches on large amounts of sequencing data, the size of which has increased dramatically with the development of next-generation sequencing. NCBI BLAST is the most widely used software for performing homology searches, but its speed is insufficient for the throughput of current DNA sequencers. In this paper, we propose a new, high-performance homology search algorithm that employs a two-step seed search strategy using multiple reduced amino acid alphabets to identify highly similar subsequences. Additionally, we evaluated the validity of the proposed method against several existing tools. Our method was faster than any other existing program for ≤120,000 queries, while DIAMOND, an existing tool, was the fastest method for >120,000 queries.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号