Nucleic Acid Conformation

核酸构象
  • 文章类型: Journal Article
    丙型肝炎病毒(HCV)是一种正链RNA病毒,通常会慢性感染肝肝细胞并导致肝硬化和癌症。这些病毒使用易错复制酶复制它们的基因组。因此,他们通常会产生大量的RNA基因组(准种),通过反复试验,全面探索可用于功能性RNA基因组的序列空间,从而保持有效复制和免疫逃逸的能力。在这种情况下,确定HCV基因组序列空间中哪些RNA二级结构是保守的,可能是由于功能要求。这里,我们提供了第一个全基因组多序列比对(MSA),并预测了所有代表性全长HCV基因组中的RNA二级结构。我们通过基于k-mer分布和降维并添加RefSeq序列对来自BV-BRC数据库的所有完整HCV基因组进行聚类来选择57个代表性基因组。我们包括以前公认的特征的注释,以便与其他研究进行比较。我们的结果表明,主要是核心编码区,C端NS5A区域,并且NS5B区域包含超出编码序列要求而保守的二级结构元件,在RNA水平上显示功能。相比之下,之间的基因组区域包含不太高度保守的结构。结果提供了所有保守的RNA二级结构的完整描述,并且清楚地表明功能上重要的RNA二级结构存在于某些HCV基因组区域中,但在其他区域中大部分不存在。补充中提供了肝病毒C的所有分支的全基因组比对。
    Hepatitis C virus (HCV) is a plus-stranded RNA virus that often chronically infects liver hepatocytes and causes liver cirrhosis and cancer. These viruses replicate their genomes employing error-prone replicases. Thereby, they routinely generate a large \'cloud\' of RNA genomes (quasispecies) which-by trial and error-comprehensively explore the sequence space available for functional RNA genomes that maintain the ability for efficient replication and immune escape. In this context, it is important to identify which RNA secondary structures in the sequence space of the HCV genome are conserved, likely due to functional requirements. Here, we provide the first genome-wide multiple sequence alignment (MSA) with the prediction of RNA secondary structures throughout all representative full-length HCV genomes. We selected 57 representative genomes by clustering all complete HCV genomes from the BV-BRC database based on k-mer distributions and dimension reduction and adding RefSeq sequences. We include annotations of previously recognized features for easy comparison to other studies. Our results indicate that mainly the core coding region, the C-terminal NS5A region, and the NS5B region contain secondary structure elements that are conserved beyond coding sequence requirements, indicating functionality on the RNA level. In contrast, the genome regions in between contain less highly conserved structures. The results provide a complete description of all conserved RNA secondary structures and make clear that functionally important RNA secondary structures are present in certain HCV genome regions but are largely absent from other regions. Full-genome alignments of all branches of Hepacivirus C are provided in the supplement.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:RNA设计在合成生物学和治疗学中的应用越来越多,由RNA在各种生物过程中的关键作用驱动。一个基本的挑战是找到满足给定结构约束的功能性RNA序列,称为逆折叠问题。已经出现了基于二级结构的计算方法来解决这个问题。然而,直接从3D结构设计RNA序列仍然具有挑战性,由于数据的稀缺性,非唯一的结构-序列映射,和RNA构象的灵活性。
    结果:在这项研究中,我们提出了核扩散,用于RNA反向折叠的生成扩散模型,可以学习给定3D主链结构的RNA序列的条件分布。我们的模型由基于图神经网络的结构模块和基于Transformer的序列模块组成,迭代地将随机序列转换为期望的序列。通过调整采样重量,我们的模型允许在序列恢复和多样性之间进行权衡,以探索更多的候选.我们基于RNA聚类对测试集进行拆分,对序列或结构相似性具有不同的截止值。我们的模型在序列恢复方面优于基线,序列相似性分裂平均相对提高11%,结构相似性分裂平均提高16%。此外,核扩散在各种RNA长度类别和RNA类型中表现一致。我们还应用计算机折叠来验证生成的序列是否可以折叠到给定的3DRNA主链中。我们的方法可能是RNA设计的强大工具,可以探索广阔的序列空间并找到3D结构约束的新颖解决方案。
    方法:源代码可在https://github.com/ml4bio/RiboDiffusion获得。
    BACKGROUND: RNA design shows growing applications in synthetic biology and therapeutics, driven by the crucial role of RNA in various biological processes. A fundamental challenge is to find functional RNA sequences that satisfy given structural constraints, known as the inverse folding problem. Computational approaches have emerged to address this problem based on secondary structures. However, designing RNA sequences directly from 3D structures is still challenging, due to the scarcity of data, the nonunique structure-sequence mapping, and the flexibility of RNA conformation.
    RESULTS: In this study, we propose RiboDiffusion, a generative diffusion model for RNA inverse folding that can learn the conditional distribution of RNA sequences given 3D backbone structures. Our model consists of a graph neural network-based structure module and a Transformer-based sequence module, which iteratively transforms random sequences into desired sequences. By tuning the sampling weight, our model allows for a trade-off between sequence recovery and diversity to explore more candidates. We split test sets based on RNA clustering with different cut-offs for sequence or structure similarity. Our model outperforms baselines in sequence recovery, with an average relative improvement of 11% for sequence similarity splits and 16% for structure similarity splits. Moreover, RiboDiffusion performs consistently well across various RNA length categories and RNA types. We also apply in silico folding to validate whether the generated sequences can fold into the given 3D RNA backbones. Our method could be a powerful tool for RNA design that explores the vast sequence space and finds novel solutions to 3D structural constraints.
    METHODS: The source code is available at https://github.com/ml4bio/RiboDiffusion.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:RNA设计是在合成生物学或生物技术等领域实现新功能的关键技术。计算工具可以帮助找到这样的RNA序列,但它们通常在搜索空间的表述中受到限制。
    结果:在这项工作中,我们提出了部分RNA设计,一种新的RNA设计范式,解决了当前RNA设计配方的局限性。部分RNA设计描述了从具有多个设计目标的任意RNA序列和结构基序设计RNA的问题。通过将设计空间与目标分开,我们的配方能够设计具有可变长度和所需特性的RNA,同时仍然允许对单个位置的序列和结构约束进行精确控制。根据这个公式,我们引入了一种新的算法,liblearna,能够有效解决不同的约束RNA设计任务。综合分析各种问题,包括一个现实的核糖开关设计任务,揭示了libLEARNA的出色性能及其鲁棒性。
    方法:libLEARNA是开源的,可在以下网站公开获得:https://github.com/automl/learna_tools。
    BACKGROUND: RNA design is a key technique to achieve new functionality in fields like synthetic biology or biotechnology. Computational tools could help to find such RNA sequences but they are often limited in their formulation of the search space.
    RESULTS: In this work, we propose partial RNA design, a novel RNA design paradigm that addresses the limitations of current RNA design formulations. Partial RNA design describes the problem of designing RNAs from arbitrary RNA sequences and structure motifs with multiple design goals. By separating the design space from the objectives, our formulation enables the design of RNAs with variable lengths and desired properties, while still allowing precise control over sequence and structure constraints at individual positions. Based on this formulation, we introduce a new algorithm, libLEARNA, capable of efficiently solving different constraint RNA design tasks. A comprehensive analysis of various problems, including a realistic riboswitch design task, reveals the outstanding performance of libLEARNA and its robustness.
    METHODS: libLEARNA is open-source and publicly available at: https://github.com/automl/learna_tools.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:非编码RNA(ncRNA)通过采用分子结构表达其功能。具体来说,RNA二级结构在三级结构之前是一个相对稳定的中间步骤,提供分子功能的可靠签名。因此,在RNA功能家族中,二级结构通常比序列在进化上更保守。相反,RNA家族中的同源RNA家族共享祖先,但通常表现出结构差异。推断RNA家族和氏族中RNA结构的进化对于随着时间的推移了解功能适应并提供有关古代RNA世界假说的线索至关重要。
    结果:我们介绍了ncRNA家族的中位数问题和小简约问题,其中二级结构表示为叶子标记的树。我们利用罗宾逊-福尔德(RF)树的距离,对应于RNA树之间的特定编辑距离,和一个新的度量称为内部叶集(IL)距离。虽然RF树距离比较了从两个RNA树的内部节点下降的叶子集,IL距离比较内部节点的叶子集合。后者比RF距离更好地捕获RNA结构元素的差异,更侧重于碱基对。我们还考虑了更一般的树编辑距离,该距离允许映射不完全对齐的碱基对。我们研究了在三个距离度量和各种生物学相关约束下,中位数问题和小简约问题的理论复杂性,我们提出了多项式时间最大简约算法来解决某些版本的问题。我们的算法应用于RFAM数据库中的ncRNA家族,说明其实际效用。
    方法:https://github.com/bmarchand/rna\\_small\\_parsimony。
    BACKGROUND: Noncoding RNAs (ncRNAs) express their functions by adopting molecular structures. Specifically, RNA secondary structures serve as a relatively stable intermediate step before tertiary structures, offering a reliable signature of molecular function. Consequently, within an RNA functional family, secondary structures are generally more evolutionarily conserved than sequences. Conversely, homologous RNA families grouped within an RNA clan share ancestors but typically exhibit structural differences. Inferring the evolution of RNA structures within RNA families and clans is crucial for gaining insights into functional adaptations over time and providing clues about the Ancient RNA World Hypothesis.
    RESULTS: We introduce the median problem and the small parsimony problem for ncRNA families, where secondary structures are represented as leaf-labeled trees. We utilize the Robinson-Foulds (RF) tree distance, which corresponds to a specific edit distance between RNA trees, and a new metric called the Internal-Leafset (IL) distance. While the RF tree distance compares sets of leaves descending from internal nodes of two RNA trees, the IL distance compares the collection of leaf-children of internal nodes. The latter is better at capturing differences in structural elements of RNAs than the RF distance, which is more focused on base pairs. We also consider a more general tree edit distance that allows the mapping of base pairs that are not perfectly aligned. We study the theoretical complexity of the median problem and the small parsimony problem under the three distance metrics and various biologically relevant constraints, and we present polynomial-time maximum parsimony algorithms for solving some versions of the problems. Our algorithms are applied to ncRNA families from the RFAM database, illustrating their practical utility.
    METHODS: https://github.com/bmarchand/rna\\_small\\_parsimony.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    线粒体转录因子A(TFAM)利用DNA弯曲将线粒体DNA(mtDNA)包装成类核苷酸,并在特定的启动子位点招募线粒体RNA聚合酶(POLRMT),轻链启动子(LSP)和重链启动子(HSP)。在这里,我们使用单分子荧光共振能量转移(smFRET)和单分子蛋白质诱导的荧光增强(smPIFE)方法表征了TFAM在启动子和非启动子序列上的构象动力学。DNA-TFAM复合物在部分和完全弯曲的DNA构象状态之间动态地转变。弯曲/不弯曲过渡速率和弯曲稳定性是DNA序列依赖性的-LSP形成最稳定的完全弯曲复合物,而非特异性序列最少,这与TFAM与这些DNA序列的寿命和亲和力相关。通过定量DNA-TFAM复合物的动态性质,我们的研究提供了有关TFAM如何通过DNA弯曲状态充当多功能蛋白质的见解,以在线粒体转录中实现序列特异性和保真度,同时进行mtDNA包装。
    Mitochondrial transcription factor A (TFAM) employs DNA bending to package mitochondrial DNA (mtDNA) into nucleoids and recruit mitochondrial RNA polymerase (POLRMT) at specific promoter sites, light strand promoter (LSP) and heavy strand promoter (HSP). Herein, we characterize the conformational dynamics of TFAM on promoter and non-promoter sequences using single-molecule fluorescence resonance energy transfer (smFRET) and single-molecule protein-induced fluorescence enhancement (smPIFE) methods. The DNA-TFAM complexes dynamically transition between partially and fully bent DNA conformational states. The bending/unbending transition rates and bending stability are DNA sequence-dependent-LSP forms the most stable fully bent complex and the non-specific sequence the least, which correlates with the lifetimes and affinities of TFAM with these DNA sequences. By quantifying the dynamic nature of the DNA-TFAM complexes, our study provides insights into how TFAM acts as a multifunctional protein through the DNA bending states to achieve sequence specificity and fidelity in mitochondrial transcription while performing mtDNA packaging.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    I型毒素-抗毒素系统(T1TA)是编码生长抑制毒素和抗毒素小RNA(sRNA)的双向细菌基因座。在许多这样的系统中,转录的毒素mRNA在翻译上是无活性的,但在核糖核酸分解加工后变得有翻译能力。抗毒素sRNA靶向加工的mRNA以抑制其翻译。这种两级控制机制可防止毒素的共转录翻译,并且仅在不存在抗毒素时才允许其合成。与此相反,我们发现timPRT1TA基因座的timPmRNA不经过酶促处理。相反,全长的timP转录物既具有翻译活性,又可以被抗毒素TimR靶向。因此,这个系统中的严格控制依赖于一种非规范机制。根据体外结合测定的结果,RNA结构探测,和无细胞翻译实验,我们建议timPmRNA采用互斥的结构构象。活性形式独特地具有RNA假结结构,其对于翻译起始是必需的。TimR优先结合活性构象,导致假结不稳定并抑制翻译。基于此,我们提出了一个模型,其中timPmRNA的“结构加工”能够在非允许条件下通过timR进行严格抑制,和TimP合成仅在TimR耗尽时。
    Type I toxin-antitoxin systems (T1TAs) are bipartite bacterial loci encoding a growth-inhibitory toxin and an antitoxin small RNA (sRNA). In many of these systems, the transcribed toxin mRNA is translationally inactive, but becomes translation-competent upon ribonucleolytic processing. The antitoxin sRNA targets the processed mRNA to inhibit its translation. This two-level control mechanism prevents cotranscriptional translation of the toxin and allows its synthesis only when the antitoxin is absent. Contrary to this, we found that the timP mRNA of the timPR T1TA locus does not undergo enzymatic processing. Instead, the full-length timP transcript is both translationally active and can be targeted by the antitoxin TimR. Thus, tight control in this system relies on a noncanonical mechanism. Based on the results from in vitro binding assays, RNA structure probing, and cell-free translation experiments, we suggest that timP mRNA adopts mutually exclusive structural conformations. The active form uniquely possesses an RNA pseudoknot structure which is essential for translation initiation. TimR preferentially binds to the active conformation, which leads to pseudoknot destabilization and inhibited translation. Based on this, we propose a model in which \"structural processing\" of timP mRNA enables tight inhibition by TimR in nonpermissive conditions, and TimP synthesis only upon TimR depletion.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    Extrinsic,实验信息可以以伪能量的形式整合到基于热力学的RNA折叠算法中。在系统发育相关序列的比对中可检测到RNA二级结构元件的进化保守性,并提供了某些碱基对存在的证据,这些碱基对也可以转化为假能量贡献。我们表明,从一致折叠模型(如RNAalifold)计算的质心碱基对可显著提高单序列的预测精度。事实证明,特定碱基对的证据比保存配对状态的位置特征更有用。与化学探测数据的比较,此外,有力地表明,系统发育碱基配对数据比从化学探测实验中获得的(非)配对性的位置特异性数据更有用。在这种情况下,我们证明,此外,使用热力学结构预测作为参考而不是已知的RNA结构,可以将信号从探测数据转换为伪能量。
    Extrinsic, experimental information can be incorporated into thermodynamics-based RNA folding algorithms in the form of pseudo-energies. Evolutionary conservation of RNA secondary structure elements is detectable in alignments of phylogenetically related sequences and provides evidence for the presence of certain base pairs that can also be converted into pseudo-energy contributions. We show that the centroid base pairs computed from a consensus folding model such as RNAalifold result in a substantial improvement of the prediction accuracy for single sequences. Evidence for specific base pairs turns out to be more informative than a position-wise profile for the conservation of the pairing status. A comparison with chemical probing data, furthermore, strongly suggests that phylogenetic base pairing data are more informative than position-specific data on (un)pairedness as obtained from chemical probing experiments. In this context we demonstrate, in addition, that the conversion of signal from probing data into pseudo-energies is possible using thermodynamic structure predictions as a reference instead of known RNA structures.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    我们先前报道,在基孔肯雅病毒(CHIKV)基因组的3'非翻译区(UTR)中缺失44个核苷酸的元件会增强小鼠CHIKV感染的毒力。这里,我们发现,虽然这种44个核苷酸的缺失以独立于I型干扰素反应的方式增强了小鼠胚胎成纤维细胞中的CHIKV适应性,相同的突变降低了C6/36蚊子细胞中的病毒适应性。Further,在CHIKV播散的小鼠模型中,体内维持了哺乳动物细胞中UTR缺失所赋予的适应性优势。最后,CHIKV3'UTR的SHAPE-MaP分析显示,该44个核苷酸的元件形成了独特的两茎环结构,该结构在突变体3'UTR中被消融,而不会改变其他3'UTRRNA二级结构。
    We previously reported that deletion of a 44-nucleotide element in the 3\' untranslated region (UTR) of the Chikungunya virus (CHIKV) genome enhances the virulence of CHIKV infection in mice. Here, we find that while this 44-nucleotide deletion enhances CHIKV fitness in murine embryonic fibroblasts in a manner independent of the type I interferon response, the same mutation decreases viral fitness in C6/36 mosquito cells. Further, the fitness advantage conferred by the UTR deletion in mammalian cells is maintained in vivo in a mouse model of CHIKV dissemination. Finally, SHAPE-MaP analysis of the CHIKV 3\' UTR revealed this 44-nucleotide element forms a distinctive two-stem-loop structure that is ablated in the mutant 3\' UTR without altering additional 3\' UTR RNA secondary structures.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    准确预测RNA分子中碱基的配对顺序对于预测RNA二级结构至关重要。因此,这项任务对于揭示以前未知的生物过程具有重要意义。广泛的COVID-19大流行的前所未有的影响加剧了理解RNA结构的迫切需要。本文提出了一个框架,Knoto_V2.0,它利用句法模式识别技术来预测RNA结构,特别强调解决预测包含凸起和发夹的H型假结的艰巨任务。通过利用无上下文语法(CFG)的表达能力,建议的框架整合了CFG的固有益处,并利用了最小自由能和最大碱基配对标准.这种集成使得能够有效地管理这种固有的模糊任务。与早期版本相比,Know_V2.0的主要贡献在于它能够识别伪结内部环中的其他图案,例如凸起和发夹。值得注意的是,拟议的方法,与最先进的框架相比,Know_V2.0在预测核心茎方面表现出更高的准确性。Know_V2.0通过准确识别在70%的检查序列中形成地面真伪结的两个核心碱基配对,表现出卓越的性能。此外,Knotty缩小了性能差距Knotty_V2.0,它表现出比Know更好的性能,甚至在Recall和F1得分指标上超过了它。与Know相比,Know_V2.0实现了更高的真阳性(tp)计数和显着更低的假阴性(fn)计数,突出显示预测和召回指标的改进,分别。因此,Know_V2.0获得了比任何其他平台更高的F1分数。Knotify_V2.0的源代码和全面的实现细节在GitHub上公开。
    Accurately predicting the pairing order of bases in RNA molecules is essential for anticipating RNA secondary structures. Consequently, this task holds significant importance in unveiling previously unknown biological processes. The urgent need to comprehend RNA structures has been accentuated by the unprecedented impact of the widespread COVID-19 pandemic. This paper presents a framework, Knotify_V2.0, which makes use of syntactic pattern recognition techniques in order to predict RNA structures, with a specific emphasis on tackling the demanding task of predicting H-type pseudoknots that encompass bulges and hairpins. By leveraging the expressive capabilities of a Context-Free Grammar (CFG), the suggested framework integrates the inherent benefits of CFG and makes use of minimum free energy and maximum base pairing criteria. This integration enables the effective management of this inherently ambiguous task. The main contribution of Knotify_V2.0 compared to earlier versions lies in its capacity to identify additional motifs like bulges and hairpins within the internal loops of the pseudoknot. Notably, the proposed methodology, Knotify_V2.0, demonstrates superior accuracy in predicting core stems compared to state-of-the-art frameworks. Knotify_V2.0 exhibited exceptional performance by accurately identifying both core base pairing that form the ground truth pseudoknot in 70% of the examined sequences. Furthermore, Knotify_V2.0 narrowed the performance gap with Knotty, which had demonstrated better performance than Knotify and even surpassed it in Recall and F1-score metrics. Knotify_V2.0 achieved a higher count of true positives (tp) and a significantly lower count of false negatives (fn) compared to Knotify, highlighting improvements in Prediction and Recall metrics, respectively. Consequently, Knotify_V2.0 achieved a higher F1-score than any other platform. The source code and comprehensive implementation details of Knotify_V2.0 are publicly available on GitHub.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    插入序列(IS)元件是在原核基因组中发现的最简单的自主转座元件1。我们最近发现IS110家族元件编码重组酶和非编码桥RNA(bRNA),其通过两个可编程环2赋予靶DNA和供体DNA的模块特异性。在这里,我们报道了IS110重组酶与其bRNA复合的低温电子显微镜结构,目标DNA和供体DNA在重组反应循环的三个不同阶段。IS110突触复合物包含两个重组酶二聚体,其中一个包含bRNA的靶结合环并与靶DNA结合,而另一个协调bRNA供体结合环和供体DNA。我们发现了跨越两个二聚体的复合RuvC-Tnp活性位点的形成,将催化丝氨酸残基定位在靶和供体DNA中的重组位点附近。三种结构的比较表明:(1)靶和供体DNA的顶部链在复合活性位点被切割,形成共价5'-磷酸丝氨酸中间体,(2)切割的DNA链交换和重新连接,以创建霍利迪连接中间体,和(3)该中间体随后通过底部链的裂解而分解。总的来说,这项研究揭示了双特异性RNA赋予IS110重组酶靶和供体DNA特异性以进行可编程DNA重组的机制。
    Insertion sequence (IS) elements are the simplest autonomous transposable elements found in prokaryotic genomes1. We recently discovered that IS110 family elements encode a recombinase and a non-coding bridge RNA (bRNA) that confers modular specificity for target DNA and donor DNA through two programmable loops2. Here we report the cryo-electron microscopy structures of the IS110 recombinase in complex with its bRNA, target DNA and donor DNA in three different stages of the recombination reaction cycle. The IS110 synaptic complex comprises two recombinase dimers, one of which houses the target-binding loop of the bRNA and binds to target DNA, whereas the other coordinates the bRNA donor-binding loop and donor DNA. We uncovered the formation of a composite RuvC-Tnp active site that spans the two dimers, positioning the catalytic serine residues adjacent to the recombination sites in both target and donor DNA. A comparison of the three structures revealed that (1) the top strands of target and donor DNA are cleaved at the composite active sites to form covalent 5\'-phosphoserine intermediates, (2) the cleaved DNA strands are exchanged and religated to create a Holliday junction intermediate, and (3) this intermediate is subsequently resolved by cleavage of the bottom strands. Overall, this study reveals the mechanism by which a bispecific RNA confers target and donor DNA specificity to IS110 recombinases for programmable DNA recombination.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号