Tandem Repeat Sequences

  • 文章类型: Journal Article
    背景:分析基因组重复的常用方法是产生通过点图可视化的序列相似性矩阵。诸如StainedGlass之类的创新方法通过将点图绘制为序列同一性的热图,对这种经典的可视化进行了改进。使研究人员能够更好地可视化着丝粒和基因组其他异色区域内的多兆碱基串联重复序列阵列。然而,计算热图的相似性估计需要很高的计算开销,并且可能会降低准确性。
    结果:在这项工作中,我们介绍了ModDotPlot,交互式和无对齐的点图查看器。通过基于k聚体的遏制指数近似平均核苷酸同一性,ModDotPlot比StainedGlass更快地产生精确的绘图。我们通过使用分层修饰方案来实现这一点,该方案可以在笔记本电脑上5分钟内可视化拟南芥的完整128Mbp基因组。ModDotPlot与图形用户界面捆绑在一起,支持整个染色体的实时交互式导航。
    方法:ModDotPlot可在https://github.com/marbl/ModDotPlot获得。
    BACKGROUND: A common method for analyzing genomic repeats is to produce a sequence similarity matrix visualized via a dot plot. Innovative approaches such as StainedGlass have improved upon this classic visualization by rendering dot plots as a heatmap of sequence identity, enabling researchers to better visualize multi-megabase tandem repeat arrays within centromeres and other heterochromatic regions of the genome. However, computing the similarity estimates for heatmaps requires high computational overhead and can suffer from decreasing accuracy.
    RESULTS: In this work, we introduce ModDotPlot, an interactive and alignment-free dot plot viewer. By approximating average nucleotide identity via a k-mer-based containment index, ModDotPlot produces accurate plots orders of magnitude faster than StainedGlass. We accomplish this through the use of a hierarchical modimizer scheme that can visualize the full 128 Mb genome of Arabidopsis thaliana in under 5 min on a laptop. ModDotPlot is bundled with a graphical user interface supporting real-time interactive navigation of entire chromosomes.
    METHODS: ModDotPlot is available at https://github.com/marbl/ModDotPlot.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    串联复制(TD)是人类基因组中常见且重要的结构变异类型。TDs已被证明在许多疾病中起着至关重要的作用,包括癌症.然而,由于读段的不均匀分布和下一代测序(NGS)数据的固有复杂性,很难准确检测TD.
    本文提出了一种称为DTDHM(基于混合方法的串联重复检测)的方法,它利用NGS数据来检测单个样品中的TD。DTDHM构建了一个集成读取深度(RD)的管道,拆分读取(SR),和配对端映射(PEM)信号。为解决正态和异常样本分布不均的问题,DTDHM使用K最近邻(KNN)算法进行多特征分类预测。然后,提取并分析合格的分裂读段和不一致读段,实现变异位点的准确定位。本文将DTDHM与其他三种方法在450个模拟数据集和五个真实数据集上进行了比较。
    在450个模拟数据样本中,DTDHM始终保持最高的F1得分。DTDHM的平均F1分数,SVIM,TARDIS,TIDDIT为80.0%,56.2%,43.4%,和67.1%,分别。DTDHM的F1评分变化范围小,其检测效果最稳定,是次优方法的1.2倍。DTDHM的大多数边界偏差在20bp左右波动,其边界偏差检测能力优于TARDIS和TIDDIT。在真实数据实验中,使用五个真实测序样品(NA19238、NA19239、NA19240、HG00266和NA12891)来测试DTDHM。结果表明,DTDHM在四种方法中具有最高的重叠密度得分(ODS)和F1得分。
    与其他三种方法相比,DTDHM在灵敏度方面取得了优异的结果,精度,F1分数,和边界偏差。这些结果表明,DTDHM可以用作从NGS数据中检测TD的可靠工具,特别是在低覆盖深度和肿瘤纯度样品的情况下。
    UNASSIGNED: Tandem duplication (TD) is a common and important type of structural variation in the human genome. TDs have been shown to play an essential role in many diseases, including cancer. However, it is difficult to accurately detect TDs due to the uneven distribution of reads and the inherent complexity of next-generation sequencing (NGS) data.
    UNASSIGNED: This article proposes a method called DTDHM (detection of tandem duplications based on hybrid methods), which utilizes NGS data to detect TDs in a single sample. DTDHM builds a pipeline that integrates read depth (RD), split read (SR), and paired-end mapping (PEM) signals. To solve the problem of uneven distribution of normal and abnormal samples, DTDHM uses the K-nearest neighbor (KNN) algorithm for multi-feature classification prediction. Then, the qualified split reads and discordant reads are extracted and analyzed to achieve accurate localization of variation sites. This article compares DTDHM with three other methods on 450 simulated datasets and five real datasets.
    UNASSIGNED: In 450 simulated data samples, DTDHM consistently maintained the highest F1-score. The average F1-score of DTDHM, SVIM, TARDIS, and TIDDIT were 80.0%, 56.2%, 43.4%, and 67.1%, respectively. The F1-score of DTDHM had a small variation range and its detection effect was the most stable and 1.2 times that of the suboptimal method. Most of the boundary biases of DTDHM fluctuated around 20 bp, and its boundary deviation detection ability was better than TARDIS and TIDDIT. In real data experiments, five real sequencing samples (NA19238, NA19239, NA19240, HG00266, and NA12891) were used to test DTDHM. The results showed that DTDHM had the highest overlap density score (ODS) and F1-score of the four methods.
    UNASSIGNED: Compared with the other three methods, DTDHM achieved excellent results in terms of sensitivity, precision, F1-score, and boundary bias. These results indicate that DTDHM can be used as a reliable tool for detecting TDs from NGS data, especially in the case of low coverage depth and tumor purity samples.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    DNA序列中的重复元件是顶螺原生动物的标志。在弓形虫和相关球虫寄生虫中对串联重复进行了全基因组筛选,并采用了一种评估成分偏倚的新策略。观察到GC偏斜和嘌呤-嘧啶偏倚的保守模式。在蛋白质水平上也存在成分偏差。谷氨酸是嘌呤(GA)富集簇中含量最丰富的氨基酸,而丝氨酸普遍存在于嘧啶(CT)富集簇。嘌呤丰富的重复,因此谷氨酸丰富,与内在无序的蛋白质区域/结构域的高分相关。最后,在一个众所周知的rhoptry抗原(ROP1)和一个具有相似特征的未表征的假设蛋白内,发现了重复区的变异性.我们提出的方法可用于鉴定带有重复元件的潜在抗原。
    Repetitive elements in DNA sequences are a hallmark of Apicomplexan protozoa. A genome-wide screening for Tandem Repeats was conducted in Toxoplasma gondii and related Coccidian parasites with a novel strategy to assess compositional bias. A conserved pattern of GC skew and purine-pyrimidine bias was observed. Compositional bias was also present at the protein level. Glutamic acid was the most abundant amino acid in the purine (GA) rich cluster, while Serine prevailed in pyrimidine (CT) rich cluster. Purine rich repeats, and consequently glutamic acid abundance, correlated with high scores for intrinsically disordered protein regions/domains. Finally, variability was established for repetitive regions within a well-known rhoptry antigen (ROP1) and an uncharacterized hypothetical protein with similar features. The approach we present could be useful to identify potential antigens bearing repetitive elements.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    串联重复在整个人类基因组中频繁出现,重复长度的变化与多种性状有关。最近长读取测序技术的改进有可能极大地改善串联重复分析,尤其是长时间或复杂的重复。这里,我们介绍LongTR,从PacBio和OxfordNanoporeTechnologies提供的高保真长读数中准确地串联重复基因型。LongTR可在https://github.com/gymorek-lab/longtr和https://zenodo.org/doi/10.5281/zenodo.11403979上免费获得。
    Tandem repeats are frequent across the human genome, and variation in repeat length has been linked to a variety of traits. Recent improvements in long read sequencing technologies have the potential to greatly improve tandem repeat analysis, especially for long or complex repeats. Here, we introduce LongTR, which accurately genotypes tandem repeats from high-fidelity long reads available from both PacBio and Oxford Nanopore Technologies. LongTR is freely available at https://github.com/gymrek-lab/longtr and https://zenodo.org/doi/10.5281/zenodo.11403979 .
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    C型凝集素(CTL)是一类重要的模式识别受体(PRR),在无脊椎动物中表现出结构和功能多样性。重复的DNA序列在真核生物基因组中普遍存在,代表不同的基因组进化模式和促进新基因的产生。我们的研究揭示了一种新的CTL,它由两个长串联重复序列组成,丰富的苏氨酸,和一个碳水化合物识别域(CRD)在Exopalaemoncarinicauda,并已命名为EcTR-CTL。EcTR-CTL的全长cDNA长1242bp,开放阅读框(ORF)为999bp,编码332个氨基酸的蛋白质。EcTR-CTL的基因组构造包含4个外显子和3个内含子。EcTR-CTL中每个重复单元的长度为198bp,这与先前在对虾和小龙虾中报道的短串联重复不同。EcTR-CTL在肠和血细胞中大量表达。副溶血性弧菌和白斑综合征病毒(WSSV)攻击后,肠EcTR-CTL的表达水平上调。EcTR-CTL基因敲除下调抗脂多糖因子的表达,Crustin,和溶菌酶在弧菌感染期间。重组EcTR-CTLCRD(rCRD)可与细菌结合,脂多糖,和肽聚糖。此外,rCRD可以直接与WSSV结合。这些发现表明,1)具有串联重复的CTL可能在甲壳类动物中普遍存在,2)EcTR-CTL可能作为PRR通过非自我识别和抗菌肽调节参与细菌的先天免疫防御,3)EcTR-CTL可能通过捕获病毒粒子在WSSV感染过程中发挥积极或消极作用。
    C-type lectins (CTLs) are an important class of pattern recognition receptors (PRRs) that exhibit structural and functional diversity in invertebrates. Repetitive DNA sequences are ubiquitous in eukaryotic genomes, representing distinct modes of genome evolution and promoting new gene generation. Our study revealed a new CTL that is composed of two long tandem repeats, abundant threonine, and one carbohydrate recognition domain (CRD) in Exopalaemon carinicauda and has been designated EcTR-CTL. The full-length cDNA of EcTR-CTL was 1242 bp long and had an open reading frame (ORF) of 999 bp that encoded a protein of 332 amino acids. The genome structure of EcTR-CTL contains 4 exons and 3 introns. The length of each repeat unit in EcTR-CTL was 198 bp, which is different from the short tandem repeats reported previously in prawns and crayfish. EcTR-CTL was abundantly expressed in the intestine and hemocytes. After Vibrio parahaemolyticus and white spot syndrome virus (WSSV) challenge, the expression level of EcTR-CTL in the intestine was upregulated. Knockdown of EcTR-CTL downregulated the expression of anti-lipopolysaccharide factor, crustin, and lysozyme during Vibrio infection. The recombinant CRD of EcTR-CTL (rCRD) could bind to bacteria, lipopolysaccharides, and peptidoglycans. Additionally, rCRD can directly bind to WSSV. These findings indicate that 1) CTLs with tandem repeats may be ubiquitous in crustaceans, 2) EcTR-CTL may act as a PRR to participate in the innate immune defense against bacteria via nonself-recognition and antimicrobial peptide regulation, and 3) EcTR-CTL may play a positive or negative role in the process of WSSV infection by capturing virions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    暂无摘要。
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    串联复制(TD)是一种主要的结构变异(SVs),在新基因形成和人类疾病中起着重要作用。然而,由于缺乏对TD相关突变信号的专门操作,TD经常被大多数现代SV检测方法错过或错误地分类为插入。在这里,我们为Pindel工具开发了一个TD检测模块,称为Pindel-TD,基于TD特定的模式增长方法。Pindel-TD能够以单核苷酸分辨率检测具有宽尺寸范围的TD。使用来自HG002的模拟和真实读取数据,我们证明了Pindel-TD在精度方面优于其他领先的方法,召回,F1分数,和鲁棒性。此外,通过将Pindel-TD应用于从K562癌细胞系产生的数据,我们确定了位于SAGE1第七外显子的TD,为其高表达提供了解释。Pindel-TD可用于非商业用途,网址为https://github.com/xjtu-omics/pindel。
    Tandem duplication (TD) is a major type of structural variations (SVs) that plays an important role in novel gene formation and human diseases. However, TDs are often missed or incorrectly classified as insertions by most modern SV detection methods due to the lack of specialized operation on TD-related mutational signals. Herein, we developed a TD detection module for the Pindel tool, referred to as Pindel-TD, based on a TD-specific pattern growth approach. Pindel-TD is capable of detecting TDs with a wide size range at single nucleotide resolution. Using simulated and real read data from HG002, we demonstrated that Pindel-TD outperforms other leading methods in terms of precision, recall, F1-score, and robustness. Furthermore, by applying Pindel-TD to data generated from the K562 cancer cell line, we identified a TD located at the seventh exon of SAGE1, providing an explanation for its high expression. Pindel-TD is available for non-commercial use at https://github.com/xjtu-omics/pindel.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    凝集素的多价在影响聚糖交联中起着关键作用,从而影响凝集素功能。这种多价可以通过低聚实现,串联重复的碳水化合物识别域的存在,或两者的组合。与依赖于多种因素的相同单体的低聚作用的凝集素不同,串联重复凝集素固有地具有多价性,独立于这个复杂的过程。重复结构域,虽然不相同,在预定的几何形状中显示略微不同的特性,增强特异性,亲和力,亲合力甚至寡聚化。尽管许多研究在最近发现的凝集素中认识到了这种结构特征,仍然需要一个统一的标准来定义串联重复凝集素。我们建议将它们定义为具有对应于碳水化合物识别域的链内串联重复序列的多价凝集素,独立于低聚。本系统综述研究了串联重复凝集素的折叠和种系多样性,并参考了相关文献。我们的研究将具有串联重复的碳水化合物识别域的所有凝集素分类为与特定生物学功能相关的9种不同的折叠类别。我们的发现从功能和结构特征方面对串联重复凝集素进行了全面的描述和分析。我们对系统发育和功能多样性的探索揭示了以前没有记载的串联重复凝集素。我们提出了研究方向,旨在增强我们对串联重复凝集素起源的理解,并促进医学和生物技术应用的发展。特别是在人造糖和新分子的设计中。
    Multivalency in lectins plays a pivotal role in influencing glycan cross-linking, thereby affecting lectin functionality. This multivalency can be achieved through oligomerization, the presence of tandemly repeated carbohydrate recognition domains, or a combination of both. Unlike lectins that rely on multiple factors for the oligomerization of identical monomers, tandem-repeat lectins inherently possess multivalency, independent of this complex process. The repeat domains, although not identical, display slightly distinct specificities within a predetermined geometry, enhancing specificity, affinity, avidity and even oligomerization. Despite the recognition of this structural characteristic in recently discovered lectins by numerous studies, a unified criterion to define tandem-repeat lectins is still necessary. We suggest defining them multivalent lectins with intrachain tandem repeats corresponding to carbohydrate recognition domains, independent of oligomerization. This systematic review examines the folding and phyletic diversity of tandem-repeat lectins and refers to relevant literature. Our study categorizes all lectins with tandemly repeated carbohydrate recognition domains into nine distinct folding classes associated with specific biological functions. Our findings provide a comprehensive description and analysis of tandem-repeat lectins in terms of their functions and structural features. Our exploration of phyletic and functional diversity has revealed previously undocumented tandem-repeat lectins. We propose research directions aimed at enhancing our understanding of the origins of tandem-repeat lectin and fostering the development of medical and biotechnological applications, notably in the design of artificial sugars and neolectins.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    野兔线粒体DNA(mtDNA)的非编码区,兔子,和pikas(Lagomorpha)包含短(~20bp)和长(130-160bp)串联重复,在相关的哺乳动物订单中不存在。在提出的研究中,我们对山地野兔(Lepustimidus)和棕色野兔(L.europaeus)mtDNA非编码区,以及串联重复变异的物种和种群水平分析。山野兔短串联重复序列(SRs)以及其他分析的野兔物种由两个保守的10bp基序组成,只有棕色的野兔表现出一种,更多可变的主题。长串联重复(LR)在物种之间的序列和拷贝数也不同。山兔有四到七个理数,中值五,棕色野兔展示5到9个LRs,中值六。有趣的是,棕色野兔中渗入的山兔mtDNA获得了中间LR长度分布,中值拷贝数与同种褐兔mtDNA相同。相比之下,将棕色野兔mtDNA转移到培养的无mtDNA的山地野兔细胞中,保持了原始的LR数,而相互转移导致拷贝数不稳定,表明细胞环境而不是核基因组背景在LR维持中起作用。由于其动态性质和分离的其他已知的保守序列元件上的非编码区的野兔线粒体基因组,串联重复元素可能代表古代基因重排的特征。阐明这些重排的性质和动力学可能会阐明NCR重复元素在线粒体和物种进化中的可能作用。
    The non-coding regions of the mitochondrial DNAs (mtDNAs) of hares, rabbits, and pikas (Lagomorpha) contain short (∼20 bp) and long (130-160 bp) tandem repeats, absent in related mammalian orders. In the presented study, we provide in-depth analysis for mountain hare (Lepus timidus) and brown hare (L. europaeus) mtDNA non-coding regions, together with a species- and population-level analysis of tandem repeat variation. Mountain hare short tandem repeats (SRs) as well as other analyzed hare species consist of two conserved 10 bp motifs, with only brown hares exhibiting a single, more variable motif. Long tandem repeats (LRs) also differ in sequence and copy number between species. Mountain hares have four to seven LRs, median value five, while brown hares exhibit five to nine LRs, median value six. Interestingly, introgressed mountain hare mtDNA in brown hares obtained an intermediate LR length distribution, with median copy number being the same as with conspecific brown hare mtDNA. In contrast, transfer of brown hare mtDNA into cultured mtDNA-less mountain hare cells maintained the original LR number, whereas the reciprocal transfer caused copy number instability, suggesting that cellular environment rather than the nuclear genomic background plays a role in the LR maintenance. Due to their dynamic nature and separation from other known conserved sequence elements on the non-coding region of hare mitochondrial genomes, the tandem repeat elements likely to represent signatures of ancient genetic rearrangements. clarifying the nature and dynamics of these rearrangements may shed light on the possible role of NCR repeated elements in mitochondria and in species evolution.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    隐孢子虫病是一种传染性肠道疾病,由隐孢子虫属的物种(其中一些是人畜共患的)引起,在许多国家都受到监视。对隐孢子虫病监测至关重要的分型测定通常涉及隐孢子虫糖蛋白60基因(gp60)的表征。这里,我们从两个样本-人类和猪粪便样本-中鉴定了猪隐孢子虫的gp60,并在此基础上开发了初步的分型方案。C.suisgp60的显着特征是位于基因5'末端的新型串联重复序列,占基因的777/1635bp(48%)。C.suisgp60缺乏经典的聚丝氨酸重复(TCA/TCG/TCT),通常会受到重大遗传变异的影响,和串联重复序列的长度使得基于Sanger测序的结合该区域的分型测定实际上不可行。因此,我们设计了一种仅基于重复序列后区域的分型测定法,并将其应用于来自挪威suid宿主的C.suis阳性样品,丹麦,和西班牙。我们能够区分三种不同的亚型:XXVa-1、XXVa-2和XXVa-3。亚型XXVa-1具有比其他亚型更宽的地理分布,并且在人类样品中也观察到。我们认为,目前的数据将为未来的策略提供信息,以开发C.suis分型测定,通过包括更大部分的基因,可以提供更多的信息。包括串联重复区,例如,通过使用长读取的下一代测序。
    Cryptosporidiosis is an infectious enteric disease caused by species (some of them zoonotic) of the genus Cryptosporidium that in many countries are under surveillance. Typing assays critical to the surveillance of cryptosporidiosis typically involve characterization of Cryptosporidium glycoprotein 60 genes (gp60). Here, we characterized the gp60 of Cryptosporidium suis from two samples-a human and a porcine faecal sample-based on which a preliminary typing scheme was developed. A conspicuous feature of the C. suis gp60 was a novel type of tandem repeats located in the 5\' end of the gene and that took up 777/1635 bp (48%) of the gene. The C. suis gp60 lacked the classical poly-serine repeats (TCA/TCG/TCT), which is usually subject to major genetic variation, and the length of the tandem repeat made a typing assay incorporating this region based on Sanger sequencing practically unfeasible. We therefore designed a typing assay based on the post-repeat region only and applied it to C. suis-positive samples from suid hosts from Norway, Denmark, and Spain. We were able to distinguish three different subtypes; XXVa-1, XXVa-2, and XXVa-3. Subtype XXVa-1 had a wider geographic distribution than the other subtypes and was also observed in the human sample. We think that the present data will inform future strategies to develop a C. suis typing assay that could be even more informative by including a greater part of the gene, including the tandem repeat region, e.g., by the use of long-read next-generation sequencing.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号