Tandem Repeat Sequences

  • 文章类型: Journal Article
    Tandem repeats (TRs) are genomic regions that tandemly change in repeat number, which are often multiallelic. Their characteristics and contributions to gene expression and quantitative traits in rice are largely unknown. Here, we survey rice TR variations based on 231 genome assemblies and the rice pan-genome graph. We identify 227,391 multiallelic TR loci, including 54,416 TR variations that are absent from the Nipponbare reference genome. Only 1/3 TR variations show strong linkage with nearby bi-allelic variants (SNPs, Indels and PAVs). Using 193 panicle and 202 leaf transcriptomic data, we reveal 485 and 511 TRs act as QTLs independently of other bi-allelic variations to nearby gene expression, respectively. Using plant height and grain width as examples, we identify and validate TRs contributions to rice agronomic trait variations. These findings would enhance our understanding of the functions of multiallelic variants and facilitate rice molecular breeding.






  • 文章类型: Journal Article
    The insulin-linked polymorphic region is a variable number of tandem repeats region of DNA in the promoter of the insulin gene that regulates transcription of insulin. This region is known to form the alternative DNA structures, i-motifs and G-quadruplexes. Individuals have different sequence variants of tandem repeats and although previous work investigated the effects of some variants on G-quadruplex formation, there is not a clear picture of the relationship between the sequence diversity, the DNA structures formed, and the functional effects on insulin gene expression. Here we show that different sequence variants of the insulin linked polymorphic region form different DNA structures in vitro. Additionally, reporter genes in cellulo indicate that insulin expression may change depending on which DNA structures form. We report the crystal structure and dynamics of an intramolecular i-motif, which reveal sequences within the loop regions forming additional stabilising interactions that are critical to formation of stable i-motif structures. The outcomes of this work reveal the detail in formation of stable i-motif DNA structures, with potential for rational based drug design for compounds to target i-motif DNA.






  • 文章类型: Journal Article
    BACKGROUND: A common method for analyzing genomic repeats is to produce a sequence similarity matrix visualized via a dot plot. Innovative approaches such as StainedGlass have improved upon this classic visualization by rendering dot plots as a heatmap of sequence identity, enabling researchers to better visualize multi-megabase tandem repeat arrays within centromeres and other heterochromatic regions of the genome. However, computing the similarity estimates for heatmaps requires high computational overhead and can suffer from decreasing accuracy.
    RESULTS: In this work, we introduce ModDotPlot, an interactive and alignment-free dot plot viewer. By approximating average nucleotide identity via a k-mer-based containment index, ModDotPlot produces accurate plots orders of magnitude faster than StainedGlass. We accomplish this through the use of a hierarchical modimizer scheme that can visualize the full 128 Mb genome of Arabidopsis thaliana in under 5 min on a laptop. ModDotPlot is bundled with a graphical user interface supporting real-time interactive navigation of entire chromosomes.
    METHODS: ModDotPlot is available at https://github.com/marbl/ModDotPlot.






  • 文章类型: Journal Article
    UNASSIGNED: Tandem duplication (TD) is a common and important type of structural variation in the human genome. TDs have been shown to play an essential role in many diseases, including cancer. However, it is difficult to accurately detect TDs due to the uneven distribution of reads and the inherent complexity of next-generation sequencing (NGS) data.
    UNASSIGNED: This article proposes a method called DTDHM (detection of tandem duplications based on hybrid methods), which utilizes NGS data to detect TDs in a single sample. DTDHM builds a pipeline that integrates read depth (RD), split read (SR), and paired-end mapping (PEM) signals. To solve the problem of uneven distribution of normal and abnormal samples, DTDHM uses the K-nearest neighbor (KNN) algorithm for multi-feature classification prediction. Then, the qualified split reads and discordant reads are extracted and analyzed to achieve accurate localization of variation sites. This article compares DTDHM with three other methods on 450 simulated datasets and five real datasets.
    UNASSIGNED: In 450 simulated data samples, DTDHM consistently maintained the highest F1-score. The average F1-score of DTDHM, SVIM, TARDIS, and TIDDIT were 80.0%, 56.2%, 43.4%, and 67.1%, respectively. The F1-score of DTDHM had a small variation range and its detection effect was the most stable and 1.2 times that of the suboptimal method. Most of the boundary biases of DTDHM fluctuated around 20 bp, and its boundary deviation detection ability was better than TARDIS and TIDDIT. In real data experiments, five real sequencing samples (NA19238, NA19239, NA19240, HG00266, and NA12891) were used to test DTDHM. The results showed that DTDHM had the highest overlap density score (ODS) and F1-score of the four methods.
    UNASSIGNED: Compared with the other three methods, DTDHM achieved excellent results in terms of sensitivity, precision, F1-score, and boundary bias. These results indicate that DTDHM can be used as a reliable tool for detecting TDs from NGS data, especially in the case of low coverage depth and tumor purity samples.






  • 文章类型: Journal Article
    Tandem repeats are frequent across the human genome, and variation in repeat length has been linked to a variety of traits. Recent improvements in long read sequencing technologies have the potential to greatly improve tandem repeat analysis, especially for long or complex repeats. Here, we introduce LongTR, which accurately genotypes tandem repeats from high-fidelity long reads available from both PacBio and Oxford Nanopore Technologies. LongTR is freely available at https://github.com/gymrek-lab/longtr and https://zenodo.org/doi/10.5281/zenodo.11403979 .






  • 文章类型: Journal Article
    Multivalency in lectins plays a pivotal role in influencing glycan cross-linking, thereby affecting lectin functionality. This multivalency can be achieved through oligomerization, the presence of tandemly repeated carbohydrate recognition domains, or a combination of both. Unlike lectins that rely on multiple factors for the oligomerization of identical monomers, tandem-repeat lectins inherently possess multivalency, independent of this complex process. The repeat domains, although not identical, display slightly distinct specificities within a predetermined geometry, enhancing specificity, affinity, avidity and even oligomerization. Despite the recognition of this structural characteristic in recently discovered lectins by numerous studies, a unified criterion to define tandem-repeat lectins is still necessary. We suggest defining them multivalent lectins with intrachain tandem repeats corresponding to carbohydrate recognition domains, independent of oligomerization. This systematic review examines the folding and phyletic diversity of tandem-repeat lectins and refers to relevant literature. Our study categorizes all lectins with tandemly repeated carbohydrate recognition domains into nine distinct folding classes associated with specific biological functions. Our findings provide a comprehensive description and analysis of tandem-repeat lectins in terms of their functions and structural features. Our exploration of phyletic and functional diversity has revealed previously undocumented tandem-repeat lectins. We propose research directions aimed at enhancing our understanding of the origins of tandem-repeat lectin and fostering the development of medical and biotechnological applications, notably in the design of artificial sugars and neolectins.






  • 文章类型: Journal Article
    Exceptions to Mendelian inheritance often highlight novel chromosomal behaviors. The maize Pl1-Rhoades allele conferring plant pigmentation can display inheritance patterns deviating from Mendelian expectations in a behavior known as paramutation. However, the chromosome features mediating such exceptions remain unknown. Here we show that small RNA production reflecting RNA polymerase IV function within a distal downstream set of five tandem repeats is coincident with meiotically-heritable repression of the Pl1-Rhoades transcription unit. A related pl1 haplotype with three, but not one with two, repeat units also displays the trans-homolog silencing typifying paramutations. 4C interactions, CHD3a-dependent small RNA profiles, nuclease sensitivity, and polyadenylated RNA levels highlight a repeat subregion having regulatory potential. Our comparative and mutant analyses show that transcriptional repression of Pl1-Rhoades correlates with 24-nucleotide RNA production and cytosine methylation at this subregion indicating the action of a specific DNA-dependent RNA polymerase complex. These findings support a working model in which pl1 paramutation depends on trans-chromosomal RNA-directed DNA methylation operating at a discrete cis-linked and copy-number-dependent transcriptional regulatory element.






  • 文章类型: Journal Article
    Cnaphalocrocis medinalis granulovirus (CnmeGV), belonging to Betabaculovirus cnamedinalis, can infect the rice pest, the rice leaf roller. In 1979, a CnmeGV isolate, CnmeGV-EP, was collected from Enping County, China. In 2014, we collected another CnmeGV isolate, CnmeGV-EPDH3, at the same location and obtained the complete virus genome sequence using Illumina and ONT sequencing technologies. By combining these two virus isolates, we updated the genome annotation of CnmeGV and conducted an in-depth analysis of its genome features. CnmeGV genome contains abundant tandem repeat sequences, and the repeating units in the homologous regions (hrs) exhibit overlapping and nested patterns. The genetic variations within EPDH3 population show the high stability of CnmeGV genome, and tandem repeats are the only region of high genetic variation in CnmeGV genome replication. Some defective viral genomes formed by recombination were found within the population. Comparison analysis of the two virus isolates collected from Enping showed that the proteins encoded by the CnmeGV-specific genes were less conserved relative to the baculovirus core genes. At the genomic level, there are a large number of SNPs and InDels between the two virus isolates, especially in and around the bro genes and hrs. Additionally, we discovered that CnmeGV acquired a segment of non-ORF sequence from its host, which does not provide any new proteins but rather serves as redundant genetic material integrated into the viral genome. Furthermore, we observed that the host\'s transposon piggyBac has inserted into some virus genes. Together, dsDNA viruses could acquire non-coding genetic material from their hosts to expand the size of their genomes. These findings provide new insights into the evolution of dsDNA viruses.






  • 文章类型: Journal Article
    UNASSIGNED: Persistence of FLT3 internal tandem duplication (ITD) in adults with acute myeloid leukemia (AML) in first complete remission (CR) prior to allogeneic hematopoietic cell transplant (HCT) is associated with increased relapse and death after transplant, but the association between the level of measurable residual disease (MRD) detected and clinical outcome is unknown.
    UNASSIGNED: To examine the association between pre-allogeneic HCT MRD level with relapse and death posttransplant in adults with AML in first CR.
    UNASSIGNED: In this cohort study, DNA sequencing was performed on first CR blood from patients with FLT3-ITD AML transplanted from March 2013 to February 2019. Clinical follow-up was through May 2022. Data were analyzed from October 2022 to December 2023.
    UNASSIGNED: Centralized DNA sequencing for FLT3-ITD in pre-allogeneic HCT first CR blood using a commercially available kit.
    UNASSIGNED: The primary outcomes were overall survival and cumulative incidence of relapse, with non-relapse-associated mortality as a competing risk post-allogeneic HCT. Kaplan-Meier estimations (log-rank tests), Cox proportional hazards models, and Fine-Gray models were used to estimate the end points.
    UNASSIGNED: Of 537 included patients with FLT3-ITD AML from the Pre-MEASURE study, 296 (55.1%) were female, and the median (IQR) age was 55.6 (42.9-64.1) years. Using the variant allele fraction (VAF) threshold of 0.01% or greater for MRD positivity, the results closely aligned with those previously reported. With no VAF threshold applied (VAF greater than 0%), 263 FLT3-ITD variants (median [range] VAF, 0.005% [0.0002%-44%]), and 177 patients (33.0%) with positive findings were identified. Multivariable analyses showed that residual FLT3-ITD was the variable most associated with relapse and overall survival, with a dose-dependent correlation. Patients receiving reduced-intensity conditioning without melphalan or nonmyeloablative conditioning had increased risk of relapse and death at any given level of MRD compared with those receiving reduced-intensity conditioning with melphalan or myeloablative conditioning.
    UNASSIGNED: This study provides generalizable and clinically applicable evidence that the detection of residual FLT3-ITD in the blood of adults in first CR from AML prior to allogeneic HCT is associated with an increased risk of relapse and death, particularly for those with a VAF of 0.01% or greater. While transplant conditioning intensification, an intervention not available to all, may help mitigate some of this risk, alternative approaches will be necessary for this high-risk population of patients who are underserved by the current standard of care.






  • 文章类型: Journal Article
    UNASSIGNED: The internal tandem duplication (ITD) mutation of the FMS-like receptor tyrosine kinase 3 (FLT3-ITD) is the most common mutation observed in approximately 30% of acute myeloid leukemia (AML) patients. It represents poor prognosis due to continuous activation of downstream growth-promoting signaling pathways such as STAT5 and PI3K/AKT. Hence, FLT3 is considered an attractive druggable target; selective small FLT3 inhibitors (FLT3Is), such as midostaurin and quizartinib, have been clinically approved. However, patients possess generally poor remission rates and acquired resistance when FLT3I used alone. Various factors in patients could cause these adverse effects including altered epigenetic regulation, causing mainly abnormal gene expression patterns. Epigenetic modifications are required for hematopoietic stem cell (HSC) self-renewal and differentiation; however, critical driver mutations have been identified in genes controlling DNA methylation (such as DNMT3A, TET2, IDH1/2). These regulators cause leukemia pathogenesis and affect disease diagnosis and prognosis when they co-occur with FLT3-ITD mutation. Therefore, understanding the role of different epigenetic alterations in FLT3-ITD AML pathogenesis and how they modulate FLT3I\'s activity is important to rationalize combinational treatment approaches including FLT3Is and modulators of methylation regulators or pathways. Data from ongoing pre-clinical and clinical studies will further precisely define the potential use of epigenetic therapy together with FLT3Is especially after characterized patients\' mutational status in terms of FLT3 and DNA methlome regulators.





