segmental duplication

  • 文章类型: Journal Article
    The segmentally duplicated Pregnancy-specific glycoprotein (PSG) locus on chromosome 19q13 may be one of the most rapidly evolving in the human genome. It comprises ten coding genes (PSG1-9, 11) and one predominantly non-coding gene (PSG10) that are expressed in the placenta and gut, in addition to several poorly characterized long non-coding RNAs. We report that long non-coding RNA PSG8-AS1 has an oligodendrocyte-specific expression pattern and is co-expressed with genes encoding key myelin constituents. PSG8-AS1 exhibits two peaks of expression during human brain development coinciding with the most active periods of oligodendrogenesis and myelination. PSG8-AS1 orthologs were found in the genomes of several primates but significant expression was found only in the human, suggesting a recent evolutionary origin of its proposed role in myelination. Additionally, because co-deletion of chromosomes 1p/19q is a genomic marker of oligodendroglioma, expression of PSG8-AS1 was examined in these tumors. PSG8-AS1 may be a promising diagnostic biomarker for glioma, with prognostic value in oligodendroglioma.






  • 文章类型: Journal Article
    Recent years have seen a dramatic increase in the number of canine genome assemblies available. Duplications are an important source of evolutionary novelty and are also prone to misassembly. We explored the duplication content of nine canine genome assemblies using both genome self-alignment and read-depth approaches. We find that 8.58% of the genome is duplicated in the canFam4 assembly, derived from the German Shepherd Dog Mischka, including 90.15% of unplaced contigs. Highlighting the continued difficulty in properly assembling duplications, less than half of read-depth and assembly alignment duplications overlap, but the mCanLor1.2 Greenland wolf assembly shows greater concordance. Further study shows the presence of multiple segments that have alignments to four or more duplicate copies. These high-recurrence duplications correspond to gene retrocopies. We identified 3,892 candidate retrocopies from 1,316 parental genes in the canFam4 assembly and find that ∼8.82% of duplicated base pairs involve a retrocopy, confirming this mechanism as a major driver of gene duplication in canines. Similar patterns are found across eight other recent canine genome assemblies, with metrics supporting a greater quality of the PacBio HiFi mCanLor1.2 assembly. Comparison between the wolf and other canine assemblies found that 92% of retrocopy insertions are shared between assemblies. By calculating the number of generations since genome divergence, we estimate that new retrocopy insertions appear, on average, in 1 out of 3,514 births. Our analyses illustrate the impact of retrogene formation on canine genomes and highlight the variable representation of duplicated sequences among recently completed canine assemblies.






  • 文章类型: Journal Article
    The duplication-triplication/inverted-duplication (DUP-TRP/INV-DUP) structure is a complex genomic rearrangement (CGR). Although it has been identified as an important pathogenic DNA mutation signature in genomic disorders and cancer genomes, its architecture remains unresolved. Here, we studied the genomic architecture of DUP-TRP/INV-DUP by investigating the DNA of 24 patients identified by array comparative genomic hybridization (aCGH) on whom we found evidence for the existence of 4 out of 4 predicted structural variant (SV) haplotypes. Using a combination of short-read genome sequencing (GS), long-read GS, optical genome mapping, and single-cell DNA template strand sequencing (strand-seq), the haplotype structure was resolved in 18 samples. The point of template switching in 4 samples was shown to be a segment of ∼2.2-5.5 kb of 100% nucleotide similarity within inverted repeat pairs. These data provide experimental evidence that inverted low-copy repeats act as recombinant substrates. This type of CGR can result in multiple conformers generating diverse SV haplotypes in susceptible dosage-sensitive loci.






  • 文章类型: Journal Article
    Leucine-rich repeat receptor-like kinases (LRR-RLKs) represent the largest subgroup of receptor-like kinases (RLKs) in plants. While some LRR-RLK members play a role in regulating various plant growth processes related to morphogenesis, disease resistance, and stress response, the functions of most LRR-RLK genes remain unclear. In this study, we identified 397 LRR-RLK genes from the genome of Camellia sinensis and categorized them into 16 subfamilies. Approximately 62% of CsLRR-RLK genes are situated in regions resulting from segmental duplications, suggesting that the expansion of CsLRR-RLK genes is due to segmental duplications. Analysis of gene expression patterns revealed differential expression of CsLRR-RLK genes across different tissues and in response to stress. Furthermore, we demonstrated that CssEMS1 localizes to the cell membrane and can complement Arabidopsis ems1 mutant. This study is the initial in-depth evolutionary examination of LRR-RLKs in tea and provides a basis for future investigations into their functionality.
    UNASSIGNED: The online version contains supplementary material available at 10.1007/s12298-024-01458-1.






  • 文章类型: Journal Article
    Long-read sequencing data, particularly those derived from the Oxford Nanopore sequencing platform, tend to exhibit high error rates. Here, we present NextDenovo, an efficient error correction and assembly tool for noisy long reads, which achieves a high level of accuracy in genome assembly. We apply NextDenovo to assemble 35 diverse human genomes from around the world using Nanopore long-read data. These genomes allow us to identify the landscape of segmental duplication and gene copy number variation in modern human populations. The use of NextDenovo should pave the way for population-scale long-read assembly using Nanopore long-read data.






  • 文章类型: Journal Article
    Collembola is a highly diverse and abundant group of soil arthropods with chromosome numbers ranging from 5 to 11. Previous karyotype studies indicated that the Tomoceridae family possesses an exceptionally long chromosome. To better understand chromosome size evolution in Collembola, we obtained a chromosome-level genome of Yoshiicerus persimilis with a size of 334.44 Mb and BUSCO completeness of 97.0% (n = 1013). Both genomes of Y. persimilis and Tomocerus qinae (recently published) have an exceptionally large chromosome (ElChr greater than 100 Mb), accounting for nearly one-third of the genome. Comparative genomic analyses suggest that chromosomal elongation occurred independently in the two species approximately 10 million years ago, rather than in the ancestor of the Tomoceridae family. The ElChr elongation was caused by large tandem and segmental duplications, as well as transposon proliferation, with genes in these regions experiencing weaker purifying selection (higher dN/dS) than conserved regions. Moreover, inter-genomic synteny analyses indicated that chromosomal fission/fusion events played a crucial role in the evolution of chromosome numbers (ranging from 5 to 7) within Entomobryomorpha. This study provides a valuable resource for investigating the chromosome evolution of Collembola.






  • 文章类型: Journal Article
    BACKGROUND: The genomic region that lies between the telomere and chromosome body, termed the subtelomere, is heterochromatic, repeat-rich, and frequently undergoes rearrangement. Within this region, large-scale structural changes enable gene diversification, and, as such, large multicopy gene families are often found at the subtelomere. In some parasites, genes associated with proliferation, invasion, and survival are often found in these regions, where they benefit from the subtelomere\'s highly plastic, rapidly changing nature. The increasing availability of complete (or near complete) parasite genomes provides an opportunity to investigate these typically poorly defined and overlooked genomic regions and potentially reveal relevant gene families necessary for the parasite\'s lifestyle.
    RESULTS: Using the latest chromosome-scale genome assembly and hallmark repeat richness observed at chromosome termini, we have identified and characterised the subtelomeres of Schistosoma mansoni, a metazoan parasitic flatworm that infects over 250 million people worldwide. Approximately 12% of the S. mansoni genome is classified as subtelomeric, and, in line with other organisms, we find these regions to be gene-poor but rich in transposable elements. We find that S. mansoni subtelomeres have undergone extensive interchromosomal recombination and that these sites disproportionately contribute to the 2.3% of the genome derived from segmental duplications. This recombination has led to the expansion of subtelomeric gene clusters containing 103 genes, including the immunomodulatory annexins and other gene families with unknown roles. The largest of these is a 49-copy plexin domain-containing protein cluster, exclusively expressed in the tegument-the tissue located at the host-parasite physical interface-of intramolluscan life stages.
    CONCLUSIONS: We propose that subtelomeric regions act as a genomic playground for trial-and-error of gene duplication and subsequent divergence. Owing to the importance of subtelomeric genes in other parasites, gene families implicated in this subtelomeric expansion within S. mansoni warrant further characterisation for a potential role in parasitism.






  • 文章类型: Journal Article
    Optical genome mapping (OGM) has been known as an all-in-one technology for chromosomal aberration detection. However, there are also aberrations beyond the detection range of OGM. This study aimed to report the aberrations missed by OGM and analyze the contributing factors. OGM was performed by taking both GRCh37 and GRCh38 as reference genomes. The OGM results were analyzed in blinded fashion and compared to standard assays. Quality control (QC) metrics, sample types, reference genome, effective coverage and classes and locations of aberrations were then analyzed. In total, 154 clinically reported variations from 123 samples were investigated. OGM failed to detect 10 (6.5%, 10/154) aberrations with GRCh37 assembly, including five copy number variations (CNVs), two submicroscopic balanced translocations, two pericentric inversion and one isochromosome (mosaicism). All the samples passed pre-analytical and analytical QC. With GRCh38 assembly, the false-negative rate of OGM fell to 4.5% (7/154). The breakpoints of the CNVs, balanced translocations and inversions undetected by OGM were located in segmental duplication (SD) regions or regions with no DLE-1 label. In conclusion, besides variations with centromeric breakpoints, structural variations (SVs) with breakpoints located in large repetitive sequences may also be missed by OGM. GRCh38 is recommended as the reference genome when OGM is performed. Our results highlight the necessity of fully understanding the detection range and limitation of OGM in clinical practice.






  • 文章类型: Journal Article
    A WUSCHEL-related homeobox (WOX) gene family has been implicated in promoting vegetative organs to embryonic transition and maintaining plant embryonic stem cell identity. Using genome-wide analysis, we identified 17 candidates, WOX genes in ramie (Boehmeria nivea). The genes (BnWOX) showed highly conserved homeodomain regions typical of WOX. Based on phylogenetic analysis, they were classified into three distinct groups: modern, intermediate, and ancient clades. The genes displayed 65% and 35% collinearities with their Arabidopsis thaliana and Oryza sativa ortholog, respectively, and exhibited similar motifs, suggesting similar functions. Furthermore, four segmental duplications (BnWOX10/14, BnWOX13A/13B, BnWOX9A/9B, and BnWOX6A/Maker00021031) and a tandem-duplicated pair (BnWOX5/7) among the putative ramie WOX genes were obtained, suggesting that whole-genome duplication (WGD) played a role in WOX gene expansion. Expression profiling analysis of the genes in the bud, leaf, stem, and root of the stem cuttings revealed higher expression levels of BnWOX10 and BnWOX14 in the stem and root and lower in the leaf consistent with the qRT-PCR analysis, suggesting their direct roles in ramie root formation. Analysis of the rooting characteristics and expression in the stem cuttings of sixty-seven different ramie genetic resources showed a possible involvement of BnWOX14 in the adventitious rooting of ramie. Thus, this study provides valuable information on ramie WOX genes and lays the foundation for further research.






  • 文章类型: Journal Article
    The pentatricopeptide repeat (PPR) gene family is one of the largest gene families in land plants. However, current knowledge about the evolution of the PPR gene family remains largely limited. In this study, we performed a comparative genomic analysis of the PPR gene family in O. sativa and its wild progenitor, O. rufipogon, and outlined a comprehensive landscape of gene duplications. Our findings suggest that the majority of PPR genes originated from dispersed duplications. Although segmental duplications have only expanded approximately 11.30% and 13.57% of the PPR gene families in the O. sativa and O. rufipogon genomes, we interestingly obtained evidence that segmental duplication promotes the structural diversity of PPR genes through incomplete gene duplications. In the O. sativa and O. rufipogon genomes, 10 (~33.33%) and 22 pairs of gene duplications (~45.83%) had non-PPR paralogous genes through incomplete gene duplication. Segmental duplications leading to incomplete gene duplications might result in the acquisition of domains, thus promoting functional innovation and structural diversification of PPR genes. This study offers a unique perspective on the evolution of PPR gene structures and underscores the potential role of segmental duplications in PPR gene structural diversity.





