Genome evolution

    The discovery of de novo emerged genes, originating from previously noncoding DNA regions, challenges traditional views of species evolution. Indeed, the hypothesis of neutrally evolving sequences giving rise to functional proteins is highly unlikely. This conundrum has sparked numerous studies to quantify and characterize these genes, aiming to understand their functional roles and contributions to genome evolution. Yet, no fully automated pipeline for their identification is available. Therefore, we introduce DENSE (DE Novo emerged gene SEarch), an automated Nextflow pipeline based on two distinct steps: detection of taxonomically restricted genes (TRGs) through phylostratigraphy, and filtering of TRGs for de novo emerged genes via genome comparisons and synteny search. DENSE is available as a user-friendly command-line tool, while the second step is accessible through a web server upon providing a list of TRGs. Highly flexible, DENSE provides various strategy and parameter combinations, enabling users to adapt to specific configurations or define their own strategy through a rational framework, facilitating protocol communication, and study interoperability. We apply DENSE to seven model organisms, exploring the impact of its strategies and parameters on de novo gene predictions. This thorough analysis across species with different evolutionary rates reveals useful metrics for users to define input datasets, identify favorable/unfavorable conditions for de novo gene detection, and control potential biases in genome annotations. Additionally, predictions made for the seven model organisms are compiled into a requestable database, which we hope will serve as a reference for de novo emerged gene lists generated with specific criteria combinations.






    OBJECTIVE: Durum wheat, Triticum turgidum, and bread wheat, Triticum aestivum, are two allopolyploid species of very recent origin that have been subjected to intense selection programs during the thousands of years they have been cultivated. In this paper, we study the durum wheat satellitome and establish a comparative analysis with the previously published bread wheat satellitome.
    METHODS: We revealed the durum wheat satellitome using the satMiner protocol which is based on consecutive rounds of clustering of Illumina reads by RepeatExplorer2, and estimated abundance and variation for each identified satDNA with RepeatMasker v4.0.5. We have also performed a deep satDNA families characterization including chromosomal location by Fluorescence In Situ Hybridization (FISH) in durum wheat and its comparison with FISH patterns in bread wheat. Basic Local Alignment Search Tool (BLAST®) was used for trailing each satDNA in the assembly of durum wheat genome through NCBI\'s Genome Data Viewer (GDW) and the genome assemblies of both species were compared. Sequence divergence and consensus turnover rate (CTR) between homologous satDNA families of durum and bread wheat were estimated using MEGA11.
    RESULTS: This study reveals that in an exceedingly short period, significant qualitative and quantitative changes have occurred in the set of satellite DNAs (satDNAs) of both species, with expansions/contractions of the number of repeats and the loci per satellite, different in each species, and a high rate of sequence change for most of these satellites, in addition to the emergence/loss of satDNAs not shared between the two species analysed. These evolutionary changes in satDNA are common between species but what is truly remarkable and novel about this study is that these processes have taken place in less than the last ~8000 years separating the two species, indicating an accelerated evolution of their satDNAs.
    CONCLUSIONS: These results, together with the relationship of many of these satellites with transposable elements and the polymorphisms they generate at the level of centromeres and subtelomeric regions of their chromosomes, are analysed and discussed in the context of the evolutionary origin of these species and the selection pressure exerted by man throughout the history of their cultivation.






    We sequenced and assembled genomes for 17 isolates of Staphylococcus cohnii isolated from osteomyelitis lesions in young broilers from two separate experiments where we induced lameness using a hybrid wire-litter flooring system. Whole genome comparisons using three different methods support a close relationship of genomes from both S. cohnii and Staphylococcus urealyticus. The data support three different lineages, which we designated as Lineage 1, Lineage 2, and Lineage 3, uniting these two species within an evolving complex. We present evidence for horizontal transfer between lineages of genomic regions from 50-440 kbp. The transfer of a 186 kbp region from Lineage 1 to Lineage 2 appears to have generated Lineage 3. Human-associated isolates appear to be limited to Lineages 2 and 3 but Lineage 2 appears to contain a higher number of human pathogenic isolates. The chicken isolates from our lameness trials included genomically diverse isolates from both Lineage 1 and 2, and isolates from both lineages were obtained from osteomyelitis lesions of individual birds. Our results expand the diversity of Staphylococci associated with osteomyelitis in poultry and suggest a high diversity in the microbiome of day-old chicks. Our data also support a reevaluation and unification of the taxonomic classifications of S. cohnii and S. urealyticus.






    Genome evolution under speciation is poorly understood in nonmodel and nonvascular plants, such as bryophytes-the largest group of nonvascular land plants. Their genomes are structurally different from angiosperms and likely subjected to stronger linked selection pressure, which may have profound consequences on genome evolution in diversifying lineages, even more so when their genome architecture is conserved. We use the highly diverse, rapidly radiated group of peatmosses (Sphagnum) to characterize the processes affecting genome diversification in bryophytes. Using whole-genome sequencing data from populations of 12 species sampled at different phylogenetic and geographical scales, we describe high correlation of the genomic landscapes of differentiation, divergence, and diversity in Sphagnum. Coupled with evidence from the patterns of covariation among different measures of genetic diversity, phylogenetic discordance, and gene density, this provides strong support that peatmoss genome evolution has been shaped by the long-term effects of linked selection, constrained by distribution of selection targets in the genome. Thus, peatmosses join the growing number of animal and plant groups where functional features of the genome, such as gene density, and linked selection drive genome evolution along predetermined and highly similar routes in different species. Our findings demonstrate the great potential of bryophytes for studying the genomics of speciation and highlight the urgent need to expand the genomic resources in this remarkable group of plants.






    Horizontal gene transfer (HGT) is fundamental to microbial evolution and adaptation. When a gene is horizontally transferred, it may either add itself as a new gene to the recipient genome (possibly displacing nonhomologous genes) or replace an existing homologous gene. Currently, studies do not usually distinguish between \"additive\" and \"replacing\" HGTs, and their relative frequencies, integration mechanisms, and specific roles in microbial evolution are poorly understood. In this work, we develop a novel computational framework for large-scale classification of HGTs as either additive or replacing. Our framework leverages recently developed phylogenetic approaches for HGT detection and classifies HGTs inferred between terminal edges based on gene orderings along genomes and phylogenetic relationships between the microbial species under consideration. The resulting method, called DART, is highly customizable and scalable and can classify a large fraction of inferred HGTs with high confidence and statistical support. Our application of DART to a large dataset of thousands of gene families from 103 Aeromonas genomes provides insights into the relative frequencies, functional biases, and integration mechanisms of additive and replacing HGTs. Among other results, we find that (i) the relative frequency of additive HGT increases with increasing phylogenetic distance, (ii) replacing HGT dominates at shorter phylogenetic distances, (iii) additive and replacing HGTs have strikingly different functional profiles, (iv) homologous recombination in flanking regions of a novel gene may be a frequent integration mechanism for additive HGT, and (v) phages and mobile genetic elements likely play an important role in facilitating additive HGT.






    Despite the highly conserved nature of the genetic code, the frequency of usage of each codon can vary significantly. The evolution of codon usage is shaped by two main evolutionary forces: mutational bias and selection pressures. These pressures can be driven by environmental factors, but also by the need for efficient translation, which depends heavily on the concentration of transfer RNAs (tRNAs) within the cell. The data presented here supports the proposal that tRNA modifications play a key role in shaping the overall preference of codon usage in proteobacteria. Interestingly, some codons, such as CGA and AGG (encoding arginine), exhibit a surprisingly low level of variation in their frequency of usage, even across genomes with differing GC content. These findings suggest that the evolution of GC content in proteobacterial genomes might be primarily driven by changes in the usage of a specific subset of codons, whose usage is itself influenced by tRNA modifications.






    BACKGROUND: In vertebrates, most protein-coding genes have a peak of GC-content near their 5\' transcriptional start site (TSS). This feature promotes both the efficient nuclear export and translation of mRNAs. Despite the importance of GC-content for RNA metabolism, its general features, origin, and maintenance remain mysterious. We investigate the evolutionary forces shaping GC-content at the transcriptional start site (TSS) of genes through both comparative genomic analysis of nucleotide substitution rates between different species and by examining human de novo mutations.
    RESULTS: Our data suggests that GC-peaks at TSSs were present in the last common ancestor of amniotes, and likely that of vertebrates. We observe that in apes and rodents, where recombination is directed away from TSSs by PRDM9, GC-content at the 5\' end of protein-coding gene is currently undergoing mutational decay. In canids, which lack PRDM9 and perform recombination at TSSs, GC-content at the 5\' end of protein-coding is increasing. We show that these patterns extend into the 5\' end of the open reading frame, thus impacting synonymous codon position choices.
    CONCLUSIONS: Our results indicate that the dynamics of this GC-peak in amniotes is largely shaped by historic patterns of recombination. Since decay of GC-content towards the mutation rate equilibrium is the default state for non-functional DNA, the observed decrease in GC-content at TSSs in apes and rodents indicates that the GC-peak is not being maintained by selection on most protein-coding genes in those species.






    Across eukaryotes, most genes required for mitochondrial function have been transferred to, or otherwise acquired by, the nucleus. Encoding genes in the nucleus has many advantages. So why do mitochondria retain any genes at all? Why does the set of mtDNA genes vary so much across different species? And how do species maintain functionality in the mtDNA genes they do retain? In this review, we will discuss some possible answers to these questions, attempting a broad perspective across eukaryotes. We hope to cover some interesting features which may be less familiar from the perspective of particular species, including the ubiquity of recombination outside bilaterian animals, encrypted chainmail-like mtDNA, single genes split over multiple mtDNA chromosomes, triparental inheritance, gene transfer by grafting, gain of mtDNA recombination factors, social networks of mitochondria, and the role of mtDNA dysfunction in feeding the world. We will discuss a unifying picture where organismal ecology and gene-specific features together influence whether organism X retains mtDNA gene Y, and where ecology and development together determine which strategies, importantly including recombination, are used to maintain the mtDNA genes that are retained.






    Almost all species in the genus Salix (willow) are dioecious and willows have variable sex-determining systems, the role of this variation in maintaining species barriers is relatively untested. We first analyzed the sex determination systems (SDS) of two species, Salix cardiophylla and Salix interior, whose positions in the Salix phylogeny make them important for understanding a sex chromosome turnover that has been detected in their relatives, and that changed the system from male (XX/XY) to female (ZW/ZZ) heterogamety. We show that both species have male heterogamety, with sex-linked regions (SLRs) on chromosome 15 (termed a 15XY system). The SLRs occupy 21.3% and 22.8% of the entire reference chromosome, respectively. By constructing phylogenetic trees, we determined the phylogenetic positions of all the species with known SDSs. Reconstruction of ancestral SDS character states revealed that the 15XY system is likely the ancestral state in willows. Turnovers of 15XY to 15ZW and 15XY to 7XY likely contributed to early speciation in Salix and gave rise to major groups of the Vetrix and Salix clades. Finally, we tested introgression among species in the phylogenetic trees based on both autosomes and SLRs separately. Frequent introgression was observed among species with 15XY, 15ZW, and 7XY on autosomes, in contrast to the SLR datasets, which showed less introgression, and in particular no gene flow between 15ZW and 7XY species. We argue that, although SDS turnovers in willow speciation may not create complete reproductive barriers, the evolution of SLRs plays important roles in preventing introgression and maintaining species boundaries.






    Understanding the roles played by centromeres in chromosome evolution and speciation is complicated by the fact that centromeres comprise large arrays of tandemly repeated satellite DNA, which hinders high-quality assembly. Here, we used long-read sequencing to generate nearly complete genome assemblies for four karyotypically diverse Papaver species, P. setigerum (2n = 44), P. somniferum (2n = 22), P. rhoeas (2n = 14), and P. bracteatum (2n = 14), collectively representing 45 gapless centromeres. We identified four centromere satellite (cenSat) families and experimentally validated two representatives. For the two allopolyploid genomes (P. somniferum and P. setigerum), we characterized the subgenomic distribution of each satellite and identified a \"homogenizing\" phase of centromere evolution in the aftermath of hybridization. An interspecies comparison of the peri-centromeric regions further revealed extensive centromere-mediated chromosome rearrangements. Taking these results together, we propose a model for studying cenSat competition after hybridization and shed further light on the complex role of the centromere in speciation.





