codon usage bias

  • 文章类型: Journal Article
    The nearly neutral theory of molecular evolution posits variation among species in the effectiveness of selection. In an idealized model, the census population size determines both this minimum magnitude of the selection coefficient required for deleterious variants to be reliably purged, and the amount of neutral diversity. Empirically, an \'effective population size\' is often estimated from the amount of putatively neutral genetic diversity and is assumed to also capture a species\' effectiveness of selection. A potentially more direct measure of the effectiveness of selection is the degree to which selection maintains preferred codons. However, past metrics that compare codon bias across species are confounded by among-species variation in %GC content and/or amino acid composition. Here, we propose a new Codon Adaptation Index of Species (CAIS), based on Kullback-Leibler divergence, that corrects for both confounders. We demonstrate the use of CAIS correlations, as well as the Effective Number of Codons, to show that the protein domains of more highly adapted vertebrate species evolve higher intrinsic structural disorder.
    Evolution is the process through which populations change over time, starting with mutations in the genetic sequence of an organism. Many of these mutations harm the survival and reproduction of an organism, but only by a very small amount. Some species, especially those with large populations, can purge these slightly harmful mutations more effectively than other species. This fact has been used by the ‘drift barrier theory’ to explain various profound differences amongst species, including differences in biological complexity. In this theory, the effectiveness of eliminating slightly harmful mutations is specified by an ‘effective\' population size, which depends on factors beyond just the number of individuals in the population. Effective population size is normally calculated from the amount of time a ‘neutral’ mutation (one with no effect at all) stays in the population before becoming lost or taking over. Estimating this time requires both representative data for genetic diversity and knowledge of the mutation rate. A major limitation is that these data are unavailable for most species. A second limitation is that a brief, temporary reduction in the number of individuals has an oversized impact on the metric, relative to its impact on the number of slighly harmful mutations accumulated. Weibel, Wheeler et al. developed a new metric to more directly determine how effectively a species purges slightly harmful mutations. Their approach is based on the fact that the genetic code has ‘synonymous’ sequences. These sequences code for the same amino acid building block, with one of these sequences being only slightly preferred over others. The metric by Weibel, Wheeler et al. quantifies the proportion of the genome from which less preferred synonymous sequences have been effectively purged. It judges a population to have a higher effective population size when the usage of synonymous sequences departs further from the usage predicted from mutational processes. The researchers expected that natural selection would favour ‘ordered’ proteins with robust three-dimensional structures, i.e., that species with a higher effective population size would tend to have more ordered versions of a protein. Instead, they found the opposite: species with a higher effective population size tend to have more disordered versions of the same protein. This changes our view of how natural selection acts on proteins. Why species are so different remains a fundamental question in biology. Weibel, Wheeler et al. provide a useful tool for future applications of drift barrier theory to a broad range of ways that species differ.






  • 文章类型: Journal Article
    Malus baccata, a valuable germplasm resource in the genus Malus, is indigenous to China and widely distributed. However, little is known about the lineage composition and genetic basis of \'ZA\', a mutant type of M. baccata. In this study, we compared the differences between \'ZA\' and wild type from the perspective of morphology and ultrastructure and analyzed their chloroplast pigment content based on biochemical methods. Further, the complete mitogenome of M. baccata \'ZA\' was assembled and obtained by next-generation sequencing. Subsequently, its molecular characteristics were analyzed using Geneious, MISA-web, and CodonW toolkits. Furthermore, by examining 106 Malus germplasms and 42 Rosaceae species, we deduced and elucidated the evolutionary position of M. baccata \'ZA\', as well as interspecific variations among different individuals. In comparison, the total length of the \'ZA\' mitogenome (GC content: 45.4%) is 374,023 bp, which is approximately 2.33 times larger than the size (160,202 bp) of the plastome (GC: 36.5%). The collinear analysis results revealed abundant repeats and genome rearrangements occurring between different Malus species. Additionally, we identified 14 plastid-driven fragment transfer events. A total of 54 genes have been annotated in the \'ZA\' mitogenome, including 35 protein-coding genes, 16 tRNAs, and three rRNAs. By calculating nucleotide polymorphisms and selection pressure for 24 shared core mitochondrial CDSs from 42 Rosaceae species (including \'ZA\'), we observed that the nad3 gene exhibited minimal variation, while nad4L appeared to be evolving rapidly. Population genetics analysis detected a total of 1578 high-quality variants (1424 SNPs, 60 insertions, and 94 deletions; variation rate: 1/237) among samples from 106 Malus individuals. Furthermore, by constructing phylogenetic trees based on both Malus and Rosaceae taxa datasets, it was preliminarily demonstrated that \'ZA\' is closely related to M. baccata, M. sieversii, and other proximate species in terms of evolution. The sequencing data obtained in this study, along with our findings, contribute to expanding the mitogenomic resources available for Rosaceae research. They also hold reference significance for molecular identification studies as well as conservation and breeding efforts focused on excellent germplasms.






  • 文章类型: Journal Article
    The codon usage bias (CUB) of genes encoded by different species\' genomes varies greatly. The analysis of codon usage patterns enriches our comprehension of genetic and evolutionary characteristics across diverse species. In this study, we performed a genome-wide analysis of CUB and its influencing factors in six sequenced Eimeria species that cause coccidiosis in poultry: Eimeria acervulina, Eimeria necatrix, Eimeria brunetti, Eimeria tenella, Eimeria praecox, and Eimeria maxima. The GC content of protein-coding genes varies between 52.67% and 58.24% among the six Eimeria species. The distribution trend of GC content at different codon positions follows GC1 > GC3 > GC2. Most high-frequency codons tend to end with C/G, except in E. maxima. Additionally, there is a positive correlation between GC3 content and GC3s/C3s, but a significantly negative correlation with A3s. Analysis of the ENC-Plot, neutrality plot, and PR2-bias plot suggests that selection pressure has a stronger influence than mutational pressure on CUB in the six Eimeria genomes. Finally, we identified from 11 to 15 optimal codons, with GCA, CAG, and AGC being the most commonly used optimal codons across these species. This study offers a thorough exploration of the relationships between CUB and selection pressures within the protein-coding genes of Eimeria species. Genetic evolution in these species appears to be influenced by mutations and selection pressures. Additionally, the findings shed light on unique characteristics and evolutionary traits specific to the six Eimeria species.






  • 文章类型: Journal Article
    Dryas octopetala var. asiatica, a dwarf shrub belonging to the Rosaceae family and native to Asia, exhibits notable plasticity in photosynthesis in response to temperature variations. However, the codon usage patterns and factors influencing them in the chloroplast genome of this species have not yet been documented. This study sequenced and assembled the complete genome of D. octopetala var. asiatica. The annotated genes in the chloroplast genome were analyzed for codon composition through multivariate statistical methods including a neutrality plot, a parity rule 2 (PR2) bias plot, and an effective number of codons (ENC) plot using CodonW 1.4.2 software. The results indicated that the mean GC content of 53 CDSs was 38.08%, with the average GC content at the third codon base position being 27.80%, suggesting a preference for A/U(T) at the third codon position in chloroplast genes. Additionally, the chloroplast genes exhibited a weak overall codon usage bias (CUB) based on ENC values and other indicators. Correlation analysis showed a significant negative correlation between ENC value and GC2, an extremely positive correlation with GC3, but no correlation with GC1 content. These findings highlight the importance of the codon composition at the third position in influencing codon usage bias. Furthermore, our analysis indicated that the CUB of the chloroplast genome of D. octopetala var. asiatica was primarily influenced by natural selection and other factors. Finally, this study identified UCA, CCU, GCU, AAU, GAU, and GGU as the optimal codons. These results offer a foundational understanding for genetic modification and evolutionary dynamics of the chloroplast genome of D. octopetala var. asiatica.






  • 文章类型: Journal Article
    The latest research shows that ferns and lycophytes have distinct evolutionary lineages. The codon usage patterns of lycophytes and ferns have not yet been documented. To investigate the gene expression profiles across various plant lineages with respect to codon usage, analyze the disparities and determinants of gene evolution in primitive plant species, and identify appropriate exogenous gene expression platforms, the whole-genome sequences of four distinct species were retrieved from the NCBI database. The findings indicated that Ceratopteris richardii, Adiantum capillus-veneris, and Selaginella moellendorffii exhibited an elevated A/U content in their codon base composition and a tendency to end with A/U. Additionally, S. capillus-veneris had more C/G in its codons and a tendency to end with C/G. The ENC values derived from both ENC-plot and ENC-ratio analyses deviated significantly from the standard curves, suggesting that the codon usage preferences of these four species were primarily influenced by genetic mutations and natural selection, with natural selection exerting a more prominent influence. This finding was further supported by PR2-Plot, neutrality plot analysis, and COA. A combination of RSCU and ENC values was used as a reference criterion to rank the codons and further identify the optimal codons. The study identified 24 high-frequency codons in C. richardii, A. capillus-veneris, and Diphasiastrum complanatum, with no shared optimal codons among the four species. Arabidopsis thaliana and Ginkgo biloba exhibited similar codon preferences to the three species, except for S. moellendorffii. This research offers a theoretical framework at the genomic codon level for investigating the phylogenetic relationships between lycophytes and ferns, shedding light on gene codon optimization and its implications for genetic engineering in breeding.






  • 文章类型: Journal Article
    Hepatitis A virus (HAV), a member of the genus Hepatovirus (Picornaviridae HepV), remains a significant viral pathogen, frequently causing enterically transmitted hepatitis worldwide. In this study, we conducted an epidemiological survey of HepVs carried by small terrestrial mammals in the wild in Yunnan Province, China. Utilizing HepV-specific broad-spectrum RT-PCR, next-generation sequencing (NGS), and QNome nanopore sequencing (QNS) techniques, we identified and characterized two novel HepVs provisionally named EpMa-HAV and EpLe-HAV, discovered in the long-tailed mountain shrew (Episoriculus macrurus) and long-tailed brown-toothed shrew (Episoriculus leucops), respectively. Our sequence and phylogenetic analyses of EpMa-HAV and EpLe-HAV indicated that they belong to the species Hepatovirus I (HepV-I) clade II, also known as the Chinese shrew HepV clade. Notably, the codon usage bias pattern of novel shrew HepVs is consistent with that of previously identified Chinese shrew HepV. Furthermore, our structural analysis demonstrated that shrew HepVs differ from other mammalian HepVs in RNA secondary structure and exhibit variances in key protein sites. Overall, the discovery of two novel HepVs in shrews expands the host range of HepV and underscores the existence of genetically diverse animal homologs of human HAV within the genus HepV.






  • 文章类型: Journal Article
    Magnolia lotungensis is an extremely endangered endemic tree in China. To elucidate the genetic basis of M. lotungensis, we performed a comprehensive transcriptome analysis using a sample integrating the plant\'s bark, leaves, and flowers. De novo transcriptome assembly yielded 177,046 transcripts and 42,518 coding sequences. Notably, we identified 796 species-specific genes enriched in organelle gene regulation and defense responses. A codon usage bias analysis revealed that mutation bias appears to be the primary driver of selection in shaping the species\' genetic architecture. An evolutionary analysis based on dN/dS values of paralogous and orthologous gene pairs indicated a predominance of purifying selection, suggesting strong evolutionary constraints on most genes. A comparative transcriptomic analysis with Magnolia sinica identified approximately 1000 ultra-conserved genes, enriched in essential cellular processes such as transcriptional regulation, protein synthesis, and genome stability. Interestingly, only a limited number of 511 rapidly evolving genes under positive selection were detected compared to M. sinica and Magnolia kuangsiensis. These genes were enriched in metabolic processes associated with adaptation to specific environments, potentially limiting the species\' ability to expand its range. Our findings contribute to understanding the genetic architecture of M. lotungensis and suggest that an insufficient number of adaptive genes contribute to its endangered status.






  • 文章类型: Journal Article
    Rutaceae family comprises economically important plants due to their extensive applications in spices, food, oil, medicine, etc. The Rutaceae plants is able to better utilization through biotechnology. Modern biotechnological approaches primarily rely on the heterologous expression of functional proteins in different vectors. However, several proteins are difficult to express outside their native environment. The expression potential of functional genes in heterologous systems can be maximized by replacing the rare synonymous codons in the vector with preferred optimal codons of functional genes. Codon usage bias plays a critical role in biogenetic engineering-based research and development. In the current study, 727 coding sequences (CDSs) obtained from the chloroplast genomes of ten Rutaceae plant family members were analyzed for codon usage bias. The nucleotide composition analysis of codons showed that these codons were rich in A/T(U) bases and preferred A/T(U) endings. Analyses of neutrality plots, effective number of codons (ENC) plots, and correlations between ENC and codon adaptation index (CAI) were conducted, which revealed that natural selection is a major driving force for the Rutaceae plant family\'s codon usage bias, followed by base mutation. In the ENC vs. CAI plot, codon usage bias in the Rutaceae family had a negligible relationship with gene expression level. For each sample, we screened 12 codons as preferred and high-frequency codons simultaneously, of which GCU encoding Ala, UUA encoding Leu, and AGA encoding Arg were the most preferred codons. Taken together, our study unraveled the synonymous codon usage pattern in the Rutaceae family, providing valuable information for the genetic engineering of Rutaceae plant species in the future.






  • 文章类型: Journal Article
    Goose circovirus (GoCV), a potential immunosuppressive virus possessing a circular single-stranded DNA genome, is widely distributed in both domesticated and wild geese. This virus infection causes significant economic losses in the waterfowl industry. The codon usage patterns of viruses reflect the evolutionary history and genetic architecture, allowing them to adapt quickly to changes in the external environment, particularly to their hosts. In this study, we retrieved the coding sequences (Rep and Cap) and the genome of GoCV from GenBank, conducting comprehensive research to explore the codon usage patterns in 144 GoCV strains. The overall codon usage of the GoCV strains was relatively similar and exhibited a slight bias. The effective number of codons (ENC) indicated a low overall extent of codon usage bias (CUB) in GoCV. Combined with the base composition and relative synonymous codon usage (RSCU) analysis, the results revealed a bias toward A- and G-ending codons in the overall codon usage. Analysis of the ENC-GC3s plot and neutrality plot suggested that natural selection plays an important role in shaping the codon usage pattern of GoCV, with mutation pressure having a minor influence. Furthermore, the correlations between ENC and relative indices, as well as correspondence analysis (COA), showed that hydrophobicity and geographical distribution also contribute to codon usage variation in GoCV, suggesting the possible involvement of natural selection. In conclusion, GoCV exhibits comparatively slight CUB, with natural selection being the major factor shaping the codon usage pattern of GoCV. Our research contributes to a deeper understanding of GoCV evolution and its host adaptation, providing valuable insights for future basic studies and vaccine design related to GoCV.






  • 文章类型: Journal Article
    Schlafen (SLFN) is a family of proteins upregulated by type I interferons with a regulatory role in translation. Intriguingly, SLFN14 associates with the ribosome and can degrade rRNA, tRNA, and mRNA in vitro, but a role in translation is still unknown. Ribosomes are important regulatory hubs during translation elongation of mRNAs rich in rare codons. Therefore, we evaluated the potential role of SLFN14 in the expression of mRNAs enriched in rare codons, using HIV-1 genes as a model. We found that, in a variety of cell types, including primary immune cells, SLFN14 regulates the expression of HIV-1 and non-viral genes based on their codon adaptation index, a measurement of the synonymous codon usage bias; consequently, SLFN14 inhibits the replication of HIV-1. The potent inhibitory effect of SLFN14 on the expression of the rare codon-rich transcript HIV-1 Gag was minimized by codon optimization. Mechanistically, we found that the endoribonuclease activity of SLFN14 is required, and that ribosomal RNA degradation is involved. Therefore, we propose that SLFN14 impairs the expression of HIV-1 transcripts rich in rare codons, in a catalytic-dependent manner.





