Codon usage

  • 文章类型: Journal Article
    Mouse (Mus musculus) models have been heavily utilized in developmental biology research to understand mammalian embryonic development, as mice share many genetic, physiological, and developmental characteristics with humans. New explorations into the integration of temporal (stage-specific) and transcriptional (tissue-specific) data have expanded our knowledge of mouse embryo tissue-specific gene functions. To better understand the substantial impact of synonymous mutational variations in the cell-state-specific transcriptome on a tissue\'s codon and codon pair usage landscape, we have established a novel resource-Mouse Embryo Codon and Codon Pair Usage Tables (Mouse Embryo CoCoPUTs). This webpage not only offers codon and codon pair usage, but also GC, dinucleotide, and junction dinucleotide usage, encompassing four strains, 15 murine embryonic tissue groups, 18 Theiler stages, and 26 embryonic days. Here, we leverage Mouse Embryo CoCoPUTs and employ the use of heatmaps to depict usage changes over time and a comparison to human usage for each strain and embryonic time point, highlighting unique differences and similarities. The usage similarities found between mouse and human central nervous system data highlight the translation for projects leveraging mouse models. Data for this analysis can be directly retrieved from Mouse Embryo CoCoPUTs. This cutting-edge resource plays a crucial role in deciphering the complex interplay between usage patterns and embryonic development, offering valuable insights into variation across diverse tissues, strains, and stages. Its applications extend across multiple domains, with notable advantages for biotherapeutic development, where optimizing codon usage can enhance protein expression; one can compare strains, tissues, and mouse embryonic stages in one query. Additionally, Mouse Embryo CoCoPUTs holds great potential in the field of tissue-specific genetic engineering, providing insights for tailoring gene expression to specific tissues for targeted interventions. Furthermore, this resource may enhance our understanding of the nuanced connections between usage biases and tissue-specific gene function, contributing to the development of more accurate predictive models for genetic disorders.






  • 文章类型: Journal Article
    While bioactivity and a favorable safety profile for biotherapeutics is of utmost importance, manufacturability is also worth of consideration to ease the manufacturing process. Manufacturability in the scientific literature is mostly related to stability of formulated drug substances, with limited focus on downstream process-related manufacturability, that is, how easily can a protein be purified. Process-related impurities or biological impurities like viruses and host cell proteins (HCP) are present in the harvest which have mostly acid isoelectric points and need to be removed to ensure patient safety. Therefore, during molecule design, the surface charge of the target molecule should preferably differ sufficiently from the surface charge of the impurities to enable an efficient purification strategy. In this feasibility study, we evaluated the possibility of improving manufacturability by adapting the surface charge of the target protein. We generated several variants of a GLP1-receptor-agonist-Fc-domain-FGF21-fusion protein and demonstrated proof of concept exemplarily for an anion exchange chromatography step which then can be operated at high pH values with maximal product recovery allowing removal of HCP and viruses. Altering the surface charge distribution of biotherapeutic proteins can thus be useful allowing for an efficient manufacturing process for removing HCP and viruses, thereby reducing manufacturing costs.






  • 文章类型: Journal Article
    Chloroplast (cp) genome sequences have been extensively used for phylogenetic and evolutionary analyses, as many have been sequenced in recent years. Identification of Quercus is challenging because many species overlap phenotypically owing to interspecific hybridization, introgression, and incomplete lineage sorting. Therefore, we wanted to gain a better understanding of this genus at the level of the maternally inherited chloroplast genome. Here, we sequenced, assembled, and annotated the cp genomes of the threatened Quercus marlipoensis (160,995 bp) and Q. kingiana (161,167 bp), and mined these genomes for repeat sequences and codon usage bias. Comparative genomic analyses, phylogenomics, and selection pressure analysis were also performed in these two threatened species along with other species of Quercus. We found that the guanine and cytosine content of the two cp genomes were similar. All 131 annotated genes, including 86 protein-coding genes, 37 transfer RNA genes, and 8 ribosomal RNA genes, had the same order in the two species. A strong A/T bias was detected in the base composition of simple sequence repeats. Among the 59 synonymous codons, the codon usage pattern of the cp genomes in these two species was more inclined toward the A/U ending. Comparative genomic analyses indicated that the cp genomes of Quercus section Ilex are highly conserved. We detected eight highly variable regions that could be used as molecular markers for species identification. The cp genome structure was consistent and different within and among the sections of Quercus. The phylogenetic analysis showed that section Ilex was not monophyletic and was divided into two groups, which were respectively nested with section Cerris and section Cyclobalanopsis. The two threatened species sequenced in this study were grouped into the section Cyclobalanopsis. In conclusion, the analyses of cp genomes of Q. marlipoensis and Q. kingiana promote further study of the taxonomy, phylogeny and evolution of these two threatened species and Quercus.






  • 文章类型: Journal Article
    CRISPR is a precise and effective genome editing technology; but despite several advancements during the last decade, our ability to computationally design gRNAs remains limited. Most predictive models have relatively low predictive power and utilize only the sequence of the target site as input. Here we suggest a new category of features, which incorporate the target site genomic position and the presence of genes close to it. We calculate four features based on gene expression and codon usage bias indices. We show, on CRISPR datasets taken from 3 different cell types, that such features perform comparably with 425 state-of-the-art predictive features, ranking in the top 2-12% of features. We trained new predictive models, showing that adding expression features to them significantly improves their r2 by up to 0.04 (relative increase of 39%), achieving average correlations of up to 0.38 on their validation sets; and that these features are deemed important by different feature importance metrics. We believe that incorporating the target site\'s position, in addition to its sequence, in features such as we have generated here will improve our ability to predict, design and understand CRISPR experiments going forward.






  • 文章类型: Journal Article
    BACKGROUND: The tribe Ampelopsideae plants are important garden plants with both medicinal and ornamental values. The study of codon usage bias (CUB) facilitates a deeper comprehension of the molecular genetic evolution of species and their adaptive strategies. The joint analysis of CUB in chloroplast genomes (cpDNA) offers valuable insights for in-depth research on molecular genetic evolution, biological resource conservation, and elite breeding within this plant family.
    RESULTS: The base composition and codon usage preferences of the eighteen chloroplast genomes were highly similar, with the GC content of bases at all positions of their codons being less than 50%. This indicates that they preferred A/T bases. Their effective codon numbers were all in the range of 35-61, which indicates that the codon preferences of the chloroplast genomes of the 18 Ampelopsideae plants were relatively weak. A series of analyses indicated that the codon preference of the chloroplast genomes of the 18 Ampelopsideae plants was influenced by a combination of multiple factors, with natural selection being the primary influence. The clustering tree generated based on the relative usage of synonymous codons is consistent with some of the results obtained from the phylogenetic tree of chloroplast genomes, which indicates that the clustering tree based on the relative usage of synonymous codons can be an important supplement to the results of the sequence-based phylogenetic analysis. Eventually, 10 shared best codons were screened on the basis of the chloroplast genomes of 18 species.
    CONCLUSIONS: The codon preferences of the chloroplast genome in Ampelopsideae plants are relatively weak and are primarily influenced by natural selection. The codon composition of the chloroplast genomes of the eighteen Ampelopsideae plants and their usage preferences were sufficiently similar to demonstrate that the chloroplast genomes of Ampelopsideae plants are highly conserved. This study provides a scientific basis for the genetic evolution of chloroplast genes in Ampelopsideae species and their suitable strategies.






  • 文章类型: Journal Article
    Despite the highly conserved nature of the genetic code, the frequency of usage of each codon can vary significantly. The evolution of codon usage is shaped by two main evolutionary forces: mutational bias and selection pressures. These pressures can be driven by environmental factors, but also by the need for efficient translation, which depends heavily on the concentration of transfer RNAs (tRNAs) within the cell. The data presented here supports the proposal that tRNA modifications play a key role in shaping the overall preference of codon usage in proteobacteria. Interestingly, some codons, such as CGA and AGG (encoding arginine), exhibit a surprisingly low level of variation in their frequency of usage, even across genomes with differing GC content. These findings suggest that the evolution of GC content in proteobacterial genomes might be primarily driven by changes in the usage of a specific subset of codons, whose usage is itself influenced by tRNA modifications.






  • 文章类型: Journal Article
    Despite the growing catalogue of studies detailing the taxonomic and functional composition of soil bacterial communities, the life history traits of those communities remain largely unknown. This study analyzes a global dataset of soil metagenomes to explore environmental drivers of growth potential, a fundamental aspect of bacterial life history. We find that growth potential, estimated from codon usage statistics, was highest in forested biomes and lowest in arid latitudes. This indicates that bacterial productivity generally reflects ecosystem productivity globally. Accordingly, the strongest environmental predictors of growth potential were productivity indicators, such as distance to the equator, and soil properties that vary along productivity gradients, such as pH and carbon to nitrogen ratios. We also observe that growth potential was negatively correlated with the relative abundances of genes involved in carbohydrate metabolism, demonstrating tradeoffs between growth and resource acquisition in soil bacteria. Overall, we identify macroecological patterns in bacterial growth potential and link growth rates to soil carbon cycling.






  • 文章类型: Journal Article
    The codon usage bias (CUB) of genes encoded by different species\' genomes varies greatly. The analysis of codon usage patterns enriches our comprehension of genetic and evolutionary characteristics across diverse species. In this study, we performed a genome-wide analysis of CUB and its influencing factors in six sequenced Eimeria species that cause coccidiosis in poultry: Eimeria acervulina, Eimeria necatrix, Eimeria brunetti, Eimeria tenella, Eimeria praecox, and Eimeria maxima. The GC content of protein-coding genes varies between 52.67% and 58.24% among the six Eimeria species. The distribution trend of GC content at different codon positions follows GC1 > GC3 > GC2. Most high-frequency codons tend to end with C/G, except in E. maxima. Additionally, there is a positive correlation between GC3 content and GC3s/C3s, but a significantly negative correlation with A3s. Analysis of the ENC-Plot, neutrality plot, and PR2-bias plot suggests that selection pressure has a stronger influence than mutational pressure on CUB in the six Eimeria genomes. Finally, we identified from 11 to 15 optimal codons, with GCA, CAG, and AGC being the most commonly used optimal codons across these species. This study offers a thorough exploration of the relationships between CUB and selection pressures within the protein-coding genes of Eimeria species. Genetic evolution in these species appears to be influenced by mutations and selection pressures. Additionally, the findings shed light on unique characteristics and evolutionary traits specific to the six Eimeria species.






  • 文章类型: Journal Article
    Tetrastigma (Vitaceae) is known for its ornamental, medicinal, and ecological significance. However, the structural and variational characteristics of the Tetrastigma chloroplast genome and their impact on phylogenetic relationships remain underexplored. This study utilized bioinformatics methods to assemble and annotate the chloroplast genomes of 10 Tetrastigma species and compare them with five previously sequenced species. This study analyzed gene composition, simple sequence repeats, and codon usage patterns, revealing a high A/T content, uniquely identified pentanucleotide repeats in five species and several preferred codons. In addition, comparative analyses were conducted of the chloroplast genomes of 15 Tetrastigma species, examining their structural differences and identifying polymorphic hotspots (rps16, rps16-trnQ, trnS, trnD, psbC-trnS-psbZ, accD-psaI, psbE-petL-petG, etc.) suitable for DNA marker development. Furthermore, phylogenetic and selective pressure analyses were performed based on the chloroplast genomes of these 15 Tetrastigma species, validating and elucidating intra-genus relationships within Tetrastigma. Futhermore, several genes under positive selection, such as atpF and accD, were identified, shedding light on the adaptive evolution of Tetrastigma. Utilizing 40 Vitaceae species, the divergence time of Tetrastigma was estimated, clarifying the evolutionary relationships within Tetrastigma relative to other genera. The analysis revealed diverse divergences of Tetrastigma in the Miocene and Pliocene, with possible ancient divergence events before the Eocene. Furthermore, family-level selective pressure analysis identified key features distinguishing Tetrastigma from other genera, showing a higher degree of purifying selection. This research enriches the chloroplast genome data for Tetrastigma and offers new insights into species identification, phylogenetic analysis, and adaptive evolution, enhancing our understanding of the genetic diversity and evolutionary history of these species.






  • 文章类型: Journal Article
    Apostasia fujianica belongs to the genus Apostasia and is part of the basal lineage in the phylogenetic tree of the Orchidaceae. Currently, there are only ten reported complete mitochondrial genomes in orchids, which greatly hinders the understanding of mitochondrial evolution in Orchidaceae. Therefore, we assembled and annotated the mitochondrial genome of A. fujianica, which has a length of 573,612 bp and a GC content of 44.5%. We annotated a total of 44 genes, including 30 protein-coding genes, 12 tRNA genes, and two rRNA genes. We also performed relative synonymous codon usage (RSCU) analysis, repeat sequence analysis, intergenomic transfer (IGT) analysis, and Ka/Ks analysis for A. fujianica and conducted RNA editing site analysis on the mitochondrial genomes of eight orchid species. We found that most protein-coding genes are under purifying selection, but nad6 is under positive selection, with a Ka/Ks value of 1.35. During the IGT event in A. fujianica\'s mitogenome, the trnN-GUU, trnD-GUC, trnW-CCA, trnP-UGG, and psaJ genes were identified as having transferred from the plastid to the mitochondrion. Compared to other monocots, the family Orchidaceae appears to have lost the rpl10, rpl14, sdh3, and sdh4 genes. Additionally, to further elucidate the evolutionary relationships among monocots, we constructed a phylogenetic tree based on the complete mitogenomes of monocots. Our study results provide valuable data on the mitogenome of A. fujianica and lay the groundwork for future research on genetic variation, evolutionary relationships, and breeding of Orchidaceae.





