关键词: Bacterial genome Codon usage bias Essential genes Machine learning Molecular evolution Selection

Mesh : Machine Learning Genes, Essential / genetics Codon Usage Escherichia coli / genetics Genome, Bacterial / genetics Genes, Bacterial Codon / genetics Bacteria / genetics classification

来  源:   DOI:10.1007/s00438-024-02163-0

Abstract:
Codon usage bias (CUB), the uneven usage of synonymous codons encoding the same amino acid, differs among genes within and across bacteria genomes. CUB is known to be influenced by gene expression and accordingly, CUB differs between the high-expression and low-expression genes in several bacteria. In this article, we have extended codon usage study considering gene essentiality as a feature. Using machine learning (ML) based approaches, we have analysed Relative Synonymous Codon Usage (RSCU) values between essential and non-essential genes in Escherichia coli and thirty-four other bacterial genomes whose gene essentiality features were available in public databases. We observed significant differences in codon usage patterns between essential and non-essential genes for majority of the bacterial genomes and accordingly, ML based classifiers achieved high area under curve (AUC) scores, with a minimum score of 70.0 across twenty-eight organisms. Further, importance of the codons towards classifying genes found to differ among the codons in each genome. Arg codon CGT and Gly codon GGT were observed to be the most preferred codons among essential genes in Escherichia coli. Interestingly, some of the codons like CGT, ATA, GGT and GGG observed to be contributing consistently towards classifying essential genes across thirty-five bacteria genomes studied. In other hand, codons TGY and CAY encoding amino acids Cys and His respectively were among the least contributing codons towards classification among all these bacteria. This study demonstrates the gene essentiality based differences in synonymous codon usage in bacteria genomes and presents a common codon usage pattern across bacteria.
摘要:
密码子使用偏差(CUB),编码相同氨基酸的同义密码子的使用不均衡,细菌基因组内和跨细菌基因组的基因不同。已知CUB受基因表达的影响,因此,CUB在几种细菌中的高表达和低表达基因之间存在差异。在这篇文章中,我们扩展了密码子使用研究,将基因的重要性作为一个特征。使用基于机器学习(ML)的方法,我们已经分析了大肠杆菌中必需和非必需基因之间的相对同义密码子使用(RSCU)值,以及其他34个细菌基因组,其基因本质特征可在公共数据库中获得。我们观察到大多数细菌基因组的必需和非必需基因之间的密码子使用模式存在显着差异,因此,基于ML的分类器获得了较高的曲线下面积(AUC)分数,在28种生物中,最低得分为70.0。Further,密码子对每个基因组中密码子之间发现差异的基因进行分类的重要性。观察到Arg密码子CGT和Gly密码子GGT是大肠杆菌必需基因中最优选的密码子。有趣的是,一些像CGT的密码子,ATA,观察到GGT和GGG对所研究的35个细菌基因组中的必需基因的分类做出了一致的贡献。另一方面,编码氨基酸Cys和His的密码子TGY和CAY分别是所有这些细菌中对分类贡献最小的密码子。这项研究证明了基于基因重要性的细菌基因组中同义密码子使用的差异,并提出了跨细菌的常见密码子使用模式。
公众号