关键词: CpG DNA virus GC content UpA dinucleotide composition

Mesh : Animals Bias DNA Methylation / genetics Genome, Viral / genetics Phylogeny Viruses / genetics Prokaryotic Cells / chemistry Eukaryotic Cells / chemistry

来  源:   DOI:10.1111/mec.17287

Abstract:
The genomes of cellular organisms display CpG and TpA dinucleotide composition biases. Such biases have been poorly investigated in dsDNA viruses. Here, we show that in dsDNA virus, bacterial, and eukaryotic genomes, the representation of TpA and CpG dinucleotides is strongly dependent on genomic G + C content. Thus, the classical observed/expected ratios do not fully capture dinucleotide biases across genomes. Because a larger portion of the variance in TpA frequency was explained by G + C content, we explored which additional factors drive the distribution of CpG dinucleotides. Using the residuals of the linear regressions as a measure of dinucleotide abundance and ancestral state reconstruction across eukaryotic and prokaryotic virus trees, we identified an important role for phylogeny in driving CpG representation. Nonetheless, phylogenetic ANOVA analyses showed that few host associations also account for significant variations. Among eukaryotic viruses, most significant differences were observed between arthropod-infecting viruses and viruses that infect vertebrates or unicellular organisms. However, an effect of viral DNA methylation status (either driven by the host or by viral-encoded methyltransferases) is also likely. Among prokaryotic viruses, cyanobacteria-infecting phages resulted to be significantly CpG-depleted, whereas phages that infect bacteria in the genera Burkolderia and Staphylococcus were CpG-rich. Comparison with bacterial genomes indicated that this effect is largely driven by the general tendency for phages to resemble the host\'s genomic CpG content. Notably, such tendency is stronger for temperate than for lytic phages. Our data shed light into the processes that shape virus genome composition and inform manipulation strategies for biotechnological applications.
摘要:
细胞生物体的基因组显示CpG和TpA二核苷酸组成偏差。这种偏见在dsDNA病毒中的研究很少。这里,我们发现在dsDNA病毒中,细菌,和真核基因组,TpA和CpG二核苷酸的表达强烈依赖于基因组G+C含量。因此,经典的观察/预期比率不能完全捕获跨基因组的二核苷酸偏差.因为TpA频率方差的较大部分由G+C含量解释,我们探索了哪些其他因素驱动CpG二核苷酸的分布。使用线性回归的残差来衡量真核和原核病毒树之间的二核苷酸丰度和祖先状态重建,我们确定了系统发育在驱动CpG代表中的重要作用。尽管如此,系统发育方差分析显示,很少有宿主关联也解释了显着的变化。在真核病毒中,在感染节肢动物的病毒和感染脊椎动物或单细胞生物的病毒之间观察到最显著的差异。然而,病毒DNA甲基化状态的影响(由宿主或病毒编码的甲基转移酶驱动)也可能.在原核病毒中,感染蓝细菌的噬菌体导致CpG明显耗尽,而感染Burkolderia和葡萄球菌属细菌的噬菌体富含CpG。与细菌基因组的比较表明,这种作用很大程度上是由噬菌体与宿主基因组CpG含量相似的一般趋势驱动的。值得注意的是,温带的这种趋势比裂解性噬菌体强。我们的数据揭示了塑造病毒基因组组成并为生物技术应用提供操作策略的过程。
公众号