关键词: Accessory genes Coronaviridae Coronavirus Open reading frame Phylogeny RNA genome

Mesh : COVID-19 / virology Genome, Viral Genomics / methods Humans Open Reading Frames / genetics SARS-CoV-2 / genetics Viral Proteins / genetics

来  源:   DOI:10.1016/j.meegid.2021.104858   PDF(Sci-hub)   PDF(Pubmed)

Abstract:
The coronaviruses (CoVs), including SARS-CoV-2, the agent of the ongoing deadly CoVID-19 pandemic (Coronavirus disease-2019), represent a highly complex and diverse class of RNA viruses with large genomes, complex gene repertoire, and intricate transcriptional and translational mechanisms. The 3\'-terminal one-third of the genome encodes four structural proteins, namely spike, envelope, membrane, and nucleocapsid, interspersed with genes for accessory proteins that are largely nonstructural and called \'open reading frame\' (ORF) proteins with alphanumerical designations, but not in a consistent or sequential order. Here, I report a comparative study of these ORF proteins, mainly encoded in two gene clusters, i.e. between the Spike and the Envelope genes, and between the Membrane and the Nucleocapsid genes. For brevity and focus, a greater emphasis was placed on the first cluster, collectively designated as the \'orf3 region\' for ease of referral. Overall, an apparently diverse set of ORFs, such as ORF3a, ORF3b, ORF3c, ORF3d, ORF4 and ORF5, but not necessarily numbered in that order on all CoV genomes, were analyzed along with other ORFs. Unexpectedly, the gene order or naming of the ORFs were never fully conserved even within the members of one Genus. These studies also unraveled hitherto unrecognized orf genes in alternative translational frames, encoding potentially novel polypeptides as well as some that are highly similar to known ORFs. Finally, several options of an inclusive and systematic numbering are proposed not only for the orf3 region but also for the other orf genes in the viral genome in an effort to regularize the apparently confusing names and orders. Regardless of the ultimate acceptability of one system over the others, this treatise is hoped to initiate an informed discourse in this area.
摘要:
冠状病毒(CoV),包括SARS-CoV-2,正在进行的致命CoVID-19大流行(冠状病毒病-2019)的代理人,代表一类具有大基因组的高度复杂和多样化的RNA病毒,复杂的基因库,以及复杂的转录和翻译机制。基因组的3'末端三分之一编码四种结构蛋白,即尖峰,信封,膜,和核衣壳,穿插着辅助蛋白的基因,这些辅助蛋白在很大程度上是非结构性的,被称为“开放阅读框”(ORF)具有字母数字名称的蛋白质,但不是以一致或连续的顺序。这里,我报告了这些ORF蛋白的比较研究,主要编码在两个基因簇中,即在Spike和Envelope基因之间,在膜和核衣壳基因之间。为了简洁和专注,更加强调第一组,为了便于转介,统称为“orf3区域”。总的来说,一组明显多样化的ORF,例如ORF3a,ORF3b,ORF3c,ORF3d,ORF4和ORF5,但不一定在所有CoV基因组上按顺序编号,与其他ORF一起分析。出乎意料的是,即使在一个属的成员中,ORF的基因顺序或命名也从未完全保守。这些研究还揭示了迄今为止在替代翻译框架中尚未识别的orf基因,编码潜在的新型多肽以及一些与已知ORF高度相似的多肽。最后,一个包容性的和系统的编号的几个选项被提出,不仅为orf3区域,而且在病毒基因组中的其他orf基因,以努力规范明显令人困惑的名称和顺序。不管一个系统对其他系统的最终可接受性如何,希望这篇论文能在这一领域展开知情的讨论。
公众号