关键词: Baculoviridae alphabaculovirus genotype long read machine learning short read viral quasispecies

Mesh : Nucleopolyhedroviruses / genetics classification isolation & purification Haplotypes Animals Nanopore Sequencing / methods Polymorphism, Single Nucleotide Bombyx / virology High-Throughput Nucleotide Sequencing / methods Genome, Viral

来  源:   DOI:10.1099/jgv.0.001983

Abstract:
Naturally occurring isolates of baculoviruses, such as the Bombyx mori nucleopolyhedrovirus (BmNPV), usually consist of numerous genetically different haplotypes. Deciphering the different haplotypes of such isolates is hampered by the large size of the dsDNA genome, as well as the short read length of next generation sequencing (NGS) techniques that are widely applied for baculovirus isolate characterization. In this study, we addressed this challenge by combining the accuracy of NGS to determine single nucleotide variants (SNVs) as genetic markers with the long read length of Nanopore sequencing technique. This hybrid approach allowed the comprehensive analysis of genetically homogeneous and heterogeneous isolates of BmNPV. Specifically, this allowed the identification of two putative major haplotypes in the heterogeneous isolate BmNPV-Ja by SNV position linkage. SNV positions, which were determined based on NGS data, were linked by the long Nanopore reads in a Position Weight Matrix. Using a modified Expectation-Maximization algorithm, the Nanopore reads were assigned according to the occurrence of variable SNV positions by machine learning. The cohorts of reads were de novo assembled, which led to the identification of BmNPV haplotypes. The method demonstrated the strength of the combined approach of short- and long-read sequencing techniques to decipher the genetic diversity of baculovirus isolates.
摘要:
自然存在的杆状病毒分离株,如家蚕核型多角体病毒(BmNPV),通常由许多遗传上不同的单倍型组成。破译这些分离株的不同单倍型受到dsDNA基因组的大尺寸的阻碍。以及广泛用于杆状病毒分离物表征的下一代测序(NGS)技术的短读取长度。在这项研究中,我们通过将NGS确定单核苷酸变体(SNV)作为遗传标记的准确性与Nanopore测序技术的长读取长度相结合,解决了这一挑战.这种混合方法允许对BmNPV的遗传同质和异质分离株进行综合分析。具体来说,这允许通过SNV位置连锁在异质分离株BmNPV-Ja中鉴定两个推定的主要单倍型。SNV位置,这些数据是根据NGS数据确定的,通过位置权重矩阵中的长纳米孔读数链接。使用改进的期望最大化算法,通过机器学习根据可变SNV位置的出现来分配纳米孔读数。阅读的队列是从头组装的,这导致了BmNPV单倍型的鉴定。该方法证明了短读测序技术和长读测序技术相结合的方法在破译杆状病毒分离株遗传多样性方面的优势。
公众号