关键词: assembly evaluation de novo assembly genome sequencing heterozygosity long-read sequencing purging allelic sequences

Mesh : Sequence Analysis, DNA Haplotypes High-Throughput Nucleotide Sequencing Heterozygote Alleles

来  源:   DOI:10.1093/bib/bbad337   PDF(Pubmed)

Abstract:
Although current long-read sequencing technologies have a long-read length that facilitates assembly for genome reconstruction, they have high sequence errors. While various assemblers with different perspectives have been developed, no systematic evaluation of assemblers with long reads for diploid genomes with varying heterozygosity has been performed. Here, we evaluated a series of processes, including the estimation of genome characteristics such as genome size and heterozygosity, de novo assembly, polishing, and removal of allelic contigs, using six genomes with various heterozygosity levels. We evaluated five long-read-only assemblers (Canu, Flye, miniasm, NextDenovo and Redbean) and five hybrid assemblers that combine short and long reads (HASLR, MaSuRCA, Platanus-allee, SPAdes and WENGAN) and proposed a concrete guideline for the construction of haplotype representation according to the degree of heterozygosity, followed by polishing and purging haplotigs, using stable and high-performance assemblers: Redbean, Flye and MaSuRCA.
摘要:
尽管当前的长读取测序技术具有长读取长度,可以促进基因组重建的组装,他们有很高的序列错误。虽然已经开发了具有不同观点的各种组装器,尚未对具有不同杂合性的二倍体基因组的具有长读数的装配体进行系统评估。这里,我们评估了一系列过程,包括基因组特征的估计,如基因组大小和杂合性,从头大会,抛光,去除等位基因重叠群,使用六个具有不同杂合度水平的基因组。我们评估了五个长只读汇编器(Canu,弗莱,miniasm,NextDenovo和Redbean)和五个结合了短读和长读的混合汇编程序(HASLR,MaSuRCA,悬铃木,SPAdes和WENGAN),并提出了根据杂合度构建单倍型表示的具体指南,然后抛光和清除杂物,使用稳定和高性能的汇编器:Redbean,Flye和MaSuRCA.
公众号