关键词: Genome assembly LRS Special Issue Oxford Nanopore duplex sequencing genome sequencing haplotype phasing single-molecule sequencing telomere-to-telomere assembly

来  源:   DOI:10.1101/2024.03.15.585294   PDF(Pubmed)

Abstract:
The combination of ultra-long Oxford Nanopore (ONT) sequencing reads with long, accurate PacBio HiFi reads has enabled the completion of a human genome and spurred similar efforts to complete the genomes of many other species. However, this approach for complete, \"telomere-to-telomere\" genome assembly relies on multiple sequencing platforms, limiting its accessibility. ONT \"Duplex\" sequencing reads, where both strands of the DNA are read to improve quality, promise high per-base accuracy. To evaluate this new data type, we generated ONT Duplex data for three widely-studied genomes: human HG002, Solanum lycopersicum Heinz 1706 (tomato), and Zea mays B73 (maize). For the diploid, heterozygous HG002 genome, we also used \"Pore-C\" chromatin contact mapping to completely phase the haplotypes. We found the accuracy of Duplex data to be similar to HiFi sequencing, but with read lengths tens of kilobases longer, and the Pore-C data to be compatible with existing diploid assembly algorithms. This combination of read length and accuracy enables the construction of a high-quality initial assembly, which can then be further resolved using the ultra-long reads, and finally phased into chromosome-scale haplotypes with Pore-C. The resulting assemblies have a base accuracy exceeding 99.999% (Q50) and near-perfect continuity, with most chromosomes assembled as single contigs. We conclude that ONT sequencing is a viable alternative to HiFi sequencing for de novo genome assembly, and has the potential to provide a single-instrument solution for the reconstruction of complete genomes.
摘要:
超长牛津纳米孔(ONT)测序读数与长,准确的PacBioHiFi读数使人类基因组得以完成,并刺激了类似的努力来完成许多其他物种的基因组。然而,这种方法是完整的,“端粒到端粒”基因组组装依赖于多个测序平台,限制其可达性。ONT\"Duplex\"测序读取,DNA的两条链都被读取以提高质量,承诺高每基地的精度。要评估此新数据类型,我们为三个广泛研究的基因组生成了ONTDuplex数据:人类HG002,番茄1706(番茄),和玉米B73(玉米)。对于二倍体,杂合子HG002基因组,我们还使用“Pore-C”染色质接触作图来完全定相单倍型。我们发现Duplex数据的准确性与HiFi测序相似,但是读取长度要长几十千,和Pore-C数据与现有的二倍体组装算法兼容。读取长度和准确性的这种组合使得能够构建高质量的初始组件,然后可以使用超长读取进一步解决,最后用Pore-C分期成染色体规模的单倍型。产生的组件具有超过99.999%(Q50)的基本精度和接近完美的连续性,大多数染色体组装成单个重叠群。我们得出的结论是,ONT测序是HiFi测序的一种可行的替代方法,用于从头基因组组装,并有可能为完整基因组的重建提供单一仪器的解决方案。
公众号