准确的单倍型分析有助于区分等位基因特异性表达,识别顺式调节元素,表征基因组变异,这使得能够更精确地研究基因型和表型之间的关系。第三代单分子长读数和合成共条形码读数测序技术的最新进展已经利用远程信息来简化组装图并改善组装基因组序列。然而,由于长读数的高测序错误率和共条形码读数的有限捕获效率,重建完整单倍型在方法上仍然具有挑战性。我们在这里展示一条管道,AsmMix,用于生成连续和准确的二倍体基因组。它首先组装共同条形码读段,以生成可能包含许多缺口的准确的单倍型解析组装体,而长时间读取的程序集是连续的,但容易出错。然后将两个装配集集成到具有减少的误装配的单倍型解析的装配中。通过对多个合成数据集的广泛评估,AsmMix始终如一地在不同的测序平台上展示出高的单倍型准确率和召回率。覆盖深度,读取长度,读取准确性,显著优于该领域的其他现有工具。此外,我们使用人类全基因组数据集(HG002)验证了我们管道的有效性,并产生高度连续的,准确,和单倍型解析程序集。使用GIAB基准对这些程序集进行评估,确认变体调用的准确性。我们的结果表明,AsmMix提供了一种简单而高效的方法,可以有效地利用长读数和共条形码读数来进行单倍型解析组装。
Accurate haplotyping facilitates distinguishing allele-specific expression, identifying cis-regulatory elements, and characterizing genomic variations, which enables more precise investigations into the relationship between genotype and phenotype. Recent advances in third-generation single-molecule long read and synthetic co-barcoded read sequencing techniques have harnessed long-range information to simplify the assembly graph and improve assembly genomic sequence. However, it remains methodologically challenging to reconstruct the complete haplotypes due to high sequencing error rates of long reads and limited capturing efficiency of co-barcoded reads. We here present a pipeline, AsmMix, for generating both contiguous and accurate diploid genomes. It first assembles co-barcoded reads to generate accurate haplotype-resolved assemblies that may contain many gaps, while the long-read assembly is contiguous but susceptible to errors. Then two assembly sets are integrated into haplotype-resolved assemblies with reduced misassembles. Through extensive evaluation on multiple synthetic datasets, AsmMix consistently demonstrates high precision and recall rates for haplotyping across diverse sequencing platforms, coverage depths, read lengths, and read accuracies, significantly outperforming other existing tools in the field. Furthermore, we validate the effectiveness of our pipeline using a human whole genome dataset (HG002), and produce highly contiguous, accurate, and haplotype-resolved assemblies. These assemblies are evaluated using the GIAB benchmarks, confirming the accuracy of variant calling. Our results demonstrate that AsmMix offers a straightforward yet highly efficient approach that effectively leverages both long reads and co-barcoded reads for haplotype-resolved assembly.