The long reads from the monozygotic twin daughters are phased into paternal and maternal haplotypes using the parent-child genetic map and for each haplotype. We also use long reads to generate haplotype-resolved whole-genome assemblies with completeness and continuity exceeding that of GRCh38. Using this Quartet, we comprehensively catalogue the human variant landscape, generating a dataset of 3,962,453 SNVs, 886,648 indels (< 50 bp), 9726 large deletions (≥ 50 bp), 15,600 large insertions (≥ 50 bp), 40 inversions, 31 complex structural variants, and 68 de novo mutations which are shared between the monozygotic twin daughters. Variants underrepresented in previous benchmarks owing to their complexity-including those located at long repeat regions, complex structural variants, and de novo mutations-are systematically examined in this study.
In summary, this study provides high-quality haplotype-resolved assemblies and a comprehensive set of benchmarking resources for two Chinese monozygotic twin samples which, relative to existing benchmarks, offers expanded genomic coverage and insight into complex variant categories.
结果:使用亲子遗传图谱和每个单倍型,将单卵双胞胎女儿的长读数分为父系和母系单倍型。我们还使用长读段来生成单倍型解析的全基因组组装体,其完整性和连续性超过GRCh38。利用这个四重奏,我们对人类的变化景观进行了全面的分类,生成3,962,453个SNV的数据集,886,648个指数(<50bp),9726个大缺失(≥50bp),15600个大插入(≥50bp),40次倒置,31个复杂的结构变体,和68个从头突变,这些突变在单卵双胞胎女儿之间共享。由于其复杂性,以前基准中代表性不足的变体——包括位于长重复区域的变体,复杂的结构变体,和从头突变-在这项研究中进行了系统检查。
结论:总之,这项研究提供了高质量的单倍型解析装配和一套全面的基准资源,为两个中国单卵双胞胎样品,相对于现有的基准,提供扩展的基因组覆盖范围和对复杂变体类别的洞察。