Mesh : Humans Benchmarking East Asian People / genetics Genomics Haplotypes High-Throughput Nucleotide Sequencing Sequence Analysis, DNA Twins, Monozygotic / genetics Twin Studies as Topic

来  源:   DOI:10.1186/s13059-023-03116-3   PDF(Pubmed)

Abstract:
Recent state-of-the-art sequencing technologies enable the investigation of challenging regions in the human genome and expand the scope of variant benchmarking datasets. Herein, we sequence a Chinese Quartet, comprising two monozygotic twin daughters and their biological parents, using four short and long sequencing platforms (Illumina, BGI, PacBio, and Oxford Nanopore Technology).
The long reads from the monozygotic twin daughters are phased into paternal and maternal haplotypes using the parent-child genetic map and for each haplotype. We also use long reads to generate haplotype-resolved whole-genome assemblies with completeness and continuity exceeding that of GRCh38. Using this Quartet, we comprehensively catalogue the human variant landscape, generating a dataset of 3,962,453 SNVs, 886,648 indels (< 50 bp), 9726 large deletions (≥ 50 bp), 15,600 large insertions (≥ 50 bp), 40 inversions, 31 complex structural variants, and 68 de novo mutations which are shared between the monozygotic twin daughters. Variants underrepresented in previous benchmarks owing to their complexity-including those located at long repeat regions, complex structural variants, and de novo mutations-are systematically examined in this study.
In summary, this study provides high-quality haplotype-resolved assemblies and a comprehensive set of benchmarking resources for two Chinese monozygotic twin samples which, relative to existing benchmarks, offers expanded genomic coverage and insight into complex variant categories.
摘要:
背景:最新的最先进的测序技术能够研究人类基因组中具有挑战性的区域,并扩大变体基准测试数据集的范围。在这里,我们排列了一个中国四重奏,包括两个单卵双胞胎女儿和他们的亲生父母,使用四个短测序平台和长测序平台(Illumina,华大基因,PacBio,和牛津纳米孔技术)。
结果:使用亲子遗传图谱和每个单倍型,将单卵双胞胎女儿的长读数分为父系和母系单倍型。我们还使用长读段来生成单倍型解析的全基因组组装体,其完整性和连续性超过GRCh38。利用这个四重奏,我们对人类的变化景观进行了全面的分类,生成3,962,453个SNV的数据集,886,648个指数(<50bp),9726个大缺失(≥50bp),15600个大插入(≥50bp),40次倒置,31个复杂的结构变体,和68个从头突变,这些突变在单卵双胞胎女儿之间共享。由于其复杂性,以前基准中代表性不足的变体——包括位于长重复区域的变体,复杂的结构变体,和从头突变-在这项研究中进行了系统检查。
结论:总之,这项研究提供了高质量的单倍型解析装配和一套全面的基准资源,为两个中国单卵双胞胎样品,相对于现有的基准,提供扩展的基因组覆盖范围和对复杂变体类别的洞察。
公众号