关键词: DNA mixture bioinformatics tool long‐read sequencing multiple haplotypes variant calling

Mesh : Humans Polyploidy Sequence Analysis, DNA / methods High-Throughput Nucleotide Sequencing / methods Software Haplotypes Polymorphism, Single Nucleotide Algorithms Computational Biology / methods DNA / genetics analysis Microsatellite Repeats / genetics Forensic Genetics / methods Genotyping Techniques / methods

来  源:   DOI:10.1002/elps.202300143

Abstract:
Macrohaplotype combines multiple types of phased DNA variants, increasing forensic discrimination power. High-quality long-sequencing reads, for example, PacBio HiFi reads, provide data to detect macrohaplotypes in multiploidy and DNA mixtures. However, the bioinformatics tools for detecting macrohaplotypes are lacking. In this study, we developed a bioinformatics software, MacroHapCaller, in which targeted loci (i.e., short TRs [STRs], single nucleotide polymorphisms, and insertion and deletions) are genotyped and combined with novel algorithms to call macrohaplotypes from long reads. MacroHapCaller uses physical phasing (i.e., read-backed phasing) to identify macrohaplotypes, and thus it can detect multi-allelic macrohaplotypes for a given sample. MacroHapCaller was validated with data generated from our designed targeted PacBio HiFi sequencing pipeline, which sequenced ∼8-kb amplicon regions harboring 20 core forensic STR loci in human benchmark samples HG002 and HG003. MacroHapCaller also was validated in whole-genome long-read sequencing data. Robust and accurate genotyping and phased macrohaplotypes were obtained with MacroHapCaller compared with the known ground truth. MacroHapCaller achieved a higher or consistent genotyping accuracy and faster speed than existing tools HipSTR and DeepVar. MacroHapCaller enables efficient macrohaplotype analysis from high-throughput sequencing data and supports applications using discriminating macrohaplotypes.
摘要:
宏观单倍型结合了多种类型的分阶段DNA变异,增加法医鉴别力。高质量的长测序读数,例如,PacBioHiFi阅读,提供数据来检测多倍体和DNA混合物中的大型单倍型。然而,缺乏检测大型单倍型的生物信息学工具。在这项研究中,我们开发了一个生物信息学软件,MacroHapCaller,其中靶向基因座(即,短TRs[STR],单核苷酸多态性,以及插入和缺失)进行基因分型,并与新颖的算法结合以从长读数中调用宏观单倍型。MacroHapCaller使用物理阶段(即,read-backedphasing)toidentifymacrohapliptype,因此它可以检测给定样品的多等位基因大型单倍型。MacroHapCaller通过我们设计的靶向PacBioHiFi测序管道生成的数据进行了验证,在人类基准样品HG002和HG003中测序了有20个核心法医STR基因座的8kb扩增子区域。MacroHapCaller也在全基因组长读数测序数据中得到验证。与已知的基本事实相比,使用MacroHapCaller获得了可靠,准确的基因分型和阶段性的大型单倍型。与现有工具HipSTR和DeepVar相比,MacroHapCaller实现了更高或一致的基因分型准确性和更快的速度。MacroHapCaller能够从高通量测序数据中进行有效的宏观单倍型分析,并支持使用区分宏观单倍型的应用。
公众号