关键词: De novo mutations HiFi reads Long-read sequencing

Mesh : Humans High-Throughput Nucleotide Sequencing Alleles INDEL Mutation Microsatellite Repeats

来  源:   DOI:10.1186/s13073-023-01183-6   PDF(Pubmed)

Abstract:
Long-read sequencing (LRS) techniques have been very successful in identifying structural variants (SVs). However, the high error rate of LRS made the detection of small variants (substitutions and short indels < 20 bp) more challenging. The introduction of PacBio HiFi sequencing makes LRS also suited for detecting small variation. Here we evaluate the ability of HiFi reads to detect de novo mutations (DNMs) of all types, which are technically challenging variant types and a major cause of sporadic, severe, early-onset disease.
We sequenced the genomes of eight parent-child trios using high coverage PacBio HiFi LRS (~ 30-fold coverage) and Illumina short-read sequencing (SRS) (~ 50-fold coverage). De novo substitutions, small indels, short tandem repeats (STRs) and SVs were called in both datasets and compared to each other to assess the accuracy of HiFi LRS. In addition, we determined the parent-of-origin of the small DNMs using phasing.
We identified a total of 672 and 859 de novo substitutions/indels, 28 and 126 de novo STRs, and 24 and 1 de novo SVs in LRS and SRS respectively. For the small variants, there was a 92 and 85% concordance between the platforms. For the STRs and SVs, the concordance was 3.6 and 0.8%, and 4 and 100% respectively. We successfully validated 27/54 LRS-unique small variants, of which 11 (41%) were confirmed as true de novo events. For the SRS-unique small variants, we validated 42/133 DNMs and 8 (19%) were confirmed as true de novo event. Validation of 18 LRS-unique de novo STR calls confirmed none of the repeat expansions as true DNM. Confirmation of the 23 LRS-unique SVs was possible for 19 candidate SVs of which 10 (52.6%) were true de novo events. Furthermore, we were able to assign 96% of DNMs to their parental allele with LRS data, as opposed to just 20% with SRS data.
HiFi LRS can now produce the most comprehensive variant dataset obtainable by a single technology in a single laboratory, allowing accurate calling of substitutions, indels, STRs and SVs. The accuracy even allows sensitive calling of DNMs on all variant levels, and also allows for phasing, which helps to distinguish true positive from false positive DNMs.
摘要:
背景:长读测序(LRS)技术在鉴定结构变体(SV)方面非常成功。然而,LRS的高错误率使得小变异(置换和短插入缺失<20bp)的检测更具挑战性.PacBioHiFi测序的引入使得LRS也适合于检测小的变异。在这里,我们评估HiFi读取检测所有类型的从头突变(DNM)的能力,这是技术上具有挑战性的变体类型,也是零星的主要原因,严重,早发性疾病。
方法:我们使用高覆盖率PacBioHiFiLRS(约30倍覆盖率)和Illumina短读测序(SRS)(约50倍覆盖率)对8个亲子三重奏组的基因组进行了测序。从头替换,小型indel,在两个数据集中调用短串联重复序列(STRs)和SV,并相互比较以评估HiFiLRS的准确性.此外,我们使用定相确定了小DNM的起源。
结果:我们确定了总共672和859个从头替换/indel,28和126从头可疑交易报告,LRS和SRS中分别有24个和1个从头SV。对于小的变体,平台之间有92%和85%的一致性。对于STR和SV,一致性为3.6%和0.8%,分别为4%和100%。我们成功验证了27/54LRS独特的小变体,其中11例(41%)被确认为真正的从头事件。对于SRS特有的小变体,我们验证了42/133个DNM和8个(19%)被确认为真正的从头事件。18个LRS独特的从头STR调用的验证证实没有重复扩增是真正的DNM。对于19个候选SV,可以确认23个LRS独特的SV,其中10个(52.6%)是真正的从头事件。此外,我们能够用LRS数据将96%的DNM分配给他们的亲本等位基因,而SRS数据只有20%。
结论:HiFiLRS现在可以在单个实验室中通过单一技术获得最全面的变体数据集,允许准确调用替换,indels,STR和SV。准确性甚至允许在所有变体级别上对DNM进行敏感调用,还允许分阶段,这有助于区分真阳性和假阳性DNM。
公众号