关键词: Deep learning Multi-platform sequencing data Variant calling

Mesh : Reproducibility of Results Genomics Genome High-Throughput Nucleotide Sequencing

来  源:   DOI:10.1186/s12859-023-05434-6   PDF(Pubmed)

Abstract:
BACKGROUND: With the continuous advances in third-generation sequencing technology and the increasing affordability of next-generation sequencing technology, sequencing data from different sequencing technology platforms is becoming more common. While numerous benchmarking studies have been conducted to compare variant-calling performance across different platforms and approaches, little attention has been paid to the potential of leveraging the strengths of different platforms to optimize overall performance, especially integrating Oxford Nanopore and Illumina sequencing data.
RESULTS: We investigated the impact of multi-platform data on the performance of variant calling through carefully designed experiments with a deep learning-based variant caller named Clair3-MP (Multi-Platform). Through our research, we not only demonstrated the capability of ONT-Illumina data for improved variant calling, but also identified the optimal scenarios for utilizing ONT-Illumina data. In addition, we revealed that the improvement in variant calling using ONT-Illumina data comes from an improvement in difficult genomic regions, such as the large low-complexity regions and segmental and collapse duplication regions. Moreover, Clair3-MP can incorporate reference genome stratification information to achieve a small but measurable improvement in variant calling. Clair3-MP is accessible as an open-source project at: https://github.com/HKU-BAL/Clair3-MP .
CONCLUSIONS: These insights have important implications for researchers and practitioners alike, providing valuable guidance for improving the reliability and efficiency of genomic analysis in diverse applications.
摘要:
背景:随着第三代测序技术的不断进步和下一代测序技术的可负担性不断提高,来自不同测序技术平台的测序数据变得越来越普遍。虽然已经进行了许多基准研究来比较不同平台和方法中的变体调用性能,很少关注利用不同平台的优势来优化整体性能的潜力,特别是整合牛津纳米孔和Illumina测序数据。
结果:我们通过精心设计的基于深度学习的变体调用程序Clair3-MP(多平台)的实验,研究了多平台数据对变体调用性能的影响。通过我们的研究,我们不仅展示了ONT-Illumina数据改进变体调用的能力,而且还确定了利用ONT-Illumina数据的最佳方案。此外,我们发现,使用ONT-Illumina数据的变体调用的改进来自于困难基因组区域的改进,例如大型低复杂度区域以及分段和崩溃重复区域。此外,Clair3-MP可以结合参考基因组分层信息,以实现变体识别的小但可测量的改善。Clair3-MP可以作为开源项目访问:https://github.com/HKU-BAL/Clair3-MP。
结论:这些见解对研究人员和从业人员都具有重要意义,为提高基因组分析在各种应用中的可靠性和效率提供有价值的指导。
公众号