关键词: Artificial intelligence Genomics NGS Sequencing Variant calling

Mesh : Software Artificial Intelligence Sequence Analysis, DNA / methods High-Throughput Nucleotide Sequencing / methods Genomics / methods

来  源:   DOI:10.1186/s12859-023-05596-3   PDF(Pubmed)

Abstract:
BACKGROUND: The accurate detection of variants is essential for genomics-based studies. Currently, there are various tools designed to detect genomic variants, however, it has always been a challenge to decide which tool to use, especially when various major genome projects have chosen to use different tools. Thus far, most of the existing tools were mainly developed to work on short-read data (i.e., Illumina); however, other sequencing technologies (e.g. PacBio, and Oxford Nanopore) have recently shown that they can also be used for variant calling. In addition, with the emergence of artificial intelligence (AI)-based variant calling tools, there is a pressing need to compare these tools in terms of efficiency, accuracy, computational power, and ease of use.
RESULTS: In this study, we evaluated five of the most widely used conventional and AI-based variant calling tools (BCFTools, GATK4, Platypus, DNAscope, and DeepVariant) in terms of accuracy and computational cost using both short-read and long-read data derived from three different sequencing technologies (Illumina, PacBio HiFi, and ONT) for the same set of samples from the Genome In A Bottle project. The analysis showed that AI-based variant calling tools supersede conventional ones for calling SNVs and INDELs using both long and short reads in most aspects. In addition, we demonstrate the advantages and drawbacks of each tool while ranking them in each aspect of these comparisons.
CONCLUSIONS: This study provides best practices for variant calling using AI-based and conventional variant callers with different types of sequencing data.
摘要:
背景:准确检测变异体对基于基因组学的研究至关重要。目前,有各种工具设计来检测基因组变异,然而,决定使用哪种工具一直是一个挑战,特别是当各种主要的基因组计划选择使用不同的工具时。到目前为止,大多数现有工具主要是为处理短读数据而开发的(即,Illumina);然而,其他测序技术(例如PacBio,和牛津纳米孔)最近表明,它们也可以用于变体调用。此外,随着基于人工智能(AI)的变体调用工具的出现,迫切需要在效率方面比较这些工具,准确度,计算能力,和易用性。
结果:在这项研究中,我们评估了五种最广泛使用的传统和基于AI的变体调用工具(BCFTools,GATK4鸭嘴兽,DNAscope,和DeepVariant)在准确性和计算成本方面,使用来自三种不同测序技术(Illumina,PacBioHiFi,和ONT)用于来自“瓶子中的基因组”项目的同一组样品。分析表明,基于AI的变体调用工具取代了传统的工具,在大多数方面使用长读段和短读段调用SNV和INDEL。此外,我们展示了每个工具的优缺点,同时在这些比较的每个方面对它们进行排名。
结论:本研究提供了使用基于AI的和常规的具有不同类型测序数据的变体调用的最佳实践。
公众号