关键词: computational pipeline short‐read paired‐end sequencing single nucleotide polymorphisms structural variations variant calling

Mesh : Polymorphism, Single Nucleotide / genetics Workflow Software Computational Biology / methods Genomics / methods Molecular Sequence Annotation / methods Whole Genome Sequencing / methods

来  源:   DOI:10.1002/cpz1.1046   PDF(Pubmed)

Abstract:
Whole-genome sequencing is widely used to investigate population genomic variation in organisms of interest. Assorted tools have been independently developed to call variants from short-read sequencing data aligned to a reference genome, including single nucleotide polymorphisms (SNPs) and structural variations (SVs). We developed SNP-SVant, an integrated, flexible, and computationally efficient bioinformatic workflow that predicts high-confidence SNPs and SVs in organisms without benchmarked variants, which are traditionally used for distinguishing sequencing errors from real variants. In the absence of these benchmarked datasets, we leverage multiple rounds of statistical recalibration to increase the precision of variant prediction. The SNP-SVant workflow is flexible, with user options to tradeoff accuracy for sensitivity. The workflow predicts SNPs and small insertions and deletions using the Genome Analysis ToolKit (GATK) and predicts SVs using the Genome Rearrangement IDentification Software Suite (GRIDSS), and it culminates in variant annotation using custom scripts. A key utility of SNP-SVant is its scalability. Variant calling is a computationally expensive procedure, and thus, SNP-SVant uses a workflow management system with intermediary checkpoint steps to ensure efficient use of resources by minimizing redundant computations and omitting steps where dependent files are available. SNP-SVant also provides metrics to assess the quality of called variants and converts between VCF and aligned FASTA format outputs to ensure compatibility with downstream tools to calculate selection statistics, which are commonplace in population genomics studies. By accounting for both small and large structural variants, users of this workflow can obtain a wide-ranging view of genomic alterations in an organism of interest. Overall, this workflow advances our capabilities in assessing the functional consequences of different types of genomic alterations, ultimately improving our ability to associate genotypes with phenotypes. © 2024 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol: Predicting single nucleotide polymorphisms and structural variations Support Protocol 1: Downloading publicly available sequencing data Support Protocol 2: Visualizing variant loci using Integrated Genome Viewer Support Protocol 3: Converting between VCF and aligned FASTA formats.
摘要:
全基因组测序被广泛用于研究感兴趣的生物体中的群体基因组变异。已独立开发了分类工具,以从与参考基因组对齐的短读取测序数据中调用变体,包括单核苷酸多态性(SNP)和结构变异(SV)。我们开发了SNP-SVant,一个综合的,灵活,和计算有效的生物信息学工作流程,可预测生物体中的高置信度SNP和SV,而无需基准变体,传统上用于区分测序错误与真实变体。在没有这些基准数据集的情况下,我们利用多轮统计重新校准来提高变体预测的精度。SNP-SVant工作流程灵活,与用户选项来权衡精度的灵敏度。该工作流程使用基因组分析工具包(GATK)预测SNP和小的插入和删除,并使用基因组重排识别软件套件(GRIDSS)预测SV,,它使用自定义脚本在变体注释中达到顶峰。SNP-SVant的关键效用是其可扩展性。变体调用是一个计算昂贵的过程,因此,SNP-SVant使用具有中间检查点步骤的工作流管理系统,通过最小化冗余计算和省略依赖文件可用的步骤来确保资源的有效利用。SNP-SVant还提供指标来评估所调用变体的质量,并在VCF和对齐的FASTA格式输出之间进行转换,以确保与下游工具的兼容性来计算选择统计信息。这在人口基因组学研究中很常见。通过考虑小型和大型结构变体,该工作流程的用户可以获得感兴趣的生物体中基因组改变的广泛视图。总的来说,这个工作流程提高了我们评估不同类型基因组改变的功能后果的能力,最终提高我们将基因型与表型相关联的能力。©2024作者WileyPeriodicalsLLC出版的当前协议。基本方案:预测单核苷酸多态性和结构变异支持方案1:下载公开可用的测序数据支持方案2:使用整合的基因组查看器可视化变异基因座支持方案3:在VCF和对齐的FASTA格式之间转换。
公众号