关键词: Candida benchmarking fungal genomics variant calling pipelines whole-genome sequencing

Mesh : Candida auris / genetics Genome, Fungal Phylogeny Polymorphism, Single Nucleotide Humans Candidiasis / drug therapy epidemiology Disease Outbreaks Drug Resistance, Fungal

来  源:   DOI:10.1099/mgen.0.000979   PDF(Pubmed)

Abstract:
Genomic analyses are widely applied to epidemiological, population genetic and experimental studies of pathogenic fungi. A wide range of methods are employed to carry out these analyses, typically without including controls that gauge the accuracy of variant prediction. The importance of tracking outbreaks at a global scale has raised the urgency of establishing high-accuracy pipelines that generate consistent results between research groups. To evaluate currently employed methods for whole-genome variant detection and elaborate best practices for fungal pathogens, we compared how 14 independent variant calling pipelines performed across 35 Candida auris isolates from 4 distinct clades and evaluated the performance of variant calling, single-nucleotide polymorphism (SNP) counts and phylogenetic inference results. Although these pipelines used different variant callers and filtering criteria, we found high overall agreement of SNPs from each pipeline. This concordance correlated with site quality, as SNPs discovered by a few pipelines tended to show lower mapping quality scores and depth of coverage than those recovered by all pipelines. We observed that the major differences between pipelines were due to variation in read trimming strategies, SNP calling methods and parameters, and downstream filtration criteria. We calculated specificity and sensitivity for each pipeline by aligning three isolates with chromosomal level assemblies and found that the GATK-based pipelines were well balanced between these metrics. Selection of trimming methods had a greater impact on SAMtools-based pipelines than those using GATK. Phylogenetic trees inferred by each pipeline showed high consistency at the clade level, but there was more variability between isolates from a single outbreak, with pipelines that used more stringent cutoffs having lower resolution. This project generated two truth datasets useful for routine benchmarking of C. auris variant calling, a consensus VCF of genotypes discovered by 10 or more pipelines across these 35 diverse isolates and variants for 2 samples identified from whole-genome alignments. This study provides a foundation for evaluating SNP calling pipelines and developing best practices for future fungal genomic studies.
摘要:
基因组分析广泛应用于流行病学,病原真菌的群体遗传和实验研究。广泛的方法被用来进行这些分析,通常不包括衡量变体预测准确性的控件。在全球范围内跟踪疫情的重要性提高了建立高精度管道以在研究小组之间产生一致结果的紧迫性。为了评估目前采用的全基因组变异检测方法,并详细阐述真菌病原体的最佳实践,我们比较了来自4个不同进化枝的35个念珠菌分离株的14个独立的变异识别管道如何进行,并评估了变异识别的性能,单核苷酸多态性(SNP)计数和系统发育推断结果。尽管这些管道使用了不同的变体调用方和过滤标准,我们发现每个管道的SNP总体一致性很高。这种一致性与网站质量相关,因为少数管道发现的SNP往往显示出比所有管道恢复的更低的映射质量分数和覆盖深度。我们观察到,管道之间的主要差异是由于读取修整策略的变化,SNP调用方法和参数,和下游过滤标准。我们通过将三个分离株与染色体水平组装进行比对,计算了每个管道的特异性和敏感性,发现基于GATK的管道在这些指标之间取得了良好的平衡。与使用GATK的方法相比,修剪方法的选择对基于SAMtools的管道的影响更大。每个管道推断的系统发育树在进化枝水平上表现出很高的一致性,但是一次爆发的分离株之间的差异更大,使用分辨率较低的更严格截止值的管道。该项目生成了两个真理数据集,可用于C.auris变体调用的常规基准测试,在这35个不同的分离株和变体中发现的10个或更多个管道中发现的基因型的共有VCF,用于从全基因组比对中鉴定的2个样品。这项研究为评估SNP调用管道和开发未来真菌基因组研究的最佳实践奠定了基础。
公众号