关键词: SNPs malaria pathogen genomics variant calling whole-genome sequencing

Mesh : COVID-19 / epidemiology genetics Communicable Diseases Genomics / methods Humans Malaria Pandemics Reproducibility of Results

来  源:   DOI:10.1093/bib/bbac314

Abstract:
As recently demonstrated by the COVID-19 pandemic, large-scale pathogen genomic data are crucial to characterize transmission patterns of human infectious diseases. Yet, current methods to process raw sequence data into analysis-ready variants remain slow to scale, hampering rapid surveillance efforts and epidemiological investigations for disease control. Here, we introduce an accelerated, scalable, reproducible, and cost-effective framework for pathogen genomic variant identification and present an evaluation of its performance and accuracy across benchmark datasets of Plasmodium falciparum malaria genomes. We demonstrate superior performance of the GPU framework relative to standard pipelines with mean execution time and computational costs reduced by 27× and 4.6×, respectively, while delivering 99.9% accuracy at enhanced reproducibility.
摘要:
正如最近COVID-19大流行所证明的那样,大规模的病原体基因组数据对于表征人类传染病的传播模式至关重要。然而,当前将原始序列数据处理为分析就绪变体的方法仍然缓慢,阻碍疾病控制的快速监测工作和流行病学调查。这里,我们引入一个加速的,可扩展,可重复,和病原体基因组变异鉴定的成本效益框架,并提出了其性能和准确性的评估跨基准数据集的恶性疟原虫疟疾基因组。我们展示了GPU框架相对于标准管道的卓越性能,平均执行时间和计算成本降低了27倍和4.6倍,分别,同时在增强的可重复性下提供99.9%的准确性。
公众号