Variant calling

变体调用
  • 文章类型: Journal Article
    正如最近COVID-19大流行所证明的那样,大规模的病原体基因组数据对于表征人类传染病的传播模式至关重要。然而,当前将原始序列数据处理为分析就绪变体的方法仍然缓慢,阻碍疾病控制的快速监测工作和流行病学调查。这里,我们引入一个加速的,可扩展,可重复,和病原体基因组变异鉴定的成本效益框架,并提出了其性能和准确性的评估跨基准数据集的恶性疟原虫疟疾基因组。我们展示了GPU框架相对于标准管道的卓越性能,平均执行时间和计算成本降低了27倍和4.6倍,分别,同时在增强的可重复性下提供99.9%的准确性。
    As recently demonstrated by the COVID-19 pandemic, large-scale pathogen genomic data are crucial to characterize transmission patterns of human infectious diseases. Yet, current methods to process raw sequence data into analysis-ready variants remain slow to scale, hampering rapid surveillance efforts and epidemiological investigations for disease control. Here, we introduce an accelerated, scalable, reproducible, and cost-effective framework for pathogen genomic variant identification and present an evaluation of its performance and accuracy across benchmark datasets of Plasmodium falciparum malaria genomes. We demonstrate superior performance of the GPU framework relative to standard pipelines with mean execution time and computational costs reduced by 27× and 4.6×, respectively, while delivering 99.9% accuracy at enhanced reproducibility.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    Many traits of cancer progression (e.g., development of metastases or resistance to therapy) are facilitated by tumour evolution: Darwinian selection of subclones with distinct genotypes or phenotypes that enable such progression. Characterising these subclones provide an opportunity to develop drugs to better target their specific properties but requires the accurate identification of somatic mutations shared across multiple spatiotemporal tumours from the same patient. Current best practices for calling somatic mutations are optimised for single samples, and risk being too conservative to identify shared mutations with low prevalence in some samples. We reasoned that datasets from multiple matched tumours can be used for mutual validation and thus propose an adapted two-stage approach: (1) low-stringency mutation calling to identify mutations shared across samples irrespective of the weight of evidence in a single sample; (2) high-stringency mutation calling to further characterise mutations present in a single sample. We applied our approach to three-independent cohorts of paired primary and recurrent glioblastoma tumours, two of which have previously been analysed using existing approaches, and found that it significantly increased the amount of biologically relevant shared somatic mutations identified. We also found that duplicate removal was detrimental when identifying shared somatic mutations. Our approach is also applicable when multiple datasets e.g. DNA and RNA are available for the same tumour.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号