背景:Kataegis是指癌症中区域基因组超突变的发生,并且是在广泛的恶性肿瘤中观察到的一种现象。kataegis基因座构成具有高突变率的基因组区域(即,紧密散布的体细胞变体的频率高于整体突变背景)。已经表明,kataegis具有生物学意义,并且可能具有临床相关性。因此,一个准确的和强大的工作流程kataegis检测是至关重要的。
结果:这里我们介绍Katdetectr,一个基于R/Bioconductor的开源软件包,用于基因组数据中kataegis基因座的强大而灵活和快速的检测。此外,Katdetectr拥有表征和可视化kataegis的功能,并以标准化格式提供对后续分析有用的结果。简而言之,Katdetectr导入行业标准格式(MAF,VCF,和VRanges),确定基因组变异的变异距离,并使用修剪的精确线性时间搜索算法执行无监督的变化点分析,然后根据用户定义的参数调用kataegis。我们使用合成数据和全基因组测序恶性肿瘤的先验标记泛癌症数据集来评估Katdetectr和5个公开可用的kataegis检测包的性能。我们的性能评估表明,Katdetectr在肿瘤突变负担方面是稳健的,并且显示出最快的平均计算时间。此外,Katdetectr揭示了两个数据集的所有评估工具的最高准确性(0.99,0.99)和归一化马修斯相关系数(0.98,0.92)。
结论:Katdetectr是一个强大的检测工作流程,表征,和kataegis的可视化,可在Bioconductor:https://doi.org/doi:10.18129/B9上获得。bioc.Katdetectr.
Kataegis refers to the occurrence of regional genomic hypermutation in cancer and is a phenomenon that has been observed in a wide range of malignancies. A kataegis locus constitutes a genomic region with a high mutation rate (i.e., a higher frequency of closely interspersed somatic variants than the overall mutational background). It has been shown that kataegis is of biological significance and possibly clinically relevant. Therefore, an accurate and robust workflow for kataegis detection is paramount.
Here we present Katdetectr, an open-source R/
Bioconductor-based package for the robust yet flexible and fast detection of kataegis loci in genomic data. In addition, Katdetectr houses functionalities to characterize and visualize kataegis and provides results in a standardized format useful for subsequent analysis. In brief, Katdetectr imports industry-standard formats (MAF, VCF, and VRanges), determines the intermutation distance of the genomic variants, and performs unsupervised changepoint analysis utilizing the Pruned Exact Linear Time search algorithm followed by kataegis calling according to user-defined parameters.We used synthetic data and an a priori labeled pan-cancer dataset of whole-genome sequenced malignancies for the performance evaluation of Katdetectr and 5 publicly available kataegis detection packages. Our performance evaluation shows that Katdetectr is robust regarding tumor mutational burden and shows the fastest mean computation time. Additionally, Katdetectr reveals the highest accuracy (0.99, 0.99) and normalized Matthews correlation coefficient (0.98, 0.92) of all evaluated tools for both datasets.
Katdetectr is a robust workflow for the detection, characterization, and visualization of kataegis and is available on
Bioconductor: https://doi.org/doi:10.18129/B9.bioc.katdetectr.