关键词: counting-based biomarker immunotherapy meta-learning approach sequencing data analysis tumor mutation burden variant calling

Mesh : Humans Carcinoma, Non-Small-Cell Lung High-Throughput Nucleotide Sequencing / methods Lung Neoplasms Genomics / methods Genome Algorithms

来  源:   DOI:10.1093/bib/bbae159   PDF(Pubmed)

Abstract:
In cancer genomics, variant calling has advanced, but traditional mean accuracy evaluations are inadequate for biomarkers like tumor mutation burden, which vary significantly across samples, affecting immunotherapy patient selection and threshold settings. In this study, we introduce TMBstable, an innovative method that dynamically selects optimal variant calling strategies for specific genomic regions using a meta-learning framework, distinguishing it from traditional callers with uniform sample-wide strategies. The process begins with segmenting the sample into windows and extracting meta-features for clustering, followed by using a pre-trained meta-model to select suitable algorithms for each cluster, thereby addressing strategy-sample mismatches, reducing performance fluctuations and ensuring consistent performance across various samples. We evaluated TMBstable using both simulated and real non-small cell lung cancer and nasopharyngeal carcinoma samples, comparing it with advanced callers. The assessment, focusing on stability measures, such as the variance and coefficient of variation in false positive rate, false negative rate, precision and recall, involved 300 simulated and 106 real tumor samples. Benchmark results showed TMBstable\'s superior stability with the lowest variance and coefficient of variation across performance metrics, highlighting its effectiveness in analyzing the counting-based biomarker. The TMBstable algorithm can be accessed at https://github.com/hello-json/TMBstable for academic usage only.
摘要:
在癌症基因组学中,变体调用已经高级,但是传统的平均准确性评估不足以用于生物标志物,例如肿瘤突变负担,不同样本之间差异很大,影响免疫治疗患者的选择和阈值设置。在这项研究中,我们介绍TMBstable,一种创新的方法,使用元学习框架为特定的基因组区域动态选择最佳的变体调用策略,用统一的全样本策略将其与传统的呼叫者区分开来。该过程从将样本分割为窗口并提取用于聚类的元特征开始,然后使用预训练的元模型为每个集群选择合适的算法,从而解决策略样本不匹配的问题,减少性能波动并确保各种样品的性能一致。我们使用模拟和真实的非小细胞肺癌和鼻咽癌样本评估了TMBstable,将其与高级呼叫者进行比较。评估,以稳定措施为重点,如假阳性率的方差和变异系数,假阴性率,精确度和召回率,涉及300个模拟肿瘤样本和106个真实肿瘤样本。基准结果显示TMBstable具有优异的稳定性,各性能指标的方差和变异系数最低,强调其在分析基于计数的生物标志物方面的有效性。TMBstable算法可以在https://github.com/hello-json/TMBstable访问,仅供学术使用。
公众号