关键词: Early recurrence Machine learning Multi-omics SCCHN XGBoost

Mesh : Carcinoma, Squamous Cell / genetics Head and Neck Neoplasms / genetics Humans Proteomics RNA, Messenger / genetics Squamous Cell Carcinoma of Head and Neck / genetics Transcriptome / genetics rab GTP-Binding Proteins / genetics

来  源:   DOI:10.1016/j.compbiomed.2022.105991

Abstract:
Patients with squamous cell carcinoma of the head and neck (SCCHN) have a high-risk of recurrence. We aimed to develop machine learning methods to identify transcriptomic and proteomic features that provide accurate classification models for predicting risk of early recurrence in SCCHN patients.
Clinical, genomic, transcriptomic and proteomic features distinguishing recurrence risk were examined in SCCHN patients from The Cancer Genome Atlas (TCGA). Recurrence within one year after treatment was classified as high-risk and no recurrence as low-risk.
No significant differences in individual clinicopathological characteristics, mutation profiles or mRNA expression patterns were seen between the groups using conventional statistical analysis. Using the machine learning algorithm, extreme gradient boosting (XGBoost), ten proteins (RAD50, 4E-BP1, MYH11, MAP2K1, BECN1, NF2, RAB25, ERRFI1, KDR, SERPINE1) and five mRNAs (PLAUR, DKK1, AXIN2, ANG and VEGFA) made the greatest contribution to classification. These features were used to build improved models in XGBoost, achieving the best discrimination performance when combining transcriptomic and proteomic data, providing an accuracy of 0.939 and an Area Under the ROC Curve (AUC) of 0.951.
This study highlights machine learning to identify transcriptomic and proteomic factors that play important roles in predicting risk of recurrence in patients with SCCHN and to develop such models by iterative cycles to enhance their accuracy, thereby aiding the introduction of personalized treatment regimens.
摘要:
头颈部鳞状细胞癌(SCCHN)患者复发的风险很高。我们旨在开发机器学习方法来识别转录组和蛋白质组特征,为预测SCCHN患者早期复发的风险提供准确的分类模型。
临床,基因组,在癌症基因组图谱(TCGA)的SCCHN患者中检查了区分复发风险的转录组和蛋白质组特征。治疗后一年内复发被归类为高风险,无复发被归类为低风险。
个体临床病理特征无显著差异,使用常规统计学分析观察各组间的突变谱或mRNA表达模式.使用机器学习算法,极端梯度提升(XGBoost),10种蛋白质(RAD50,4E-BP1,MYH11,MAP2K1,BECN1,NF2,RAB25,ERRFI1,KDR,SERPINE1)和五个mRNA(PLAUR,DKK1,AXIN2,ANG和VEGFA)对分类的贡献最大。这些功能用于在XGBoost中构建改进的模型,当结合转录组和蛋白质组数据时,获得最佳的辨别性能,提供0.939的准确度和0.951的ROC曲线下面积(AUC)。
这项研究强调了机器学习,以识别在预测SCCHN患者复发风险中起重要作用的转录组和蛋白质组因素,并通过迭代循环开发此类模型以提高其准确性。从而帮助引入个性化的治疗方案。
公众号