目的:我们开发了可解释的机器学习模型来预测腹膜后脂肪肉瘤(RLPS)患者的总生存期(OS)。这种方法旨在增强我们建模结果的可解释性和透明度。
方法:我们从监测中收集RLPS患者的临床病理信息,流行病学,和最终结果(SEER)数据库,并以7:3的比例将它们分配到训练集和验证集。同时,我们从海军医科大学第一附属医院(上海,中国)。我们进行了LASSO回归和多变量Cox比例风险分析,以识别相关的危险因素,然后将其组合以开发六个机器学习(ML)模型:Cox比例风险模型(Coxph),随机生存森林(RSF),游侠,使用分量线性模型(GBM)的梯度增强,决策树,提升树木。使用一致性指数(C指数)评估了这些ML模型的预测性能,积分累积/动态曲线下面积(AUC),和综合Brier得分,以及Cox-Snell残差图。我们还使用了时间依赖的变量重要性,部分依赖生存图的分析,和聚集生存Shapley加法扩张(SurvSHAP)图的生成,以提供最优模型的全局解释。此外,SurvSHAP(t)和生存局部可解释模型不可知解释(SurvLIME)图用于提供最佳模型的局部解释。
结果:最终的ML模型由六个因素组成:患者的年龄,性别,婚姻状况,手术史,以及肿瘤的组织病理学分类,组织学分级,SEER阶段。我们的预后模型表现出显著的判别能力,特别是在游侠模型表现最佳的情况下。在训练集中,验证集,和外部验证集,1、3和5年OS的AUC均高于0.83,Brier积分始终低于0.15.游侠模型的可解释性分析还表明,组织学分级,组织病理学分类,年龄是预测操作系统的最重要因素。
结论:rangerML预后模型表现出最佳性能,可用于预测RLPS患者的OS,为临床医生提前做出明智的决定提供有价值和关键的参考。
OBJECTIVE: We have developed explainable machine learning models to predict the overall survival (OS) of retroperitoneal
liposarcoma (RLPS) patients. This approach aims to enhance the explainability and transparency of our modeling results.
METHODS: We collected clinicopathological information of RLPS patients from The Surveillance, Epidemiology, and End Results (SEER) database and allocated them into training and validation sets with a 7:3 ratio. Simultaneously, we obtained an external validation cohort from The First Affiliated Hospital of Naval Medical University (Shanghai, China). We performed LASSO regression and multivariate Cox proportional hazards analysis to identify relevant risk factors, which were then combined to develop six machine learning (ML) models: Cox proportional hazards model (Coxph), random survival forest (RSF), ranger, gradient boosting with component-wise linear models (GBM), decision trees, and boosting trees. The predictive performance of these ML models was evaluated using the concordance index (C-index), the integrated cumulative/dynamic area under the curve (AUC), and the integrated Brier score, as well as the Cox-Snell residual plot. We also used time-dependent variable importance, analysis of partial dependence survival plots, and the generation of aggregated survival SHapley Additive exPlanations (SurvSHAP) plots to provide a global explanation of the optimal model. Additionally, SurvSHAP (t) and survival local interpretable model-agnostic explanations (SurvLIME) plots were used to provide a local explanation of the optimal model.
RESULTS: The final ML models are consisted of six factors: patient\'s age, gender, marital status, surgical history, as well as tumor\'s histopathological classification, histological grade, and SEER stage. Our prognostic model exhibits significant discriminative ability, particularly with the ranger model performing optimally. In the training set, validation set, and external validation set, the AUC for 1, 3, and 5 year OS are all above 0.83, and the integrated Brier scores are consistently below 0.15. The explainability analysis of the ranger model also indicates that histological grade, histopathological classification, and age are the most influential factors in predicting OS.
CONCLUSIONS: The ranger ML prognostic model exhibits optimal performance and can be utilized to predict the OS of RLPS patients, offering valuable and crucial references for clinical physicians to make informed decisions in advance.