■我们旨在通过整合相关的临床病理和遗传因素,开发基于机器学习的严重放射性肺炎(RP)预测模型,考虑到临床的关联,剂量测定参数,TGF-β1通路中基因的单核苷酸多态性(SNP)与RP。
■我们前瞻性招募了59名接受放疗的原发性肺癌患者,并分析了治疗前的血液样本,临床病理/剂量学变量,和TGFβ通路基因中的11个功能性SNP。使用合成少数过采样技术(SMOTE)和嵌套交叉验证,我们开发了一种基于机器学习的重度RP(≥2级)预测模型。使用四种方法进行特征选择(基于过滤,基于包装器的,嵌入式,和逻辑回归),并使用三种机器学习模型评估了性能。
■20.3%的患者发生严重RP,中位随访时间为39.7个月。在我们的最终模型中,年龄(>66岁),吸烟史,PTV音量(>300cc),BMP2rs1979855中的AG/GG基因型被确定为最重要的预测因子。此外,与单独使用临床病理变量相比,将基因组变量与临床病理变量一起进行预测显着提高了AUC(0.822vs.0.741,p=0.029)。使用基于包装器的方法和逻辑模型选择相同的特征集,展示所有机器学习模型的最佳性能(AUC:XGBoost0.815,RF0.805,SVM0.712,分别)。
■我们成功开发了基于机器学习的RP预测模型,展示年龄,吸烟史,PTV音量,和BMP2rs1979855基因型是显著的预测因子。值得注意的是,与单独的临床病理因素相比,纳入SNP数据显着增强了预测性能。
UNASSIGNED: We aimed to develop a machine learning-based prediction model for severe radiation pneumonitis (RP) by integrating relevant clinicopathological and genetic factors, considering the associations of clinical, dosimetric parameters, and single nucleotide polymorphisms (SNPs) of genes in the TGF-β1 pathway with RP.
UNASSIGNED: We prospectively enrolled 59 primary lung cancer patients undergoing radiotherapy and analyzed pretreatment blood samples, clinicopathological/dosimetric variables, and 11 functional SNPs in TGFβ pathway genes. Using the Synthetic Minority Over-sampling Technique (SMOTE) and nested cross-validation, we developed a machine learning-based prediction model for severe RP (grade ≥ 2). Feature selection was conducted using four methods (filtered-based, wrapper-based, embedded, and logistic regression), and performance was evaluated using three machine learning models.
UNASSIGNED: Severe RP occurred in 20.3 % of patients with a median follow-up of 39.7 months. In our final model, age (>66 years), smoking history, PTV volume (>300 cc), and AG/GG genotype in BMP2 rs1979855 were identified as the most significant predictors. Additionally, incorporating genomic variables for prediction alongside clinicopathological variables significantly improved the AUC compared to using clinicopathological variables alone (0.822 vs. 0.741, p = 0.029). The same feature set was selected using both the wrapper-based method and logistic model, demonstrating the best performance across all machine learning models (AUC: XGBoost 0.815, RF 0.805, SVM 0.712, respectively).
UNASSIGNED: We successfully developed a machine learning-based prediction model for RP, demonstrating age, smoking history, PTV volume, and BMP2 rs1979855 genotype as significant predictors. Notably, incorporating SNP data significantly enhanced predictive performance compared to clinicopathological factors alone.