目的:头颈癌(HNC)患者的常规正常组织并发症概率(NTCP)模型通常基于单值变量,辐射诱导的口干症是基线口干症和平均唾液腺剂量。本研究旨在通过利用辐射剂量分布的3D信息来改善对晚期口干症的预测,CT成像,危险器官分割,和具有深度学习(DL)的临床变量。
方法:使用来自两个研究所的1208名HNC患者的国际队列来训练并两次验证DL模型(DCNN,EfficientNet-v2和ResNet)具有3D剂量分布,CT扫描,危险器官分割,口干症基线评分,性别,和年龄作为输入。NTCP终点为放疗后12个月的中度至重度口干症。将DL模型的预测性能与参考模型进行了比较:最近发布的口干症NTCP模型,该模型使用基线口干症评分和平均唾液腺剂量作为输入。创建注意力图以可视化DL预测的焦点区域。进行迁移学习以改善DL模型在外部验证集上的性能。
结果:在独立测试中,所有基于DL的NTCP模型均显示出比参考NTCP模型(AUCtest=0.74)更好的性能(AUCtest=0.78-0.79)。注意图显示DL模型集中于主要唾液腺,特别是腮腺的干细胞丰富的区域。DL模型获得的外部验证性能(AUCexternal=0.63)低于参考模型(AUCexternal=0.66)。在一个小的外部子集上进行迁移学习后,DL模型(AUCtl,外部=0.66)性能优于参考模型(AUCtl,外部=0.64)。
结论:在同一研究所的数据中进行验证时,基于DL的NTCP模型的性能优于参考模型。通过迁移学习,提高了外部数据集中的性能,证明需要多中心训练数据来实现可推广的基于DL的NTCP模型。
OBJECTIVE: Conventional normal tissue complication probability (
NTCP) models for head and neck cancer (HNC) patients are typically based on single-value variables, which for radiation-induced xerostomia are baseline xerostomia and mean salivary gland doses. This study aims to improve the prediction of late xerostomia by utilizing 3D information from radiation dose distributions, CT imaging, organ-at-risk segmentations, and clinical variables with deep learning (DL).
METHODS: An international cohort of 1208 HNC patients from two institutes was used to train and twice validate DL models (DCNN, EfficientNet-v2, and ResNet) with 3D dose distribution, CT scan, organ-at-risk segmentations, baseline xerostomia score, sex, and age as input. The
NTCP endpoint was moderate-to-severe xerostomia 12 months post-radiotherapy. The DL models\' prediction performance was compared to a reference model: a recently published xerostomia NTCP model that used baseline xerostomia score and mean salivary gland doses as input. Attention maps were created to visualize the focus regions of the DL predictions. Transfer learning was conducted to improve the DL model performance on the external validation set.
RESULTS: All DL-based
NTCP models showed better performance (AUCtest=0.78 - 0.79) than the reference
NTCP model (AUCtest=0.74) in the independent test. Attention maps showed that the DL model focused on the major salivary glands, particularly the stem cell-rich region of the parotid glands. DL models obtained lower external validation performance (AUCexternal=0.63) than the reference model (AUCexternal=0.66). After transfer learning on a small external subset, the DL model (AUCtl, external=0.66) performed better than the reference model (AUCtl, external=0.64).
CONCLUSIONS: DL-based
NTCP models performed better than the reference model when validated in data from the same institute. Improved performance in the external dataset was achieved with transfer learning, demonstrating the need for multicenter training data to realize generalizable DL-based NTCP models.