关键词: Blood biochemical indices Colorectal cancer diagnosis Diagnostic model optimization Machine learning algorithms Serum tumor markers

来  源:   DOI:10.1007/s12094-024-03564-8

Abstract:
BACKGROUND: Colorectal cancer has a high incidence and mortality rate due to a low rate of early diagnosis. Therefore, efficient diagnostic methods are urgently needed.
OBJECTIVE: This study assesses the diagnostic effectiveness of Carbohydrate Antigen 19-9 (CA19-9), Carcinoembryonic Antigen (CEA), Alpha-fetoprotein (AFP), and Cancer Antigen 125 (CA125) serum tumor markers for colorectal cancer (CRC) and investigates a machine learning-based diagnostic model incorporating these markers with blood biochemical indices for improved CRC detection.
METHODS: Between January 2019 and December 2021, data from 800 CRC patients and 697 controls were collected; 52 patients and 63 controls attending the same hospital in 2022 were collected as an external validation set. Markers\' effectiveness was analyzed individually and collectively, using metrics like ROC curve AUC and F1 score. Variables chosen through backward regression, including demographics and blood tests, were tested on six machine learning models using these metrics.
RESULTS: In the case group, the levels of CEA, CA199, and CA125 were found to be higher than those in the control group. Combining these with a fourth serum marker significantly improved predictive efficacy over using any single marker alone, achieving an Area Under the Curve (AUC) value of 0.801. Using stepwise regression (backward), 17 variables were meticulously selected for evaluation in six machine learning models. Among these models, the Gradient Boosting Machine (GBM) emerged as the top performer in the training set, test set, and external validation set, boasting an AUC value of over 0.9, indicating its superior predictive power.
CONCLUSIONS: Machine learning models integrating tumor markers and blood indices offer superior CRC diagnostic accuracy, potentially enhancing clinical practice.
摘要:
背景:由于早期诊断率低,结直肠癌的发病率和死亡率很高。因此,迫切需要有效的诊断方法。
目的:本研究评估了糖类抗原19-9(CA19-9)的诊断有效性,癌胚抗原(CEA),甲胎蛋白(AFP),和癌症抗原125(CA125)血清肿瘤标志物用于结直肠癌(CRC),并研究了一种基于机器学习的诊断模型,将这些标志物与血液生化指标相结合,以改善CRC检测。
方法:在2019年1月至2021年12月之间,收集了800名CRC患者和697名对照的数据;2022年在同一家医院就诊的52名患者和63名对照作为外部验证集。标记的有效性进行了单独和集体分析,使用ROC曲线AUC和F1评分等指标。通过向后回归选择的变量,包括人口统计学和血液测试,使用这些指标在六个机器学习模型上进行了测试。
结果:在病例组中,CEA的水平,发现CA199和CA125高于对照组。与单独使用任何单一标志物相比,将这些与第四种血清标志物相结合显着提高了预测功效。实现0.801的曲线下面积(AUC)值。使用逐步回归(向后),在六个机器学习模型中精心选择了17个变量进行评估。在这些模型中,梯度提升机(GBM)成为训练集中表现最好的人,测试集,和外部验证集,AUC值超过0.9,表明其优越的预测能力。
结论:整合肿瘤标志物和血液指标的机器学习模型提供了更高的CRC诊断准确性,有可能加强临床实践。
公众号