目的:准确的肝细胞癌(HCC)风险预测有助于适当的监测策略并降低癌症死亡率。我们旨在使用医院管理局数据协作实验室(HADCL)的数据得出和验证新的机器学习模型,以预测慢性病毒性肝炎(CVH)患者的全域队列中的HCC。
方法:这是一个全港性的,回顾性,观察,2000-2018年香港CVH患者的队列研究,根据病毒标志物从HADCL中确定,诊断代码,和抗病毒治疗慢性乙型肝炎和/或C。队列随机分为训练和验证队列在7:3的比例。五种流行的机器学习方法即,逻辑回归,岭回归,AdaBoost,决策树,和随机森林,进行了比较,找到了最佳的预测模型。
结果:共纳入124,006例具有完整数据的CVH患者来建立模型。在训练队列中(n=86,804;6,821HCC),岭回归(接受者工作特征曲线下面积[AUROC]0.842),决策树(0.952),随机森林(0.992)表现最好。在验证队列中(n=37,202;2,875HCC),岭回归(AUROC0.844)和随机森林(0.837)保持了它们的准确性,明显高于HCC风险评分:CU-HCC(0.672),GAG-HCC(0.745),REACH-B(0.671),PAGE-B(0.748),和REAL-B(0.712)得分。在验证队列中,HCC岭评分(HCC-RS)的低截止值(0.07)达到90.0%的敏感性和98.6%的阴性预测值(NPV)。HCC-RS的高临界值(0.15)实现了高特异性(90.0%)和NPV(95.6%);31.1%的患者仍不确定。
结论:来自岭回归机器学习模型的HCC-RS可以准确预测CVH患者的HCC。这些机器学习模型可以被开发为电子健康系统中的内置功能键或计算器,以降低癌症死亡率。
背景:新型机器学习模型在慢性病毒性肝炎患者中产生肝细胞癌(HCC)的准确风险评分。HCC岭评分始终比现有的HCC风险评分更准确。这些模型可以被整合到电子医疗系统中,以开发适当的癌症监测策略并减少癌症死亡。
OBJECTIVE: Accurate hepatocellular carcinoma (HCC) risk prediction facilitates appropriate surveillance strategy and reduces cancer mortality. We aimed to derive and validate novel machine learning models to predict HCC in a territory-wide cohort of patients with chronic viral hepatitis (CVH) using data from the Hospital Authority Data Collaboration Lab (HADCL).
METHODS: This was a territory-wide, retrospective, observational, cohort study of patients with CVH in Hong Kong in 2000-2018 identified from HADCL based on viral markers, diagnosis codes, and antiviral treatment for chronic hepatitis B and/or C. The cohort was randomly split into training and validation cohorts in a 7:3 ratio. Five popular machine learning methods, namely, logistic regression, ridge regression, AdaBoost, decision tree, and random forest, were performed and compared to find the best prediction model.
RESULTS: A total of 124,006 patients with CVH with complete data were included to build the models. In the training cohort (n = 86,804; 6,821 HCC), ridge regression (area under the receiver operating characteristic curve [AUROC] 0.842), decision tree (0.952), and random forest (0.992) performed the best. In the validation cohort (n = 37,202; 2,875 HCC), ridge regression (AUROC 0.844) and random forest (0.837) maintained their accuracy, which was significantly higher than those of HCC risk scores: CU-HCC (0.672), GAG-HCC (0.745), REACH-B (0.671), PAGE-B (0.748), and REAL-B (0.712) scores. The low cut-off (0.07) of HCC ridge score (HCC-RS) achieved 90.0% sensitivity and 98.6% negative predictive value (NPV) in the validation cohort. The high cut-off (0.15) of HCC-RS achieved high specificity (90.0%) and NPV (95.6%); 31.1% of patients remained indeterminate.
CONCLUSIONS: HCC-RS from the ridge regression machine learning model accurately predicted HCC in patients with CVH. These machine learning models may be developed as built-in functional keys or calculators in electronic health systems to reduce cancer mortality.
BACKGROUND: Novel machine learning models generated accurate risk scores for hepatocellular carcinoma (HCC) in patients with chronic viral hepatitis. HCC ridge score was consistently more accurate than existing HCC risk scores. These models may be incorporated into electronic medical health systems to develop appropriate cancer surveillance strategies and reduce cancer death.