关键词: Clinical decision support systems Electronic Health Records Machine learning Oncology Palliative Medicine

Mesh : Humans Machine Learning / standards Electronic Health Records / statistics & numerical data Palliative Care / methods standards statistics & numerical data Male Female Middle Aged Aged Risk Assessment / methods Neoplasms / mortality therapy Cohort Studies Adult Medical Oncology / methods standards Aged, 80 and over Mortality / trends

来  源:   DOI:10.1186/s12904-024-01457-9   PDF(Pubmed)

Abstract:
BACKGROUND: Ex-ante identification of the last year in life facilitates a proactive palliative approach. Machine learning models trained on electronic health records (EHR) demonstrate promising performance in cancer prognostication. However, gaps in literature include incomplete reporting of model performance, inadequate alignment of model formulation with implementation use-case, and insufficient explainability hindering trust and adoption in clinical settings. Hence, we aim to develop an explainable machine learning EHR-based model that prompts palliative care processes by predicting for 365-day mortality risk among patients with advanced cancer within an outpatient setting.
METHODS: Our cohort consisted of 5,926 adults diagnosed with Stage 3 or 4 solid organ cancer between July 1, 2017, and June 30, 2020 and receiving ambulatory cancer care within a tertiary center. The classification problem was modelled using Extreme Gradient Boosting (XGBoost) and aligned to our envisioned use-case: \"Given a prediction point that corresponds to an outpatient cancer encounter, predict for mortality within 365-days from prediction point, using EHR data up to 365-days prior.\" The model was trained with 75% of the dataset (n = 39,416 outpatient encounters) and validated on a 25% hold-out dataset (n = 13,122 outpatient encounters). To explain model outputs, we used Shapley Additive Explanations (SHAP) values. Clinical characteristics, laboratory tests and treatment data were used to train the model. Performance was evaluated using area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC), while model calibration was assessed using the Brier score.
RESULTS: In total, 17,149 of the 52,538 prediction points (32.6%) had a mortality event within the 365-day prediction window. The model demonstrated an AUROC of 0.861 (95% CI 0.856-0.867) and AUPRC of 0.771. The Brier score was 0.147, indicating slight overestimations of mortality risk. Explanatory diagrams utilizing SHAP values allowed visualization of feature impacts on predictions at both the global and individual levels.
CONCLUSIONS: Our machine learning model demonstrated good discrimination and precision-recall in predicting 365-day mortality risk among individuals with advanced cancer. It has the potential to provide personalized mortality predictions and facilitate earlier integration of palliative care.
摘要:
背景:对生命中最后一年的事前识别有助于采取积极的姑息治疗方法。在电子健康记录(EHR)上训练的机器学习模型在癌症预后方面表现出了有希望的表现。然而,文献中的空白包括模型性能报告不完整,模型制定与实现用例的一致性不够,以及不足的可解释性阻碍了临床环境中的信任和采用。因此,我们的目标是开发一种可解释的基于机器学习EHR的模型,该模型通过预测门诊晚期癌症患者的365天死亡风险来提示姑息治疗流程.
方法:我们的队列包括2017年7月1日至2020年6月30日期间诊断为3期或4期实体器官癌的5,926名成年人,并在三级中心接受门诊癌症治疗。使用极端梯度提升(XGBoost)对分类问题进行了建模,并与我们设想的用例保持一致:“给定一个对应于门诊癌症的预测点,预测从预测点开始的365天内的死亡率,使用多达365天之前的EHR数据。该模型使用75%的数据集(n=39,416例门诊病人)进行了训练,并在25%的保留数据集(n=13,122例门诊病人)上进行了验证。为了解释模型输出,我们使用Shapley加法解释(SHAP)值。临床特征,实验室测试和治疗数据用于训练模型.使用接受者工作特征曲线下面积(AUROC)和精确召回曲线下面积(AUPRC)评估性能,而模型校准使用Brier评分进行评估。
结果:总计,52,538个预测点中的17,149个(32.6%)在365天的预测窗口内发生了死亡事件。该模型显示AUROC为0.861(95%CI0.856-0.867),AUPRC为0.771。Brier评分为0.147,表明对死亡风险的轻微高估。利用SHAP值的解释性图可以在全局和个人级别上可视化特征对预测的影响。
结论:我们的机器学习模型在预测晚期癌症患者的365天死亡风险方面表现出良好的辨别力和精确召回率。它有可能提供个性化的死亡率预测,并促进姑息治疗的早期整合。
公众号