关键词: Clinical prediction model electronic health record imputation missing data prediction simulation

Mesh : Humans Data Interpretation, Statistical Computer Simulation Research Design Critical Care

来  源:   DOI:10.1177/09622802231165001   PDF(Pubmed)

Abstract:
Background: In clinical prediction modelling, missing data can occur at any stage of the model pipeline; development, validation or deployment. Multiple imputation is often recommended yet challenging to apply at deployment; for example, the outcome cannot be in the imputation model, as recommended under multiple imputation. Regression imputation uses a fitted model to impute the predicted value of missing predictors from observed data, and could offer a pragmatic alternative at deployment. Moreover, the use of missing indicators has been proposed to handle informative missingness, but it is currently unknown how well this method performs in the context of clinical prediction models. Methods: We simulated data under various missing data mechanisms to compare the predictive performance of clinical prediction models developed using both imputation methods. We consider deployment scenarios where missing data is permitted or prohibited, imputation models that use or omit the outcome, and clinical prediction models that include or omit missing indicators. We assume that the missingness mechanism remains constant across the model pipeline. We also apply the proposed strategies to critical care data. Results: With complete data available at deployment, our findings were in line with existing recommendations; that the outcome should be used to impute development data when using multiple imputation and omitted under regression imputation. When missingness is allowed at deployment, omitting the outcome from the imputation model at the development was preferred. Missing indicators improved model performance in many cases but can be harmful under outcome-dependent missingness. Conclusion: We provide evidence that commonly taught principles of handling missing data via multiple imputation may not apply to clinical prediction models, particularly when data can be missing at deployment. We observed comparable predictive performance under multiple imputation and regression imputation. The performance of the missing data handling method must be evaluated on a study-by-study basis, and the most appropriate strategy for handling missing data at development should consider whether missing data are allowed at deployment. Some guidance is provided.
摘要:
背景:在临床预测建模中,缺失数据可能发生在模型管道的任何阶段;开发,验证或部署。通常建议在部署时应用多重归集,但具有挑战性;例如,结果不能在归责模型中,根据多重归因的建议。回归插补使用拟合模型从观测数据中推算缺失预测因子的预测值,并可以在部署时提供务实的替代方案。此外,建议使用缺失指标来处理信息缺失,但目前尚不清楚这种方法在临床预测模型中的表现如何。方法:我们在各种缺失数据机制下模拟数据,以比较使用两种归因方法开发的临床预测模型的预测性能。我们考虑允许或禁止丢失数据的部署方案,使用或省略结果的估算模型,以及包括或省略缺失指标的临床预测模型。我们假设错误机制在整个模型管道中保持不变。我们还将建议的策略应用于重症监护数据。结果:在部署时提供完整的数据,我们的研究结果与现有的建议一致;当使用多重插补时,应将结果用于插补发育数据,而在回归插补中省略.当部署时允许不安全时,优选在开发时省略归因模型的结果。在许多情况下,缺少指标会改善模型性能,但在依赖于结果的错误情况下可能是有害的。结论:我们提供的证据表明,通常教导的通过多重插补处理缺失数据的原则可能不适用于临床预测模型,特别是在部署时数据可能丢失时。我们在多重插补和回归插补下观察到了可比的预测性能。必须在逐个研究的基础上评估缺失数据处理方法的性能,在开发时处理缺失数据的最适当策略应该考虑在部署时是否允许缺失数据。提供了一些指导。
公众号