背景:概念漂移和协变量移位导致机器学习(ML)模型的退化。我们研究的目的是描述由COVID大流行引起的突然数据漂移。此外,我们研究了某些方法在模型训练中的适用性,以防止数据漂移导致的模型退化。
方法:我们在包含2014-2019年收集的102,666例手术患者的数据集上使用H2OAutoML方法训练了不同的ML模型,以使用术前可用数据预测术后死亡率。应用的模型是带正则化的广义线性模型,默认随机森林,梯度增压机,极限梯度提升,包含所有基础模型的深度学习和堆叠集成。Further,在原始大流行前数据集上训练时,我们通过应用三种不同的方法修改了原始模型:(RahmaniK,etal,IntJMedInform173:104930,2023)我们对较旧的数据加权较弱,(莫尔格A,etal,SciRep12:7244,2022)仅使用最新数据进行模型训练,并且(DilmeganiC,2023)对数值输入参数进行了z变换。之后,我们在训练过程中未使用的大流行前和大流行内数据集上测试了模型性能,并分析了共同特征。
结果:在2020年1月至3月的数据集上进行测试时,所产生的模型显示出接收器工作特征和可接受的精度召回曲线下的出色区域,但在2020年4月至5月的第一波COVID大流行中收集的数据集上进行测试时,显示出显着下降。当比较输入参数的概率分布时,大流行前和大流行内数据之间存在显著差异.我们模型的端点,手术后住院死亡率,大流行前和大流行数据之间没有显着差异,每个病例中约为1%。然而,模型的输入参数组成差异很大。我们所应用的修改都没有防止性能损失,尽管从中出现了非常不同的模型,使用各种各样的参数。
结论:我们的结果表明,我们在模型训练中经过测试的易于实施的措施都不能防止在突然发生的外部事件的情况下恶化。因此,我们的结论是,在存在概念漂移和协变量移位的情况下,有必要对模型预测进行密切监测和严格审查。
BACKGROUND: Concept drift and covariate shift lead to a degradation of machine learning (ML) models. The objective of our study was to characterize sudden data drift as caused by the COVID pandemic. Furthermore, we investigated the suitability of certain methods in model training to prevent model degradation caused by data drift.
METHODS: We trained different ML models with the H2O AutoML method on a dataset comprising 102,666 cases of surgical patients collected in the years 2014-2019 to predict postoperative mortality using preoperatively available data. Models applied were Generalized Linear Model with regularization, Default Random Forest, Gradient Boosting Machine, eXtreme Gradient Boosting, Deep Learning and Stacked Ensembles comprising all base models. Further, we modified the original models by applying three different methods when training on the original pre-pandemic dataset: (Rahmani K, et al, Int J Med Inform 173:104930, 2023) we weighted older data weaker, (Morger A, et al, Sci Rep 12:7244, 2022) used only the most recent data for model training and (Dilmegani C, 2023) performed a z-transformation of the numerical input parameters. Afterwards, we tested model performance on a pre-pandemic and an in-pandemic data set not used in the training process, and analysed common features.
RESULTS: The models produced showed excellent areas under receiver-operating characteristic and acceptable precision-recall curves when tested on a dataset from January-March 2020, but significant degradation when tested on a dataset collected in the first wave of the COVID pandemic from April-May 2020. When comparing the probability distributions of the input parameters, significant differences between pre-pandemic and in-pandemic data were found. The endpoint of our models, in-hospital mortality after surgery, did not differ significantly between pre- and in-pandemic data and was about 1% in each case. However, the models varied considerably in the composition of their input parameters. None of our applied modifications prevented a loss of performance, although very different models emerged from it, using a large variety of parameters.
CONCLUSIONS: Our results show that none of our tested easy-to-implement measures in model training can prevent deterioration in the case of sudden external events. Therefore, we conclude that, in the presence of concept drift and covariate shift, close monitoring and critical review of model predictions are necessary.