关键词: U-processes covariate shift domain adaptation importance weighting model performance prediction models transportability

Mesh : Models, Statistical ROC Curve Nutrition Surveys Area Under Curve

来  源:   DOI:10.1111/biom.13796   PDF(Pubmed)

Abstract:
We propose methods for estimating the area under the receiver operating characteristic (ROC) curve (AUC) of a prediction model in a target population that differs from the source population that provided the data used for original model development. If covariates that are associated with model performance, as measured by the AUC, have a different distribution in the source and target populations, then AUC estimators that only use data from the source population will not reflect model performance in the target population. Here, we provide identification results for the AUC in the target population when outcome and covariate data are available from the sample of the source population, but only covariate data are available from the sample of the target population. In this setting, we propose three estimators for the AUC in the target population and show that they are consistent and asymptotically normal. We evaluate the finite-sample performance of the estimators using simulations and use them to estimate the AUC in a nationally representative target population from the National Health and Nutrition Examination Survey for a lung cancer risk prediction model developed using source population data from the National Lung Screening Trial.
摘要:
我们提出了用于估计目标人群中预测模型的接受者工作特征(ROC)曲线(AUC)下面积的方法,该目标人群不同于提供用于原始模型开发的数据的源人群。如果与模型性能相关的协变量,由AUC测量,在来源和目标人群中有不同的分布,则仅使用来自源人群的数据的AUC估计器将无法反映目标人群的模型性能。这里,当结果和协变量数据可从来源人群的样本中获得时,我们提供目标人群AUC的识别结果,但只有协变量数据可从目标人群的样本中获得。在此设置中,我们为目标人群中的AUC提出了三个估计,并表明它们是一致且渐近正态的。我们使用模拟评估了估计器的有限样本性能,并使用它们来估计全国健康和营养检查调查中具有全国代表性的目标人群的AUC,该调查是使用国家肺癌筛查试验的源人群数据开发的肺癌风险预测模型。
公众号