关键词: Calibration Discrimination Sample size Simulation

Mesh : Humans Sample Size Risk Assessment / methods statistics & numerical data Models, Statistical Computer Simulation Algorithms

来  源:   DOI:10.1186/s12874-024-02268-5   PDF(Pubmed)

Abstract:
BACKGROUND: Risk prediction models are routinely used to assist in clinical decision making. A small sample size for model development can compromise model performance when the model is applied to new patients. For binary outcomes, the calibration slope (CS) and the mean absolute prediction error (MAPE) are two key measures on which sample size calculations for the development of risk models have been based. CS quantifies the degree of model overfitting while MAPE assesses the accuracy of individual predictions.
METHODS: Recently, two formulae were proposed to calculate the sample size required, given anticipated features of the development data such as the outcome prevalence and c-statistic, to ensure that the expectation of the CS and MAPE (over repeated samples) in models fitted using MLE will meet prespecified target values. In this article, we use a simulation study to evaluate the performance of these formulae.
RESULTS: We found that both formulae work reasonably well when the anticipated model strength is not too high (c-statistic < 0.8), regardless of the outcome prevalence. However, for higher model strengths the CS formula underestimates the sample size substantially. For example, for c-statistic = 0.85 and 0.9, the sample size needed to be increased by at least 50% and 100%, respectively, to meet the target expected CS. On the other hand, the MAPE formula tends to overestimate the sample size for high model strengths. These conclusions were more pronounced for higher prevalence than for lower prevalence. Similar results were drawn when the outcome was time to event with censoring. Given these findings, we propose a simulation-based approach, implemented in the new R package \'samplesizedev\', to correctly estimate the sample size even for high model strengths. The software can also calculate the variability in CS and MAPE, thus allowing for assessment of model stability.
CONCLUSIONS: The calibration and MAPE formulae suggest sample sizes that are generally appropriate for use when the model strength is not too high. However, they tend to be biased for higher model strengths, which are not uncommon in clinical risk prediction studies. On those occasions, our proposed adjustments to the sample size calculations will be relevant.
摘要:
背景:风险预测模型通常用于辅助临床决策。当模型应用于新患者时,用于模型开发的小样本大小可能会损害模型性能。对于二元结果,校准斜率(CS)和平均绝对预测误差(MAPE)是两个关键指标,用于开发风险模型的样本量计算是基于这两个指标。CS量化模型过度拟合的程度,而MAPE评估个体预测的准确性。
方法:最近,提出了两个公式来计算所需的样本量,考虑到发展数据的预期特征,如结果患病率和c统计量,以确保使用MLE拟合的模型中的CS和MAPE(在重复样本中)的预期将满足预定的目标值。在这篇文章中,我们使用模拟研究来评估这些公式的性能。
结果:我们发现,当预期模型强度不太高时(c统计量<0.8),这两个公式都可以很好地工作。无论结果如何。然而,对于较高的模型强度,CS公式大大低估了样本量。例如,对于c统计量=0.85和0.9,样本量需要至少增加50%和100%,分别,以达到预期的CS目标。另一方面,MAPE公式倾向于高估高模型强度的样本量。这些结论对于较高的患病率比对于较低的患病率更为明显。当结果是进行审查时,得出了类似的结果。鉴于这些发现,我们提出了一种基于仿真的方法,在新的R包“samplesizedev”中实现,即使对于高模型强度,也能正确估计样本量。该软件还可以计算CS和MAPE的可变性,从而允许评估模型的稳定性。
结论:校准和MAPE公式表明,当模型强度不太高时,通常适合使用的样品尺寸。然而,他们倾向于偏向于更高的模型优势,这在临床风险预测研究中并不少见。在那些场合,我们对样本量计算的拟议调整将是相关的。
公众号