评估医疗保健预测模型中的种族偏见：来自 30 天医院再入院模型的实证评估的实践教训。Assessing racial bias in healthcare predictive models: Practical lessons from an empirical evaluation of 30-day hospital readmission models.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

OBJECTIVE: Despite increased availability of methodologies to identify algorithmic bias, the operationalization of bias evaluation for healthcare predictive models is still limited. Therefore, this study proposes a process for bias evaluation through an empirical assessment of common hospital readmission models. The process includes selecting bias measures, interpretation, determining disparity impact and potential mitigations.
METHODS: This retrospective analysis evaluated racial bias of four common models predicting 30-day unplanned readmission (i.e., LACE Index, HOSPITAL Score, and the CMS readmission measure applied as is and retrained). The models were assessed using 2.4 million adult inpatient discharges in Maryland from 2016 to 2019. Fairness metrics that are model-agnostic, easy to compute, and interpretable were implemented and apprised to select the most appropriate bias measures. The impact of changing model\'s risk thresholds on these measures was further assessed to guide the selection of optimal thresholds to control and mitigate bias.
RESULTS: Four bias measures were selected for the predictive task: zero-one-loss difference, false negative rate (FNR) parity, false positive rate (FPR) parity, and generalized entropy index. Based on these measures, the HOSPITAL score and the retrained CMS measure demonstrated the lowest racial bias. White patients showed a higher FNR while Black patients resulted in a higher FPR and zero-one-loss. As the models\' risk threshold changed, trade-offs between models\' fairness and overall performance were observed, and the assessment showed all models\' default thresholds were reasonable for balancing accuracy and bias.
CONCLUSIONS: This study proposes an Applied Framework to Assess Fairness of Predictive Models (AFAFPM) and demonstrates the process using 30-day hospital readmission model as the example. It suggests the feasibility of applying algorithmic bias assessment to determine optimized risk thresholds so that predictive models can be used more equitably and accurately. It is evident that a combination of qualitative and quantitative methods and a multidisciplinary team are necessary to identify, understand and respond to algorithm bias in real-world healthcare settings. Users should also apply multiple bias measures to ensure a more comprehensive, tailored, and balanced view. The results of bias measures, however, must be interpreted with caution and consider the larger operational, clinical, and policy context.

摘要：

目标：尽管识别算法偏差的方法越来越多，医疗保健预测模型偏倚评估的可操作性仍然有限.因此,本研究通过对普通医院再入院模型的实证评估，提出了一个偏差评估的过程。该过程包括选择偏差度量，解释,确定差异影响和潜在缓解措施。
方法：这项回顾性分析评估了预测30天计划外再入院的四种常见模型的种族偏见（即，蕾丝索引，医院评分,和CMS再接纳措施按原样应用并重新培训)。这些模型是使用2016年至2019年马里兰州240万成人住院患者进行评估的。与模型无关的公平性指标，易于计算，和可解释的实施和通知，以选择最合适的偏见措施。进一步评估了改变模型的风险阈值对这些措施的影响，以指导选择最佳阈值来控制和减轻偏差。
结果：为预测任务选择了四种偏差度量：零一损失差，假阴性率(FNR)平价，假阳性率(FPR)平价，和广义熵指数。基于这些措施,医院评分和经再训练的CMS测量显示种族偏见最低.白人患者显示出较高的FNR，而黑人患者导致较高的FPR和零一损失。随着模型风险阈值的变化，观察到模型公平性和整体性能之间的权衡，评估显示，所有模型的默认阈值对于平衡准确性和偏差都是合理的。
结论：本研究提出了评估预测模型公平性的应用框架（AFAFAFFPM），并以30天医院再入院模型为例演示了该过程。它提出了应用算法偏差评估来确定优化的风险阈值的可行性，以便可以更公平和准确地使用预测模型。显然，定性和定量相结合的方法和多学科的团队是必要的，以确定，理解并应对现实世界医疗保健环境中的算法偏差。用户还应应用多种偏见措施，以确保更全面、量身定做,平衡的观点。偏差测量的结果，然而,必须谨慎解释，并考虑更大的运营，临床,和政策背景。