关键词: algorithm algorithmic fairness artificial intelligence assessment bias clinical machine learning diagnosis fairness machine learning medical machine learning mitigation model outcome prediction prediction race racial bias scoping review score prediction algorithm algorithmic fairness artificial intelligence assessment bias clinical machine learning diagnosis fairness machine learning medical machine learning mitigation model outcome prediction prediction race racial bias scoping review score prediction

来  源:   DOI:10.2196/36388

Abstract:
BACKGROUND: Racial bias is a key concern regarding the development, validation, and implementation of machine learning (ML) models in clinical settings. Despite the potential of bias to propagate health disparities, racial bias in clinical ML has yet to be thoroughly examined and best practices for bias mitigation remain unclear.
OBJECTIVE: Our objective was to perform a scoping review to characterize the methods by which the racial bias of ML has been assessed and describe strategies that may be used to enhance algorithmic fairness in clinical ML.
METHODS: A scoping review was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) Extension for Scoping Reviews. A literature search using PubMed, Scopus, and Embase databases, as well as Google Scholar, identified 635 records, of which 12 studies were included.
RESULTS: Applications of ML were varied and involved diagnosis, outcome prediction, and clinical score prediction performed on data sets including images, diagnostic studies, clinical text, and clinical variables. Of the 12 studies, 1 (8%) described a model in routine clinical use, 2 (17%) examined prospectively validated clinical models, and the remaining 9 (75%) described internally validated models. In addition, 8 (67%) studies concluded that racial bias was present, 2 (17%) concluded that it was not, and 2 (17%) assessed the implementation of bias mitigation strategies without comparison to a baseline model. Fairness metrics used to assess algorithmic racial bias were inconsistent. The most commonly observed metrics were equal opportunity difference (5/12, 42%), accuracy (4/12, 25%), and disparate impact (2/12, 17%). All 8 (67%) studies that implemented methods for mitigation of racial bias successfully increased fairness, as measured by the authors\' chosen metrics. Preprocessing methods of bias mitigation were most commonly used across all studies that implemented them.
CONCLUSIONS: The broad scope of medical ML applications and potential patient harms demand an increased emphasis on evaluation and mitigation of racial bias in clinical ML. However, the adoption of algorithmic fairness principles in medicine remains inconsistent and is limited by poor data availability and ML model reporting. We recommend that researchers and journal editors emphasize standardized reporting and data availability in medical ML studies to improve transparency and facilitate evaluation for racial bias.
摘要:
背景:种族偏见是关于发展的一个关键问题,验证,以及在临床环境中实施机器学习(ML)模型。尽管偏见可能会传播健康差异,临床ML中的种族偏倚尚未得到彻底检查,缓解偏倚的最佳实践仍不清楚.
目的:我们的目的是进行范围审查,以描述评估ML种族偏倚的方法,并描述可用于提高临床ML算法公平性的策略。
方法:根据系统评价和Meta分析(PRISMA)扩展范围审查的首选报告项目进行范围审查。使用PubMed进行文献检索,Scopus,和Embase数据库,以及谷歌学者,已识别635条记录,其中包括12项研究。
结果:ML的应用多种多样,涉及诊断,结果预测,对包括图像在内的数据集进行临床评分预测,诊断研究,临床文本,和临床变量。在12项研究中,1(8%)描述了常规临床使用的模型,2(17%)检查了前瞻性验证的临床模型,其余9个(75%)描述了内部验证的模型。此外,8项(67%)研究得出的结论是存在种族偏见,2(17%)的结论是它不是,和2(17%)评估了偏倚缓解策略的实施,而没有与基线模型进行比较。用于评估算法种族偏见的公平性指标不一致。最常见的指标是机会均等差异(5/12,42%),准确度(4/12,25%),和不同的影响(2/12,17%)。所有8项(67%)研究实施了缓解种族偏见的方法,成功地提高了公平性,由作者选择的指标衡量。在所有实施偏见的研究中,最常用的是减轻偏见的预处理方法。
结论:医学ML应用的广泛范围和潜在的患者危害需要更加重视临床ML中种族偏见的评估和缓解。然而,在医学中采用算法公平性原则仍然不一致,并且受到数据可用性差和ML模型报告的限制.我们建议研究人员和期刊编辑强调医学ML研究中的标准化报告和数据可用性,以提高透明度并促进对种族偏见的评估。
公众号