Mesh : Humans Artificial Intelligence COVID-19 / diagnostic imaging X-Rays Benchmarking Decision Support Systems, Clinical

来  源:   DOI:10.1038/s41598-023-46493-2   PDF(Pubmed)

Abstract:
Recent advances in artificial intelligence (AI) have sparked interest in developing explainable AI (XAI) methods for clinical decision support systems, especially in translational research. Although using XAI methods may enhance trust in black-box models, evaluating their effectiveness has been challenging, primarily due to the absence of human (expert) intervention, additional annotations, and automated strategies. In order to conduct a thorough assessment, we propose a patch perturbation-based approach to automatically evaluate the quality of explanations in medical imaging analysis. To eliminate the need for human efforts in conventional evaluation methods, our approach executes poisoning attacks during model retraining by generating both static and dynamic triggers. We then propose a comprehensive set of evaluation metrics during the model inference stage to facilitate the evaluation from multiple perspectives, covering a wide range of correctness, completeness, consistency, and complexity. In addition, we include an extensive case study to showcase the proposed evaluation strategy by applying widely-used XAI methods on COVID-19 X-ray imaging classification tasks, as well as a thorough review of existing XAI methods in medical imaging analysis with evaluation availability. The proposed patch perturbation-based workflow offers model developers an automated and generalizable evaluation strategy to identify potential pitfalls and optimize their proposed explainable solutions, while also aiding end-users in comparing and selecting appropriate XAI methods that meet specific clinical needs in real-world clinical research and practice.
摘要:
人工智能(AI)的最新进展引发了人们对开发用于临床决策支持系统的可解释AI(XAI)方法的兴趣。特别是在转化研究中。尽管使用XAI方法可能会增强对黑盒模型的信任,评估它们的有效性一直很有挑战性,主要是由于缺乏人类(专家)的干预,附加注释,和自动化策略。为了进行全面评估,我们提出了一种基于补丁扰动的方法来自动评估医学成像分析中解释的质量。为了消除常规评估方法中对人类努力的需要,我们的方法通过生成静态和动态触发器来在模型再训练期间执行中毒攻击。然后,我们在模型推断阶段提出了一套全面的评估指标,以便于从多个角度进行评估,涵盖了广泛的正确性,完整性,一致性,和复杂性。此外,我们包括一个广泛的案例研究,通过在COVID-19X射线成像分类任务中应用广泛使用的XAI方法来展示拟议的评估策略,以及对医学影像分析中现有XAI方法的全面审查,并评估其可用性。提出的基于补丁扰动的工作流程为模型开发人员提供了一种自动化和可推广的评估策略,以识别潜在的陷阱并优化他们提出的可解释的解决方案。同时也帮助最终用户比较和选择合适的XAI方法,以满足实际临床研究和实践中的特定临床需求。
公众号