临床实践中机器学习模型质量控制监测的考虑。Considerations for Quality Control Monitoring of Machine Learning Models in Clinical Practice.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

Integrating machine learning (ML) models into clinical practice presents a challenge of maintaining their efficacy over time. While existing literature offers valuable strategies for detecting declining model performance, there is a need to document the broader challenges and solutions associated with the real-world development and integration of model monitoring solutions. This work details the development and use of a platform for monitoring the performance of a production-level ML model operating in Mayo Clinic. In this paper, we aimed to provide a series of considerations and guidelines necessary for integrating such a platform into a team\'s technical infrastructure and workflow. We have documented our experiences with this integration process, discussed the broader challenges encountered with real-world implementation and maintenance, and included the source code for the platform. Our monitoring platform was built as an R shiny application, developed and implemented over the course of 6 months. The platform has been used and maintained for 2 years and is still in use as of July 2023. The considerations necessary for the implementation of the monitoring platform center around 4 pillars: feasibility (what resources can be used for platform development?); design (through what statistics or models will the model be monitored, and how will these results be efficiently displayed to the end user?); implementation (how will this platform be built, and where will it exist within the IT ecosystem?); and policy (based on monitoring feedback, when and what actions will be taken to fix problems, and how will these problems be translated to clinical staff?). While much of the literature surrounding ML performance monitoring emphasizes methodological approaches for capturing changes in performance, there remains a battery of other challenges and considerations that must be addressed for successful real-world implementation.

摘要：

将机器学习（ML）模型集成到临床实践中面临着随着时间的推移保持其功效的挑战。虽然现有文献提供了检测模型性能下降的有价值的策略，有必要记录与实际开发和集成模型监控解决方案相关的更广泛的挑战和解决方案。这项工作详细介绍了用于监视在MayoClinic中运行的生产级ML模型的性能的平台的开发和使用。在本文中,我们的目标是提供一系列必要的考虑因素和准则，以将这样一个平台集成到团队的技术基础结构和工作流程中。我们已经记录了我们在这个整合过程中的经验，讨论了实际实施和维护遇到的更广泛的挑战，并包括平台的源代码。我们的监控平台是作为一个R闪亮的应用程序构建的，在6个月内开发和实施。该平台已经使用和维护了2年，截至2023年7月仍在使用。实施监控平台所需的考虑因素围绕4个支柱：可行性（哪些资源可用于平台开发？）；设计（通过哪些统计数据或模型将监控模型，以及如何将这些结果有效地显示给最终用户？）；实现（该平台将如何构建，以及它将在IT生态系统中存在的位置？)；和政策(基于监控反馈，何时以及将采取什么措施来解决问题，以及这些问题将如何转化为临床工作人员？)。尽管围绕ML性能监控的许多文献都强调捕获性能变化的方法论方法，为了成功地在现实世界中实施，还必须解决一系列其他挑战和考虑因素。