关键词: concept drift drift detection drift explanation drift localization explainability monitoring survey

来  源:   DOI:10.3389/frai.2024.1330258   PDF(Pubmed)

Abstract:
In an increasing number of industrial and technical processes, machine learning-based systems are being entrusted with supervision tasks. While they have been successfully utilized in many application areas, they frequently are not able to generalize to changes in the observed data, which environmental changes or degrading sensors might cause. These changes, commonly referred to as concept drift can trigger malfunctions in the used solutions which are safety-critical in many cases. Thus, detecting and analyzing concept drift is a crucial step when building reliable and robust machine learning-driven solutions. In this work, we consider the setting of unsupervised data streams which is highly relevant for different monitoring and anomaly detection scenarios. In particular, we focus on the tasks of localizing and explaining concept drift which are crucial to enable human operators to take appropriate action. Next to providing precise mathematical definitions of the problem of concept drift localization, we survey the body of literature on this topic. By performing standardized experiments on parametric artificial datasets we provide a direct comparison of different strategies. Thereby, we can systematically analyze the properties of different schemes and suggest first guidelines for practical applications. Finally, we explore the emerging topic of explaining concept drift.
摘要:
在越来越多的工业和技术过程中,基于机器学习的系统正被赋予监督任务。虽然它们已经成功地应用在许多应用领域,他们经常不能概括观察到的数据的变化,可能会导致环境变化或传感器退化。这些变化,通常被称为概念漂移可以在使用的解决方案中触发故障,这些解决方案在许多情况下是安全关键的。因此,在构建可靠且稳健的机器学习驱动解决方案时,检测和分析概念漂移是至关重要的一步。在这项工作中,我们考虑无监督数据流的设置,这与不同的监控和异常检测场景高度相关。特别是,我们专注于本地化和解释概念漂移的任务,这对于使人类操作员采取适当的行动至关重要。接下来提供概念漂移本地化问题的精确数学定义,我们调查了关于这个主题的文献。通过对参数人工数据集进行标准化实验,我们提供了不同策略的直接比较。因此,我们可以系统地分析不同方案的性质,并为实际应用提出第一准则。最后,我们探索解释概念漂移的新兴主题。
公众号