关键词: Anomaly detection Data augmentation Data generation Deep learning Imbalanced dataset Signal processing

来  源:   DOI:10.1016/j.heliyon.2021.e07687   PDF(Sci-hub)   PDF(Pubmed)

Abstract:
Unlike data augmentation, data generation for extremely rare cases is an approach that can spawn a significant number of high-quality samples based on very few original data. This could be useful in anomaly detection and classification tasks that have the limitation of publicly available datasets for research purposes. Though some other approaches have attempted to solve this problem, such as data augmentation techniques, there was nothing to ensure the characteristics of synthesized samples. Previously, we initiated a framework, called Data Augmentation and Generation for Anomalous Time-series Signals (DAGAT), that was in cooperation with important components: Data Augmentation, Variational Autoencoder (VAE), Data Picker (DP), Signal Fragment Assembler (SFA), and Quality Classifier (QC). And then, an upgraded framework, called An Advanced Data Generation for Anomalous Signals (ADGAS), was introduced to eliminate the limitations of DAGAT; those are uncontrollable outputs and the possibility of bad data included in a training set. By reforming DAGAT architecture, ADGAS achieves a better outcome of generated samples. Nonetheless, ADGAS could be improved through better SFA, DP, and QC. Hence, this paper proposed a Data Generation Framework for Extremely Rare Case Signals. The proposed framework is achievable in generating reliable data for various objectives. We challenged this framework by using the 1D-CNN to serve as the performance evaluator in multi-class anomalous classifications and using the water treatment and water distribution testbed (SWaT and WADI) as the real-world anomaly datasets. The result shows that it surpasses other baseline methods of anomaly data augmentation and data generation techniques.
摘要:
与数据增强不同,非常罕见情况下的数据生成是一种方法,可以基于非常少的原始数据产生大量高质量的样本。这在具有用于研究目的的公开可用数据集的限制的异常检测和分类任务中可能是有用的。虽然一些其他方法试图解决这个问题,例如数据增强技术,没有什么可以确保合成样品的特性。以前,我们启动了一个框架,称为异常时间序列信号的数据增强和生成(DAGAT),这是与重要组件合作的:数据增强,变分自动编码器(VAE),数据选取器(DP),信号片段组装器(SFA),和质量分类器(QC)。然后,一个升级的框架,称为异常信号高级数据生成(ADGAS),是为了消除DAGAT的限制而引入的;这些是不可控的输出,并且是训练集中包含不良数据的可能性。通过改革DAGAT架构,ADGAS实现了产生的样品的更好结果。尽管如此,ADGAS可以通过更好的SFA来改进,DP,和QC。因此,本文提出了一种极罕见案例信号的数据生成框架。拟议的框架可以为各种目标生成可靠的数据。我们通过使用1D-CNN作为多类异常分类的性能评估者,并使用水处理和配水测试台(SWaT和WADI)作为现实世界的异常数据集来挑战这一框架。结果表明,它超越了其他基线方法的异常数据扩充和数据生成技术。
公众号