Isolation forest

  • 文章类型: Journal Article
    软传感器已被广泛用于风力发电中的实时功率预测。这是具有挑战性的即时测量。对风电的短期预测旨在为日内电网的调度提供参考。本研究通过将数据预处理与变分模态分解(VMD)相结合,提出了一种基于长短期记忆(LSTM)网络的软测量模型,以提高风电功率预测精度。采用隔离森林算法对原始风电机组进行异常检测,并对缺失数据进行多次插补处理。基于过程数据样本,VMD技术用于实现功率数据分解和降噪。引入LSTM网络来分别预测每个模态分量,并进一步求和重构各分量的预测结果,完成风电功率预测。从实验结果来看,可以看出,使用Adam优化算法的LSTM网络具有更好的收敛精度。VMD方法由于其固有的维纳滤波能力而表现出优异的分解结果,有效地减轻噪声和防止模态混叠。平均绝对百分比误差(MAPE)减少了9.358%,表明LSTM网络结合VMD方法具有较好的预测精度。
    Soft sensors have been extensively utilized to approximate real-time power prediction in wind power generation, which is challenging to measure instantaneously. The short-term forecast of wind power aims at providing a reference for the dispatch of the intraday power grid. This study proposes a soft sensor model based on the Long Short-Term Memory (LSTM) network by combining data preprocessing with Variational Modal Decomposition (VMD) to improve wind power prediction accuracy. It does so by adopting the isolation forest algorithm for anomaly detection of the original wind power series and processing the missing data by multiple imputation. Based on the process data samples, VMD technology is used to achieve power data decomposition and noise reduction. The LSTM network is introduced to predict each modal component separately, and further sum reconstructs the prediction results of each component to complete the wind power prediction. From the experimental results, it can be seen that the LSTM network which uses an Adam optimizing algorithm has better convergence accuracy. The VMD method exhibited superior decomposition outcomes due to its inherent Wiener filter capabilities, which effectively mitigate noise and forestall modal aliasing. The Mean Absolute Percentage Error (MAPE) was reduced by 9.3508%, which indicates that the LSTM network combined with the VMD method has better prediction accuracy.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    高光谱异常检测用于识别高光谱数据中的异常模式或异常。目前,许多光谱空间检测方法已经提出了级联的方式;然而,它们往往忽略了光谱和空间维度之间的互补特征,这容易导致产生高误报率。为了缓解这个问题,设计了一种用于高光谱异常检测的光谱-空间信息融合(SSIF)方法。首先,利用隔离森林获取光谱异常图,其中使用熵率分割算法构造对象级特征。然后,提出了一种局部空间显著性检测方案来产生空间异常结果。最后,将频谱和空间异常分数集成在一起,然后进行域变换递归滤波以生成最终的检测结果。在覆盖海洋和机场场景的五个高光谱数据集上的实验证明,与其他最先进的检测技术相比,所提出的SSIF产生了更好的检测结果。
    Hyperspectral anomaly detection is used to recognize unusual patterns or anomalies in hyperspectral data. Currently, many spectral-spatial detection methods have been proposed with a cascaded manner; however, they often neglect the complementary characteristics between the spectral and spatial dimensions, which easily leads to yield high false alarm rate. To alleviate this issue, a spectral-spatial information fusion (SSIF) method is designed for hyperspectral anomaly detection. First, an isolation forest is exploited to obtain spectral anomaly map, in which the object-level feature is constructed with an entropy rate segmentation algorithm. Then, a local spatial saliency detection scheme is proposed to produce the spatial anomaly result. Finally, the spectral and spatial anomaly scores are integrated together followed by a domain transform recursive filtering to generate the final detection result. Experiments on five hyperspectral datasets covering ocean and airport scenes prove that the proposed SSIF produces superior detection results over other state-of-the-art detection techniques.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    长期脑电图(长期脑电图)具有长期监测的能力,使其成为医疗机构中的宝贵工具。然而,由于大量的患者数据,从原始长期脑电图中选择干净的数据段进行进一步分析是一项极其耗时且费力的任务。此外,患者在记录过程中的各种动作使得很难使用算法去噪部分脑电图数据,从而导致对这些数据的拒绝。因此,用于快速拒绝长期EEG记录中严重损坏的时期的工具是非常有益的。在本文中,提出了一种新的基于隔离森林(IF)的长期脑电可靠,快速的自动伪影抑制方法。具体来说,重复应用IF算法来检测EEG数据中的异常值,并利用统计指标及时调整内点边界,使算法迭代进行。当干净历元和伪影破坏历元之间的距离度量保持不变时,终止迭代。六个统计指标(即,min,max,中位数,意思是,峰度,和偏度)通过将它们设置为质心以在迭代过程中调整边界来评估,并将所提出的方法与回顾性收集的数据集上的几种最先进的方法进行比较。实验结果表明,利用数据的最小值作为质心可以产生最佳的性能,所提出的方法在长期脑电图的自动伪影抑制中具有很高的有效性和可靠性,因为它显著提高了整体数据质量。此外,所提出的方法超过了大多数数据质量差的数据段的比较方法,展示了其卓越的能力,以提高严重损坏的数据的数据质量。此外,由于IF的线性时间复杂度,所提出的方法比其他方法快得多,从而在处理大量数据集时提供优势。
    Long-term electroencephalogram (Long-Term EEG) has the capacity to monitor over a long period, making it a valuable tool in medical institutions. However, due to the large volume of patient data, selecting clean data segments from raw Long-Term EEG for further analysis is an extremely time-consuming and labor-intensive task. Furthermore, the various actions of patients during recording make it difficult to use algorithms to denoise part of the EEG data, and thus lead to the rejection of these data. Therefore, tools for the quick rejection of heavily corrupted epochs in Long-Term EEG records are highly beneficial. In this paper, a new reliable and fast automatic artifact rejection method for Long-Term EEG based on Isolation Forest (IF) is proposed. Specifically, the IF algorithm is repetitively applied to detect outliers in the EEG data, and the boundary of inliers is promptly adjusted by using a statistical indicator to make the algorithm proceed in an iterative manner. The iteration is terminated when the distance metric between clean epochs and artifact-corrupted epochs remains unchanged. Six statistical indicators (i.e., min, max, median, mean, kurtosis, and skewness) are evaluated by setting them as centroid to adjust the boundary during iteration, and the proposed method is compared with several state-of-the-art methods on a retrospectively collected dataset. The experimental results indicate that utilizing the min value of data as the centroid yields the most optimal performance, and the proposed method is highly efficacious and reliable in the automatic artifact rejection of Long-Term EEG, as it significantly improves the overall data quality. Furthermore, the proposed method surpasses compared methods on most data segments with poor data quality, demonstrating its superior capacity to enhance the data quality of the heavily corrupted data. Besides, owing to the linear time complexity of IF, the proposed method is much faster than other methods, thus providing an advantage when dealing with extensive datasets.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    离群点检测是数据挖掘领域的一个重要研究方向。针对孤立森林算法在离群点检测中存在的数据集特征随机划分导致检测结果不稳定、效率低的问题,提出了一种结合聚类和隔离森林的基于聚类的改进隔离森林(CIIF)算法。CIIF首先使用k均值方法对数据集进行聚类,根据聚类的结果选择一个特定的聚类来构造一个选择矩阵,并通过选择矩阵实现算法的选择机制;然后构建多个隔离树。最后,根据每个样本在不同隔离树中的平均搜索长度计算离群值,并且具有最高异常值分数的Top-n对象被视为异常值。通过在11个真实数据集中与6种算法的对比实验,结果表明,CIIF算法具有较好的性能。与隔离林算法相比,我们提出的CIIF算法的平均AUC(ROC曲线下面积)值提高了7%。
    Outlier detection is an important research direction in the field of data mining. Aiming at the problem of unstable detection results and low efficiency caused by randomly dividing features of the data set in the Isolation Forest algorithm in outlier detection, an algorithm CIIF (Cluster-based Improved Isolation Forest) that combines clustering and Isolation Forest is proposed. CIIF first uses the k-means method to cluster the data set, selects a specific cluster to construct a selection matrix based on the results of the clustering, and implements the selection mechanism of the algorithm through the selection matrix; then builds multiple isolation trees. Finally, the outliers are calculated according to the average search length of each sample in different isolation trees, and the Top-n objects with the highest outlier scores are regarded as outliers. Through comparative experiments with six algorithms in eleven real data sets, the results show that the CIIF algorithm has better performance. Compared to the Isolation Forest algorithm, the average AUC (Area under the Curve of ROC) value of our proposed CIIF algorithm is improved by 7%.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    With the development and promotion of wearable devices and their mobile health (mHealth) apps, physiological signals have become a research hotspot. However, noise is complex in signals obtained from daily lives, making it difficult to analyze the signals automatically and resulting in a high false alarm rate. At present, screening out the high-quality segments of the signals from huge-volume data with few labels remains a problem. Signal quality assessment (SQA) is essential and is able to advance the valuable information mining of signals.
    The aims of this study were to design an SQA algorithm based on the unsupervised isolation forest model to classify the signal quality into 3 grades: good, acceptable, and unacceptable; validate the algorithm on labeled data sets; and apply the algorithm on real-world data to evaluate its efficacy.
    Data used in this study were collected by a wearable device (SensEcho) from healthy individuals and patients. The observation windows for electrocardiogram (ECG) and respiratory signals were 10 and 30 seconds, respectively. In the experimental procedure, the unlabeled training set was used to train the models. The validation and test sets were labeled according to preset criteria and used to evaluate the classification performance quantitatively. The validation set consisted of 3460 and 2086 windows of ECG and respiratory signals, respectively, whereas the test set was made up of 4686 and 3341 windows of signals, respectively. The algorithm was also compared with self-organizing maps (SOMs) and 4 classic supervised models (logistic regression, random forest, support vector machine, and extreme gradient boosting). One case validation was illustrated to show the application effect. The algorithm was then applied to 1144 cases of ECG signals collected from patients and the detected arrhythmia false alarms were calculated.
    The quantitative results showed that the ECG SQA model achieved 94.97% and 95.58% accuracy on the validation and test sets, respectively, whereas the respiratory SQA model achieved 81.06% and 86.20% accuracy on the validation and test sets, respectively. The algorithm was superior to SOM and achieved moderate performance when compared with the supervised models. The example case showed that the algorithm was able to correctly classify the signal quality even when there were complex pathological changes in the signals. The algorithm application results indicated that some specific types of arrhythmia false alarms such as tachycardia, atrial premature beat, and ventricular premature beat could be significantly reduced with the help of the algorithm.
    This study verified the feasibility of applying the anomaly detection unsupervised model to SQA. The application scenarios include reducing the false alarm rate of the device and selecting signal segments that can be used for further research.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    This paper proposes an indoor positioning method based on iBeacon technology that combines anomaly detection and a weighted Levenberg-Marquadt (LM) algorithm. The proposed solution uses the isolation forest algorithm for anomaly detection on the collected Received Signal Strength Indicator (RSSI) data from different iBeacon base stations, and calculates the anomaly rate of each signal source while eliminating abnormal signals. Then, a weight matrix is set by using each anomaly ratio and the RSSI value after eliminating the abnormal signal. Finally, the constructed weight matrix and the weighted LM algorithm are combined to solve the positioning coordinates. An Android smartphone was used to verify the positioning method proposed in this paper in an indoor scene. This experimental scenario revealed an average positioning error of 1.540 m and a root mean square error (RMSE) of 1.748 m. A large majority (85.71%) of the positioning point errors were less than 3 m. Furthermore, the RMSE of the method proposed in this paper was, respectively, 38.69%, 36.60%, and 29.52% lower than the RMSE of three other methods used for comparison. The experimental results show that the iBeacon-based indoor positioning method proposed in this paper can improve the precision of indoor positioning and has strong practicability.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    基于视觉感知的方法是捕获钢丝绳表面损伤状态的有前途的手段,因此提供了一种监测钢丝绳状况的潜在方法。以前的方法主要集中在手工制作的基于特征的缺陷表示,并构造分类器实现故障识别。然而,户外钢丝绳的外观受到润滑油等噪音的严重影响,灰尘,和光。此外,在实际应用中,很难准备足够数量的缺陷数据来训练故障分类器。在这些问题的背景下,本研究提出了一种新的基于卷积去噪自动编码器(CDAE)和隔离森林(iForest)的缺陷检测方法。首先通过使用图像重建损失来训练CDAE。然后,对其进行微调,以最小化成本函数,该函数会惩罚正常数据和缺陷数据之间基于iForest的缺陷得分差异。使用矿井索道的真实拖缆图像来测试新开发方法的有效性和优势。各种方法的比较表明,CDAE-iForest方法在具有少量缺陷训练数据的判别性特征学习和缺陷隔离方面表现更好。
    Visual perception-based methods are a promising means of capturing the surface damage state of wire ropes and hence provide a potential way to monitor the condition of wire ropes. Previous methods mainly concentrated on the handcrafted feature-based flaw representation, and a classifier was constructed to realize fault recognition. However, appearances of outdoor wire ropes are seriously affected by noises like lubricating oil, dust, and light. In addition, in real applications, it is difficult to prepare a sufficient amount of flaw data to train a fault classifier. In the context of these issues, this study proposes a new flaw detection method based on the convolutional denoising autoencoder (CDAE) and Isolation Forest (iForest). CDAE is first trained by using an image reconstruction loss. Then, it is finetuned to minimize a cost function that penalizes the iForest-based flaw score difference between normal data and flaw data. Real hauling rope images of mine cableways were used to test the effectiveness and advantages of the newly developed method. Comparisons of various methods showed the CDAE-iForest method performed better in discriminative feature learning and flaw isolation with a small amount of flaw training data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    For analyzing the traffic anomaly within dashcam videos from the perspective of ego-vehicles, the agent should spatial-temporally localize the abnormal occasion and regions and give a semantically recounting of what happened. Most existing formulations concentrate on the former spatial-temporal aspect and mainly approach this goal by training normal pattern classifiers/regressors/dictionaries with large-scale availably labeled data. However, anomalies are context-related, and it is difficult to distinguish the margin of abnormal and normal clearly. This paper proposes a progressive unsupervised driving anomaly detection and recounting (D&R) framework. The highlights are three-fold: (1) We formulate driving anomaly D&R as a temporal-spatial-semantic (TSS) model, which achieves a coarse-to-fine focusing and generates convincing driving anomaly D&R. (2) This work contributes an unsupervised D&R without any training data while performing an effective performance. (3) We novelly introduce the traffic saliency, isolation forest, visual semantic causal relations of driving scene to effectively construct the TSS model. Extensive experiments on a driving anomaly dataset with 106 video clips (temporal-spatial-semantically labeled carefully by ourselves) demonstrate superior performance over existing techniques.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

公众号