One-class classification

一类分类
  • 文章类型: Journal Article
    物联网(IoT)和工业物联网(IIoT)中互连设备的使用不断增加,显着提高了个人和工业环境中的效率和实用性,但也加剧了网络安全漏洞。特别是通过物联网恶意软件。本文探讨了一类分类的使用,一种无监督学习的方法,这特别适用于未标记的数据,动态环境,和恶意软件检测,这是异常检测的一种形式。我们介绍了TF-IDF方法,用于将标称特征转换为避免信息丢失并有效管理维度的数值格式,当与n-gram结合时,这对于增强模式识别至关重要。此外,我们比较了多类别与一类分类模型,包括隔离森林和深度自动编码器,使用良性和恶意NetFlow样本与只对良性NetFlow样本进行训练。我们使用单类分类在各种测试数据集上实现了100%的召回率,准确率高于80%和90%。这些模型显示了无监督学习的适应性,尤其是一类分类,物联网领域不断演变的恶意软件威胁,提供有关增强物联网安全框架的见解,并为这一关键领域的未来研究提出方向。
    The increasing usage of interconnected devices within the Internet of Things (IoT) and Industrial IoT (IIoT) has significantly enhanced efficiency and utility in both personal and industrial settings but also heightened cybersecurity vulnerabilities, particularly through IoT malware. This paper explores the use of one-class classification, a method of unsupervised learning, which is especially suitable for unlabeled data, dynamic environments, and malware detection, which is a form of anomaly detection. We introduce the TF-IDF method for transforming nominal features into numerical formats that avoid information loss and manage dimensionality effectively, which is crucial for enhancing pattern recognition when combined with n-grams. Furthermore, we compare the performance of multi-class vs. one-class classification models, including Isolation Forest and deep autoencoder, that are trained with both benign and malicious NetFlow samples vs. trained exclusively on benign NetFlow samples. We achieve 100% recall with precision rates above 80% and 90% across various test datasets using one-class classification. These models show the adaptability of unsupervised learning, especially one-class classification, to the evolving malware threats in the IoT domain, offering insights into enhancing IoT security frameworks and suggesting directions for future research in this critical area.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    我们提出了一种用于组织病理学图像中异常检测的系统。在组织学上,正常样本通常很丰富,而异常(病理)病例很少或没有。在这样的设置下,在健康数据上训练的一类分类器可以检测出分布异常样本。与图像的预训练卷积神经网络(CNN)表示组合的这种方法先前被用于异常检测(AD)。然而,预先训练的现成CNN表示可能对组织中的异常情况不敏感,而健康组织的自然变化可能会导致远处的表征。为了使表征适应健康组织中的相关细节,我们建议在区分不同物种的健康组织的辅助任务上训练CNN。器官,和染色试剂。几乎不需要额外的标签工作量,因为健康样本自动带有上述标签。在训练过程中,我们使用中心损失项强制压缩图像表示,这进一步改进了AD的表示。所提出的系统在已发布的肝脏异常数据集上优于已建立的AD方法。此外,它提供了与专门用于量化肝脏异常的常规方法相当的结果.我们表明,我们的方法可用于早期开发阶段的候选药物的毒性评估,从而可以减少昂贵的后期药物消耗。
    We present a system for anomaly detection in histopathological images. In histology, normal samples are usually abundant, whereas anomalous (pathological) cases are scarce or not available. Under such settings, one-class classifiers trained on healthy data can detect out-of-distribution anomalous samples. Such approaches combined with pre-trained Convolutional Neural Network (CNN) representations of images were previously employed for anomaly detection (AD). However, pre-trained off-the-shelf CNN representations may not be sensitive to abnormal conditions in tissues, while natural variations of healthy tissue may result in distant representations. To adapt representations to relevant details in healthy tissue we propose training a CNN on an auxiliary task that discriminates healthy tissue of different species, organs, and staining reagents. Almost no additional labeling workload is required, since healthy samples come automatically with aforementioned labels. During training we enforce compact image representations with a center-loss term, which further improves representations for AD. The proposed system outperforms established AD methods on a published dataset of liver anomalies. Moreover, it provided comparable results to conventional methods specifically tailored for quantification of liver anomalies. We show that our approach can be used for toxicity assessment of candidate drugs at early development stages and thereby may reduce expensive late-stage drug attrition.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    已开发出一种新的策略,以增强对有机类中全葡萄汁真实性的评估。这种方法基于对来自不同分析来源的数据的分析。该新方法采用了多重块回归技术,特别是一类偏最小二乘(OC-PLS)分类器,建立每个预测模块和响应变量之间的关系。在相对于先前的回归分数正交化之后执行顺序计算。所提出的方法已证明在检测目标样品中的有效性。测试集的最佳模型的结果具有高达100%的灵敏度,89%的特异性,83%的准确率。要与多块模型进行比较,采用DD-SIMCA方法,但是当应用于可见数据时,它产生了较差的结果。多块方法被证明可以有效地从不同来源的不同数据集到有机葡萄汁的分类进行评估。
    A new strategy has been developed to enhance the assessment of the authenticity of whole grape juice within the organic class. This approach is based on the analysis of data from different analytical sources. The novel method employs a multiblock regression technique, specifically the one-class partial least squares (OC-PLS) classifier, to establish a relationship between each predictor block and the response variable. Sequential calculations are performed after orthogonalization with respect to the preceding regression scores. The proposed method has demonstrated effectiveness in detecting targeted samples. The results achieved of the best models for the test set had rates of up to 100 % sensitivity, 89 % specificity, and 83 % accuracy. To compare with the multiblock models, the DD-SIMCA method was employed, but it yielded inferior results when applied to visible data. The multiblock approach proved to be efficient in evaluating from different datasets of varied sources to classification of organic grape juice.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    地理标志(GI)农产品具有特定的地理来源和高质量,重要的保护性商标需要有效的地理来源追溯方法。在这项研究中,通过脂肪酸谱和一类分类方法,包括数据驱动的类类比软独立建模(DD-SIMCA)和一类偏最小二乘(OCPLS),建立了常山山茶油的认证模型,并与传统的二类分类模型进行了比较。结果表明,3种二类分类模型的预测误差为63.8%,12.1%,目标地理来源的样本占65.2%,分别。相比之下,一类分类模型可以完全区分常山和非常山山茶油,甚至来自邻近的县。此外,与传统的矿物元素指标相比,脂肪酸谱建立的模型具有较高的敏感性和特异性。它还为其他高价值油或食品的地理来源鉴定提供了参考策略。
    Geographical Indication (GI) agricultural products possess specific geographical origins and high qualities, which require an effective geographical origin traceability method for the important protective trademarks. In this study, authentication models for Changshan camellia oil were developed by fatty acid profiles and one-class classification methods including data-driven soft independent modeling of class analogy (DD-SIMCA) and one-class partial least squares (OCPLS), and compared with traditional two-class classification models. The results indicated that the prediction errors of three two-class classification models were 63.8%, 12.1%, and 65.2% for the samples out of targeted geographical origins, respectively. By contrast, the one-class classification models could completely differentiate Changshan from non-Changshan camellia oils, even from the adjacent counties. Moreover, compared with traditional indicators of mineral elements, the model built by fatty acid profiles possessed higher sensitivity and specificity. It also offered a reference strategy for the geographical origin identification of other high-value oils or foods.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    无监督异常检测(UAD)方法仅使用正常(或健康)图像进行训练,但在测试过程中,他们能够对正常和异常(或疾病)图像进行分类。UAD是一种重要的医学图像分析(MIA)方法,适用于疾病筛查问题,因为可用于这些问题的训练集通常仅包含正常图像。然而,对正常图像的唯一依赖可能会导致学习无效的低维图像表示,这些图像表示不够敏感,无法检测和分割大小不同的看不见的异常病变,外观,和形状。使用自监督学习对UAD方法进行预训练,基于计算机视觉技术,可以减轻这一挑战,但是它们是次优的,因为它们不探索设计借口任务的领域知识,他们的对比学习损失不会试图对正常的训练图像进行聚类,这可能导致正常图像的稀疏分布,对异常检测无效。在本文中,我们为MIAUAD应用提出了一种新的自监督预训练方法,名为伪多类强增强通过对比学习(PMSACL)。PMSACL包含一种新颖的优化方法,可以将正常图像类与多个伪类的合成异常图像进行对比。强制每个类在特征空间中形成一个密集的集群。在实验中,我们表明,我们的PMSACL预训练提高了SOTAUAD方法的准确性在许多MIA基准使用结肠镜检查,眼底筛查和Covid-19胸部X射线数据集。
    Unsupervised anomaly detection (UAD) methods are trained with normal (or healthy) images only, but during testing, they are able to classify normal and abnormal (or disease) images. UAD is an important medical image analysis (MIA) method to be applied in disease screening problems because the training sets available for those problems usually contain only normal images. However, the exclusive reliance on normal images may result in the learning of ineffective low-dimensional image representations that are not sensitive enough to detect and segment unseen abnormal lesions of varying size, appearance, and shape. Pre-training UAD methods with self-supervised learning, based on computer vision techniques, can mitigate this challenge, but they are sub-optimal because they do not explore domain knowledge for designing the pretext tasks, and their contrastive learning losses do not try to cluster the normal training images, which may result in a sparse distribution of normal images that is ineffective for anomaly detection. In this paper, we propose a new self-supervised pre-training method for MIA UAD applications, named Pseudo Multi-class Strong Augmentation via Contrastive Learning (PMSACL). PMSACL consists of a novel optimisation method that contrasts a normal image class from multiple pseudo classes of synthesised abnormal images, with each class enforced to form a dense cluster in the feature space. In the experiments, we show that our PMSACL pre-training improves the accuracy of SOTA UAD methods on many MIA benchmarks using colonoscopy, fundus screening and Covid-19 Chest X-ray datasets.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    已经表明,无监督的离群检测方法可以适应一类分类问题(Janssens和Postma,在:第18届比利时-荷兰年度机器学习论文集,第56-64页,2009年;Janssens等人。在:2009年ICMLA机器学习与应用国际会议论文集,IEEE计算机学会,pp147-153,2009.10.1109/ICMLA.2009.16)。在本文中,我们专注于一类分类算法与这种适应的无监督异常检测方法的比较,在几个重要方面改进了以前的比较研究。我们在严格的实验设置中研究了许多一类分类和无监督的离群检测方法,将它们在具有不同特征的大量数据集上进行比较,使用不同的绩效指标。与以前的比较研究相比,其中模型(算法,参数)是通过使用来自两个类(离群值和离群值)的示例来选择的,在这里,我们还研究和比较了在没有离群类示例的情况下选择模型的不同方法,这对于实际应用来说更现实,因为标记的异常值很少可用。我们的研究结果表明,总的来说,SVDD和GMM表现最好,无论是否将地面实况用于参数选择。然而,在特定的应用场景中,其他方法表现出更好的性能。将一类分类器组合成集合在准确性方面比单个方法表现出更好的性能,只要合奏成员选择得当。
    在线版本包含补充材料,可在10.1007/s10618-023-00931-x获得。
    It has been shown that unsupervised outlier detection methods can be adapted to the one-class classification problem (Janssens and Postma, in: Proceedings of the 18th annual Belgian-Dutch on machine learning, pp 56-64, 2009; Janssens et al. in: Proceedings of the 2009 ICMLA international conference on machine learning and applications, IEEE Computer Society, pp 147-153, 2009. 10.1109/ICMLA.2009.16). In this paper, we focus on the comparison of one-class classification algorithms with such adapted unsupervised outlier detection methods, improving on previous comparison studies in several important aspects. We study a number of one-class classification and unsupervised outlier detection methods in a rigorous experimental setup, comparing them on a large number of datasets with different characteristics, using different performance measures. In contrast to previous comparison studies, where the models (algorithms, parameters) are selected by using examples from both classes (outlier and inlier), here we also study and compare different approaches for model selection in the absence of examples from the outlier class, which is more realistic for practical applications since labeled outliers are rarely available. Our results showed that, overall, SVDD and GMM are top-performers, regardless of whether the ground truth is used for parameter selection or not. However, in specific application scenarios, other methods exhibited better performance. Combining one-class classifiers into ensembles showed better performance than individual methods in terms of accuracy, as long as the ensemble members are properly selected.
    UNASSIGNED: The online version contains supplementary material available at 10.1007/s10618-023-00931-x.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    缺陷检查对于确保工业制造的一致质量和效率很重要。最近,集成了基于人工智能(AI)的检测算法的机器视觉系统在各种应用中表现出了有希望的性能,但实际上,他们经常遭受数据不平衡的困扰。本文提出了一种使用单类分类(OCC)模型来处理不平衡数据集的缺陷检查方法。提出了一种由全局和局部特征提取器网络组成的双流网络体系结构,可以缓解OCC的表示崩溃问题。通过将面向对象的不变特征向量与面向训练数据的局部特征向量相结合,提出的双流网络模型可以防止决策边界折叠到训练数据集,并获得适当的决策边界。在汽车安全气囊支架焊接缺陷检查的实际应用中证明了该模型的性能。通过使用在受控实验室环境中和从生产现场收集的图像样本,阐明了分类层和两流网络体系结构对整体检查精度的影响。将结果与以前的分类模型进行比较,证明了所提出的模型可以提高精度,精度,F1得分高达8.19%,10.74%,和4.02%,分别。
    Defect inspection is important to ensure consistent quality and efficiency in industrial manufacturing. Recently, machine vision systems integrating artificial intelligence (AI)-based inspection algorithms have exhibited promising performance in various applications, but practically, they often suffer from data imbalance. This paper proposes a defect inspection method using a one-class classification (OCC) model to deal with imbalanced datasets. A two-stream network architecture consisting of global and local feature extractor networks is presented, which can alleviate the representation collapse problem of OCC. By combining an object-oriented invariant feature vector with a training-data-oriented local feature vector, the proposed two-stream network model prevents the decision boundary from collapsing to the training dataset and obtains an appropriate decision boundary. The performance of the proposed model is demonstrated in the practical application of automotive-airbag bracket-welding defect inspection. The effects of the classification layer and two-stream network architecture on the overall inspection accuracy were clarified by using image samples collected in a controlled laboratory environment and from a production site. The results are compared with those of a previous classification model, demonstrating that the proposed model can improve the accuracy, precision, and F1 score by up to 8.19%, 10.74%, and 4.02%, respectively.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Cachaça是从甘蔗汁(甘蔗酒精)发酵获得的巴西饮料,被认为是世界上消费最多的酒精饮料之一,对巴西东北部具有强大的经济影响,更具体地说是Brejo。这个微区域生产的甘蔗烈酒具有与饮食条件相关的高质量。在这个意义上,使用无溶剂的样品鉴定和质量控制分析,环保,快速和非破坏性的方法是有利于cachaça生产者和生产链。因此,在这项工作中,使用近红外光谱(NIRS)根据地理来源使用一类分类数据驱动的类类比软独立建模(DD-SIMCA)和一类偏最小二乘(OCPLS)和基于不同化学计量学算法的酒精含量和密度的预测质量参数。总共从巴西零售市场购买了150个甘蔗酒样品,其中100个来自Brejo,50个来自巴西其他地区。使用具有一阶导数的Savitzky-Golay导数,通过DD-SIMCA获得了一类化学计量学分类模型,9点窗口和1次多项式作为预处理算法,在7,290-11,726cm-1的光谱范围内,灵敏度为96.70%,特异性为100%。在密度和化学计量模型的模型构造中获得了令人满意的结果,以基线偏移为预处理的iSPA-PLS算法,预测均方根误差(RMSEP)为0.0011mg/L,预测相对误差(REP)为0.12%。用于酒精含量预测的化学计量模型使用具有一阶导数的Savitzky-Golay导数的iSPA-PLS算法,9点窗口和1次多项式作为算法作为预处理获得0.69和1.81%(v/v)的RMSEP和REP,分别。两种型号都使用了7,290-11,726cm-1的光谱范围。结果反映了振动光谱与化学计量学相结合的潜力,可以建立可靠的模型来识别Cachaça样品的地理来源,以预测Cachaça样品的质量参数。
    Cachaça is a Brazilian beverage obtained from the fermentation of sugarcane juice (sugarcane spirit) and is considered one of the most consumed alcoholic beverages in the world with a strong economic impact on the northeastern Brazil, more specifically in the Brejo. This microregion produces sugarcane spirits with high quality associated to edaphoclimatic conditions. In this sense, analysis for sample authentication and quality control that uses solvent-free, environmentally friendly, rapid and non-destructive methods is advantageous for cachaça producers and production chain. Thus, in this work commercial cachaça samples using near-infrared spectroscopy (NIRS) were classified based on geographical origin using one-class classification Data-Driven in Soft Independent Modelling of Class Analogy (DD-SIMCA) and One-Class Partial Least Squares (OCPLS) and predicted quality parameters of alcohol content and density based on different chemometric algorithms. A total of 150 sugarcane spirits samples were purchased from the Brazilian retail market being 100 from Brejo and 50 from other regions of Brazil. The one-class chemometric classification model was obtained with DD-SIMCA using the Savitzky-Golay derivative with first derivative, 9-point window and 1st degree polynomial as preprocessing algorithm and sensibility was 96.70 % and specificity 100 % in the spectral range 7,290-11,726 cm-1. Satisfactory results were obtained in the model constructs for density and the chemometric model, iSPA-PLS algorithm with baseline offset as preprocessing, obtained root mean square errors of prediction (RMSEP) of 0.0011 mg/L and Relative Error of Prediction (REP) of 0.12 %. The chemometric model for alcohol content prediction used the iSPA-PLS algorithm with Savitzky-Golay derivative with first derivative, 9-point window and 1st degree polynomial as algorithm as preprocessing obtaining RMSEP and REP of 0.69 and 1.81 % (v/v), respectively. Both models used the spectral range from 7,290-11,726 cm-1. The results reflected the potential of vibrational spectroscopy coupled with chemometrics to build reliable models for identifying the geographical origin of cachaça samples for predicting quality parameters in cachaça samples.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    近年来,复制检测模式(CDP)作为物理世界和数字世界之间的纽带引起了很多关注,这对于物联网和品牌保护应用非常感兴趣。然而,CDP在未经授权方的可重复性或克隆性方面的安全性仍未得到很大程度的探索。在这方面,本文解决了物理对象的防伪问题,旨在从机器学习的角度研究现代CDP的身份验证方面和对非法复制的抵抗力。当代码在工业打印机上打印并在常规光照条件下通过现代移动电话注册时,特别注意在真实验证条件下的可靠身份验证。从(i)作为基准方法的多类监督分类和(ii)作为真实的一类分类的角度,针对四种类型的复制假进行了CDP认证方面的理论和实证研究生活应用案例。获得的结果表明,现代机器学习方法和现代移动电话的技术能力允许在所考虑的假货类别下可靠地对最终用户移动电话上的CDP进行身份验证。
    In the recent years, the copy detection patterns (CDP) attracted a lot of attention as a link between the physical and digital worlds, which is of great interest for the internet of things and brand protection applications. However, the security of CDP in terms of their reproducibility by unauthorized parties or clonability remains largely unexplored. In this respect, this paper addresses a problem of anti-counterfeiting of physical objects and aims at investigating the authentication aspects and the resistances to illegal copying of the modern CDP from machine learning perspectives. A special attention is paid to a reliable authentication under the real-life verification conditions when the codes are printed on an industrial printer and enrolled via modern mobile phones under regular light conditions. The theoretical and empirical investigation of authentication aspects of CDP is performed with respect to four types of copy fakes from the point of view of (i) multi-class supervised classification as a baseline approach and (ii) one-class classification as a real-life application case. The obtained results show that the modern machine-learning approaches and the technical capacities of modern mobile phones allow to reliably authenticate CDP on end-user mobile phones under the considered classes of fakes.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    网络入侵检测技术是物联网(IoT)网络安全的关键。传统的以二进制或多分类为目标的入侵检测系统可以检测已知的攻击,但是很难抵抗未知的攻击(例如零日攻击)。未知攻击需要安全专家确认并重新训练模型,但新的模式不保持最新。本文提出了一种使用单类双向GRU自动编码器和集成学习的轻量级智能NIDS。它不仅可以准确识别正常和异常数据,但也将未知攻击识别为与已知攻击最相似的类型。首先,介绍了一种基于双向GRU自动编码器的单类分类模型。这个模型是用正常数据训练的,并且在异常数据和未知攻击数据的情况下具有较高的预测精度。第二,提出了一种基于集成学习的多分类识别方法。它使用软投票来评估各种基本分类器的结果,并将未知攻击(新颖性数据)识别为与已知攻击最相似的类型,因此异常分类变得更加准确。在WSN-DS上进行了实验,UNSW-NB15和KDDCUP99数据集,所提出的模型在三个数据集中的识别率提高到97.91%,98.92%,和98.23%。结果验证了该方法的可行性,效率,算法的可移植性。
    Network intrusion detection technology is key to cybersecurity regarding the Internet of Things (IoT). The traditional intrusion detection system targeting Binary or Multi-Classification can detect known attacks, but it is difficult to resist unknown attacks (such as zero-day attacks). Unknown attacks require security experts to confirm and retrain the model, but new models do not keep up to date. This paper proposes a Lightweight Intelligent NIDS using a One-Class Bidirectional GRU Autoencoder and Ensemble Learning. It can not only accurately identify normal and abnormal data, but also identify unknown attacks as the type most similar to known attacks. First, a One-Class Classification model based on a Bidirectional GRU Autoencoder is introduced. This model is trained with normal data, and has high prediction accuracy in the case of abnormal data and unknown attack data. Second, a multi-classification recognition method based on ensemble learning is proposed. It uses Soft Voting to evaluate the results of various base classifiers, and identify unknown attacks (novelty data) as the type most similar to known attacks, so that exception classification becomes more accurate. Experiments are conducted on WSN-DS, UNSW-NB15, and KDD CUP99 datasets, and the recognition rates of the proposed models in the three datasets are raised to 97.91%, 98.92%, and 98.23% respectively. The results verify the feasibility, efficiency, and portability of the algorithm proposed in the paper.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号