malware detection

恶意软件检测
  • 文章类型: Journal Article
    内存取证与深度学习相结合的恶意软件检测技术取得了一定进展,但是大多数现有方法将进程转储转换为图像进行分类,仍基于进程字节特征分类。恶意软件加载到内存后,原来的字节特征将改变。与字节特征相比,函数调用功能可以更有力地表示恶意软件的行为。因此,本文提出了ProcGCN模型,基于DGCNN(深图卷积神经网络)的深度学习模型,检测内存映像中的恶意进程。首先,从整个系统内存映像中提取进程转储;然后,提取过程的函数调用图(FCG),和基于词袋模型生成FCG中函数节点的特征向量;最后,FCG被输入到ProcGCN模型用于分类和检测。使用公共数据集进行实验,ProcGCN模型的准确率为98.44%,F1得分为0.9828.它显示了比现有的基于静态特征的深度学习方法更好的结果,它的检测速度更快,函数调用特征和图表示学习方法在记忆取证中的有效性。
    The combination of memory forensics and deep learning for malware detection has achieved certain progress, but most existing methods convert process dump to images for classification, which is still based on process byte feature classification. After the malware is loaded into memory, the original byte features will change. Compared with byte features, function call features can represent the behaviors of malware more robustly. Therefore, this article proposes the ProcGCN model, a deep learning model based on DGCNN (Deep Graph Convolutional Neural Network), to detect malicious processes in memory images. First, the process dump is extracted from the whole system memory image; then, the Function Call Graph (FCG) of the process is extracted, and feature vectors for the function node in the FCG are generated based on the word bag model; finally, the FCG is input to the ProcGCN model for classification and detection. Using a public dataset for experiments, the ProcGCN model achieved an accuracy of 98.44% and an F1 score of 0.9828. It shows a better result than the existing deep learning methods based on static features, and its detection speed is faster, which demonstrates the effectiveness of the method based on function call features and graph representation learning in memory forensics.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    物联网(IoT)和工业物联网(IIoT)中互连设备的使用不断增加,显着提高了个人和工业环境中的效率和实用性,但也加剧了网络安全漏洞。特别是通过物联网恶意软件。本文探讨了一类分类的使用,一种无监督学习的方法,这特别适用于未标记的数据,动态环境,和恶意软件检测,这是异常检测的一种形式。我们介绍了TF-IDF方法,用于将标称特征转换为避免信息丢失并有效管理维度的数值格式,当与n-gram结合时,这对于增强模式识别至关重要。此外,我们比较了多类别与一类分类模型,包括隔离森林和深度自动编码器,使用良性和恶意NetFlow样本与只对良性NetFlow样本进行训练。我们使用单类分类在各种测试数据集上实现了100%的召回率,准确率高于80%和90%。这些模型显示了无监督学习的适应性,尤其是一类分类,物联网领域不断演变的恶意软件威胁,提供有关增强物联网安全框架的见解,并为这一关键领域的未来研究提出方向。
    The increasing usage of interconnected devices within the Internet of Things (IoT) and Industrial IoT (IIoT) has significantly enhanced efficiency and utility in both personal and industrial settings but also heightened cybersecurity vulnerabilities, particularly through IoT malware. This paper explores the use of one-class classification, a method of unsupervised learning, which is especially suitable for unlabeled data, dynamic environments, and malware detection, which is a form of anomaly detection. We introduce the TF-IDF method for transforming nominal features into numerical formats that avoid information loss and manage dimensionality effectively, which is crucial for enhancing pattern recognition when combined with n-grams. Furthermore, we compare the performance of multi-class vs. one-class classification models, including Isolation Forest and deep autoencoder, that are trained with both benign and malicious NetFlow samples vs. trained exclusively on benign NetFlow samples. We achieve 100% recall with precision rates above 80% and 90% across various test datasets using one-class classification. These models show the adaptability of unsupervised learning, especially one-class classification, to the evolving malware threats in the IoT domain, offering insights into enhancing IoT security frameworks and suggesting directions for future research in this critical area.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    支持AI的物联网(IoT)设备的快速扩展带来了重大的安全挑战,影响隐私和组织资源。物联网设备产生的大数据的动态增加带来了一个持续的问题,特别是在根据持续增长的数据做出决策时。为了在动态环境中应对这一挑战,本研究介绍了专门为IoT场景设计的基于BERT的前馈神经网络框架(BEFNet)。在这次评估中,采用具有不同模块的新颖框架对8个数据集进行全面分析,每个代表不同类型的恶意软件。BEFSONet使用斑点鬣狗优化器(SO)进行了优化,突出了它对不同形状的恶意软件数据的适应性。彻底的探索性分析和比较性评估强调了BEFSONet的卓越绩效指标,达到97.99%的准确度,97.96马修斯相关系数,97%F1分数,98.37%ROC曲线下面积(AUC-ROC),和95.89科恩的卡帕。这项研究将BEFSONet定位为IoT安全时代的强大防御机制,为动态决策环境中不断变化的挑战提供有效的解决方案。
    The rapid expansion of AI-enabled Internet of Things (IoT) devices presents significant security challenges, impacting both privacy and organizational resources. The dynamic increase in big data generated by IoT devices poses a persistent problem, particularly in making decisions based on the continuously growing data. To address this challenge in a dynamic environment, this study introduces a specialized BERT-based Feed Forward Neural Network Framework (BEFNet) designed for IoT scenarios. In this evaluation, a novel framework with distinct modules is employed for a thorough analysis of 8 datasets, each representing a different type of malware. BEFSONet is optimized using the Spotted Hyena Optimizer (SO), highlighting its adaptability to diverse shapes of malware data. Thorough exploratory analyses and comparative evaluations underscore BEFSONet\'s exceptional performance metrics, achieving 97.99% accuracy, 97.96 Matthews Correlation Coefficient, 97% F1-Score, 98.37% Area under the ROC Curve(AUC-ROC), and 95.89 Cohen\'s Kappa. This research positions BEFSONet as a robust defense mechanism in the era of IoT security, offering an effective solution to evolving challenges in dynamic decision-making environments.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    近年来,互联网已成为知识和交流的重要来源。不断的技术进步改变了企业的运作方式,今天的每个人都生活在数字工程世界中。由于物联网(IoT)及其应用,人们对信息革命的印象有所改善。在网络安全领域,恶意软件检测和分类越来越成为一个问题。因此,互联网上强大的安全性可以保护数十亿互联网用户免受有害行为的侵害。在恶意软件检测和分类技术中,使用了几种类型的深度学习模型;然而,他们仍然有局限性。本研究将使用现代机器学习(ML)方法探索恶意软件检测和分类元素,包括K-最近邻居(KNN),额外的树(ET),随机森林(RF),逻辑回归(LR),决策树(DT)和神经网络多层感知器(nnMLP)。拟议的研究使用公开可用的数据集UNSWNB15。在我们提议的工作中,我们应用特征编码方法将我们的数据集转换为纯数值。之后,我们应用了一种基于熵的特征选择方法,称为术语频率-逆文档频率(TFIDF),以获得最佳特征选择。然后平衡数据集并提供给ML模型进行分类。研究得出的结论是,随机森林,在所有测试的ML模型中,最佳准确率为97.68%。
    The Internet has become a vital source of knowledge and communication in recent times. Continuous technological advancements have changed the way businesses operate, and everyone today lives in the digital world of engineering. Because of the Internet of Things (IoT) and its applications, people\'s impressions of the information revolution have improved. Malware detection and categorization are becoming more of a problem in the cybersecurity world. As a result, strong security on the Internet could protect billions of internet users from harmful behavior. In malware detection and classification techniques, several types of deep learning models are used; however, they still have limitations. This study will explore malware detection and classification elements using modern machine learning (ML) approaches, including K-Nearest Neighbors (KNN), Extra Tree (ET), Random Forest (RF), Logistic Regression (LR), Decision Tree (DT), and neural network Multilayer Perceptron (nnMLP). The proposed study uses the publicly available dataset UNSWNB15. In our proposed work, we applied the feature encoding method to convert our dataset into purely numeric values. After that, we applied a feature selection method named Term Frequency-Inverse Document Frequency (TFIDF) based on entropy for the best feature selection. The dataset is then balanced and provided to the ML models for classification. The study concludes that Random Forest, out of all tested ML models, yielded the best accuracy of 97.68 %.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    近年来,对计算机系统的恶意软件攻击的数量和复杂性显着增加。恶意软件作者用来逃避检测和分析的一种技术,被称为天堂之门,使64位代码能够在32位进程中运行。Heaven\'sGate利用操作系统中的一项功能,该功能允许在执行期间从32位模式过渡到64位模式,使恶意软件能够逃避旨在仅监视32位进程的安全软件的检测。天堂之门对现有的安全工具提出了重大挑战,包括动态二进制仪器(DBI)工具,广泛用于程序分析,开箱,和去虚拟化。在本文中,我们提供了天堂门技术的全面分析。我们还提出了一种使用黑盒测试绕过天堂之门技术的新方法。我们的实验结果表明,该方法有效地绕过并防止了Heaven\'sGate技术,并增强了DBI工具对抗高级恶意软件威胁的能力。
    In recent years, the number and sophistication of malware attacks on computer systems have increased significantly. One technique employed by malware authors to evade detection and analysis, known as Heaven\'s Gate, enables 64-bit code to run within a 32-bit process. Heaven\'s Gate exploits a feature in the operating system that allows the transition from a 32-bit mode to a 64-bit mode during execution, enabling the malware to evade detection by security software designed to monitor only 32-bit processes. Heaven\'s Gate poses significant challenges for existing security tools, including dynamic binary instrumentation (DBI) tools, widely used for program analysis, unpacking, and de-virtualization. In this paper, we provide a comprehensive analysis of the Heaven\'s Gate technique. We also propose a novel approach to bypass the Heaven\'s Gate technique using black-box testing. Our experimental results show that the proposed approach effectively bypasses and prevents the Heaven\'s Gate technique and strengthens the capabilities of DBI tools in combating advanced malware threats.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    高质量的数据集对于构建现实和高性能的受监督恶意软件检测模型至关重要。目前,基于机器学习的解决方案的主要挑战之一是既具有代表性又具有高质量的数据集的稀缺性。促进未来的研究,并为现有分类器的综合评估和比较提供更新和公开的数据,我们引入MH-100K数据集[1],广泛的Android恶意软件信息集合,包括101,975个样本。它包含一个包含有价值的元数据的主要CSV文件,包括SHA256哈希(APK\的签名),文件名,软件包名称,Android的官方编译API,166个权限,24,417个API调用,250种意图此外,MH-100K数据集具有广泛的文件集合,其中包含VirusTotal1分析的有用元数据。该信息库可以通过分析防病毒扫描结果模式来识别各种恶意软件家族的流行和行为,从而为未来的研究服务。这样的分析可以帮助扩展现有的恶意软件分类,新变体的鉴定,以及恶意软件随时间演变的探索。
    High-quality datasets are crucial for building realistic and high-performance supervised malware detection models. Currently, one of the major challenges of machine learning-based solutions is the scarcity of datasets that are both representative and of high quality. To foster future research and provide updated and public data for comprehensive evaluation and comparison of existing classifiers, we introduce the MH-100K dataset [1], an extensive collection of Android malware information comprising 101,975 samples. It encompasses a main CSV file with valuable metadata, including the SHA256 hash (APK\'s signature), file name, package name, Android\'s official compilation API, 166 permissions, 24,417 API calls, and 250 intents. Moreover, the MH-100K dataset features an extensive collection of files containing useful metadata of the VirusTotal1 analysis. This repository of information can serve future research by enabling the analysis of antivirus scan result patterns to discern the prevalence and behaviour of various malware families. Such analysis can help to extend existing malware taxonomies, the identification of novel variants, and the exploration of malware evolution over time.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    物联网(IoT)环境需要恶意软件检测(MD)框架来保护敏感数据免受未经授权的访问。该研究旨在开发基于图像的MD框架。作者应用图像转换和增强技术将恶意软件二进制文件转换为RGB图像。你只看一次(YoloV7)用于从恶意软件图像中提取关键特征。HarrisHawks优化用于优化DenseNet161模型,以将图像分类为恶意软件和良性。利用IoT恶意软件和Virusshare数据集来评估建议框架的性能。结果表明,所提出的框架优于当前的MD框架。对于IoT恶意软件和Virusshare数据集,该框架以98.65和98.5以及97.3和96.63的准确性和F1分数生成结果。分别。此外,它实现了接收器操作特性下的区域以及IoT恶意软件和Virusshare数据集的0.98和0.85以及0.97和0.84的精确召回曲线,因此。研究结果表明,所提出的框架可以部署在物联网环境中以保护资源。
    The Internet of Things (IoT) environment demands a malware detection (MD) framework for protecting sensitive data from unauthorized access. The study intends to develop an image-based MD framework. The authors apply image conversion and enhancement techniques to convert malware binaries into RGB images. You only look once (Yolo V7) is employed for extracting the key features from the malware images. Harris Hawks optimization is used to optimize the DenseNet161 model to classify images into malware and benign. IoT malware and Virusshare datasets are utilized to evaluate the proposed framework\'s performance. The outcome reveals that the proposed framework outperforms the current MD framework. The framework generates the outcome at an accuracy and F1-score of 98.65 and 98.5 and 97.3 and 96.63 for IoT malware and Virusshare datasets, respectively. In addition, it achieves an area under the receiver operating characteristics and the precision-recall curve of 0.98 and 0.85 and 0.97 and 0.84 for IoT malware and Virusshare datasets, accordingly. The study\'s outcome reveals that the proposed framework can be deployed in the IoT environment to protect the resources.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在机器学习中,用于构建模型的数据集是限制这些模型可以实现什么以及它们的预测性能有多好的主要因素之一。用于网络安全或计算机安全的机器学习应用众多,包括通过模式识别缓解网络威胁和增强安全基础设施。实时攻击检测,和深度渗透测试。因此,特别是对于这些应用,必须仔细考虑用于构建模型的数据集,以代表现实世界的数据。然而,由于标记数据的稀缺性和手动标记积极例子的成本,有越来越多的文献利用半监督学习与网络安全数据存储库。在这项工作中,我们全面概述了用于构建基于半监督学习的计算机安全或网络安全系统的公开数据存储库和数据集,其中只有几个标签是必要的或可用的,用于构建强模型。我们强调了数据存储库和数据集的优势和局限性,并提供了用于评估构建模型的性能评估指标的分析。最后,我们讨论了开放的挑战,并为使用网络安全数据集和评估基于它们的模型提供了未来的研究方向。
    In Machine Learning, the datasets used to build models are one of the main factors limiting what these models can achieve and how good their predictive performance is. Machine Learning applications for cyber-security or computer security are numerous including cyber threat mitigation and security infrastructure enhancement through pattern recognition, real-time attack detection, and in-depth penetration testing. Therefore, for these applications in particular, the datasets used to build the models must be carefully thought to be representative of real-world data. However, because of the scarcity of labelled data and the cost of manually labelling positive examples, there is a growing corpus of literature utilizing Semi-Supervised Learning with cyber-security data repositories. In this work, we provide a comprehensive overview of publicly available data repositories and datasets used for building computer security or cyber-security systems based on Semi-Supervised Learning, where only a few labels are necessary or available for building strong models. We highlight the strengths and limitations of the data repositories and sets and provide an analysis of the performance assessment metrics used to evaluate the built models. Finally, we discuss open challenges and provide future research directions for using cyber-security datasets and evaluating models built upon them.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在线活动和物联网(IoT)的巨大增长导致了网络攻击的增加。恶意软件几乎渗透到每个家庭中的至少一个设备。近年来,人们发现了各种使用浅层或深层IoT技术的恶意软件检测方法。具有可视化方法的深度学习模型是大多数作品中最常用和最常用的策略。该方法具有自动提取特征的优点,需要较少的技术专长,并在数据处理期间使用更少的资源。训练有效泛化而不过度拟合的深度学习模型对于大型数据集和复杂架构是不可行或不合适的。在本文中,一个新颖的合奏模型,堆叠的Ensemble-自动编码器,GRU,和MLP或SE-AGM,由三个轻量级神经网络模型-自动编码器组成,GRU,提出了在基准MalImg数据集的25个基本和编码提取特征上训练的MLP,以进行分类。由于GRU模型在该领域的使用较少,因此对其在恶意软件检测中的适用性进行了测试。所提出的模型使用一组简洁的恶意软件特征来训练和分类恶意软件类别,与其他现有模型相比,这减少了时间和资源消耗。新颖之处在于堆叠集成方法,其中一个中间模型的输出用作下一个模型的输入,从而完善了与集成方法的一般概念相比的特征。灵感来自早期基于图像的恶意软件检测工作和迁移学习思想。要从MalImg数据集提取特征,使用基于CNN的迁移学习模型,该模型是在领域数据上从头开始训练的。数据增强是图像处理阶段的重要步骤,用于研究其对MalImg数据集中的灰度恶意软件图像分类的影响。SE-AGM在基准MalImg数据集上优于现有方法,平均准确率为99.43%,证明我们的方法与他们相当甚至超过他们。
    The tremendous growth in online activity and the Internet of Things (IoT) led to an increase in cyberattacks. Malware infiltrated at least one device in almost every household. Various malware detection methods that use shallow or deep IoT techniques were discovered in recent years. Deep learning models with a visualization method are the most commonly and popularly used strategy in most works. This method has the benefit of automatically extracting features, requiring less technical expertise, and using fewer resources during data processing. Training deep learning models that generalize effectively without overfitting is not feasible or appropriate with large datasets and complex architectures. In this paper, a novel ensemble model, Stacked Ensemble-autoencoder, GRU, and MLP or SE-AGM, composed of three light-weight neural network models-autoencoder, GRU, and MLP-that is trained on the 25 essential and encoded extracted features of the benchmark MalImg dataset for classification was proposed. The GRU model was tested for its suitability in malware detection due to its lesser usage in this domain. The proposed model used a concise set of malware features for training and classifying the malware classes, which reduced the time and resource consumption in comparison to other existing models. The novelty lies in the stacked ensemble method where the output of one intermediate model works as input for the next model, thereby refining the features as compared to the general notion of an ensemble approach. Inspiration was drawn from earlier image-based malware detection works and transfer learning ideas. To extract features from the MalImg dataset, a CNN-based transfer learning model that was trained from scratch on domain data was used. Data augmentation was an important step in the image processing stage to investigate its effect on classifying grayscale malware images in the MalImg dataset. SE-AGM outperformed existing approaches on the benchmark MalImg dataset with an average accuracy of 99.43%, demonstrating that our method was on par with or even surpassed them.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    人们在日常生活中使用的大多数服务的数字化,其中,导致对网络安全的需求增加。随着数字工具的日益增加和新的软件和硬件开箱推出,检测已知的现有漏洞,或者众所周知的零日,成为网络安全专家最具挑战性的情况之一。零日漏洞,几乎可以在每个新推出的软件和/或硬件中找到,可以被具有不同动机的恶意行为者立即利用,对最终用户构成威胁。在这种情况下,这项研究提出并描述了一种从零日类型的产生开始的整体方法,然而现实的,表格格式的数据,并总结神经网络零日攻击检测器的评估,该检测器在有和没有合成数据的情况下进行训练。该方法涉及生成对抗网络(GAN)的设计和使用,用于综合生成零日攻击数据的新的和更大的数据集。新产生的,ZDGAN(ZDGAN)然后使用数据集来训练和评估零日攻击的神经网络分类器。结果表明,表格格式的零日攻击数据的生成经过约5000次迭代达到均衡,并产生与原始数据样本几乎相同的数据。最后但并非最不重要的,应该提到的是,使用包含ZDGAN生成的样本的数据集进行训练的神经网络模型在仅使用原始数据集进行训练时优于相同的模型,并且获得了高验证准确性和最小验证损失的结果。
    Digitization of most of the services that people use in their everyday life has, among others, led to increased needs for cybersecurity. As digital tools increase day by day and new software and hardware launch out-of-the box, detection of known existing vulnerabilities, or zero-day as they are commonly known, becomes one of the most challenging situations for cybersecurity experts. Zero-day vulnerabilities, which can be found in almost every new launched software and/or hardware, can be exploited instantly by malicious actors with different motives, posing threats for end-users. In this context, this study proposes and describes a holistic methodology starting from the generation of zero-day-type, yet realistic, data in tabular format and concluding to the evaluation of a Neural Network zero-day attacks\' detector which is trained with and without synthetic data. This methodology involves the design and employment of Generative Adversarial Networks (GANs) for synthetically generating a new and larger dataset of zero-day attacks data. The newly generated, by the Zero-Day GAN (ZDGAN), dataset is then used to train and evaluate a Neural Network classifier for zero-day attacks. The results show that the generation of zero-day attacks data in tabular format reaches an equilibrium after about 5000 iterations and produces data that are almost identical to the original data samples. Last but not least, it should be mentioned that the Neural Network model that was trained with the dataset containing the ZDGAN generated samples outperformed the same model when the later was trained with only the original dataset and achieved results of high validation accuracy and minimal validation loss.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号