基于特征工程和强化学习的匿名流量检测 [J].Anonymous Traffic Detection Based on Feature Engineering and Reinforcement Learning.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

Anonymous networks, which aim primarily to protect user identities, have gained prominence as tools for enhancing network security and anonymity. Nonetheless, these networks have become a platform for adversarial affairs and sources of suspicious attack traffic. To defend against unpredictable adversaries on the Internet, detecting anonymous network traffic has emerged as a necessity. Many supervised approaches to identify anonymous traffic have harnessed machine learning strategies. However, many require access to engineered datasets and complex architectures to extract the desired information. Due to the resistance of anonymous network traffic to traffic analysis and the scarcity of publicly available datasets, those approaches may need to improve their training efficiency and achieve a higher performance when it comes to anonymous traffic detection. This study utilizes feature engineering techniques to extract pattern information and rank the feature importance of the static traces of anonymous traffic. To leverage these pattern attributes effectively, we developed a reinforcement learning framework that encompasses four key components: states, actions, rewards, and state transitions. A lightweight system is devised to classify anonymous and non-anonymous network traffic. Subsequently, two fine-tuned thresholds are proposed to substitute the traditional labels in a binary classification system. The system will identify anonymous network traffic without reliance on labeled data. The experimental results underscore that the system can identify anonymous traffic with an accuracy rate exceeding 80% (when based on pattern information).

摘要：

匿名网络,其主要目的是保护用户身份，作为增强网络安全和匿名性的工具，已经越来越突出。尽管如此，这些网络已经成为对抗事务和可疑攻击流量来源的平台。为了抵御互联网上不可预测的对手，检测匿名网络流量已经成为一种必要性。许多识别匿名流量的监督方法都利用了机器学习策略。然而,许多需要访问工程数据集和复杂的体系结构来提取所需的信息。由于匿名网络流量对流量分析的抵制以及公开可用数据集的稀缺，当涉及到匿名流量检测时，这些方法可能需要提高训练效率并实现更高的性能。本研究利用特征工程技术来提取模式信息，并对匿名流量的静态痕迹的特征重要性进行排名。为了有效地利用这些模式属性，我们开发了一个强化学习框架，包括四个关键组成部分：状态，行动,奖励,和状态转换。设计了一种轻量级系统来对匿名和非匿名网络流量进行分类。随后，提出了两个微调阈值来代替二元分类系统中的传统标签。该系统将识别匿名网络流量，而不依赖于标记的数据。实验结果强调，该系统可以识别匿名流量，准确率超过80％（当基于模式信息时）。