Mesh : Software Internet Algorithms Humans Deep Learning

来  源:   DOI:10.1371/journal.pone.0304066   PDF(Pubmed)

Abstract:
In recent years, with the development of the Internet, the attribution classification of APT malware remains an important issue in society. Existing methods have yet to consider the DLL link library and hidden file address during the execution process, and there are shortcomings in capturing the local and global correlation of event behaviors. Compared to the structural features of binary code, opcode features reflect the runtime instructions and do not consider the issue of multiple reuse of local operation behaviors within the same APT organization. Obfuscation techniques more easily influence attribution classification based on single features. To address the above issues, (1) an event behavior graph based on API instructions and related operations is constructed to capture the execution traces on the host using the GNNs model. (2) ImageCNTM captures the local spatial correlation and continuous long-term dependency of opcode images. (3) The word frequency and behavior features are concatenated and fused, proposing a multi-feature, multi-input deep learning model. We collected a publicly available dataset of APT malware to evaluate our method. The attribution classification results of the model based on a single feature reached 89.24% and 91.91%. Finally, compared to single-feature classifiers, the multi-feature fusion model achieves better classification performance.
摘要:
近年来,随着互联网的发展,APT恶意软件的属性分类仍然是社会上的一个重要问题。现有方法在执行过程中还没有考虑DLL链接库和隐藏文件地址,并且在捕捉事件行为的局部和全局相关性方面存在缺陷。与二进制代码的结构特点相比,操作码功能反映了运行时指令,并且不考虑同一APT组织内本地操作行为的多次重用问题。混淆技术更容易影响基于单一特征的属性分类。为了解决上述问题,(1)构建基于API指令和相关操作的事件行为图,利用GNNs模型捕获主机上的执行轨迹。(2)ImageCNTM捕获操作码图像的局部空间相关性和连续长期依赖性。(3)词频和行为特征的连接和融合,提出了一个多特征,多输入深度学习模型。我们收集了一个公开的APT恶意软件数据集来评估我们的方法。基于单一特征的模型归因分类结果分别达到89.24%和91.91%。最后,与单特征分类器相比,多特征融合模型取得了较好的分类性能。
公众号