关键词: Classification Machine learning Phishing Phishing detection

来  源:   DOI:10.7717/peerj-cs.2131   PDF(Pubmed)

Abstract:
The advent of Internet technologies has resulted in the proliferation of electronic trading and the use of the Internet for electronic transactions, leading to a rise in unauthorized access to sensitive user information and the depletion of resources for enterprises. As a consequence, there has been a marked increase in phishing, which is now considered one of the most common types of online theft. Phishing attacks are typically directed towards obtaining confidential information, such as login credentials for online banking platforms and sensitive systems. The primary objective of such attacks is to acquire specific personal information to either use for financial gain or commit identity theft. Recent studies have been conducted to combat phishing attacks by examining domain characteristics such as website addresses, content on websites, and combinations of both approaches for the website and its source code. However, businesses require more effective anti-phishing technologies to identify phishing URLs and safeguard their users. The present research aims to evaluate the effectiveness of eight machine learning (ML) and deep learning (DL) algorithms, including support vector machine (SVM), k-nearest neighbors (KNN), random forest (RF), Decision Tree (DT), Extreme Gradient Boosting (XGBoost), logistic regression (LR), convolutional neural network (CNN), and DL model and assess their performances in identifying phishing. This study utilizes two real datasets, Mendeley and UCI, employing performance metrics such as accuracy, precision, recall, false positive rate (FPR), and F-1 score. Notably, CNN exhibits superior accuracy, emphasizing its efficacy. Contributions include using purpose-specific datasets, meticulous feature engineering, introducing SMOTE for class imbalance, incorporating the novel CNN model, and rigorous hyperparameter tuning. The study demonstrates consistent model performance across both datasets, highlighting stability and reliability.
摘要:
互联网技术的出现导致了电子交易的泛滥和使用互联网进行电子交易,导致对敏感用户信息的未经授权访问和企业资源的枯竭。因此,网络钓鱼明显增多,现在被认为是最常见的在线盗窃类型之一。网络钓鱼攻击通常针对获取机密信息,例如在线银行平台和敏感系统的登录凭据。此类攻击的主要目的是获取特定的个人信息,以用于经济利益或进行身份盗窃。最近进行了研究,通过检查网站地址等领域特征来打击网络钓鱼攻击,网站上的内容,以及网站及其源代码的两种方法的组合。然而,企业需要更有效的反网络钓鱼技术来识别网络钓鱼URL并保护其用户。本研究旨在评估八种机器学习(ML)和深度学习(DL)算法的有效性,包括支持向量机(SVM),k-最近邻(KNN),随机森林(RF),决策树(DT)极端梯度提升(XGBoost),逻辑回归(LR),卷积神经网络(CNN)和DL模型,并评估它们在识别网络钓鱼方面的性能。这项研究利用了两个真实的数据集,Mendeley和UCI,采用诸如准确性、精度,召回,假阳性率(FPR),F-1得分。值得注意的是,CNN表现出卓越的准确性,强调其功效。贡献包括使用特定用途的数据集,细致的特征工程,为班级不平衡引入SMOTE,结合了新的CNN模型,和严格的超参数调整。这项研究表明,两个数据集的模型性能一致,强调稳定性和可靠性。
公众号