关键词: Adversarial Training Alignment Deep Learning GAN Gradient Gradient Direction PGD Robust Models Robustness

来  源:   DOI:10.1016/j.patcog.2023.109430   PDF(Pubmed)

Abstract:
Adversarial training, especially projected gradient descent (PGD), has proven to be a successful approach for improving robustness against adversarial attacks. After adversarial training, gradients of models with respect to their inputs have a preferential direction. However, the direction of alignment is not mathematically well established, making it difficult to evaluate quantitatively. We propose a novel definition of this direction as the direction of the vector pointing toward the closest point of the support of the closest inaccurate class in decision space. To evaluate the alignment with this direction after adversarial training, we apply a metric that uses generative adversarial networks to produce the smallest residual needed to change the class present in the image. We show that PGD-trained models have a higher alignment than the baseline according to our definition, that our metric presents higher alignment values than a competing metric formulation, and that enforcing this alignment increases the robustness of models.
摘要:
对抗训练,尤其是投影梯度下降(PGD),已被证明是提高对抗攻击的鲁棒性的成功方法。在对抗训练之后,模型相对于其输入的梯度具有优先方向。然而,对齐的方向在数学上没有很好地确定,这使得定量评估变得困难。我们提出了该方向的新定义,即向量的方向指向决策空间中最接近的不准确类的支持的最接近点。为了在对抗训练后评估与这个方向的一致性,我们应用一个度量,使用生成对抗网络来产生改变图像中存在的类别所需的最小残差。我们表明,根据我们的定义,PGD训练的模型比基线具有更高的对齐率,我们的度量比竞争度量公式具有更高的对准值,并且强制执行这种对齐增加了模型的鲁棒性。
公众号