关键词: Dataset Face anti-spoofing Presentation attack Transformer

Mesh : Humans Neural Networks, Computer Automated Facial Recognition / methods Image Processing, Computer-Assisted / methods Face Computer Security Algorithms

来  源:   DOI:10.1016/j.neunet.2024.106275

Abstract:
Face Anti-Spoofing (FAS) seeks to protect face recognition systems from spoofing attacks, which is applied extensively in scenarios such as access control, electronic payment, and security surveillance systems. Face anti-spoofing requires the integration of local details and global semantic information. Existing CNN-based methods rely on small stride or image patch-based feature extraction structures, which struggle to capture spatial and cross-layer feature correlations effectively. Meanwhile, Transformer-based methods have limitations in extracting discriminative detailed features. To address the aforementioned issues, we introduce a multi-stage CNN-Transformer-based framework, which extracts local features through the convolutional layer and long-distance feature relationships via self-attention. Based on this, we proposed a cross-attention multi-stage feature fusion, employing semantically high-stage features to query task-relevant features in low-stage features for further cross-stage feature fusion. To enhance the discrimination of local features for subtle differences, we design pixel-wise material classification supervision and add a auxiliary branch in the intermediate layers of the model. Moreover, to address the limitations of a single acquisition environment and scarcity of acquisition devices in the existing Near-Infrared dataset, we create a large-scale Near-Infrared Face Anti-Spoofing dataset with 380k pictures of 1040 identities. The proposed method could achieve the state-of-the-art in OULU-NPU and our proposed Near-Infrared dataset at just 1.3GFlops and 3.2M parameter numbers, which demonstrate the effective of the proposed method.
摘要:
面部反欺骗(FAS)旨在保护面部识别系统免受欺骗攻击,广泛应用于访问控制等场景,电子支付,和安全监控系统。人脸反欺骗需要整合局部细节和全局语义信息。现有的基于CNN的方法依赖于小步幅或基于图像块的特征提取结构,难以有效捕获空间和跨层特征相关性。同时,基于变压器的方法在提取区分性详细特征方面存在局限性。为了解决上述问题,我们介绍了一个基于多阶段CNN-Transformer的框架,它通过卷积层提取局部特征,并通过自注意提取长距离特征关系。基于此,我们提出了一种跨注意力多阶段特征融合,利用语义高阶段特征在低阶段特征中查询任务相关特征,进行进一步的跨阶段特征融合。为了加强对地方特征的区分,以进行细微的差异,我们设计了按像素的材料分类监督,并在模型的中间层中添加了一个辅助分支。此外,为了解决现有近红外数据集中单一采集环境和采集设备稀缺的局限性,我们创建了一个大规模的近红外人脸反欺骗数据集,包含380k张1040个身份的照片。所提出的方法可以在OULU-NPU和我们提出的近红外数据集上实现最先进的状态,只有1.3GFlops和3.2M参数数,验证了该方法的有效性。
公众号