目标:作为一项重要的人机交互任务,几十年来,情感识别已经成为一个新兴领域。尽管以前对情绪进行分类的尝试已经取得了很高的性能,几个挑战仍然存在:1)如何使用不同的方式有效地识别情绪仍然具有挑战性。2)由于深度学习所需的计算能力不断增加,如何提供实时检测和提高深度神经网络的鲁棒性具有重要意义。方法:本文,我们提出了一种基于深度学习的多模态情感识别(MER),称为深度情感,它可以自适应地整合面部表情中最具鉴别力的特征,演讲,和脑电图(EEG)来提高MER的性能。具体来说,拟议的深度情感框架由三个分支组成,即,面部分支,演讲科,和脑电图分支。相应地,面部分支利用本文提出的改进的GhostNet神经网络进行特征提取,有效缓解了训练过程中的过拟合现象,与原GhostNet网络相比,提高了分类精度。对于演讲科的工作,本文提出了一种轻量级全卷积神经网络(LFCNN),用于有效提取语音情感特征。关于脑电分支的研究,我们提出了一种能够融合多阶段特征的树状LSTM(tLSTM)模型,用于脑电情感特征提取。最后,采用决策层融合的策略对上述三种模式的识别结果进行融合,导致更全面和准确的性能。结果与结论:对CK+的大量实验,EMO-DB,和MAHNOB-HCI数据集已经证明了本文提出的Deep-Emotion方法的先进性,以及MER方法的可行性和优越性。
Goal: As an essential human-machine interactive task, emotion recognition has become an emerging area over the decades. Although previous attempts to classify emotions have achieved high performance, several challenges remain open: 1) How to effectively recognize emotions using different modalities remains challenging. 2) Due to the increasing amount of computing power required for deep learning, how to provide real-time detection and improve the robustness of deep neural networks is important. Method: In this paper, we propose a deep learning-based multimodal emotion recognition (MER) called Deep-Emotion, which can adaptively integrate the most discriminating features from facial expressions,
speech, and electroencephalogram (EEG) to improve the performance of the MER. Specifically, the proposed Deep-Emotion framework consists of three branches, i.e., the facial branch,
speech branch, and EEG branch. Correspondingly, the facial branch uses the improved GhostNet neural network proposed in this paper for feature extraction, which effectively alleviates the overfitting phenomenon in the training process and improves the classification accuracy compared with the original GhostNet network. For work on the
speech branch, this paper proposes a lightweight fully convolutional neural network (LFCNN) for the efficient extraction of
speech emotion features. Regarding the study of EEG branches, we proposed a tree-like LSTM (tLSTM) model capable of fusing multi-stage features for EEG emotion feature extraction. Finally, we adopted the strategy of decision-level fusion to integrate the recognition results of the above three modes, resulting in more comprehensive and accurate performance. Result and Conclusions: Extensive experiments on the CK+, EMO-DB, and MAHNOB-HCI datasets have demonstrated the advanced nature of the Deep-Emotion method proposed in this paper, as well as the feasibility and superiority of the MER approach.