结合相对难度和标签可靠性的语音情感识别。Speech Emotion Recognition Incorporating Relative Difficulty and Labeling Reliability.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

Emotions in speech are expressed in various ways, and the speech emotion recognition (SER) model may perform poorly on unseen corpora that contain different emotional factors from those expressed in training databases. To construct an SER model robust to unseen corpora, regularization approaches or metric losses have been studied. In this paper, we propose an SER method that incorporates relative difficulty and labeling reliability of each training sample. Inspired by the Proxy-Anchor loss, we propose a novel loss function which gives higher gradients to the samples for which the emotion labels are more difficult to estimate among those in the given minibatch. Since the annotators may label the emotion based on the emotional expression which resides in the conversational context or other modality but is not apparent in the given speech utterance, some of the emotional labels may not be reliable and these unreliable labels may affect the proposed loss function more severely. In this regard, we propose to apply label smoothing for the samples misclassified by a pre-trained SER model. Experimental results showed that the performance of the SER on unseen corpora was improved by adopting the proposed loss function with label smoothing on the misclassified data.

摘要：

言语中的情绪有多种表达方式,和语音情感识别（SER）模型可能在看不见的语料库上表现不佳，这些语料库包含与训练数据库中表达的情感因素不同的情感因素。要构造一个对看不见的语料库鲁棒的SER模型，正则化方法或度量损失已经被研究。在本文中,我们提出了一种SER方法，该方法结合了每个训练样本的相对难度和标记可靠性。受代理锚损失的启发，我们提出了一种新的损失函数，该函数为给定小批量中情感标签更难估计的样本提供了更高的梯度。由于注释者可以基于情感表达来标记情感，该情感表达驻留在对话上下文或其他模态中，但在给定的语音话语中并不明显，一些情绪标签可能不可靠，这些不可靠的标签可能会更严重地影响建议的损失功能。在这方面,我们建议对预先训练的SER模型错误分类的样本应用标签平滑。实验结果表明，通过对错误分类的数据采用所提出的带有标签平滑的损失函数，可以提高SER对看不见的语料库的性能。