使用 Learned 预测情绪状态的觉醒和效价值，预设计,和深层视觉特征。Predicting the Arousal and Valence Values of Emotional States Using Learned, Predesigned, and Deep Visual Features.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

The cognitive state of a person can be categorized using the circumplex model of emotional states, a continuous model of two dimensions: arousal and valence. The purpose of this research is to select a machine learning model(s) to be integrated into a virtual reality (VR) system that runs cognitive remediation exercises for people with mental health disorders. As such, the prediction of emotional states is essential to customize treatments for those individuals. We exploit the Remote Collaborative and Affective Interactions (RECOLA) database to predict arousal and valence values using machine learning techniques. RECOLA includes audio, video, and physiological recordings of interactions between human participants. To allow learners to focus on the most relevant data, features are extracted from raw data. Such features can be predesigned, learned, or extracted implicitly using deep learners. Our previous work on video recordings focused on predesigned and learned visual features. In this paper, we extend our work onto deep visual features. Our deep visual features are extracted using the MobileNet-v2 convolutional neural network (CNN) that we previously trained on RECOLA\'s video frames of full/half faces. As the final purpose of our work is to integrate our solution into a practical VR application using head-mounted displays, we experimented with half faces as a proof of concept. The extracted deep features were then used to predict arousal and valence values via optimizable ensemble regression. We also fused the extracted visual features with the predesigned visual features and predicted arousal and valence values using the combined feature set. In an attempt to enhance our prediction performance, we further fused the predictions of the optimizable ensemble model with the predictions of the MobileNet-v2 model. After decision fusion, we achieved a root mean squared error (RMSE) of 0.1140, a Pearson\'s correlation coefficient (PCC) of 0.8000, and a concordance correlation coefficient (CCC) of 0.7868 on arousal predictions. We achieved an RMSE of 0.0790, a PCC of 0.7904, and a CCC of 0.7645 on valence predictions.

摘要：

一个人的认知状态可以使用情绪状态的环绕模型进行分类，两个维度的连续模型：唤醒和效价。这项研究的目的是选择一个或多个机器学习模型，以集成到虚拟现实（VR）系统中，该系统为患有精神健康障碍的人运行认知补救练习。因此，情绪状态的预测对于为这些个体定制治疗至关重要。我们利用远程协作和情感交互（RECOLA）数据库来使用机器学习技术预测唤醒和效价值。RECOLA包括音频，视频,以及人类参与者之间相互作用的生理记录。为了让学习者专注于最相关的数据，从原始数据中提取特征。这些功能可以预先设计，学会了,或使用深度学习者隐式提取。我们以前在视频录制方面的工作集中在预先设计和学习的视觉特征上。在本文中,我们将我们的工作扩展到深层视觉特征上。我们的深度视觉特征是使用MobileNet-v2卷积神经网络（CNN）提取的，我们以前在RECOLA的全/半脸视频帧上训练过。由于我们工作的最终目的是使用头戴式显示器将我们的解决方案集成到实际的VR应用程序中，我们尝试了半张脸作为概念的证明。然后，通过可优化的集成回归，将提取的深层特征用于预测唤醒和效价值。我们还将提取的视觉特征与预先设计的视觉特征以及使用组合特征集预测的唤醒和效价值融合在一起。为了提高我们的预测性能，我们进一步融合了可优化集成模型的预测与MobileNet-v2模型的预测。决策融合后，在唤醒预测中,均方根误差(RMSE)为0.1140,皮尔逊相关系数(PCC)为0.8000,一致相关系数(CCC)为0.7868.在效价预测中，我们的RMSE为0.0790，PCC为0.7904，CCC为0.7645。