关键词: Arousal Emoji Frequency Norming study Subjective rating Valence Visual complexity Word embeddings

来  源:   DOI:10.3758/s13428-024-02444-x

Abstract:
We introduce a novel dataset of affective, semantic, and descriptive norms for all facial emojis at the point of data collection. We gathered and examined subjective ratings of emojis from 138 German speakers along five essential dimensions: valence, arousal, familiarity, clarity, and visual complexity. Additionally, we provide absolute frequency counts of emoji use, drawn from an extensive Twitter corpus, as well as a much smaller WhatsApp database. Our results replicate the well-established quadratic relationship between arousal and valence of lexical items, also known for words. We also report associations among the variables: for example, the subjective familiarity of an emoji is strongly correlated with its usage frequency, and positively associated with its emotional valence and clarity of meaning. We establish the meanings associated with face emojis, by asking participants for up to three descriptions for each emoji. Using this linguistic data, we computed vector embeddings for each emoji, enabling an exploration of their distribution within the semantic space. Our description-based emoji vector embeddings not only capture typical meaning components of emojis, such as their valence, but also surpass simple definitions and direct emoji2vec models in reflecting the semantic relationship between emojis and words. Our dataset stands out due to its robust reliability and validity. This new semantic norm for face emojis impacts the future design of highly controlled experiments focused on the cognitive processing of emojis, their lexical representation, and their linguistic properties.
摘要:
我们介绍了一个新的情感数据集,语义,以及在数据收集时所有面部表情符号的描述性规范。我们收集并检查了来自138名德语使用者的表情符号的主观评级,包括五个基本维度:效价,唤醒,熟悉度,清晰度,视觉复杂性。此外,我们提供表情符号使用的绝对频率计数,来自广泛的Twitter语料库,以及一个更小的WhatsApp数据库。我们的结果复制了词汇项目的唤醒和效价之间建立的二次关系,也以文字而闻名。我们还报告变量之间的关联:例如,表情符号的主观熟悉程度与其使用频率密切相关,并与其情感效价和含义清晰呈正相关。我们建立与面部表情符号相关的含义,通过要求参与者为每个表情符号提供最多三个描述。使用这些语言数据,我们计算了每个表情符号的向量嵌入,能够探索它们在语义空间中的分布。我们基于描述的表情符号向量嵌入不仅捕获表情符号的典型含义成分,比如它们的价,而且在体现表情符号与词语的语义关系方面也超越了简单的定义和直接的emoji2vec模型。我们的数据集由于其强大的可靠性和有效性而脱颖而出。面部表情符号的这种新语义规范影响了高度受控实验的未来设计,该实验专注于表情符号的认知处理,他们的词汇表示,以及它们的语言属性。
公众号