关键词: ChatGPT algorithm aversion competence large language models morality quality assessment technology

来  源:   DOI:10.3389/frai.2024.1412710   PDF(Pubmed)

Abstract:
UNASSIGNED: While Large Language Models (LLMs) are considered positively with respect to technological progress and abilities, people are rather opposed to machines making moral decisions. But the circumstances under which algorithm aversion or algorithm appreciation are more likely to occur with respect to LLMs have not yet been sufficiently investigated. Therefore, the aim of this study was to investigate how texts with moral or technological topics, allegedly written either by a human author or by ChatGPT, are perceived.
UNASSIGNED: In a randomized controlled experiment, n = 164 participants read six texts, three of which had a moral and three a technological topic (predictor text topic). The alleged author of each text was randomly either labeled \"ChatGPT\" or \"human author\" (predictor authorship). We captured three dependent variables: assessment of author competence, assessment of content quality, and participants\' intention to submit the text in a hypothetical university course (sharing intention). We hypothesized interaction effects, that is, we expected ChatGPT to score lower than alleged human authors for moral topics and higher than alleged human authors for technological topics and vice versa.
UNASSIGNED: We only found a small interaction effect for perceived author competence, p = 0.004, d = 0.40, but not for the other dependent variables. However, ChatGPT was consistently devalued compared to alleged human authors across all dependent variables: there were main effects of authorship for assessment of the author competence, p < 0.001, d = 0.95; for assessment of content quality, p < 0.001, d = 0.39; as well as for sharing intention, p < 0.001, d = 0.57. There was also a small main effect of text topic on the assessment of text quality, p = 0.002, d = 0.35.
UNASSIGNED: These results are more in line with previous findings on algorithm aversion than with algorithm appreciation. We discuss the implications of these findings for the acceptance of the use of LLMs for text composition.
摘要:
虽然大型语言模型(LLM)在技术进步和能力方面得到了积极的考虑,人们相当反对机器做出道德决定。但是,尚未对LLM更有可能发生算法厌恶或算法欣赏的情况进行充分的研究。因此,这项研究的目的是调查具有道德或技术主题的文本,据称是由人类作者或ChatGPT撰写的,被感知。
在一项随机对照实验中,n=164名参与者阅读了六篇课文,其中三个有道德主题,三个有技术主题(预测文本主题)。每个文本的所谓作者被随机标记为“ChatGPT”或“人类作者”(预测作者身份)。我们捕获了三个因变量:作者能力评估,内容质量评估,和参与者打算在假设的大学课程中提交文本(共享意图)。我们假设相互作用效应,也就是说,我们预计ChatGPT在道德主题上的得分低于所谓的人类作者,在技术主题上的得分高于所谓的人类作者,反之亦然。
我们只发现对感知的作者能力有很小的交互作用,p=0.004,d=0.40,但对于其他因变量则不是。然而,在所有因变量中,与所谓的人类作者相比,ChatGPT始终贬值:作者身份对评估作者能力有主要影响,p<0.001,d=0.95;对于内容质量评估,p<0.001,d=0.39;对于共享意图,p<0.001,d=0.57。文本主题对文本质量的评估也有很小的主要影响,p=0.002,d=0.35。
这些结果更符合先前关于算法厌恶的发现,而不是算法欣赏。我们讨论了这些发现对接受使用LLM进行文本撰写的影响。
公众号