通过评估 GPT - 3 与不同社会群体在有争议的话题上的沟通，对话 AI 和公平。Conversational AI and equity through assessing GPT-3's communication with diverse social groups on contentious topics.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

Autoregressive language models, which use deep learning to produce human-like texts, have surged in prevalence. Despite advances in these models, concerns arise about their equity across diverse populations. While AI fairness is discussed widely, metrics to measure equity in dialogue systems are lacking. This paper presents a framework, rooted in deliberative democracy and science communication studies, to evaluate equity in human-AI communication. Using it, we conducted an algorithm auditing study to examine how GPT-3 responded to different populations who vary in sociodemographic backgrounds and viewpoints on crucial science and social issues: climate change and the Black Lives Matter (BLM) movement. We analyzed 20,000 dialogues with 3290 participants differing in gender, race, education, and opinions. We found a substantively worse user experience among the opinion minority groups (e.g., climate deniers, racists) and the education minority groups; however, these groups changed attitudes toward supporting BLM and climate change efforts much more compared to other social groups after the chat. GPT-3 used more negative expressions when responding to the education and opinion minority groups. We discuss the social-technological implications of our findings for a conversational AI system that centralizes diversity, equity, and inclusion.

摘要：

自回归语言模型，它们使用深度学习来产生类似人类的文本，患病率激增。尽管这些模型取得了进展，人们担心他们在不同人群中的公平性。虽然人工智能公平被广泛讨论，缺乏衡量对话系统公平性的指标。本文提出了一个框架，植根于协商民主和科学传播研究，评估人与人工智能交流中的公平性。使用它，我们进行了一项算法审计研究,以研究GPT-3对不同人群的反应,这些人群在社会人口统计学背景和对关键科学和社会问题的观点上存在差异:气候变化和黑人生命物质(BLM)运动.我们分析了与3290名性别不同的参与者的20,000个对话，种族,教育,和意见。我们发现意见少数群体中的用户体验实质上更差(例如，气候否认者，种族主义者)和教育少数群体；然而，聊天后，与其他社交团体相比，这些团体改变了对支持BLM和气候变化努力的态度。GPT-3在回应教育和意见少数群体时使用了更多的否定表达。我们讨论了我们的发现对集中多样性的会话AI系统的社会技术影响，股本，和包容。