AI Chatbot 眼科医生建议中的偏见和不准确性。Bias and Inaccuracy in AI Chatbot Ophthalmologist Recommendations.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

OBJECTIVE: To evaluate the accuracy and bias of ophthalmologist recommendations made by three AI chatbots, namely ChatGPT 3.5 (OpenAI, San Francisco, CA, USA), Bing Chat (Microsoft Corp., Redmond, WA, USA), and Google Bard (Alphabet Inc., Mountain View, CA, USA). This study analyzed chatbot recommendations for the 20 most populous U.S. cities.
METHODS: Each chatbot returned 80 total recommendations when given the prompt \"Find me four good ophthalmologists in (city).\" Characteristics of the physicians, including specialty, location, gender, practice type, and fellowship, were collected. A one-proportion z-test was performed to compare the proportion of female ophthalmologists recommended by each chatbot to the national average (27.2% per the Association of American Medical Colleges (AAMC)). Pearson\'s chi-squared test was performed to determine differences between the three chatbots in male versus female recommendations and recommendation accuracy.
RESULTS: Female ophthalmologists recommended by Bing Chat (1.61%) and Bard (8.0%) were significantly less than the national proportion of 27.2% practicing female ophthalmologists (p<0.001, p<0.01, respectively). ChatGPT recommended fewer female (29.5%) than male ophthalmologists (p<0.722). ChatGPT (73.8%), Bing Chat (67.5%), and Bard (62.5%) gave high rates of inaccurate recommendations. Compared to the national average of academic ophthalmologists (17%), the proportion of recommended ophthalmologists in academic medicine or in combined academic and private practice was significantly greater for all three chatbots.
CONCLUSIONS: This study revealed substantial bias and inaccuracy in the AI chatbots\' recommendations. They struggled to recommend ophthalmologists reliably and accurately, with most recommendations being physicians in specialties other than ophthalmology or not in or near the desired city. Bing Chat and Google Bard showed a significant tendency against recommending female ophthalmologists, and all chatbots favored recommending ophthalmologists in academic medicine.

摘要：

目的：评估三个AI聊天机器人提出的眼科医生建议的准确性和偏见，即ChatGPT3.5(OpenAI，旧金山,CA,美国),必应聊天（MicrosoftCorp.,雷德蒙德,WA,美国),和谷歌吟游诗人(AlphabetInc.，山景,CA,美国）。这项研究分析了20个美国人口最多的城市的聊天机器人建议。
方法：每个聊天机器人在给出提示时返回了80条建议：“给我找到（城市）的四位优秀眼科医生。“医生的特点，包括专业，location,性别,实践类型,和团契，被收集。进行了一个比例z检验，以比较每个聊天机器人推荐的女性眼科医生的比例与全国平均水平（美国医学院协会（AAMC）的27.2％）。进行了Pearson的卡方检验，以确定三种聊天机器人在男性和女性推荐和推荐准确性方面的差异。
结果：BingChat推荐的女性眼科医生（1.61%）和Bard推荐的女性眼科医生（8.0%）明显少于全国执业女性眼科医生27.2%的比例（分别为p<0.001，p<0.01）。ChatGPT推荐的女性(29.5%)少于男性眼科医生(p<0.722)。ChatGPT（73.8%），Bing聊天(67.5%),巴德(62.5%)给出了很高的不准确建议。与全国学术眼科医生的平均水平(17%)相比，在所有3个聊天机器人中,在学术医学或学术和私人综合实践中推荐的眼科医生的比例均显著较高.
结论：这项研究揭示了AI聊天机器人建议中的实质性偏见和不准确性。他们努力可靠而准确地推荐眼科医生，大多数建议是眼科以外的专业医生，或者不在所需城市或附近。BingChat和GoogleBard显示出明显的反对推荐女性眼科医生的倾向，所有聊天机器人都赞成推荐学术医学眼科医生。