关键词: Pediatric ophthalmology artificial intelligence chatbot health information strabismus

来  源:   DOI:10.1177/11206721241272251

Abstract:
BACKGROUND: The rise in popularity of chatbots, particularly ChatGPT by OpenAI among the general public and its utility in the healthcare field is a topic of present controversy. The current study aimed at assessing the reliability and accuracy of ChatGPT\'s responses to inquiries posed by parents, specifically focusing on a range of pediatric ophthalmological and strabismus conditions.
METHODS: Patient queries were collected via a thematic analysis and posed to ChatGPT 3.5 version across 3 unique instances each. The questions were divided into 12 domains totalling 817 unique questions. All responses were scored on the response quality by two experienced pediatric ophthalmologists in a Likert-scale format. All questions were evaluated for readability using the Flesch-Kincaid Grade Level (FKGL) and character counts.
RESULTS: A total of 638 (78.09%) questions were scored to be perfectly correct, 156 (19.09%) were scored correct but incomplete and only 23 (2.81%) were scored to be partially incorrect. None of the responses were scored to be completely incorrect. Average FKGL score was 14.49 [95% CI 14.4004-14.5854] and the average character count was 1825.33 [95%CI 1791.95-1858.7] with p = 0.831 and 0.697 respectively. The minimum and maximum FKGL scores were 10.6 and 18.34 respectively. FKGL predicted character count, R²=.012, F(1,815) = 10.26, p = .001.
CONCLUSIONS: ChatGPT provided accurate and reliable information for a majority of the questions. The readability of the questions was much above the typically required standards for adults, which is concerning. Despite these limitations, it is evident that this technology will play a significant role in the healthcare industry.
摘要:
背景:聊天机器人的普及,特别是OpenAI在公众中的ChatGPT及其在医疗保健领域的实用性是当前争议的话题。当前的研究旨在评估ChatGPT对父母提出的询问的回答的可靠性和准确性,特别关注一系列儿科眼科和斜视疾病。
方法:通过主题分析收集患者查询,并将其提交给ChatGPT3.5版本,每个案例有3个独特的实例。这些问题分为12个领域,共817个独特问题。所有反应均由两名经验丰富的儿科眼科医生以Likert量表格式对反应质量进行评分。使用Flesch-Kincaid等级(FKGL)和字符计数评估所有问题的可读性。
结果:共638个(78.09%)问题得分完全正确,156(19.09%)得分正确,但不完整,只有23(2.81%)得分部分不正确。没有一个回答被认为是完全不正确的。平均FKGL评分为14.49[95%CI14.4004-14.5854],平均字符计数为1825.33[95CI1791.95-1858.7],p=0.831和0.697。最小和最大FKGL评分分别为10.6和18.34。FKGL预测字符计数,R²=.012,F(1815)=10.26,p=.001。
结论:ChatGPT为大多数问题提供了准确可靠的信息。问题的可读性远高于成年人通常要求的标准,这是关于。尽管有这些限制,显然,这项技术将在医疗保健行业发挥重要作用。
公众号