关键词: Accuracy ChatGPT Comprehensiveness Parental education Pediatric otorhinolaryngology Readability

Mesh : Humans Tonsillectomy / methods Adenoidectomy Parents / psychology Comprehension Middle Ear Ventilation Female Male Internet Child Surveys and Questionnaires Health Literacy

来  源:   DOI:10.1016/j.ijporl.2024.111998

Abstract:
OBJECTIVE: This study examined the potential of ChatGPT as an accurate and readable source of information for parents seeking guidance on adenoidectomy, tonsillectomy, and ventilation tube insertion surgeries (ATVtis).
METHODS: ChatGPT was tasked with identifying the top 15 most frequently asked questions by parents on internet search engines for each of the three specific surgical procedures. We removed repeated questions from the initial set of 45. Subsequently, we asked ChatGPT to generate answers to the remaining 33 questions. Seven highly experienced otolaryngologists individually assessed the accuracy of the responses using a four-level grading scale, from completely incorrect to comprehensive. The readability of responses was determined using the Flesch Reading Ease (FRE) and Flesch-Kincaid Grade Level (FKGL) scores. The questions were categorized into four groups: Diagnosis and Preparation Process, Surgical Information, Risks and Complications, and Postoperative Process. Responses were then compared based on accuracy grade, FRE, and FKGL scores.
RESULTS: Seven evaluators each assessed 33 AI-generated responses, providing a total of 231 evaluations. Among the evaluated responses, 167 (72.3 %) were classified as \'comprehensive.\' Sixty-two responses (26.8 %) were categorized as \'correct but inadequate,\' and two responses (0.9 %) were assessed as \'some correct, some incorrect.\' None of the responses were adjudged \'completely incorrect\' by any assessors. The average FRE and FGKL scores were 57.15(±10.73) and 9.95(±1.91), respectively. Upon analyzing the responses from ChatGPT, 3 (9.1 %) were at or below the sixth-grade reading level recommended by the American Medical Association (AMA). No significant differences were found between the groups regarding readability and accuracy scores (p > 0.05).
CONCLUSIONS: ChatGPT can provide accurate answers to questions on various topics related to ATVtis. However, ChatGPT\'s answers may be too complex for some readers, as they are generally written at a high school level. This is above the sixth-grade reading level recommended for patient information by the AMA. According to our study, more than three-quarters of the AI-generated responses were at or above the 10th-grade reading level, raising concerns about the ChatGPT text\'s readability.
摘要:
目的:这项研究检查了ChatGPT作为寻求腺样体切除术指导的父母的准确和可读的信息来源的潜力,扁桃体切除术,和通风管插入手术(ATVtis)。
方法:ChatGPT的任务是确定父母在互联网搜索引擎上最常见的15个问题,用于三种特定的外科手术。我们从最初的45个问题中删除了重复的问题。随后,我们要求ChatGPT生成其余33个问题的答案。七名经验丰富的耳鼻喉科医师使用四级分级量表分别评估了反应的准确性,从完全不正确到全面。使用Flesch阅读易读性(FRE)和Flesch-Kincaid等级(FKGL)评分确定反应的可读性。问题分为四组:诊断和准备过程,手术信息,风险和并发症,和术后过程。然后根据准确度等级比较响应,FRE,和FKGL得分。
结果:7名评估者各评估33个AI产生的反应,共提供231项评估。在评估的回答中,167人(72.3%)被归类为“全面”。62个回答(26.8%)被归类为正确但不充分,'和两个回答(0.9%)被评估为'一些正确的,有些不正确。任何评估人员都没有判定任何回应“完全不正确”。FRE和FGKL平均评分分别为57.15(±10.73)和9.95(±1.91),分别。在分析ChatGPT的反应后,3(9.1%)处于或低于美国医学协会(AMA)建议的六年级阅读水平。两组之间的可读性和准确性评分没有显着差异(p>0.05)。
结论:ChatGPT可以为与ATVtis相关的各种主题的问题提供准确的答案。然而,ChatGPT的答案对某些读者来说可能过于复杂,因为它们通常是在高中阶段写的。这高于AMA为患者信息推荐的六年级阅读水平。根据我们的研究,超过四分之三的人工智能生成的响应处于或高于10年级的阅读水平,引起人们对ChatGPT文本可读性的担忧。
公众号