关键词: ChatGPT artificial intelligence hip arthroscopy machine learning

来  源:   DOI:10.1016/j.arthro.2024.06.017

Abstract:
OBJECTIVE: To assess the ability of ChatGPT to answer common patient questions regarding hip arthroscopy, and to analyze the accuracy and appropriateness of its responses.
METHODS: Ten questions were selected from well-known patient education websites, and ChatGPT (version 3.5) responses to these questions were graded by 2 fellowship-trained hip preservation surgeons. Responses were analyzed, compared with the current literature, and graded from A to D (A being the highest, and D being the lowest) in a grading scale on the basis of the accuracy and completeness of the response. If the grading differed between the 2 surgeons, a consensus was reached. Inter-rater agreement was calculated. The readability of responses was also assessed using the Flesch-Kincaid Reading Ease Score (FRES) and Flesch-Kincaid Grade Level (FKGL).
RESULTS: Responses received the following consensus grades: A (50%, n = 5), B (30%, n = 3), C (10%, n = 1), D (10%, n = 1). Inter-rater agreement on the basis of initial individual grading was 30%. The mean FRES was 28.2 (± 9.2 standard deviation), corresponding to a college graduate level, ranging from 11.7 to 42.5. The mean FKGL was 14.4 (±1.8 standard deviation), ranging from 12.1 to 18, indicating a college student reading level.
CONCLUSIONS: ChatGPT can answer common patient questions regarding hip arthroscopy with satisfactory accuracy graded by 2 high-volume hip arthroscopists; however, incorrect information was identified in more than one instance. Caution must be observed when using ChatGPT for patient education related to hip arthroscopy.
CONCLUSIONS: Given the increasing number of hip arthroscopies being performed annually, ChatGPT has the potential to aid physicians in educating their patients about this procedure and addressing any questions they may have.
摘要:
目的:评估ChatGPT回答患者关于髋关节镜检查的常见问题的能力,并分析其反应的准确性和适当性。
方法:从知名患者教育网站中选取10个问题,和ChatGPT(版本3.5)对这些问题的回答由两名受过研究金训练的髋关节保留外科医生进行分级。对反应进行了分析,与目前的文献相比,从A到D(A是最高的,并且D是最低的)在基于响应的准确性和完整性的分级量表中。如果两位外科医生的分级不同,达成了共识。计算了评分者之间的协议。还使用Flesch-Kincaid阅读轻松评分(FRES)和Flesch-Kincaid等级(FKGL)评估了反应的可读性。
结果:响应获得以下共识等级:A(50%,n=5),B(30%,n=3),C(10%,n=1),D(10%,n=1)(表2)。基于初始个人评分的评分者间协议为30%。平均FRES为28.2(SD±9.2),对应于大学毕业生水平,范围从11.7到42.5。平均FKGL为14.4(SD±1.8),从12.1到18,表明大学生的阅读水平。
结论:ChatGPT可以回答有关髋关节镜检查的常见患者问题,并由两名高容量髋关节镜师进行了令人满意的评分,然而,在多个实例中识别出不正确的信息。使用ChatGPT进行与髋关节镜检查相关的患者教育时必须谨慎。
结论:鉴于每年进行的髋关节镜检查的数量不断增加,ChatGPT有可能帮助医生对患者进行有关此程序的教育,并解决他们可能遇到的任何问题。
公众号