关键词: ACL AI diagnostics flat foot natural language processing orthopedics pediatric orthopedics sports medicine

来  源:   DOI:10.3390/diagnostics14121253   PDF(Pubmed)

Abstract:
BACKGROUND: This study evaluates the potential of ChatGPT and Google Bard as educational tools for patients in orthopedics, focusing on sports medicine and pediatric orthopedics. The aim is to compare the quality of responses provided by these natural language processing (NLP) models, addressing concerns about the potential dissemination of incorrect medical information.
METHODS: Ten ACL- and flat foot-related questions from a Google search were presented to ChatGPT-3.5 and Google Bard. Expert orthopedic surgeons rated the responses using the Global Quality Score (GQS). The study minimized bias by clearing chat history before each question, maintaining respondent anonymity and employing statistical analysis to compare response quality.
RESULTS: ChatGPT-3.5 and Google Bard yielded good-quality responses, with average scores of 4.1 ± 0.7 and 4 ± 0.78, respectively, for sports medicine. For pediatric orthopedics, Google Bard scored 3.5 ± 1, while the average score for responses generated by ChatGPT was 3.8 ± 0.83. In both cases, no statistically significant difference was found between the platforms (p = 0.6787, p = 0.3092). Despite ChatGPT\'s responses being considered more readable, both platforms showed promise for AI-driven patient education, with no reported misinformation.
CONCLUSIONS: ChatGPT and Google Bard demonstrate significant potential as supplementary patient education resources in orthopedics. However, improvements are needed for increased reliability. The study underscores the evolving role of AI in orthopedics and calls for continued research to ensure a conscientious integration of AI in healthcare education.
摘要:
背景:这项研究评估了ChatGPT和GoogleBard作为骨科患者教育工具的潜力,专注于运动医学和小儿骨科。目的是比较这些自然语言处理(NLP)模型提供的响应质量,解决对不正确医疗信息潜在传播的担忧。
方法:来自Google搜索的十个与ACL和平足相关的问题被提交给ChatGPT-3.5和GoogleBard。专家整形外科医生使用全球质量评分(GQS)对反应进行评级。这项研究通过在每个问题之前清除聊天记录来最大限度地减少偏见,保持受访者的匿名性,并采用统计分析来比较回应质量。
结果:ChatGPT-3.5和GoogleBard产生了高质量的响应,平均得分为4.1±0.7和4±0.78,运动医学。对于儿科骨科,GoogleBard得分为3.5±1,而ChatGPT产生的平均得分为3.8±0.83。在这两种情况下,平台间无统计学差异(p=0.6787,p=0.3092).尽管ChatGPT的回答被认为更具可读性,这两个平台都显示出AI驱动的患者教育的希望,没有误报.
结论:ChatGPT和GoogleBard显示出作为骨科辅助患者教育资源的巨大潜力。然而,需要改进以提高可靠性。该研究强调了人工智能在骨科中不断发展的作用,并呼吁继续研究以确保人工智能在医疗保健教育中的认真整合。
公众号