%0 Journal Article
%T Using Large Language Models to Generate Educational Materials on Childhood Glaucoma.
%A Dihan Q
%A Chauhan MZ
%A Eleiwa TK
%A Hassan AK
%A Sallam AB
%A Khouri AS
%A Chang TC
%A Elhusseiny AM
%J Am J Ophthalmol
%V 265
%N 0
%D 2024 Apr 16
%M 38614196
%F 5.488
%R 10.1016/j.ajo.2024.04.004
%X <B>OBJECTIVE: </B>To evaluate the quality, readability, and accuracy of large language model (LLM)-generated patient education materials (PEMs) on childhood glaucoma, and their ability to improve existing the readability of online information.<BR><B>METHODS: </B>Cross-sectional comparative study.<BR><B>METHODS: </B>We evaluated responses of ChatGPT-3.5, ChatGPT-4, and Bard to 3 separate prompts requesting that they write PEMs on "childhood glaucoma." Prompt A required PEMs be "easily understandable by the average American." Prompt B required that PEMs be written "at a 6th-grade level using Simple Measure of Gobbledygook (SMOG) readability formula." We then compared responses' quality (DISCERN questionnaire, Patient Education Materials Assessment Tool [PEMAT]), readability (SMOG, Flesch-Kincaid Grade Level [FKGL]), and accuracy (Likert Misinformation scale). To assess the improvement of readability for existing online information, Prompt C requested that LLM rewrite 20 resources from a Google search of keyword "childhood glaucoma" to the American Medical Association-recommended "6th-grade level." Rewrites were compared on key metrics such as readability, complex words (≥3 syllables), and sentence count.<BR><B>RESULTS: </B>All 3 LLMs generated PEMs that were of high quality, understandability, and accuracy (DISCERN ≥4, ≥70% PEMAT understandability, Misinformation score = 1). Prompt B responses were more readable than Prompt A responses for all 3 LLM (P ≤ .001). ChatGPT-4 generated the most readable PEMs compared to ChatGPT-3.5 and Bard (P ≤ .001). Although Prompt C responses showed consistent reduction of mean SMOG and FKGL scores, only ChatGPT-4 achieved the specified 6th-grade reading level (4.8 ± 0.8 and 3.7 ± 1.9, respectively).<BR><B>CONCLUSIONS: </B>LLMs can serve as strong supplemental tools in generating high-quality, accurate, and novel PEMs, and improving the readability of existing PEMs on childhood glaucoma.