patient education materials

病人教育材料
  • 文章类型: Journal Article
    背景:人工智能(AI)是一个新兴的新领域,在过去几年中越来越受欢迎,与大型语言模型(LLM)驱动的聊天机器人的公开发布相吻合。这些聊天机器人,比如ChatGPT,可以直接参与对话,允许用户向他们提问或发出其他命令。由于LLM是在大量文本数据上训练的,他们还可以可靠和真实地回答问题,这种能力使他们能够作为医疗查询的来源。这项研究旨在评估四种最常见的聊天机器人:ChatGPT,MicrosoftCopilot,谷歌双子座,元AI。
    方法:使用基于网站的关于该主题的患者教育材料,开发了一组10个关于心导管插入的问题。然后,我们连续向四个最常见的聊天机器人提出了这些问题:ChatGPT,MicrosoftCopilot,谷歌双子座,元AI。Flesch阅读轻松评分(FRES)用于评估可读性评分。使用六种工具评估可读性等级水平:Flesch-Kincaid等级水平(FKGL),GunningFogIndex(GFI),科尔曼-廖氏指数(CLI),Gobbledygook(SMOG)指数的简单测量,自动可读性指数(ARI)和强制等级。
    结果:在FKGL中,所有四个聊天机器人的平均FRES为40.2,而四个聊天机器人的总体平均等级为11.2、13.7、13.7、13.3、11.2和11.6。GFI,CLI,SMOG,ARI,和FORCAST指数,分别。ChatGPT的六个工具的平均阅读等级为14.8,12.3对于MicrosoftCopilot,13.1适用于GoogleGemini,元AI为9.6。Further,四个聊天机器人的FRES值分别为31、35.8、36.4和57.7。
    结论:这项研究表明,人工智能聊天机器人能够为有关心脏导管插入的医学问题提供答案。然而,四个聊天机器人的反应总体平均阅读等级为11-13年级,取决于所使用的工具。这意味着这些材料处于高中甚至大学阅读水平,远远超过推荐的六年级患者教育材料。Further,不同聊天机器人提供的可读性水平存在显著差异,在所有六个年级评估中,MetaAI得分最低,ChatGPT通常得分最高。
    BACKGROUND: Artificial intelligence (AI) is a burgeoning new field that has increased in popularity over the past couple of years, coinciding with the public release of large language model (LLM)-driven chatbots. These chatbots, such as ChatGPT, can be engaged directly in conversation, allowing users to ask them questions or issue other commands. Since LLMs are trained on large amounts of text data, they can also answer questions reliably and factually, an ability that has allowed them to serve as a source for medical inquiries. This study seeks to assess the readability of patient education materials on cardiac catheterization across four of the most common chatbots: ChatGPT, Microsoft Copilot, Google Gemini, and Meta AI.
    METHODS: A set of 10 questions regarding cardiac catheterization was developed using website-based patient education materials on the topic. We then asked these questions in consecutive order to four of the most common chatbots: ChatGPT, Microsoft Copilot, Google Gemini, and Meta AI. The Flesch Reading Ease Score (FRES) was used to assess the readability score. Readability grade levels were assessed using six tools: Flesch-Kincaid Grade Level (FKGL), Gunning Fog Index (GFI), Coleman-Liau Index (CLI), Simple Measure of Gobbledygook (SMOG) Index, Automated Readability Index (ARI), and FORCAST Grade Level.
    RESULTS: The mean FRES across all four chatbots was 40.2, while overall mean grade levels for the four chatbots were 11.2, 13.7, 13.7, 13.3, 11.2, and 11.6 across the FKGL, GFI, CLI, SMOG, ARI, and FORCAST indices, respectively. Mean reading grade levels across the six tools were 14.8 for ChatGPT, 12.3 for Microsoft Copilot, 13.1 for Google Gemini, and 9.6 for Meta AI. Further, FRES values for the four chatbots were 31, 35.8, 36.4, and 57.7, respectively.
    CONCLUSIONS: This study shows that AI chatbots are capable of providing answers to medical questions regarding cardiac catheterization. However, the responses across the four chatbots had overall mean reading grade levels at the 11th-13th-grade level, depending on the tool used. This means that the materials were at the high school and even college reading level, which far exceeds the recommended sixth-grade level for patient education materials. Further, there is significant variability in the readability levels provided by different chatbots as, across all six grade-level assessments, Meta AI had the lowest scores and ChatGPT generally had the highest.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:皮肤科患者教育材料(PEM)的书写水平通常高于全国平均水平的七至八年级阅读水平。ChatGPT-3.5,GPT-4,DermGPT,和DocsGPT是响应用户提示的大型语言模型(LLM)。我们的项目评估了它们在指定阅读水平下生成皮肤病学PEM的用途。
    目的:本研究旨在评估在未指定和指定的阅读水平下,选择LLM在常见和罕见皮肤病学中产生PEM的能力。Further,该研究旨在评估这些LLM生成的PEM的意义保存情况,由皮肤科住院医师评估。
    方法:当前美国皮肤病学会PEMs的Flesch-Kincaid阅读水平(FKRL)评估了4种常见(特应性皮炎,寻常痤疮,牛皮癣,和带状疱疹)和4例罕见(大疱性表皮松解症,大疱性类天疱疮,层状鱼鳞病,和扁平苔藓)皮肤病。我们提示ChatGPT-3.5,GPT-4,DermGPT,和DocsGPT以“在[FKRL]中创建关于[条件]的患者教育讲义”,以在未指定的五年级和七年级FKRL中每个条件迭代生成10个PEM,使用MicrosoftWord可读性统计进行评估。由2名皮肤科住院医师评估了LLM中意义的保留。
    结果:当前的美国皮肤病学会PEMs对常见和罕见疾病的平均(SD)FKRL为9.35(1.26)和9.50(2.3),分别。对于常见疾病,LLM生产的PEM的FKRL介于9.8和11.21之间(未指定提示),在4.22和7.43之间(五年级提示),在5.98和7.28之间(七年级提示)。对于罕见疾病,LLM生产的PEM的FKRL范围在9.85和11.45之间(未指定提示),在4.22和7.43之间(五年级提示),在5.98和7.28之间(七年级提示)。在五年级阅读水平,与ChatGPT-3.5相比,GPT-4在常见和罕见条件下都能更好地生产PEM(分别为P=.001和P=.01),DermGPT(分别为P<.001和P=.03),和DocsGPT(分别为P<.001和P=.02)。在七年级的阅读水平,ChatGPT-3.5、GPT-4、DocsGPT、或DermGPT在生产常见条件下的PEM(所有P>.05);然而,对于罕见的情况,ChatGPT-3.5和DocsGPT的表现优于GPT-4(分别为P=.003和P<.001)。意义分析的保留表明,对于共同条件,DermGPT在整体阅读便利性方面排名最高,患者的可理解性,和准确性(14.75/15,98%);对于罕见的情况,GPT-4产生的施舍排名最高(14.5/15,97%)。
    结论:GPT-4的表现似乎优于ChatGPT-3.5,DocsGPT,和DermGPT在五年级FKRL的常见和罕见的情况下,尽管ChatGPT-3.5和DocsGPT在7级FKRL中在罕见情况下的表现均优于GPT-4。LLM生产的PEM可以可靠地满足七级FKRL的选择常见和罕见的皮肤病,并且易于阅读,患者可以理解,而且大多是准确的。LLM可能在提高健康素养和传播无障碍方面发挥作用,在皮肤病学中可以理解的PEM。
    BACKGROUND: Dermatologic patient education materials (PEMs) are often written above the national average seventh- to eighth-grade reading level. ChatGPT-3.5, GPT-4, DermGPT, and DocsGPT are large language models (LLMs) that are responsive to user prompts. Our project assesses their use in generating dermatologic PEMs at specified reading levels.
    OBJECTIVE: This study aims to assess the ability of select LLMs to generate PEMs for common and rare dermatologic conditions at unspecified and specified reading levels. Further, the study aims to assess the preservation of meaning across such LLM-generated PEMs, as assessed by dermatology resident trainees.
    METHODS: The Flesch-Kincaid reading level (FKRL) of current American Academy of Dermatology PEMs was evaluated for 4 common (atopic dermatitis, acne vulgaris, psoriasis, and herpes zoster) and 4 rare (epidermolysis bullosa, bullous pemphigoid, lamellar ichthyosis, and lichen planus) dermatologic conditions. We prompted ChatGPT-3.5, GPT-4, DermGPT, and DocsGPT to \"Create a patient education handout about [condition] at a [FKRL]\" to iteratively generate 10 PEMs per condition at unspecified fifth- and seventh-grade FKRLs, evaluated with Microsoft Word readability statistics. The preservation of meaning across LLMs was assessed by 2 dermatology resident trainees.
    RESULTS: The current American Academy of Dermatology PEMs had an average (SD) FKRL of 9.35 (1.26) and 9.50 (2.3) for common and rare diseases, respectively. For common diseases, the FKRLs of LLM-produced PEMs ranged between 9.8 and 11.21 (unspecified prompt), between 4.22 and 7.43 (fifth-grade prompt), and between 5.98 and 7.28 (seventh-grade prompt). For rare diseases, the FKRLs of LLM-produced PEMs ranged between 9.85 and 11.45 (unspecified prompt), between 4.22 and 7.43 (fifth-grade prompt), and between 5.98 and 7.28 (seventh-grade prompt). At the fifth-grade reading level, GPT-4 was better at producing PEMs for both common and rare conditions than ChatGPT-3.5 (P=.001 and P=.01, respectively), DermGPT (P<.001 and P=.03, respectively), and DocsGPT (P<.001 and P=.02, respectively). At the seventh-grade reading level, no significant difference was found between ChatGPT-3.5, GPT-4, DocsGPT, or DermGPT in producing PEMs for common conditions (all P>.05); however, for rare conditions, ChatGPT-3.5 and DocsGPT outperformed GPT-4 (P=.003 and P<.001, respectively). The preservation of meaning analysis revealed that for common conditions, DermGPT ranked the highest for overall ease of reading, patient understandability, and accuracy (14.75/15, 98%); for rare conditions, handouts generated by GPT-4 ranked the highest (14.5/15, 97%).
    CONCLUSIONS: GPT-4 appeared to outperform ChatGPT-3.5, DocsGPT, and DermGPT at the fifth-grade FKRL for both common and rare conditions, although both ChatGPT-3.5 and DocsGPT performed better than GPT-4 at the seventh-grade FKRL for rare conditions. LLM-produced PEMs may reliably meet seventh-grade FKRLs for select common and rare dermatologic conditions and are easy to read, understandable for patients, and mostly accurate. LLMs may play a role in enhancing health literacy and disseminating accessible, understandable PEMs in dermatology.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Patient education materials (pems) are frequently used to help patients make cancer screening decisions. However, because pems are typically developed by experts, they might inadequately address patient barriers to screening. We co-created, with patients, a prostate cancer (pca) screening pem, and we compared how the co-created pem and a pem developed by experts affected decisional conflict and screening intention in patients.
    We identified and used patient barriers to pca screening to co-create a pca screening pem with patients, clinicians, and researchers. We then conducted a parallel-group randomized controlled trial with men 40 years of age and older in Ontario to compare decisional conflict and intention about pca screening after those men had viewed the co-created pem (intervention) or an expert-created pem (control). Participants were randomized using dynamic block randomization, and the study team was blinded to the allocation.
    Of 287 participants randomized to exposure to the co-created pem, 230 were analyzed, and of 287 randomized to exposure to the expert-created pem, 223 were analyzed. After pem exposure, intervention and control participants did not differ significantly in Decisional Conflict Scale scores [mean difference: 0.37 ± 1.23; 95% confidence interval (ci): -2.05 to 2.79]; in sure (Sure of myself, Understand information, Risk-benefit ratio, or Encouragement) scores (odds ratio: 0.75; 95% ci: 0.52 to 1.08); or in screening intention (mean difference: 0.09 ± 0.08; 95% ci: -0.06 to 0.24]).
    The effectiveness of the co-created pem did not differ from that of the pem developed by experts. Thus, pem developers should choose the method that best fits their goals and resources.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

公众号