generative

生成
  • 文章类型: Journal Article
    背景:尽管在实施方面存在不确定性,人工智能驱动的生成语言模型(GLM)在医学领域具有巨大的潜力。GLM的部署可以提高患者对临床文本的理解,并改善低健康素养。
    目的:本研究的目的是评估ChatGPT-3.5和GPT-4的潜力,以适应患者特定输入教育水平的医疗信息的复杂性,这是至关重要的,如果它是作为解决低健康素养的工具。
    方法:设计了与2种常见慢性疾病-II型糖尿病和高血压-相关的输入模板。针对假设的患者教育水平调整每个临床小插图,以评估输出个性化。要评估GLM(GPT-3.5和GPT-4)在定制输出编写方面的成功,使用Flesch阅读缓解评分(FKRE)和Flesch-Kincaid等级(FKGL)对转换前后输出的可读性进行量化.
    结果:使用GPT-3.5和GPT-4在2个临床小插曲中产生反应(n=80)。对于GPT-3.5,FKRE平均值为57.75(SD4.75),51.28(标准差5.14),32.28(标准差4.52),六年级为28.31(SD5.22),8年级,高中,和单身汉,分别;FKGL平均得分为9.08(SD0.90),10.27(标准差1.06),13.4(标准差0.80),和13.74(标准差1.18)。GPT-3.5仅与学士学位的预设教育水平保持一致。相反,GPT-4的FKRE平均得分为74.54(SD2.6),71.25(标准差4.96),47.61(标准差6.13),和13.71(标准差5.77),FKGL平均得分为6.3(SD0.73),6.7(标准差1.11),11.09(标准差1.26),和17.03(标准差1.11),分别为相同的教育水平。GPT-4符合除6级FKRE平均值外的所有组的目标可读性。两种GLM的产出均具有统计学上的显着差异(P<.001;8年级P<.001;高中P<.001;学士P=.003;FKGL:6年级P=.001;8年级P<.001;高中P<.001;学士P<.001)。
    结论:GLM可以根据输入指定的教育来改变医学文本输出的结构和可读性。然而,GLM将输入教育指定分类为3个广泛的输出可读性等级:容易(6年级和8年级),中等(高中),和困难(学士学位)。这是第一个结果表明GLM在输出文本简化方面的成功存在更广泛的界限。未来的研究必须确定GLM如何可靠地将医学文本个性化到预定的教育水平,以便对医疗保健素养产生更广泛的影响。
    BACKGROUND: Although uncertainties exist regarding implementation, artificial intelligence-driven generative language models (GLMs) have enormous potential in medicine. Deployment of GLMs could improve patient comprehension of clinical texts and improve low health literacy.
    OBJECTIVE: The goal of this study is to evaluate the potential of ChatGPT-3.5 and GPT-4 to tailor the complexity of medical information to patient-specific input education level, which is crucial if it is to serve as a tool in addressing low health literacy.
    METHODS: Input templates related to 2 prevalent chronic diseases-type II diabetes and hypertension-were designed. Each clinical vignette was adjusted for hypothetical patient education levels to evaluate output personalization. To assess the success of a GLM (GPT-3.5 and GPT-4) in tailoring output writing, the readability of pre- and posttransformation outputs were quantified using the Flesch reading ease score (FKRE) and the Flesch-Kincaid grade level (FKGL).
    RESULTS: Responses (n=80) were generated using GPT-3.5 and GPT-4 across 2 clinical vignettes. For GPT-3.5, FKRE means were 57.75 (SD 4.75), 51.28 (SD 5.14), 32.28 (SD 4.52), and 28.31 (SD 5.22) for 6th grade, 8th grade, high school, and bachelor\'s, respectively; FKGL mean scores were 9.08 (SD 0.90), 10.27 (SD 1.06), 13.4 (SD 0.80), and 13.74 (SD 1.18). GPT-3.5 only aligned with the prespecified education levels at the bachelor\'s degree. Conversely, GPT-4\'s FKRE mean scores were 74.54 (SD 2.6), 71.25 (SD 4.96), 47.61 (SD 6.13), and 13.71 (SD 5.77), with FKGL mean scores of 6.3 (SD 0.73), 6.7 (SD 1.11), 11.09 (SD 1.26), and 17.03 (SD 1.11) for the same respective education levels. GPT-4 met the target readability for all groups except the 6th-grade FKRE average. Both GLMs produced outputs with statistically significant differences (P<.001; 8th grade P<.001; high school P<.001; bachelors P=.003; FKGL: 6th grade P=.001; 8th grade P<.001; high school P<.001; bachelors P<.001) between mean FKRE and FKGL across input education levels.
    CONCLUSIONS: GLMs can change the structure and readability of medical text outputs according to input-specified education. However, GLMs categorize input education designation into 3 broad tiers of output readability: easy (6th and 8th grade), medium (high school), and difficult (bachelor\'s degree). This is the first result to suggest that there are broader boundaries in the success of GLMs in output text simplification. Future research must establish how GLMs can reliably personalize medical texts to prespecified education levels to enable a broader impact on health care literacy.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    这项研究调查了如何减少护士重复的电子护理记录任务。我们通过学习使用虚拟患者数据实践的护理记录数据来应用生成AI。我们的目标是评估生成AI的有用性,可用性,以及应用于护理记录创建任务时的可用性。通过电子护理记录系统收集的护理记录数据,针对没有隐私问题的护生,FocusDAR,SOAPIE,和叙述记录。我们培训了5万份护理记录数据,并通过生成AI和微调来提升性能。使用单独的API与实践电子护理记录系统连接,和一所大学医院的40名经验丰富的护士进行了测试。电子护理记录,通过生成AI,预计将有助于减轻护士的工作量。
    This study investigates how to reduce nurses\' repetitive electronic nursing record tasks. We applied generative AI by learning nursing record data practiced with virtual patient data. We aim to evaluate generative AI\'s usefulness, usability, and availability when applied to nursing record creation tasks. The nursing record data collected through the electronic nursing record system for nursing students without privacy issues is in the form of NANDA, FocusDAR, SOAPIE, and narrative records. We trained 50,000 nursing record data and upgraded the performance through generative AI and fine-tuning. A separate API was used to connect with the practice electronic nursing record system, and 40 experienced nurses from a university hospital conducted tests. The electronic nursing record, through generative AI, is expected to contribute to easing the workload of nurses.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:ChatGPT的快速发展引起了极大的兴趣,并在公共和学术领域引起了广泛的讨论,特别是在医学教育的背景下。
    目的:本研究旨在通过与三年级医学生的对比分析,评估ChatGPT在肺病学检查中的表现。
    方法:在这项横断面研究中,我们对2个不同的组进行了比较分析.第一组包括244名三年级医学生,他们以前参加过我们机构2020年的肺科检查,这是用法语进行的。第二组将ChatGPT-3.5分为两组对话:没有语境化(V1)和语境化(V2)。在V1和V2中,ChatGPT收到了对学生的相同问题集。
    结果:V1在放射学方面表现出非凡的熟练程度,微生物学,和胸外科,超过了这些领域的大多数医学生。然而,它面临着病理学的挑战,药理学,和临床肺炎。相比之下,V2在各种问题类别中始终提供更准确的回答,无论专业化。与医学生相比,ChatGPT在多项选择题中表现欠佳。V2擅长回答结构化的开放式问题。两次ChatGPT谈话,特别是V2,在解决低难度和中等难度的问题方面优于学生。有趣的是,学生在面对极具挑战性的问题时表现出更高的熟练程度。V1未能通过考试。相反,V2成功取得考试成功,表现优于139名(62.1%)医学生。
    结论:虽然ChatGPT可以访问基于Web的全面数据集,它的表现与普通医学生的表现非常相似。结果受问题格式的影响,项目复杂性,和上下文细微差别。该模型在需要信息综合的医疗环境中面临挑战,先进的分析能力,和临床判断,以及在非英语语言评估中以及面对主流互联网来源之外的数据时。
    BACKGROUND: The rapid evolution of ChatGPT has generated substantial interest and led to extensive discussions in both public and academic domains, particularly in the context of medical education.
    OBJECTIVE: This study aimed to evaluate ChatGPT\'s performance in a pulmonology examination through a comparative analysis with that of third-year medical students.
    METHODS: In this cross-sectional study, we conducted a comparative analysis with 2 distinct groups. The first group comprised 244 third-year medical students who had previously taken our institution\'s 2020 pulmonology examination, which was conducted in French. The second group involved ChatGPT-3.5 in 2 separate sets of conversations: without contextualization (V1) and with contextualization (V2). In both V1 and V2, ChatGPT received the same set of questions administered to the students.
    RESULTS: V1 demonstrated exceptional proficiency in radiology, microbiology, and thoracic surgery, surpassing the majority of medical students in these domains. However, it faced challenges in pathology, pharmacology, and clinical pneumology. In contrast, V2 consistently delivered more accurate responses across various question categories, regardless of the specialization. ChatGPT exhibited suboptimal performance in multiple choice questions compared to medical students. V2 excelled in responding to structured open-ended questions. Both ChatGPT conversations, particularly V2, outperformed students in addressing questions of low and intermediate difficulty. Interestingly, students showcased enhanced proficiency when confronted with highly challenging questions. V1 fell short of passing the examination. Conversely, V2 successfully achieved examination success, outperforming 139 (62.1%) medical students.
    CONCLUSIONS: While ChatGPT has access to a comprehensive web-based data set, its performance closely mirrors that of an average medical student. Outcomes are influenced by question format, item complexity, and contextual nuances. The model faces challenges in medical contexts requiring information synthesis, advanced analytical aptitude, and clinical judgment, as well as in non-English language assessments and when confronted with data outside mainstream internet sources.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    OpenAI对ChatGPT的引入引起了极大的关注。在其能力中,释义突出。
    本研究旨在调查该聊天机器人产生的释义文本中剽窃的令人满意的水平。
    向ChatGPT提交了三个不同长度的文本。然后指示ChatGPT使用五个不同的提示来解释所提供的文本。在研究的后续阶段,案文分为不同的段落,ChatGPT被要求单独解释每个段落。最后,在第三阶段,ChatGPT被要求解释它以前生成的文本。
    ChatGPT生成的文本中的平均抄袭率为45%(SD10%)。ChatGPT在提供的文本中表现出抄袭的大幅减少(平均差异-0.51,95%CI-0.54至-0.48;P<.001)。此外,当将第二次尝试与初始尝试进行比较时,抄袭率显着下降(平均差-0.06,95%CI-0.08至-0.03;P<.001)。文本中的段落数量表明与抄袭的百分比有值得注意的关联,由单个段落组成的文本表现出最低的抄袭率(P<.001)。
    尽管ChatGPT显著减少了文本中的抄袭,现有的抄袭水平仍然相对较高。这突显了研究人员在将这种聊天机器人纳入他们的工作时的关键谨慎。
    UNASSIGNED: The introduction of ChatGPT by OpenAI has garnered significant attention. Among its capabilities, paraphrasing stands out.
    UNASSIGNED: This study aims to investigate the satisfactory levels of plagiarism in the paraphrased text produced by this chatbot.
    UNASSIGNED: Three texts of varying lengths were presented to ChatGPT. ChatGPT was then instructed to paraphrase the provided texts using five different prompts. In the subsequent stage of the study, the texts were divided into separate paragraphs, and ChatGPT was requested to paraphrase each paragraph individually. Lastly, in the third stage, ChatGPT was asked to paraphrase the texts it had previously generated.
    UNASSIGNED: The average plagiarism rate in the texts generated by ChatGPT was 45% (SD 10%). ChatGPT exhibited a substantial reduction in plagiarism for the provided texts (mean difference -0.51, 95% CI -0.54 to -0.48; P<.001). Furthermore, when comparing the second attempt with the initial attempt, a significant decrease in the plagiarism rate was observed (mean difference -0.06, 95% CI -0.08 to -0.03; P<.001). The number of paragraphs in the texts demonstrated a noteworthy association with the percentage of plagiarism, with texts consisting of a single paragraph exhibiting the lowest plagiarism rate (P<.001).
    UNASSIGNED: Although ChatGPT demonstrates a notable reduction of plagiarism within texts, the existing levels of plagiarism remain relatively high. This underscores a crucial caution for researchers when incorporating this chatbot into their work.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:评估ChatGPT-4的能力,这是一种由人工智能(AI)驱动的自动化Chatbot,回答与前肩关节不稳定患者的Latarjet程序有关的常见患者问题,并将此性能与Google搜索引擎进行比较。
    方法:使用先前验证的方法,首先使用查询“Latarjet”进行了Google搜索。“随后,提取了十大常见问题(FAQ)和相关来源.然后,ChatGPT-4被提示提供关于该程序的十大常见问题和答案。重复此过程以识别需要离散数字答案的其他常见问题解答,以便在ChatGPT-4和Google之间进行比较。离散,随后,根据两名研究金训练的运动医学外科医生对搜索平台不知情的临床判断,对数字答案的准确性进行了评估.
    结果:ChatGPT-4对数字答案的平均(±标准偏差)准确度为2.9±0.9,而Google为2.5±1.4(p=0.65)。ChatGPT-4仅从学术来源获得答案的信息,这与谷歌搜索引擎(p=0.003)显著不同,仅使用30%的学术来源和网站来自个人外科医生(50%)和更大的医疗实践(20%)。对于一般常见问题,在比较ChatGPT-4和Google搜索引擎时,发现40%的常见问题解答是相同的。就用来回答这些问题的来源而言,ChatGPT-4再次使用了100%的学术资源,谷歌搜索引擎使用了60%的学术资源,20%的外科医生个人网站,和20%的医疗实践(p=0.087)。
    结论:ChatGPT-4证明了响应患者询问提供有关Latarjet程序的准确可靠信息的能力,在所有情况下使用多个学术来源。这与谷歌搜索引擎相反,更频繁地使用单外科医生和大型医疗实践网站。尽管为执行信息检索任务而访问的资源存在差异,ChatGPT-4和GoogleSearchEngine的临床相关性和所提供信息的准确性无显著差异.
    OBJECTIVE: To assess the ability of ChatGPT-4, an automated Chatbot powered by artificial intelligence, to answer common patient questions concerning the Latarjet procedure for patients with anterior shoulder instability and compare this performance with Google Search Engine.
    METHODS: Using previously validated methods, a Google search was first performed using the query \"Latarjet.\" Subsequently, the top 10 frequently asked questions (FAQs) and associated sources were extracted. ChatGPT-4 was then prompted to provide the top 10 FAQs and answers concerning the procedure. This process was repeated to identify additional FAQs requiring discrete-numeric answers to allow for a comparison between ChatGPT-4 and Google. Discrete, numeric answers were subsequently assessed for accuracy on the basis of the clinical judgment of 2 fellowship-trained sports medicine surgeons who were blinded to search platform.
    RESULTS: Mean (± standard deviation) accuracy to numeric-based answers was 2.9 ± 0.9 for ChatGPT-4 versus 2.5 ± 1.4 for Google (P = .65). ChatGPT-4 derived information for answers only from academic sources, which was significantly different from Google Search Engine (P = .003), which used only 30% academic sources and websites from individual surgeons (50%) and larger medical practices (20%). For general FAQs, 40% of FAQs were found to be identical when comparing ChatGPT-4 and Google Search Engine. In terms of sources used to answer these questions, ChatGPT-4 again used 100% academic resources, whereas Google Search Engine used 60% academic resources, 20% surgeon personal websites, and 20% medical practices (P = .087).
    CONCLUSIONS: ChatGPT-4 demonstrated the ability to provide accurate and reliable information about the Latarjet procedure in response to patient queries, using multiple academic sources in all cases. This was in contrast to Google Search Engine, which more frequently used single-surgeon and large medical practice websites. Despite differences in the resources accessed to perform information retrieval tasks, the clinical relevance and accuracy of information provided did not significantly differ between ChatGPT-4 and Google Search Engine.
    CONCLUSIONS: Commercially available large language models (LLMs), such as ChatGPT-4, can perform diverse information retrieval tasks on-demand. An important medical information retrieval application for LLMs consists of the ability to provide comprehensive, relevant, and accurate information for various use cases such as investigation about a recently diagnosed medical condition or procedure. Understanding the performance and abilities of LLMs for use cases has important implications for deployment within health care settings.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    介观光声成像(PAI)可以实现组织中血管网络的无标记可视化,具有高对比度和分辨率。从3DPAI数据中分割这些网络并解释其生理和病理意义是至关重要的,但由于当前方法的耗时和易错性质,因此具有挑战性。深度学习提供了一个潜在的解决方案;然而,监督分析框架通常需要人工注释的地面实况标签。为了解决这个问题,引入了一种无监督的图像到图像翻译深度学习模型,船舶分段生成对抗网络(VAN-GAN)。VAN-GAN将与现实生活解剖结构非常相似的合成血管网络集成到其训练过程中,并学习复制PAI系统的基础物理原理,以学习如何从3D光声图像中分割脉管系统。适用于各种各样的计算机模拟,在体外,和体内数据,包括患者来源的乳腺癌异种移植模型和3D临床血管造影,VAN-GAN展示了其促进3D血管网络的准确和无偏分割的能力。通过利用合成数据,VAN-GAN减少了对手动标签的依赖,从而降低了进入高质量血管分割的门槛(F1评分:VAN-GANvs.U-Net=0.84vs.0.87),并加强对血管结构和功能的临床前和临床研究。
    Mesoscopic photoacoustic imaging (PAI) enables label-free visualization of vascular networks in tissues with high contrast and resolution. Segmenting these networks from 3D PAI data and interpreting their physiological and pathological significance is crucial yet challenging due to the time-consuming and error-prone nature of current methods. Deep learning offers a potential solution; however, supervised analysis frameworks typically require human-annotated ground-truth labels. To address this, an unsupervised image-to-image translation deep learning model is introduced, the Vessel Segmentation Generative Adversarial Network (VAN-GAN). VAN-GAN integrates synthetic blood vessel networks that closely resemble real-life anatomy into its training process and learns to replicate the underlying physics of the PAI system in order to learn how to segment vasculature from 3D photoacoustic images. Applied to a diverse range of in silico, in vitro, and in vivo data, including patient-derived breast cancer xenograft models and 3D clinical angiograms, VAN-GAN demonstrates its capability to facilitate accurate and unbiased segmentation of 3D vascular networks. By leveraging synthetic data, VAN-GAN reduces the reliance on manual labeling, thus lowering the barrier to entry for high-quality blood vessel segmentation (F1 score: VAN-GAN vs. U-Net = 0.84 vs. 0.87) and enhancing preclinical and clinical research into vascular structure and function.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    由于符号任务计划易于理解和部署在工程机器人体系结构中,因此它是一种广泛用于实施机器人自主性的方法。然而,符号任务计划的技术很难在现实世界中扩展,高度动态,人机协作方案,因为在行动效果可能不立竿见影的规划领域表现不佳,或由于机器人工作区环境的变化而需要频繁的重新规划。长期计划的有效性,计划长度,和计划时间可能会阻碍机器人的效率,并对整体人机交互的流畅性产生负面影响。我们提出了一个框架,我们称之为照烧,专门旨在弥合符号任务计划和机器学习方法之间的差距。其基本原理是训练大型语言模型(LLM),即GPT-3,与计划域定义语言(PDDL)兼容的神经符号任务计划器,然后利用其生成能力来克服象征性任务计划者固有的一些限制。潜在的好处包括(I)在规划域复杂性增加的情况下具有更好的可扩展性,由于LLM的响应时间与输入和输出的组合长度成线性比例,而不是像象征性任务计划者那样的超线性,和(ii)综合行动而不是端到端的计划的能力,并使每个操作在生成后立即可执行,而不是等待整个计划可用,这反过来又实现了并发计划和执行。在过去的一年里,研究界已经付出了巨大的努力来评估LLM的整体认知能力,交替的成功。相反,使用Teriyaki,我们的目标是在特定规划领域提供与传统规划师相当的总体规划性能,在利用其他指标中的LLM功能的同时,特别是那些与他们的短期和中期生成能力有关的,用于构建前瞻预测规划模型。选定领域的初步结果表明,我们的方法可以:(i)解决1,000个样本的测试数据集中的95.5%的问题;(ii)产生比传统符号计划者短13.5%的计划;(iii)将计划可用性的平均总等待时间减少多达61.4%。
    Symbolic task planning is a widely used approach to enforce robot autonomy due to its ease of understanding and deployment in engineered robot architectures. However, techniques for symbolic task planning are difficult to scale in real-world, highly dynamic, human-robot collaboration scenarios because of the poor performance in planning domains where action effects may not be immediate, or when frequent re-planning is needed due to changed circumstances in the robot workspace. The validity of plans in the long term, plan length, and planning time could hinder the robot\'s efficiency and negatively affect the overall human-robot interaction\'s fluency. We present a framework, which we refer to as Teriyaki, specifically aimed at bridging the gap between symbolic task planning and machine learning approaches. The rationale is training Large Language Models (LLMs), namely GPT-3, into a neurosymbolic task planner compatible with the Planning Domain Definition Language (PDDL), and then leveraging its generative capabilities to overcome a number of limitations inherent to symbolic task planners. Potential benefits include (i) a better scalability in so far as the planning domain complexity increases, since LLMs\' response time linearly scales with the combined length of the input and the output, instead of super-linearly as in the case of symbolic task planners, and (ii) the ability to synthesize a plan action-by-action instead of end-to-end, and to make each action available for execution as soon as it is generated instead of waiting for the whole plan to be available, which in turn enables concurrent planning and execution. In the past year, significant efforts have been devoted by the research community to evaluate the overall cognitive capabilities of LLMs, with alternate successes. Instead, with Teriyaki we aim to providing an overall planning performance comparable to traditional planners in specific planning domains, while leveraging LLMs capabilities in other metrics, specifically those related to their short- and mid-term generative capabilities, which are used to build a look-ahead predictive planning model. Preliminary results in selected domains show that our method can: (i) solve 95.5% of problems in a test data set of 1,000 samples; (ii) produce plans up to 13.5% shorter than a traditional symbolic planner; (iii) reduce average overall waiting times for a plan availability by up to 61.4%.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:人工智能(AI)的集成,特别是深度学习模型,改变了医疗技术的格局,特别是在使用成像和生理数据的诊断领域。在耳鼻喉科,AI在中耳疾病的图像分类中显示出希望。然而,现有的模型通常缺乏患者特定的数据和临床背景,限制其普遍适用性。GPT-4Vision(GPT-4V)的出现使得多模态诊断方法成为可能,将语言处理与图像分析相结合。
    目的:在本研究中,我们通过整合患者特异性数据和耳镜下鼓膜图像,研究了GPT-4V在诊断中耳疾病中的有效性.
    方法:本研究的设计分为两个阶段:(1)建立具有适当提示的模型和(2)验证最佳提示模型对图像进行分类的能力。总的来说,305个中耳疾病的耳镜图像(急性中耳炎,中耳胆脂瘤,慢性中耳炎,和渗出性中耳炎)来自2010年4月至2023年12月期间访问新州大学或济池医科大学的患者。使用提示和患者数据建立优化的GPT-4V设置,并使用最佳提示创建的模型来验证GPT-4V在190张图像上的诊断准确性。为了比较GPT-4V与医生的诊断准确性,30名临床医生完成了由190张图像组成的基于网络的问卷。
    结果:多模态人工智能方法实现了82.1%的准确率,优于认证儿科医生的70.6%,但落后于耳鼻喉科医生的95%以上。该模型对急性中耳炎的疾病特异性准确率为89.2%,76.5%为慢性中耳炎,79.3%为中耳胆脂瘤,渗出性中耳炎占85.7%,这突出了对疾病特异性优化的需求。与医生的比较显示了有希望的结果,提示GPT-4V增强临床决策的潜力。
    结论:尽管有其优势,必须解决数据隐私和道德考虑等挑战。总的来说,这项研究强调了多模式AI在提高诊断准确性和改善耳鼻喉科患者护理方面的潜力.需要进一步的研究以在不同的临床环境中优化和验证这种方法。
    The integration of artificial intelligence (AI), particularly deep learning models, has transformed the landscape of medical technology, especially in the field of diagnosis using imaging and physiological data. In otolaryngology, AI has shown promise in image classification for middle ear diseases. However, existing models often lack patient-specific data and clinical context, limiting their universal applicability. The emergence of GPT-4 Vision (GPT-4V) has enabled a multimodal diagnostic approach, integrating language processing with image analysis.
    In this study, we investigated the effectiveness of GPT-4V in diagnosing middle ear diseases by integrating patient-specific data with otoscopic images of the tympanic membrane.
    The design of this study was divided into two phases: (1) establishing a model with appropriate prompts and (2) validating the ability of the optimal prompt model to classify images. In total, 305 otoscopic images of 4 middle ear diseases (acute otitis media, middle ear cholesteatoma, chronic otitis media, and otitis media with effusion) were obtained from patients who visited Shinshu University or Jichi Medical University between April 2010 and December 2023. The optimized GPT-4V settings were established using prompts and patients\' data, and the model created with the optimal prompt was used to verify the diagnostic accuracy of GPT-4V on 190 images. To compare the diagnostic accuracy of GPT-4V with that of physicians, 30 clinicians completed a web-based questionnaire consisting of 190 images.
    The multimodal AI approach achieved an accuracy of 82.1%, which is superior to that of certified pediatricians at 70.6%, but trailing behind that of otolaryngologists at more than 95%. The model\'s disease-specific accuracy rates were 89.2% for acute otitis media, 76.5% for chronic otitis media, 79.3% for middle ear cholesteatoma, and 85.7% for otitis media with effusion, which highlights the need for disease-specific optimization. Comparisons with physicians revealed promising results, suggesting the potential of GPT-4V to augment clinical decision-making.
    Despite its advantages, challenges such as data privacy and ethical considerations must be addressed. Overall, this study underscores the potential of multimodal AI for enhancing diagnostic accuracy and improving patient care in otolaryngology. Further research is warranted to optimize and validate this approach in diverse clinical settings.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    耳鼻咽喉头颈外科人工智能(AI)生成模型的日益发展将逐步改变我们的实践。从业者和患者可以访问AI资源,改善信息,知识,和病人护理的实践。本文总结了目前研究的AI生成模型的应用,特别是Chatbot生成预训练变压器,耳鼻咽喉头颈外科.
    The increasing development of artificial intelligence (AI) generative models in otolaryngology-head and neck surgery will progressively change our practice. Practitioners and patients have access to AI resources, improving information, knowledge, and practice of patient care. This article summarizes the currently investigated applications of AI generative models, particularly Chatbot Generative Pre-trained Transformer, in otolaryngology-head and neck surgery.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Letter
    暂无摘要。
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号