prompt design

  • 文章类型: Journal Article
    背景:为诊断认知障碍而定制的人工智能模型已显示出出色的结果。然而,尚不清楚大型语言模型是否可以仅通过文本与专门模型竞争。
    目的:在本研究中,我们探讨了ChatGPT在轻度认知障碍(MCI)初筛中的表现,并对提示的设计步骤和组成部分进行了标准化.
    方法:我们从DementiaBank筛查中收集了总共174名参与者,并将其中70%的参与者分类为训练集,将其中30%的参与者分类为测试集。只保留文本对话。使用宏代码清理句子,然后是手动检查。该提示由5个主要部分组成,包括字符设置,评分系统设置,指标设置,输出设置,和解释性信息设置。包括已发表研究的变量的三个维度:词汇(即,词频和词比,短语频率和短语比例,和词汇复杂性),语法和语法(即,句法复杂性和语法成分),和语义(即,语义密度和语义连贯)。我们使用了R4.3.0。用于分析变量和诊断指标。
    结果:与MCI严重程度相关的三个额外指标被纳入模型的最终提示中。这些指标有效区分MCI和认知正常参与者:舌尖现象(P<.001),有复杂想法的困难(P<.001),和内存问题(P<.001)。最终的GPT-4模型在训练集上的灵敏度为0.8636,特异性为0.9487,曲线下面积为0.9062;在测试集上,灵敏度,特异性,曲线下面积分别达到0.7727、0.8333和0.8030。
    结论:ChatGPT在初步筛查可能患有MCI的参与者中是有效的。临床医生改进的提示标准化也将改进模型的性能。重要的是要注意,ChatGPT不能代替临床医生进行诊断。
    Artificial intelligence models tailored to diagnose cognitive impairment have shown excellent results. However, it is unclear whether large linguistic models can rival specialized models by text alone.
    In this study, we explored the performance of ChatGPT for primary screening of mild cognitive impairment (MCI) and standardized the design steps and components of the prompts.
    We gathered a total of 174 participants from the DementiaBank screening and classified 70% of them into the training set and 30% of them into the test set. Only text dialogues were kept. Sentences were cleaned using a macro code, followed by a manual check. The prompt consisted of 5 main parts, including character setting, scoring system setting, indicator setting, output setting, and explanatory information setting. Three dimensions of variables from published studies were included: vocabulary (ie, word frequency and word ratio, phrase frequency and phrase ratio, and lexical complexity), syntax and grammar (ie, syntactic complexity and grammatical components), and semantics (ie, semantic density and semantic coherence). We used R 4.3.0. for the analysis of variables and diagnostic indicators.
    Three additional indicators related to the severity of MCI were incorporated into the final prompt for the model. These indicators were effective in discriminating between MCI and cognitively normal participants: tip-of-the-tongue phenomenon (P<.001), difficulty with complex ideas (P<.001), and memory issues (P<.001). The final GPT-4 model achieved a sensitivity of 0.8636, a specificity of 0.9487, and an area under the curve of 0.9062 on the training set; on the test set, the sensitivity, specificity, and area under the curve reached 0.7727, 0.8333, and 0.8030, respectively.
    ChatGPT was effective in the primary screening of participants with possible MCI. Improved standardization of prompts by clinicians would also improve the performance of the model. It is important to note that ChatGPT is not a substitute for a clinician making a diagnosis.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    诸如ChatGPT之类的大型语言模型(LLM)由于在许多实际任务上的出色表现最近引起了极大的关注。这些模型还证明了促进各种生物医学任务的潜力。然而,人们对它们在生物医学信息检索方面的潜力知之甚少,尤其是确定药物与疾病的关联。本研究旨在探索ChatGPT的潜力,一个受欢迎的LLM,在辨别力的药物-疾病关联中。我们收集了2694个真实的药物-疾病关联和5662个虚假的药物-疾病对。我们的方法涉及创建各种提示以指导ChatGPT识别这些关联。在不同的提示设计下,ChatGPT识别药物-疾病关联的能力,真假对的准确率为74.6-83.5%和96.2-97.6%,分别。这项研究表明,ChatGPT具有识别药物-疾病关联的潜力,并且可以作为搜索药学相关信息的有用工具。然而,其见解的准确性值得在医疗实践中实施之前进行全面检查。
    Large language models (LLMs) such as ChatGPT have recently attracted significant attention due to their impressive performance on many real-world tasks. These models have also demonstrated the potential in facilitating various biomedical tasks. However, little is known of their potential in biomedical information retrieval, especially identifying drug-disease associations. This study aims to explore the potential of ChatGPT, a popular LLM, in discerning drug-disease associations. We collected 2694 true drug-disease associations and 5662 false drug-disease pairs. Our approach involved creating various prompts to instruct ChatGPT in identifying these associations. Under varying prompt designs, ChatGPT\'s capability to identify drug-disease associations with an accuracy of 74.6-83.5% and 96.2-97.6% for the true and false pairs, respectively. This study shows that ChatGPT has the potential in identifying drug-disease associations and may serve as a helpful tool in searching pharmacy-related information. However, the accuracy of its insights warrants comprehensive examination before its implementation in medical practice.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景本研究旨在评估ChatGPT的疗效,先进的自然语言处理模型,通过比较和对比不同的指南来源来适应和综合糖尿病酮症酸中毒(DKA)的临床指南。方法我们采用了全面的比较方法,并检查了三个著名的指南来源:加拿大糖尿病临床实践指南专家委员会(2018),初级保健中高血糖的应急管理,联合英国糖尿病协会(JBDS)02成人糖尿病酮症酸中毒的管理。数据提取侧重于诊断标准,危险因素,症状和体征,调查,和治疗建议。我们比较了ChatGPT生成的综合指南,并确定了任何误报或未报告的错误。结果ChatGPT能够生成比较指南的综合表格。然而,多个反复出现的错误,包括误报和未报告错误,被确认,使结果不可靠。此外,在重复报告数据中观察到不一致.该研究强调了使用ChatGPT在没有专家人工干预的情况下适应临床指南的局限性。结论虽然ChatGPT证明了临床指南合成的潜力,多次反复出现的错误和不一致现象的存在凸显了专家人工干预和验证的必要性.未来的研究应该集中在提高ChatGPT的准确性和可靠性上,以及探索其在临床实践和指南开发其他领域的潜在应用。
    Background This study aimed to evaluate the efficacy of ChatGPT, an advanced natural language processing model, in adapting and synthesizing clinical guidelines for diabetic ketoacidosis (DKA) by comparing and contrasting different guideline sources. Methodology We employed a comprehensive comparison approach and examined three reputable guideline sources: Diabetes Canada Clinical Practice Guidelines Expert Committee (2018), Emergency Management of Hyperglycaemia in Primary Care, and Joint British Diabetes Societies (JBDS) 02 The Management of Diabetic Ketoacidosis in Adults. Data extraction focused on diagnostic criteria, risk factors, signs and symptoms, investigations, and treatment recommendations. We compared the synthesized guidelines generated by ChatGPT and identified any misreporting or non-reporting errors. Results ChatGPT was capable of generating a comprehensive table comparing the guidelines. However, multiple recurrent errors, including misreporting and non-reporting errors, were identified, rendering the results unreliable. Additionally, inconsistencies were observed in the repeated reporting of data. The study highlights the limitations of using ChatGPT for the adaptation of clinical guidelines without expert human intervention. Conclusions Although ChatGPT demonstrates the potential for the synthesis of clinical guidelines, the presence of multiple recurrent errors and inconsistencies underscores the need for expert human intervention and validation. Future research should focus on improving the accuracy and reliability of ChatGPT, as well as exploring its potential applications in other areas of clinical practice and guideline development.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:驱逐是决定健康的重要社会和行为因素。驱逐与一系列可能导致失业的负面事件有关,住房不安全/无家可归,长期贫困,和心理健康问题。在这项研究中,我们开发了一个自然语言处理系统,从电子健康记录(EHR)笔记中自动检测驱逐状态。
    方法:我们首先定义了驱逐状态(驱逐存在和驱逐期),然后在退伍军人健康管理局(VHA)的5000份EHR笔记中注释了驱逐状态。我们开发了一个新的模型,KIRESH,这已经证明大大优于其他国家的最先进的模型,如微调预训练的语言模型,如BioBERT和Bio_ClinicalBERT。此外,我们设计了一个新颖的提示,通过使用驱逐存在和周期预测的两个子任务之间的内在联系来进一步提高模型性能。最后,我们在KIRESH-Prompt方法中使用了基于温度缩放的校准,以避免因不平衡数据集引起的过度自信问题.
    结果:KIRESH-Prompt的表现大大优于强基线模型,包括微调Bio_ClinicalBERT模型以实现0.74672MCC,0.71153宏观F1和0.83396微观F1预测驱逐期和0.66827MCC,预测驱逐存在的0.62734宏F1和0.7863微F1。我们还在基准健康社会决定因素(SBDH)数据集上进行了其他实验,以证明我们方法的普遍性。
    KIRESH-Prompt大大改善了驱逐状态分类。我们计划将KIRESH-Prompt部署到VHAEHR作为驱逐监视系统,以帮助解决美国退伍军人的住房不安全问题。
    Evictions are important social and behavioral determinants of health. Evictions are associated with a cascade of negative events that can lead to unemployment, housing insecurity/homelessness, long-term poverty, and mental health problems. In this study, we developed a natural language processing system to automatically detect eviction status from electronic health record (EHR) notes.
    We first defined eviction status (eviction presence and eviction period) and then annotated eviction status in 5000 EHR notes from the Veterans Health Administration (VHA). We developed a novel model, KIRESH, that has shown to substantially outperform other state-of-the-art models such as fine-tuning pretrained language models like BioBERT and Bio_ClinicalBERT. Moreover, we designed a novel prompt to further improve the model performance by using the intrinsic connection between the 2 subtasks of eviction presence and period prediction. Finally, we used the Temperature Scaling-based Calibration on our KIRESH-Prompt method to avoid overconfidence issues arising from the imbalance dataset.
    KIRESH-Prompt substantially outperformed strong baseline models including fine-tuning the Bio_ClinicalBERT model to achieve 0.74672 MCC, 0.71153 Macro-F1, and 0.83396 Micro-F1 in predicting eviction period and 0.66827 MCC, 0.62734 Macro-F1, and 0.7863 Micro-F1 in predicting eviction presence. We also conducted additional experiments on a benchmark social determinants of health (SBDH) dataset to demonstrate the generalizability of our methods.
    KIRESH-Prompt has substantially improved eviction status classification. We plan to deploy KIRESH-Prompt to the VHA EHRs as an eviction surveillance system to help address the US Veterans\' housing insecurity.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号