Large language model

大型语言模型
  • 文章类型: Journal Article
    目的:本研究旨在研究微调大语言模型(LLM)在将脑MRI报告分类为预处理时的功效,后处理,和非肿瘤病例。
    方法:这项回顾性研究包括759、284和164例脑MRI训练报告,验证,和测试数据集。放射科医生将报告分为三组:非肿瘤(第1组),治疗后肿瘤(第2组),治疗前肿瘤(组3)例。使用训练数据集对来自变形金刚日本模型的预训练的双向编码器表示进行微调,并在验证数据集上进行评估。选择在验证数据集上表现出最高准确性的模型作为最终模型。另外两名放射科医师参与对三组的测试数据集中的报告进行分类。将该模型在测试数据集上的性能与两名放射科医生的性能进行了比较。
    结果:微调LLM的总体准确度为0.970(95%CI:0.930-0.990)。模型对1/2/3组的敏感性为1.000/0.864/0.978。模型对1/2/3组的特异性为0.991/0.993/0.958。在准确性方面没有发现统计学上的显着差异,灵敏度,以及LLM和人类读者之间的特异性(p≥0.371)。LLM完成分类任务的速度比放射科医师快大约20-26倍。区分第2组和第3组与第1组的受试者工作特征曲线下面积为0.994(95%CI:0.982-1.000),区分第3组与第1组和第2组的受试者工作特征曲线下面积为0.992(95%CI:0.982-1.000)。
    结论:微调LLM在对脑部MRI报告进行分类方面与放射科医师表现出可比的表现,同时需要更少的时间。
    OBJECTIVE: This study aimed to investigate the efficacy of fine-tuned large language models (LLM) in classifying brain MRI reports into pretreatment, posttreatment, and nontumor cases.
    METHODS: This retrospective study included 759, 284, and 164 brain MRI reports for training, validation, and test dataset. Radiologists stratified the reports into three groups: nontumor (group 1), posttreatment tumor (group 2), and pretreatment tumor (group 3) cases. A pretrained Bidirectional Encoder Representations from Transformers Japanese model was fine-tuned using the training dataset and evaluated on the validation dataset. The model which demonstrated the highest accuracy on the validation dataset was selected as the final model. Two additional radiologists were involved in classifying reports in the test datasets for the three groups. The model\'s performance on test dataset was compared to that of two radiologists.
    RESULTS: The fine-tuned LLM attained an overall accuracy of 0.970 (95% CI: 0.930-0.990). The model\'s sensitivity for group 1/2/3 was 1.000/0.864/0.978. The model\'s specificity for group1/2/3 was 0.991/0.993/0.958. No statistically significant differences were found in terms of accuracy, sensitivity, and specificity between the LLM and human readers (p ≥ 0.371). The LLM completed the classification task approximately 20-26-fold faster than the radiologists. The area under the receiver operating characteristic curve for discriminating groups 2 and 3 from group 1 was 0.994 (95% CI: 0.982-1.000) and for discriminating group 3 from groups 1 and 2 was 0.992 (95% CI: 0.982-1.000).
    CONCLUSIONS: Fine-tuned LLM demonstrated a comparable performance with radiologists in classifying brain MRI reports, while requiring substantially less time.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    本研究的目的是评估ChatGPT产生的信息对中国居民教育的效用。
    我们设计了一项三步调查,以评估ChatGPT在中国住院医师培训教育中的表现,包括住院医师期末考试问题,患者病例,和居民满意度得分。首先,在ChatGPT的界面中输入了来自住院医师期末考试的204个问题,以获得正确答案的百分比。接下来,ChatGPT被要求产生20个临床病例,随后由三名讲师使用预先设计的5分Likert量表进行评估。根据包括清晰度在内的标准评估案件的质量,相关性,逻辑性,信誉,和全面性。最后,进行了31名三年级居民和ChatGPT之间的互动会议。居民对ChatGPT反馈的看法是使用李克特量表进行评估的,专注于易用性等方面,回答的准确性和完整性,及其在增强对医学知识的理解方面的有效性。
    我们的结果显示ChatGPT-3.5正确回答了45.1%的考试问题。在虚拟病人病例中,ChatGPT的平均评分为4.57±0.50、4.68±0.47、4.77±0.46、4.60±0.53和3.95±0.59分,相关性,逻辑性,信誉,和临床指导员的全面性,分别。在培训住院医师中,ChatGPT得分为4.48±0.70、4.00±0.82和4.61±0.50分,便于使用,准确性和完整性,和有用性,分别。
    我们的研究结果证明了ChatGPT在个性化中国医学教育方面的巨大潜力。
    UNASSIGNED: The purpose of this study was to assess the utility of information generated by ChatGPT for residency education in China.
    UNASSIGNED: We designed a three-step survey to evaluate the performance of ChatGPT in China\'s residency training education including residency final examination questions, patient cases, and resident satisfaction scores. First, 204 questions from the residency final exam were input into ChatGPT\'s interface to obtain the percentage of correct answers. Next, ChatGPT was asked to generate 20 clinical cases, which were subsequently evaluated by three instructors using a pre-designed Likert scale with 5 points. The quality of the cases was assessed based on criteria including clarity, relevance, logicality, credibility, and comprehensiveness. Finally, interaction sessions between 31 third-year residents and ChatGPT were conducted. Residents\' perceptions of ChatGPT\'s feedback were assessed using a Likert scale, focusing on aspects such as ease of use, accuracy and completeness of responses, and its effectiveness in enhancing understanding of medical knowledge.
    UNASSIGNED: Our results showed ChatGPT-3.5 correctly answered 45.1% of exam questions. In the virtual patient cases, ChatGPT received mean ratings of 4.57 ± 0.50, 4.68 ± 0.47, 4.77 ± 0.46, 4.60 ± 0.53, and 3.95 ± 0.59 points for clarity, relevance, logicality, credibility, and comprehensiveness from clinical instructors, respectively. Among training residents, ChatGPT scored 4.48 ± 0.70, 4.00 ± 0.82 and 4.61 ± 0.50 points for ease of use, accuracy and completeness, and usefulness, respectively.
    UNASSIGNED: Our findings demonstrate ChatGPT\'s immense potential for personalized Chinese medical education.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    OpenAI对ChatGPT的引入引起了极大的关注。在其能力中,释义突出。
    本研究旨在调查该聊天机器人产生的释义文本中剽窃的令人满意的水平。
    向ChatGPT提交了三个不同长度的文本。然后指示ChatGPT使用五个不同的提示来解释所提供的文本。在研究的后续阶段,案文分为不同的段落,ChatGPT被要求单独解释每个段落。最后,在第三阶段,ChatGPT被要求解释它以前生成的文本。
    ChatGPT生成的文本中的平均抄袭率为45%(SD10%)。ChatGPT在提供的文本中表现出抄袭的大幅减少(平均差异-0.51,95%CI-0.54至-0.48;P<.001)。此外,当将第二次尝试与初始尝试进行比较时,抄袭率显着下降(平均差-0.06,95%CI-0.08至-0.03;P<.001)。文本中的段落数量表明与抄袭的百分比有值得注意的关联,由单个段落组成的文本表现出最低的抄袭率(P<.001)。
    尽管ChatGPT显著减少了文本中的抄袭,现有的抄袭水平仍然相对较高。这突显了研究人员在将这种聊天机器人纳入他们的工作时的关键谨慎。
    UNASSIGNED: The introduction of ChatGPT by OpenAI has garnered significant attention. Among its capabilities, paraphrasing stands out.
    UNASSIGNED: This study aims to investigate the satisfactory levels of plagiarism in the paraphrased text produced by this chatbot.
    UNASSIGNED: Three texts of varying lengths were presented to ChatGPT. ChatGPT was then instructed to paraphrase the provided texts using five different prompts. In the subsequent stage of the study, the texts were divided into separate paragraphs, and ChatGPT was requested to paraphrase each paragraph individually. Lastly, in the third stage, ChatGPT was asked to paraphrase the texts it had previously generated.
    UNASSIGNED: The average plagiarism rate in the texts generated by ChatGPT was 45% (SD 10%). ChatGPT exhibited a substantial reduction in plagiarism for the provided texts (mean difference -0.51, 95% CI -0.54 to -0.48; P<.001). Furthermore, when comparing the second attempt with the initial attempt, a significant decrease in the plagiarism rate was observed (mean difference -0.06, 95% CI -0.08 to -0.03; P<.001). The number of paragraphs in the texts demonstrated a noteworthy association with the percentage of plagiarism, with texts consisting of a single paragraph exhibiting the lowest plagiarism rate (P<.001).
    UNASSIGNED: Although ChatGPT demonstrates a notable reduction of plagiarism within texts, the existing levels of plagiarism remain relatively high. This underscores a crucial caution for researchers when incorporating this chatbot into their work.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    为了应对存储在数据库中的电子病历(EMR)的日益普及,由于他们在数据库操作方面的技术专长有限,医护人员在检索这些记录方面遇到困难。由于这些记录对于提供适当的医疗服务至关重要,需要一种用于医护人员访问EMR的可访问方法。
    为了解决这个问题,用于文本到SQL的自然语言处理(NLP)已经成为一种解决方案,允许非技术用户使用自然语言文本生成SQL查询。这项研究评估了文本到SQL转换的现有工作,并提出了专门为EMR检索设计的MedT5SQL模型。所提出的模型利用文本到文本转换转换器(T5)模型,在各种基于文本的NLP任务中常用的大型语言模型(LLM)。该模型在MIMICSQL数据集上进行了微调,医疗保健领域的第一个Text-to-SQL数据集。性能评估涉及在两个优化器上对MedT5SQL模型进行基准测试,不同数量的训练时期,并使用两个数据集,MIMICSQL和WikiSQL。
    对于MIMICSQL数据集,该模型在生成问题-SQL对方面表现出相当大的有效性,准确率达到80.63%,98.937%,和90%的精确匹配精度矩阵,近似字符串匹配,和手动评估,分别。在WikiSQL数据集上测试模型的性能时,该模型展示了生成SQL查询的效率,WikiSQL的准确率为44.2%,近似字符串匹配的准确率为94.26%。
    结果表明,随着训练时间的增加,性能有所提高。这项工作强调了微调T5模型将以自然语言编写的医疗相关问题转换为医疗保健领域的结构化查询语言(SQL)的潜力,为该领域的未来研究奠定了基础。
    UNASSIGNED: In response to the increasing prevalence of electronic medical records (EMRs) stored in databases, healthcare staff are encountering difficulties retrieving these records due to their limited technical expertise in database operations. As these records are crucial for delivering appropriate medical care, there is a need for an accessible method for healthcare staff to access EMRs.
    UNASSIGNED: To address this, natural language processing (NLP) for Text-to-SQL has emerged as a solution, enabling non-technical users to generate SQL queries using natural language text. This research assesses existing work on Text-to-SQL conversion and proposes the MedT5SQL model specifically designed for EMR retrieval. The proposed model utilizes the Text-to-Text Transfer Transformer (T5) model, a Large Language Model (LLM) commonly used in various text-based NLP tasks. The model is fine-tuned on the MIMICSQL dataset, the first Text-to-SQL dataset for the healthcare domain. Performance evaluation involves benchmarking the MedT5SQL model on two optimizers, varying numbers of training epochs, and using two datasets, MIMICSQL and WikiSQL.
    UNASSIGNED: For MIMICSQL dataset, the model demonstrates considerable effectiveness in generating question-SQL pairs achieving accuracy of 80.63%, 98.937%, and 90% for exact match accuracy matrix, approximate string-matching, and manual evaluation, respectively. When testing the performance of the model on WikiSQL dataset, the model demonstrates efficiency in generating SQL queries, with an accuracy of 44.2% on WikiSQL and 94.26% for approximate string-matching.
    UNASSIGNED: Results indicate improved performance with increased training epochs. This work highlights the potential of fine-tuned T5 model to convert medical-related questions written in natural language to Structured Query Language (SQL) in healthcare domain, providing a foundation for future research in this area.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:这篇简短的交流探索了潜在的,局限性,以及生成人工智能(GAI)在增强诊断方面的未来方向。
    方法:本评论回顾了GAI的当前应用和进步,特别是专注于将其整合到医疗诊断中。它研究了GAI在支持医疗访谈中的作用,协助鉴别诊断,并通过双过程理论的镜头辅助临床推理。讨论得到了最近的例子和理论框架的支持,以说明GAI在医学中的实际和潜在用途。
    结果:GAI通过支持将患者描述翻译成视觉格式,在增强诊断过程方面显示出巨大的希望。提供鉴别诊断,并促进复杂的临床推理。然而,限制,如产生医疗错误信息的可能性,被称为幻觉,存在。此外,评论强调了GAI与临床诊断中的直观和分析决策过程的集成,证明了诊断速度和准确性的潜在改善。
    结论:虽然GAI为医学诊断提供了变革潜力,它还引入了必须谨慎管理的风险。未来的进步应该集中在完善GAI技术,以更好地与人类诊断推理保持一致。确保GAI增强而不是取代医疗专业人员的专业知识。
    OBJECTIVE: This short communication explores the potential, limitations, and future directions of generative artificial intelligence (GAI) in enhancing diagnostics.
    METHODS: This commentary reviews current applications and advancements in GAI, particularly focusing on its integration into medical diagnostics. It examines the role of GAI in supporting medical interviews, assisting in differential diagnosis, and aiding clinical reasoning through the lens of dual-process theory. The discussion is supported by recent examples and theoretical frameworks to illustrate the practical and potential uses of GAI in medicine.
    RESULTS: GAI shows significant promise in enhancing diagnostic processes by supporting the translation of patient descriptions into visual formats, providing differential diagnoses, and facilitating complex clinical reasoning. However, limitations such as the potential for generating medical misinformation, known as hallucinations, exist. Furthermore, the commentary highlights the integration of GAI with both intuitive and analytical decision-making processes in clinical diagnostics, demonstrating potential improvements in both the speed and accuracy of diagnoses.
    CONCLUSIONS: While GAI presents transformative potential for medical diagnostics, it also introduces risks that must be carefully managed. Future advancements should focus on refining GAI technologies to better align with human diagnostic reasoning, ensuring GAI enhances rather than replaces the medical professionals\' expertise.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:学术论文写作在医学生的教育中具有重要意义,并对母语不是英语的人提出了明显的挑战。本研究旨在调查采用大型语言模型的有效性,尤其是ChatGPT,提高这些学生的英语学术写作能力。
    方法:招募了25名来自中国的三年级医学生。该研究包括两个阶段。首先,学生们被要求写一篇迷你论文。其次,要求学生在两周内使用ChatGPT修改迷你论文。对微型文件的评估集中在三个关键方面,包括结构,逻辑,和语言。评估方法结合了使用ChatGPT-3.5和ChatGPT-4模型的手动评分和AI评分。此外,我们采用问卷收集学生使用ChatGPT的经验反馈。
    结果:在实施ChatGPT进行写作帮助后,人工得分显着增加了4.23分。同样,基于ChatGPT-3.5模型的AI评分增加了4.82分,而ChatGPT-4模型显示增加3.84点。这些结果凸显了大型语言模型在支持学术写作方面的潜力。统计学分析显示人工评分与ChatGPT-4评分无显著差异,表明ChatGPT-4在评分过程中协助教师的潜力。问卷的反馈表明,学生的反应总体上是积极的,92%的人承认他们的写作质量有所改善,84%的人注意到他们语言技能的进步,76%的人认识到ChatGPT在支持学术研究方面的贡献。
    结论:该研究强调了像ChatGPT这样的大型语言模型在医学教育中提高非母语人士的英语学术写作能力的功效。此外,它说明了这些模型对教育评估过程做出贡献的潜力,特别是在英语不是主要语言的环境中。
    BACKGROUND: Academic paper writing holds significant importance in the education of medical students, and poses a clear challenge for those whose first language is not English. This study aims to investigate the effectiveness of employing large language models, particularly ChatGPT, in improving the English academic writing skills of these students.
    METHODS: A cohort of 25 third-year medical students from China was recruited. The study consisted of two stages. Firstly, the students were asked to write a mini paper. Secondly, the students were asked to revise the mini paper using ChatGPT within two weeks. The evaluation of the mini papers focused on three key dimensions, including structure, logic, and language. The evaluation method incorporated both manual scoring and AI scoring utilizing the ChatGPT-3.5 and ChatGPT-4 models. Additionally, we employed a questionnaire to gather feedback on students\' experience in using ChatGPT.
    RESULTS: After implementing ChatGPT for writing assistance, there was a notable increase in manual scoring by 4.23 points. Similarly, AI scoring based on the ChatGPT-3.5 model showed an increase of 4.82 points, while the ChatGPT-4 model showed an increase of 3.84 points. These results highlight the potential of large language models in supporting academic writing. Statistical analysis revealed no significant difference between manual scoring and ChatGPT-4 scoring, indicating the potential of ChatGPT-4 to assist teachers in the grading process. Feedback from the questionnaire indicated a generally positive response from students, with 92% acknowledging an improvement in the quality of their writing, 84% noting advancements in their language skills, and 76% recognizing the contribution of ChatGPT in supporting academic research.
    CONCLUSIONS: The study highlighted the efficacy of large language models like ChatGPT in augmenting the English academic writing proficiency of non-native speakers in medical education. Furthermore, it illustrated the potential of these models to make a contribution to the educational evaluation process, particularly in environments where English is not the primary language.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    凭借其无与伦比的解释和参与人类语言和语境的能力,大型语言模型(LLM)暗示了连接人工智能和人类认知过程的潜力。这篇综述探讨了LLM的当前应用,比如ChatGPT,在精神病学领域。
    我们遵循PRISMA指南,并通过PubMed搜索,Embase,WebofScience,还有Scopus,直到2024年3月。
    从771篇检索到的文章中,我们纳入了16项直接检查LLM在精神病学中的使用情况。LLM,特别是ChatGPT和GPT-4,在临床推理中显示出不同的应用,社交媒体,和精神病学的教育。他们可以帮助诊断心理健康问题,管理抑郁症,评估自杀风险,并支持该领域的教育。然而,我们的审查还指出了它们的局限性,例如复杂病例的困难和对自杀风险的潜在低估。
    精神病学的早期研究揭示了LLM的多功能应用,从诊断支持到教育角色。鉴于发展速度很快,未来的调查准备探索这些模型可能在多大程度上重新定义精神卫生保健中的传统角色。
    UNASSIGNED: With their unmatched ability to interpret and engage with human language and context, large language models (LLMs) hint at the potential to bridge AI and human cognitive processes. This review explores the current application of LLMs, such as ChatGPT, in the field of psychiatry.
    UNASSIGNED: We followed PRISMA guidelines and searched through PubMed, Embase, Web of Science, and Scopus, up until March 2024.
    UNASSIGNED: From 771 retrieved articles, we included 16 that directly examine LLMs\' use in psychiatry. LLMs, particularly ChatGPT and GPT-4, showed diverse applications in clinical reasoning, social media, and education within psychiatry. They can assist in diagnosing mental health issues, managing depression, evaluating suicide risk, and supporting education in the field. However, our review also points out their limitations, such as difficulties with complex cases and potential underestimation of suicide risks.
    UNASSIGNED: Early research in psychiatry reveals LLMs\' versatile applications, from diagnostic support to educational roles. Given the rapid pace of advancement, future investigations are poised to explore the extent to which these models might redefine traditional roles in mental health care.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:头痛疾病在世界范围内非常普遍。人工智能(AI)的快速发展已经扩大了与头痛相关的研究,有可能解决头痛领域未满足的需求。我们在本文中概述了人工智能在头痛研究中的应用。
    结果:我们简要介绍了机器学习模型和常用的评估指标。然后,我们回顾了在该领域利用人工智能来提高诊断准确性和分类的研究,预测治疗反应,从各种数据源收集见解,并预测偏头痛发作。此外,考虑到ChatGPT的出现,一种大型语言模型(LLM),以及它获得的受欢迎程度,我们还讨论了如何使用LLM来推进这一领域。最后,我们讨论潜在的陷阱,偏见,以及在头痛医学中使用人工智能的未来方向。最近许多关于头痛医学的研究都结合了机器学习,生成AI和LLM。全面了解潜在的陷阱和偏见对于以最小的伤害使用这些新技术至关重要。如果使用得当,AI有可能彻底改变头痛医学。
    OBJECTIVE: Headache disorders are highly prevalent worldwide. Rapidly advancing capabilities in artificial intelligence (AI) have expanded headache-related research with the potential to solve unmet needs in the headache field. We provide an overview of AI in headache research in this article.
    RESULTS: We briefly introduce machine learning models and commonly used evaluation metrics. We then review studies that have utilized AI in the field to advance diagnostic accuracy and classification, predict treatment responses, gather insights from various data sources, and forecast migraine attacks. Furthermore, given the emergence of ChatGPT, a type of large language model (LLM), and the popularity it has gained, we also discuss how LLMs could be used to advance the field. Finally, we discuss the potential pitfalls, bias, and future directions of employing AI in headache medicine. Many recent studies on headache medicine incorporated machine learning, generative AI and LLMs. A comprehensive understanding of potential pitfalls and biases is crucial to using these novel techniques with minimum harm. When used appropriately, AI has the potential to revolutionize headache medicine.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    由人工智能驱动的大语言模型(LLM)允许人们进行有关其健康的直接对话。ChatGPT提供的答案的准确性和可读性,最著名的LLM,关于原发性震颤(ET),最常见的运动障碍之一,尚未评估。
    ChatGPT对有关ET的10个问题的答案由5名专业人士和15名外行人进行了评估,在清晰度方面得分从1(差)到5(优)不等,相关性,准确性(仅适用于专业人士),全面性,以及响应的整体价值。我们进一步计算了答案的可读性。
    ChatGPT的回答得到了相对积极的评价,两组的中位数得分在4到5之间,独立于问题类型。然而,评价者之间只有适度的协议,尤其是在专业人群中。此外,所有被检查答案的可读性水平都很差。
    ChatGPT提供了相对准确和相关的答案,有一些变化的判断,由一组专业人士表明,有关ET的识字程度影响了评级和,间接地,临床实践中提供的信息质量也是可变的。此外,ChatGPT提供的答案的可读性较差。LLM可能会在未来发挥重要作用;因此,这些工具生成的与健康相关的内容应该受到监控。
    UNASSIGNED: Large-language models (LLMs) driven by artificial intelligence allow people to engage in direct conversations about their health. The accuracy and readability of the answers provided by ChatGPT, the most famous LLM, about Essential Tremor (ET), one of the commonest movement disorders, have not yet been evaluated.
    UNASSIGNED: Answers given by ChatGPT to 10 questions about ET were evaluated by 5 professionals and 15 laypeople with a score ranging from 1 (poor) to 5 (excellent) in terms of clarity, relevance, accuracy (only for professionals), comprehensiveness, and overall value of the response. We further calculated the readability of the answers.
    UNASSIGNED: ChatGPT answers received relatively positive evaluations, with median scores ranging between 4 and 5, by both groups and independently from the type of question. However, there was only moderate agreement between raters, especially in the group of professionals. Moreover, readability levels were poor for all examined answers.
    UNASSIGNED: ChatGPT provided relatively accurate and relevant answers, with some variability as judged by the group of professionals suggesting that the degree of literacy about ET has influenced the ratings and, indirectly, that the quality of information provided in clinical practice is also variable. Moreover, the readability of the answer provided by ChatGPT was found to be poor. LLMs will likely play a significant role in the future; therefore, health-related content generated by these tools should be monitored.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:识别音频记录的患者-护士沟通中的健康问题对于改善家庭保健患者的预后很重要,这些患者病情复杂,医院使用风险增加。训练机器学习分类器来识别问题需要资源密集型的人类注释。
    目的:使用GPT-4生成患者-护士之间的综合沟通,并自动注释家庭医疗环境中遇到的常见健康问题。我们还研究了用合成数据增强现实世界的病人-护士交流是否可以提高机器学习识别健康问题的性能。
    方法:家庭医疗环境中病人-护士口头交流数据的二级数据分析。
    方法:数据来自美国最大的家庭医疗保健组织之一。我们使用了来自15名患者的23个病人-护士通信录音。录音被逐字转录,并手动注释健康问题(例如,流通,皮肤,疼痛)在奥马哈系统分类方案中指出。使用上下文学习提示方法生成患者-护士沟通的合成数据,通过思想链提示增强,以提高自动注释性能。机器学习分类器被应用于三个训练数据集:真实世界的通信,合成通信,以及通过合成通信增强的现实世界通信。
    结果:训练数据通过综合交流增强后,平均F1得分从0.62提高到0.63。使用XGBoost分类器观察到最大的增加,其中F1分数从0.61提高到0.64(约5%提高)。如果只接受真实世界通信或合成通信的培训,分类器的F1评分分别为0.62~0.61.
    结论:集成合成数据可以提高机器学习分类器识别家庭医疗保健中健康问题的能力。性能与仅在现实世界数据上进行训练相当,强调合成数据在医疗保健分析中的潜力。
    结论:这项研究证明了利用合成的患者-护士交流数据来增强机器学习分类器性能以识别家庭医疗保健环境中的健康问题的临床相关性。这将有助于更准确和有效地识别和检测具有复杂健康状况的家庭保健患者的问题。
    BACKGROUND: Identifying health problems in audio-recorded patient-nurse communication is important to improve outcomes in home healthcare patients who have complex conditions with increased risks of hospital utilization. Training machine learning classifiers for identifying problems requires resource-intensive human annotation.
    OBJECTIVE: To generate synthetic patient-nurse communication and to automatically annotate for common health problems encountered in home healthcare settings using GPT-4. We also examined whether augmenting real-world patient-nurse communication with synthetic data can improve the performance of machine learning to identify health problems.
    METHODS: Secondary data analysis of patient-nurse verbal communication data in home healthcare settings.
    METHODS: The data were collected from one of the largest home healthcare organizations in the United States. We used 23 audio recordings of patient-nurse communications from 15 patients. The audio recordings were transcribed verbatim and manually annotated for health problems (e.g., circulation, skin, pain) indicated in the Omaha System Classification scheme. Synthetic data of patient-nurse communication were generated using the in-context learning prompting method, enhanced by chain-of-thought prompting to improve the automatic annotation performance. Machine learning classifiers were applied to three training datasets: real-world communication, synthetic communication, and real-world communication augmented by synthetic communication.
    RESULTS: Average F1 scores improved from 0.62 to 0.63 after training data were augmented with synthetic communication. The largest increase was observed using the XGBoost classifier where F1 scores improved from 0.61 to 0.64 (about 5% improvement). When trained solely on either real-world communication or synthetic communication, the classifiers showed comparable F1 scores of 0.62-0.61, respectively.
    CONCLUSIONS: Integrating synthetic data improves machine learning classifiers\' ability to identify health problems in home healthcare, with performance comparable to training on real-world data alone, highlighting the potential of synthetic data in healthcare analytics.
    CONCLUSIONS: This study demonstrates the clinical relevance of leveraging synthetic patient-nurse communication data to enhance machine learning classifier performances to identify health problems in home healthcare settings, which will contribute to more accurate and efficient problem identification and detection of home healthcare patients with complex health conditions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号