大型语言模型 Large language model-医云文献数字医云科研云海量医学决策数据服务

Large language model 关注

大型语言模型

文献(19篇)

百科

视频

1 Information extraction from medical case reports using OpenAI InstructGPT.

使用 OpenAI InstructGPT 从医疗病例报告中提取信息。影响指数 : 7.027
发表时间：Jul 2024 18
来源期刊：Comput Methods Programs Biomed PMID：39029416

DOI：10.1016/j.cmpb.2024.108326
文章类型： Journal Article

目的：研究人员通常使用诸如自然语言处理（NLP）系统之类的自动化解决方案从大量非结构化数据中提取临床信息。然而,临床文本糟糕的语义结构和特定领域的词汇可能使开发一刀切的解决方案具有挑战性。大型语言模型(LLM)例如OpenAI的生成式预训练变压器3(GPT-3)，为捕获和标准化非结构化临床信息提供了一个有前途的解决方案。这项研究评估了InstructGPT的性能，从LLMGPT-3衍生的一系列模型，用于从医学病例报告中提取相关患者信息，并讨论了LLM与专用NLP方法的优缺点。
方法：在本文中，通过搜索PubMed，确定了208篇与儿童异物损伤病例报告有关的文章，Scopus,和WebofScience。审稿人手动提取性别信息，年龄,造成伤害的物体，和每个患者受伤的身体部位建立一个黄金标准来比较InstructGPT的表现。
结果：InstructGPT在性别分类方面取得了很高的准确性，年龄,受伤的物体和身体部位，94%,82%,94%和89%，分别。排除InstructGPT无法检索任何信息的文章时，确定孩子性别和年龄的准确度提高到97%，识别受伤身体部位的准确率提高到93%。InstructGPT还能够从非英语文章中提取信息。
结论：该研究强调，LLM有可能消除任务特定培训（零射提取）的必要性，允许从非结构化自然语言文本中检索临床信息，特别是来自已发表的科学文献，如病例报告，通过直接利用文章的PDF文件，无需任何预处理，也不需要任何NLP或机器学习方面的技术专长。语料库的多样性，其中包括用英语以外的语言写的文章，其中一些包含广泛的临床细节，而另一些则缺乏信息，增加了研究的力量。
OBJECTIVE: Researchers commonly use automated solutions such as Natural Language Processing (NLP) systems to extract clinical information from large volumes of unstructured data. However, clinical text\'s poor semantic structure and domain-specific vocabulary can make it challenging to develop a one-size-fits-all solution. Large Language Models (LLMs), such as OpenAI\'s Generative Pre-Trained Transformer 3 (GPT-3), offer a promising solution for capturing and standardizing unstructured clinical information. This study evaluated the performance of InstructGPT, a family of models derived from LLM GPT-3, to extract relevant patient information from medical case reports and discussed the advantages and disadvantages of LLMs versus dedicated NLP methods.
METHODS: In this paper, 208 articles related to case reports of foreign body injuries in children were identified by searching PubMed, Scopus, and Web of Science. A reviewer manually extracted information on sex, age, the object that caused the injury, and the injured body part for each patient to build a gold standard to compare the performance of InstructGPT.
RESULTS: InstructGPT achieved high accuracy in classifying the sex, age, object and body part involved in the injury, with 94%, 82%, 94% and 89%, respectively. When excluding articles for which InstructGPT could not retrieve any information, the accuracy for determining the child\'s sex and age improved to 97%, and the accuracy for identifying the injured body part improved to 93%. InstructGPT was also able to extract information from non-English language articles.
CONCLUSIONS: The study highlights that LLMs have the potential to eliminate the necessity for task-specific training (zero-shot extraction), allowing the retrieval of clinical information from unstructured natural language text, particularly from published scientific literature like case reports, by directly utilizing the PDF file of the article without any pre-processing and without requiring any technical expertise in NLP or Machine Learning. The diverse nature of the corpus, which includes articles written in languages other than English, some of which contain a wide range of clinical details while others lack information, adds to the strength of the study.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
2 Automated classification of brain MRI reports using fine-tuned large language models.

使用微调大语言模型对脑部 MRI 报告进行自动分类。影响指数 : 2.995
发表时间：Jul 2024 12
来源期刊：Neuroradiology PMID：38995393

DOI：10.1007/s00234-024-03427-7
文章类型： Journal Article

目的：本研究旨在研究微调大语言模型（LLM）在将脑MRI报告分类为预处理时的功效，后处理,和非肿瘤病例。
方法：这项回顾性研究包括759、284和164例脑MRI训练报告，验证,和测试数据集。放射科医生将报告分为三组：非肿瘤（第1组），治疗后肿瘤（第2组），治疗前肿瘤(组3)例。使用训练数据集对来自变形金刚日本模型的预训练的双向编码器表示进行微调，并在验证数据集上进行评估。选择在验证数据集上表现出最高准确性的模型作为最终模型。另外两名放射科医师参与对三组的测试数据集中的报告进行分类。将该模型在测试数据集上的性能与两名放射科医生的性能进行了比较。
结果：微调LLM的总体准确度为0.970（95％CI：0.930-0.990）。模型对1/2/3组的敏感性为1.000/0.864/0.978。模型对1/2/3组的特异性为0.991/0.993/0.958。在准确性方面没有发现统计学上的显着差异，灵敏度,以及LLM和人类读者之间的特异性（p≥0.371）。LLM完成分类任务的速度比放射科医师快大约20-26倍。区分第2组和第3组与第1组的受试者工作特征曲线下面积为0.994（95％CI：0.982-1.000），区分第3组与第1组和第2组的受试者工作特征曲线下面积为0.992（95％CI：0.982-1.000）。
结论：微调LLM在对脑部MRI报告进行分类方面与放射科医师表现出可比的表现，同时需要更少的时间。
OBJECTIVE: This study aimed to investigate the efficacy of fine-tuned large language models (LLM) in classifying brain MRI reports into pretreatment, posttreatment, and nontumor cases.
METHODS: This retrospective study included 759, 284, and 164 brain MRI reports for training, validation, and test dataset. Radiologists stratified the reports into three groups: nontumor (group 1), posttreatment tumor (group 2), and pretreatment tumor (group 3) cases. A pretrained Bidirectional Encoder Representations from Transformers Japanese model was fine-tuned using the training dataset and evaluated on the validation dataset. The model which demonstrated the highest accuracy on the validation dataset was selected as the final model. Two additional radiologists were involved in classifying reports in the test datasets for the three groups. The model\'s performance on test dataset was compared to that of two radiologists.
RESULTS: The fine-tuned LLM attained an overall accuracy of 0.970 (95% CI: 0.930-0.990). The model\'s sensitivity for group 1/2/3 was 1.000/0.864/0.978. The model\'s specificity for group1/2/3 was 0.991/0.993/0.958. No statistically significant differences were found in terms of accuracy, sensitivity, and specificity between the LLM and human readers (p ≥ 0.371). The LLM completed the classification task approximately 20-26-fold faster than the radiologists. The area under the receiver operating characteristic curve for discriminating groups 2 and 3 from group 1 was 0.994 (95% CI: 0.982-1.000) and for discriminating group 3 from groups 1 and 2 was 0.992 (95% CI: 0.982-1.000).
CONCLUSIONS: Fine-tuned LLM demonstrated a comparable performance with radiologists in classifying brain MRI reports, while requiring substantially less time.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
3 Use of Generative AI for Improving Health Literacy in Reproductive Health: Case Study.

使用生成人工智能提高生殖健康中的健康素养：案例研究。影响指数 : 暂无
发表时间：Aug 2024 6
来源期刊：JMIR Form Res PMID：38986153

DOI：10.2196/59434
文章类型： Journal Article

背景：患者发现技术工具更容易获取敏感的健康相关信息，如生殖健康信息。人工智能(AI)聊天机器人的创造性对话能力，比如ChatGPT,为患者提供了一种潜在的方法，可以在线有效地找到与健康相关的问题的答案。
目的：进行了一项初步研究，将新型ChatGPT与现有的Google搜索技术进行比较，有效,以及关于在错过口服避孕药（OCP）剂量后继续行动的最新信息。
方法：十一个问题的序列，模仿患者在错过一定剂量的OCP后询问要采取的行动，作为级联输入到ChatGPT中，考虑到ChatGPT的会话能力。这些问题被输入到四个不同的ChatGPT帐户中，帐户持有人具有各种人口统计特征，评估给予不同账户持有人的答复中的潜在差异和偏见。最主要的问题，“如果我错过了一天的口服避孕药，我该怎么办？”然后将其单独输入到Google搜索中，考虑到它的非对话性质。ChatGPT问题的结果和Google搜索结果对主要问题的可读性进行了评估，准确度,和有效的信息传递。
结果：ChatGPT结果被确定为整体较高年级阅读水平，更长的读取持续时间(表2)，不太准确，较小的电流,和一个不太有效的信息传递。相比之下,谷歌搜索结果答案框和片段处于较低的阅读水平，较短的阅读持续时间，电流更大,能够参考信息的来源(透明)，并提供了除文本之外的各种格式的信息。
结论：ChatGPT在准确性方面还有改进的空间，透明度，最近，和可靠性之前，它可以公平地实施到医疗保健信息交付，并提供潜在的好处，它带来。然而,AI可以用作提供者优先教育患者的工具，创造性,和有效的方法，例如使用AI从医疗保健提供者审查的信息中生成可访问的短教育视频。需要代表不同用户群的更大研究。
背景：
BACKGROUND: Patients find technology tools to be more approachable for seeking sensitive health-related information, such as reproductive health information. The inventive conversational ability of artificial intelligence (AI) chatbots, such as ChatGPT (OpenAI Inc), offers a potential means for patients to effectively locate answers to their health-related questions digitally.
OBJECTIVE: A pilot study was conducted to compare the novel ChatGPT with the existing Google Search technology for their ability to offer accurate, effective, and current information regarding proceeding action after missing a dose of oral contraceptive pill.
METHODS: A sequence of 11 questions, mimicking a patient inquiring about the action to take after missing a dose of an oral contraceptive pill, were input into ChatGPT as a cascade, given the conversational ability of ChatGPT. The questions were input into 4 different ChatGPT accounts, with the account holders being of various demographics, to evaluate potential differences and biases in the responses given to different account holders. The leading question, \"what should I do if I missed a day of my oral contraception birth control?\" alone was then input into Google Search, given its nonconversational nature. The results from the ChatGPT questions and the Google Search results for the leading question were evaluated on their readability, accuracy, and effective delivery of information.
RESULTS: The ChatGPT results were determined to be at an overall higher-grade reading level, with a longer reading duration, less accurate, less current, and with a less effective delivery of information. In contrast, the Google Search resulting answer box and snippets were at a lower-grade reading level, shorter reading duration, more current, able to reference the origin of the information (transparent), and provided the information in various formats in addition to text.
CONCLUSIONS: ChatGPT has room for improvement in accuracy, transparency, recency, and reliability before it can equitably be implemented into health care information delivery and provide the potential benefits it poses. However, AI may be used as a tool for providers to educate their patients in preferred, creative, and efficient ways, such as using AI to generate accessible short educational videos from health care provider-vetted information. Larger studies representing a diverse group of users are needed.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
4 Patient-centered radiology reports with generative artificial intelligence: adding value to radiology reporting.

以患者为中心的放射学报告与生成人工智能：增加价值的放射学报告。影响指数 : 4.996
发表时间：06 2024 8
来源期刊：Sci Rep PMID：38851825

DOI：10.1038/s41598-024-63824-z
文章类型： Journal Article

目的是根据报告摘要评估AI生成的放射学报告的功效，病人友好，和建议，并评估报告质量和准确性的一致性表现，有助于放射学工作流程的进步。从我们的医院数据库中检索到总共685例脊柱MRI报告。人工智能生成的放射学报告以三种格式生成：(1)总结报告，(2)对患者友好的报告，（三）建议。在AI生成的报告中评估了人工幻觉的发生。两名放射科医生将原始报告作为标准参考进行了定性和定量评估。两名非医师评估者使用5点Likert量表评估了他们对原始和患者友好报告内容的理解。AI生成的放射学报告的评分是所有三种格式的总体高平均分数。原始报告的平均理解评分为2.71±0.73，而患者友好报告的评分显着增加至4.69±0.48（p<0.001）。有1.12％的人工幻觉和7.40％的潜在有害翻译。总之,使用生成AI助手生成这些报告的潜在好处包括提高报告质量，在放射学工作流程中更高的效率来生成摘要，以患者为中心的报告，和建议，走向以患者为中心的放射学。
The purposes were to assess the efficacy of AI-generated radiology reports in terms of report summary, patient-friendliness, and recommendations and to evaluate the consistent performance of report quality and accuracy, contributing to the advancement of radiology workflow. Total 685 spine MRI reports were retrieved from our hospital database. AI-generated radiology reports were generated in three formats: (1) summary reports, (2) patient-friendly reports, and (3) recommendations. The occurrence of artificial hallucinations was evaluated in the AI-generated reports. Two radiologists conducted qualitative and quantitative assessments considering the original report as a standard reference. Two non-physician raters assessed their understanding of the content of original and patient-friendly reports using a 5-point Likert scale. The scoring of the AI-generated radiology reports were overall high average scores across all three formats. The average comprehension score for the original report was 2.71 ± 0.73, while the score for the patient-friendly reports significantly increased to 4.69 ± 0.48 (p < 0.001). There were 1.12% artificial hallucinations and 7.40% potentially harmful translations. In conclusion, the potential benefits of using generative AI assistants to generate these reports include improved report quality, greater efficiency in radiology workflow for producing summaries, patient-centered reports, and recommendations, and a move toward patient-centered radiology.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
5 MDE and LLM Synergy for Network Experimentation: Case Analysis of Wireless System Performance in Beaulieu-Xie Fading and κ-µ Co-Channel Interference Environment with Diversity Combining.

用于网络实验的 MDE 和 LLM 协同作用：具有分集组合的 Beaulieu - Xie 衰落和 κ - µ 共信道干扰环境中无线系统性能的案例分析。影响指数 : 3.847
发表时间：May 2024 10
来源期刊：Sensors (Basel) PMID：38793893

DOI：10.3390/s24103037
文章类型： Journal Article

信道建模是任何无线通信系统成功投影的第一步。因此，在本文中，我们分析了在无线环境中的多分支选择组合（SC）分集接收器的输出性能，该环境已被衰落和同信道干扰（CCI）分散注意力，根据较新的Beaulieu-Xie(BX)分布对衰落进行建模，CCI由κ-µ分布建模。BX分布提供了考虑包括任何数量的视线(LOS)有用信号分量和非LOS(NLOS)有用信号分量的能力。由于其灵活的衰落参数，该分布包含一些其他衰落模型的特征，这也适用于κ-µ分布。我们在这里推导了输出信号与共信道干扰比（SIR）的概率密度函数（PDF）和累积分布函数（CDF）的表达式。之后,获得其他性能，即：中断概率(Pout)，信道容量(CC)，矩生成函数(MGF)，平均误码概率(ABEP)，水平穿越率（LCR），和平均衰落持续时间(AFD)。对于不同的衰落和CCI参数值，数值结果与SIR的关系在几个图中给出。以及SC接收机中输入分支的数量。然后，检查参数对所有性能的影响。从我们的数值结果来看，对于先前已知的衰落和CCI分布的情况,通过插入适当的参数值,可以直接获得所有导出和显示量的性能。在论文的第二部分,提出了一种依赖于大型语言模型（LLM）和模型驱动工程（MDE）的协同作用的自动化网络实验工作流程，而先前派生的表达式用于计算。由于上述原因,所获得的结果的最大价值是，通过替换公式中相应的参数，可以适用于衰落和CCI的大量其他分布的情况。
Channel modeling is a first step towards the successful projecting of any wireless communication system. Hence, in this paper, we analyze the performance at the output of a multi-branch selection combining (SC) diversity receiver in a wireless environment that has been distracted by fading and co-channel interference (CCI), whereby the fading is modelled by newer Beaulieu-Xie (BX) distribution, and the CCI is modelled by the κ-µ distribution. The BX distribution provides the ability to include in consideration any number of line-of-sight (LOS) useful signal components and non-LOS (NLOS) useful signal components. This distribution contains characteristics of some other fading models thanks to its flexible fading parameters, which also applies to the κ-µ distribution. We derived here the expressions for the probability density function (PDF) and cumulative distribution function (CDF) for the output signal-to-co-channel interference ratio (SIR). After that, other performances are obtained, namely: outage probability (Pout), channel capacity (CC), moment-generating function (MGF), average bit error probability (ABEP), level crossing rate (LCR), and average fade duration (AFD). Numerical results are presented in several graphs versus the SIR for different values of fading and CCI parameters, as well as the number of input branches in the SC receiver. Then, the impact of parameters on all performance is checked. From our numerical results, it is possible to directly obtain the performance for all derived and displayed quantities for cases of previously known distributions of fading and CCI by inserting the appropriate parameter values. In the second part of the paper, a workflow for automated network experimentation relying on the synergy of Large Language Models (LLMs) and model-driven engineering (MDE) is presented, while the previously derived expressions are used for evaluation. Due to the aforementioned, the biggest value of the obtained results is the applicability to the cases of a large number of other distributions for fading and CCI by replacing the corresponding parameters in the formulas for the respective performances.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
6 Using Large Language Models to Support Content Analysis: A Case Study of ChatGPT for Adverse Event Detection.

使用大型语言模型支持内容分析： ChatGPT 的不良事件检测案例研究。影响指数 : 7.076
发表时间：May 2024 2
来源期刊：J Med Internet Res PMID：38696245

DOI：10.2196/52499
文章类型： Journal Article

本研究通过进行案例研究来识别社交媒体帖子中的不良事件（AE），探讨了使用大型语言模型来辅助内容分析的潜力。该案例研究比较了ChatGPT与人类注释者在检测与δ-8-四氢大麻酚相关的AE方面的表现，大麻衍生产品。使用给人类注释者的相同指令，ChatGPT非常接近人类结果，高度一致：任何AE检测（Fleissκ=0.95）为94.4%（9436/10,000），严重AE（κ=0.96）为99.3%（9931/10,000）。这些发现表明ChatGPT具有准确有效地复制人类注释的潜力。该研究认识到可能的局限性，包括对ChatGPT的训练数据的泛化性的担忧，并提示用不同的模型进行进一步的研究，数据源,和内容分析任务。该研究强调了大型语言模型对提高生物医学研究效率的承诺。
This study explores the potential of using large language models to assist content analysis by conducting a case study to identify adverse events (AEs) in social media posts. The case study compares ChatGPT\'s performance with human annotators\' in detecting AEs associated with delta-8-tetrahydrocannabinol, a cannabis-derived product. Using the identical instructions given to human annotators, ChatGPT closely approximated human results, with a high degree of agreement noted: 94.4% (9436/10,000) for any AE detection (Fleiss κ=0.95) and 99.3% (9931/10,000) for serious AEs (κ=0.96). These findings suggest that ChatGPT has the potential to replicate human annotation accurately and efficiently. The study recognizes possible limitations, including concerns about the generalizability due to ChatGPT\'s training data, and prompts further research with different models, data sources, and content analysis tasks. The study highlights the promise of large language models for enhancing the efficiency of biomedical research.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
7 Large Language Model-Based Evaluation of Medical Question Answering Systems: Algorithm Development and Case Study.

基于大型语言模型的医学问答系统评估：算法开发和案例研究。影响指数 : 暂无
发表时间：Apr 2024 26
来源期刊：Stud Health Technol Inform PMID：38682499

DOI：10.3233/SHTI240006
文章类型： Journal Article

背景：医疗保健系统资源日益紧张，为重要的患者-提供者互动留出更少的时间。对话代理（CA）可用于支持提供信息并回答患者的问题。然而,信息必须可供各种患者人群使用，这需要理解在不同语言层面表达的问题。
方法：本研究描述了使用大型语言模型（LLM）评估患者人群CA中预定义的医疗内容。这些模拟人群的特征是一系列的健康素养。评估框架包括用于评估CA性能的全自动和半自动程序。
结果：乳房X线摄影领域的案例研究表明，LLM可以模拟来自不同患者人群的问题。然而,所提供答案的准确性因健康素养水平而异。
结论：我们的可扩展评估框架可以模拟具有不同健康素养水平的患者人群，并有助于评估特定领域的CA，从而促进他们融入临床实践。未来的研究旨在将框架扩展到没有预定义内容的CA，并应用LLM使医疗信息适应用户的特定（健康）素养水平。
BACKGROUND: Healthcare systems are increasingly resource constrained, leaving less time for important patient-provider interactions. Conversational agents (CAs) could be used to support the provision of information and to answer patients\' questions. However, information must be accessible to a variety of patient populations, which requires understanding questions expressed at different language levels.
METHODS: This study describes the use of Large Language Models (LLMs) to evaluate predefined medical content in CAs across patient populations. These simulated populations are characterized by a range of health literacy. The evaluation framework includes both fully automated and semi-automated procedures to assess the performance of a CA.
RESULTS: A case study in the domain of mammography shows that LLMs can simulate questions from different patient populations. However, the accuracy of the answers provided varies depending on the level of health literacy.
CONCLUSIONS: Our scalable evaluation framework enables the simulation of patient populations with different health literacy levels and helps to evaluate domain specific CAs, thus promoting their integration into clinical practice. Future research aims to extend the framework to CAs without predefined content and to apply LLMs to adapt medical information to the specific (health) literacy level of the user.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
8 Using GPT-4 for LI-RADS feature extraction and categorization with multilingual free-text reports.

使用 GPT - 4 进行 LI - RADS 特征提取和多语言自由文本报告分类。影响指数 : 8.754
发表时间：Jul 2024 23
来源期刊：Liver Int PMID：38651924

DOI：10.1111/liv.15891
文章类型： Journal Article

目的：肝脏成像报告和数据系统（LI-RADS）为肝细胞癌成像提供了一种标准化方法。然而,放射学报告的不同样式和结构使自动数据提取变得复杂。大型语言模型具有从自由文本报告中提取结构化数据的潜力。我们的目标是评估生成预训练变压器（GPT）-4从自由文本肝脏磁共振成像（MRI）报告中提取LI-RADS特征和类别的性能。
方法：三位放射科医生用韩语和英语生成了160份虚构的自由文本肝脏MRI报告，模拟现实世界的实践。其中,20个用于即时工程，140人组成了内部测试队列.七十二份真实报告,由17名放射科医师撰写,我们对外部检测队列进行了收集和去识别.使用GPT-4提取LI-RADS特征，并使用Python脚本计算类别。比较每个测试队列的准确性。
结果：在外部测试中，主要LI-RADS特征提取的准确性，包括大小，非边缘动脉期增快，非外围\'冲刷\''，增强“胶囊”和阈值增长，范围从.92到.99。对于其余的LI-RADS功能，精度范围从.86到.97。对于LI-RADS类别，该模型的准确性为.85（95%CI：.76，.93）。
结论：GPT-4在提取LI-RADS特征方面显示出希望，进一步完善其提示策略和改进其神经网络架构对于可靠地处理复杂的真实世界MRI报告至关重要.
OBJECTIVE: The Liver Imaging Reporting and Data System (LI-RADS) offers a standardized approach for imaging hepatocellular carcinoma. However, the diverse styles and structures of radiology reports complicate automatic data extraction. Large language models hold the potential for structured data extraction from free-text reports. Our objective was to evaluate the performance of Generative Pre-trained Transformer (GPT)-4 in extracting LI-RADS features and categories from free-text liver magnetic resonance imaging (MRI) reports.
METHODS: Three radiologists generated 160 fictitious free-text liver MRI reports written in Korean and English, simulating real-world practice. Of these, 20 were used for prompt engineering, and 140 formed the internal test cohort. Seventy-two genuine reports, authored by 17 radiologists were collected and de-identified for the external test cohort. LI-RADS features were extracted using GPT-4, with a Python script calculating categories. Accuracies in each test cohort were compared.
RESULTS: On the external test, the accuracy for the extraction of major LI-RADS features, which encompass size, nonrim arterial phase hyperenhancement, nonperipheral \'washout\', enhancing \'capsule\' and threshold growth, ranged from .92 to .99. For the rest of the LI-RADS features, the accuracy ranged from .86 to .97. For the LI-RADS category, the model showed an accuracy of .85 (95% CI: .76, .93).
CONCLUSIONS: GPT-4 shows promise in extracting LI-RADS features, yet further refinement of its prompting strategy and advancements in its neural network architecture are crucial for reliable use in processing complex real-world MRI reports.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
9 Evaluation of large language models performance against humans for summarizing MRI knee radiology reports: A feasibility study.

评估大型语言模型对人类的性能，以总结 MRI 膝关节放射学报告：可行性研究。影响指数 : 4.73
发表时间：Jul 2024 4
来源期刊：Int J Med Inform PMID：38615509

DOI：10.1016/j.ijmedinf.2024.105443
文章类型： Journal Article

目的：本研究通过比较各种基于大型语言模型（LLM）的自动摘要生成方法，解决了放射学中对准确摘要的迫切需求。随着患者信息量的增加，准确，简洁地传达放射学发现对于有效的临床决策至关重要。摘要的轻微不准确可能导致重大后果，强调需要可靠的自动摘要工具。
方法：我们在微调和零射学习场景中采用了两种语言模型-文本到文本传输转换器（T5）和双向和自回归转换器（BART），并将它们与循环神经网络（RNN）进行了比较。此外,我们对100份MRI报告摘要进行了比较分析，使用专家的人类判断和一致性等标准，相关性，流利,和一致性,根据原始的放射科医师摘要评估模型。为了促进这一点，我们从我们的放射学信息系统（RIS）收集了15,508个回顾性膝关节磁共振成像（MRI）报告的数据集，专注于发现部分，以预测放射科医生的总结。
结果：微调模型的性能优于神经网络，并在零射变体中显示出优越的性能。具体来说,T5模型的Rouge-L得分为0.638。根据放射科医生的读者研究，发现该模型产生的摘要与放射科医生产生的摘要非常相似，T5生成的摘要与原始摘要在流畅性和一致性方面具有约70％的相似性。
结论：技术进步，特别是在NLP和LLM中，对改进和简化放射学发现的总结大有希望，从而为放射科医生的工作提供宝贵的帮助。
OBJECTIVE: This study addresses the critical need for accurate summarization in radiology by comparing various Large Language Model (LLM)-based approaches for automatic summary generation. With the increasing volume of patient information, accurately and concisely conveying radiological findings becomes crucial for effective clinical decision-making. Minor inaccuracies in summaries can lead to significant consequences, highlighting the need for reliable automated summarization tools.
METHODS: We employed two language models - Text-to-Text Transfer Transformer (T5) and Bidirectional and Auto-Regressive Transformers (BART) - in both fine-tuned and zero-shot learning scenarios and compared them with a Recurrent Neural Network (RNN). Additionally, we conducted a comparative analysis of 100 MRI report summaries, using expert human judgment and criteria such as coherence, relevance, fluency, and consistency, to evaluate the models against the original radiologist summaries. To facilitate this, we compiled a dataset of 15,508 retrospective knee Magnetic Resonance Imaging (MRI) reports from our Radiology Information System (RIS), focusing on the findings section to predict the radiologist\'s summary.
RESULTS: The fine-tuned models outperform the neural network and show superior performance in the zero-shot variant. Specifically, the T5 model achieved a Rouge-L score of 0.638. Based on the radiologist readers\' study, the summaries produced by this model were found to be very similar to those produced by a radiologist, with about 70% similarity in fluency and consistency between the T5-generated summaries and the original ones.
CONCLUSIONS: Technological advances, especially in NLP and LLM, hold great promise for improving and streamlining the summarization of radiological findings, thus providing valuable assistance to radiologists in their work.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
10 Applying Object Detection and Large Language Model to Establish a Smart Telemedicine Diagnosis System with Chatbot: A Case Study of Pressure Injuries Diagnosis System.

应用对象检测和大语言模型建立智能聊天机器人远程医疗诊断系统 — — 以压力性损伤诊断系统为例 [J]. 影响指数 : 5.033
发表时间：Jun 2024 21
来源期刊：Telemed J E Health PMID：38512470

DOI：10.1089/tmj.2023.0715
文章类型： Journal Article

背景：由于COVID-19，医疗资源和人员的稀缺加剧。远程医疗在没有体检的情况下评估伤口方面面临挑战。评估压力伤害是耗时的，能源密集型,和不一致。当今的大多数远程医疗平台都使用图形用户界面，这些图形用户界面具有复杂的操作程序和有限的信息传播渠道。本研究旨在建立基于YOLOv7和大语言模型的智能远程医疗诊断系统。方法：使用临床数据集训练YOLOv7模型，使用数据增强技术来增强数据集，以识别六种类型的压力损伤图像。已建立的系统具有前端界面，包括响应式网页设计和带有ChatGPT的聊天机器人，它与个人信息管理数据库集成在一起。结果：本研究提供了一种实用的压力损伤分期分类模型，平均F1评分为0.9238。该系统远程提供实时准确的诊断和处方，指导患者根据症状严重程度寻求各种医疗帮助。结论:本研究建立了基于YOLOv7模型的智能远程医疗辅助诊断系统,具有分类和实时检测能力。在远程会诊时,它提供即时和准确的诊断信息和处方建议，并根据症状的严重程度寻求各种医疗援助。通过使用ChatGPT设置聊天机器人，不同的用户可以快速实现各自的目标。
Background: The scarcity of medical resources and personnel has worsened due to COVID-19. Telemedicine faces challenges in assessing wounds without physical examination. Evaluating pressure injuries is time consuming, energy intensive, and inconsistent. Most of today\'s telemedicine platforms utilize graphical user interfaces with complex operational procedures and limited channels for information dissemination. The study aims to establish a smart telemedicine diagnosis system based on YOLOv7 and large language model. Methods: The YOLOv7 model is trained using a clinical data set, with data augmentation techniques employed to enhance the data set to identify six types of pressure injury images. The established system features a front-end interface that includes responsive web design and a chatbot with ChatGPT, and it is integrated with a database for personal information management. Results: This research provides a practical pressure injury staging classification model with an average F1 score of 0.9238. The system remotely provides real-time accurate diagnoses and prescriptions, guiding patients to seek various medical help levels based on symptom severity. Conclusions: This study establishes a smart telemedicine auxiliary diagnosis system based on the YOLOv7 model, which possesses capabilities for classification and real-time detection. During teleconsultations, it provides immediate and accurate diagnostic information and prescription recommendations and seeks various medical assistance based on the severity of symptoms. Through the setup of a chatbot with ChatGPT, different users can quickly achieve their respective objectives.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文

Large language model 关注

1 Information extraction from medical case reports using OpenAI InstructGPT.

2 Automated classification of brain MRI reports using fine-tuned large language models.

3 Use of Generative AI for Improving Health Literacy in Reproductive Health: Case Study.

4 Patient-centered radiology reports with generative artificial intelligence: adding value to radiology reporting.

5 MDE and LLM Synergy for Network Experimentation: Case Analysis of Wireless System Performance in Beaulieu-Xie Fading and κ-µ Co-Channel Interference Environment with Diversity Combining.

6 Using Large Language Models to Support Content Analysis: A Case Study of ChatGPT for Adverse Event Detection.

7 Large Language Model-Based Evaluation of Medical Question Answering Systems: Algorithm Development and Case Study.

8 Using GPT-4 for LI-RADS feature extraction and categorization with multilingual free-text reports.

9 Evaluation of large language models performance against humans for summarizing MRI knee radiology reports: A feasibility study.

10 Applying Object Detection and Large Language Model to Establish a Smart Telemedicine Diagnosis System with Chatbot: A Case Study of Pressure Injuries Diagnosis System.