GPT

GPT
  • 文章类型: Journal Article
    Infantilecaféaulait斑点是一种棕色黄斑,具有各种尺寸(直径:0.5厘米-30厘米)。婴儿巨型咖啡壶(IGCALS)是一种巨大的(直径>20cm)不规则形状的良性色素沉着皮肤病,在婴儿中出现。对于IGCALS的治疗,还没有明确确立的激光治疗共识,因为婴儿太脆弱,不能接受长时间和广泛区域的激光治疗以及不期望的美容结果的可能性。
    这项研究调查了使用高通量1064nmQ开关Nd:YAG激光(QSNL)进行IGCALS治疗的黄金参数疗法(GPT)的安全性和有效性。
    这项研究包括24名韩国IGCALS患者。21名患者每周接受1064-nmQSNL治疗30-50次GPT治疗。参数包括7毫米的光斑尺寸,使用IGCALS上的滑动堆叠技术,一次通过的通量为2.2J/cm2,脉冲速率为10Hz。在对照组中,3例患者每月接受532-nm皮秒激光治疗3个疗程,光斑大小为3毫米,1J/cm2的通量和2Hz的脉冲速率。
    最后一次治疗后,21例IGCALS患者达到色素性病变的完全切除,这可以被认为是最佳的美容效果,没有任何副作用,如紫癜,地壳,炎症后色素沉着过度,医源性点状白斑病,和疤痕。随访6-21个月后,没有任何患者复发,但3例接受532nm皮秒激光治疗的患者出现治疗失败.
    令人信服,我们认为,使用高通量1064nmQSNL的GPT在12月龄之前进行早期干预是安全的,适用于IGCALS的有效治疗,尽量减少副作用没有任何复发。
    UNASSIGNED: Infantile café au lait spot is a brown macule with various sizes (diameter: 0.5 cm-30 cm). Infantile giant café au lait spot (IGCALS) is a huge (diameter >20cm) irregular-shaped benign hyperpigmented skin disorder that arises in infants. There has been no clearly established laser treatment consensus for the treatment of IGCALS because infants are too fragile to receive laser treatment with long hours and broad areas along with the possibility of undesirable cosmetic results.
    UNASSIGNED: This study investigated the safety and efficacy of Golden Parameter Therapy (GPT) using a high fluence 1064-nm Q-switched Nd:YAG laser (QSNL) for IGCALS treatment.
    UNASSIGNED: This study included 24 Korean patients with IGCALS. Twenty-one patients who were treated with a 1064-nm QSNL weekly for 30-50 treatment sessions with GPT. The parameters included a spot size of 7 mm, a fluence of 2.2 J/cm2 and a pulse rate of 10 Hz with one pass using a sliding-stacking technique over the IGCALS. In control group, three patients were treated with a 532-nm picosecond laser monthly for three treatment sessions with a spot size of 3 mm, a fluence of 1 J/cm2 and a pulse rate of 2 Hz.
    UNASSIGNED: After the last treatment, 21 patients with IGCALS reached the complete removal of pigmented lesions, which can be considered optimal cosmetic results without any side effects such as purpura, crust, post-inflammatory hyperpigmentation, iatrogenic punctate leukoderma, and scarring. There are no recurrences in any patients after 6-21 months\' follow-up, but treatment failure occurred in three patients who were treated with 532 nm picosecond laser.
    UNASSIGNED: Convincingly, we argue that early intervention before 12 months of age with GPT using a high fluence 1064 nm QSNL is a safe, applicable and effective treatment for IGCALS, minimizing side effects without any recurrences.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    社会和行为科学越来越多地使用自动文本分析来测量文本中的心理构造。我们探索GPT,AI聊天机器人ChatGPT底层的大型语言模型(LLM),可以用作几种语言的自动心理文本分析工具。在15个数据集(n=47,925条手动注释的推文和新闻标题)中,我们测试了不同版本的GPT(3.5Turbo,4和4Turbo)可以准确地检测心理构造(情感,离散的情绪,攻击性,和道德基础)跨越12种语言。我们发现,GPT(r=0.59至0.77)在检测手动注释器判断的心理结构方面比英语词典分析(r=0.20至0.30)表现得更好。GPT的表现几乎一样好,有时比,几个性能最好的微调机器学习模型。此外,GPT的性能在模型的连续版本中得到了提高,特别是对于使用较少的语言,变得更便宜了。总的来说,GPT可能优于许多现有的自动文本分析方法,因为它在许多语言中都达到了相对较高的准确性,不需要训练数据,并且易于使用简单的提示(例如,\“此文本是否定的吗?\”)和很少的编码经验。我们提供了示例代码和视频教程,用于使用GPT应用程序编程接口分析文本。我们认为,GPT和其他LLM通过使高级自然语言处理功能更易于访问来帮助实现自动化文本分析,并可能有助于促进与未被研究的语言进行更多的跨语言研究。
    The social and behavioral sciences have been increasingly using automated text analysis to measure psychological constructs in text. We explore whether GPT, the large-language model (LLM) underlying the AI chatbot ChatGPT, can be used as a tool for automated psychological text analysis in several languages. Across 15 datasets (n = 47,925 manually annotated tweets and news headlines), we tested whether different versions of GPT (3.5 Turbo, 4, and 4 Turbo) can accurately detect psychological constructs (sentiment, discrete emotions, offensiveness, and moral foundations) across 12 languages. We found that GPT (r = 0.59 to 0.77) performed much better than English-language dictionary analysis (r = 0.20 to 0.30) at detecting psychological constructs as judged by manual annotators. GPT performed nearly as well as, and sometimes better than, several top-performing fine-tuned machine learning models. Moreover, GPT\'s performance improved across successive versions of the model, particularly for lesser-spoken languages, and became less expensive. Overall, GPT may be superior to many existing methods of automated text analysis, since it achieves relatively high accuracy across many languages, requires no training data, and is easy to use with simple prompts (e.g., \"is this text negative?\") and little coding experience. We provide sample code and a video tutorial for analyzing text with the GPT application programming interface. We argue that GPT and other LLMs help democratize automated text analysis by making advanced natural language processing capabilities more accessible, and may help facilitate more cross-linguistic research with understudied languages.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    为了评估响应能力,在公共医疗系统耳鼻喉科工作竞争考试中,ChatGPT3.5和互联网连接的GPT-4引擎(MicrosoftCopilot),以耳鼻喉科专家的真实分数为对照组。2023年9月,将135个分为理论和实践部分的问题输入到ChatGPT3.5和连接互联网的GPT-4中。将AI反应的准确性与参加考试的耳鼻喉科医生的官方结果进行了比较,采用Stata14.2进行统计分析。副驾驶(GPT-4)的表现优于ChatGPT3.5。副驾驶取得88.5分的成绩,而ChatGPT得了60分。两个AI的错误答案都存在差异。尽管ChatGPT很熟练,Copilot表现出卓越的性能,在参加考试的108名耳鼻喉科医生中排名第二,而ChatGPT排在第83位。与ChatGPT3.5相比,由具有互联网访问功能的GPT-4(Copilot)提供的聊天在回答多项选择的医疗问题方面表现出卓越的性能。
    To evaluate the response capabilities, in a public healthcare system otolaryngology job competition examination, of ChatGPT 3.5 and an internet-connected GPT-4 engine (Microsoft Copilot) with the real scores of otolaryngology specialists as the control group. In September 2023, 135 questions divided into theoretical and practical parts were input into ChatGPT 3.5 and an internet-connected GPT-4. The accuracy of AI responses was compared with the official results from otolaryngologists who took the exam, and statistical analysis was conducted using Stata 14.2. Copilot (GPT-4) outperformed ChatGPT 3.5. Copilot achieved a score of 88.5 points, while ChatGPT scored 60 points. Both AIs had discrepancies in their incorrect answers. Despite ChatGPT\'s proficiency, Copilot displayed superior performance, ranking as the second-best score among the 108 otolaryngologists who took the exam, while ChatGPT was placed 83rd. A chat powered by GPT-4 with internet access (Copilot) demonstrates superior performance in responding to multiple-choice medical questions compared to ChatGPT 3.5.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    通过多组学分析癌症患者的数据的可用性正在迅速增加。然而,对这些数据进行综合分析以进行个性化目标识别并不是微不足道的。Multiomics2Targets是一个平台,使用户能够上传转录组学,蛋白质组学,和从同一癌症患者队列中收集的磷酸化蛋白质组学数据矩阵。上传数据后,Multiomics2Targets产生的报告类似于研究出版物。处理上传的矩阵,分析,并使用Enrichr工具进行可视化,KEA3,ChEA3,Expression2激酶,和TargetRanger来识别和优先考虑蛋白质,基因,和成绩单作为潜在的目标。数字和表格,以及方法和结果的描述,是自动生成的。报告包括摘要,介绍,方法,结果,讨论,结论,和引用,并可导出为可引用的PDF和Jupyter笔记本。Multiomics2Targets用于分析临床蛋白质组学肿瘤分析联盟(CPTAC3)泛癌症队列的第3版,确定每种CPTAC3癌症亚型的潜在靶标。Multiomics2Targets可从https://multiomics2targets获得。Maayanlab.云/。
    The availability of data from profiling of cancer patients with multiomics is rapidly increasing. However, integrative analysis of such data for personalized target identification is not trivial. Multiomics2Targets is a platform that enables users to upload transcriptomics, proteomics, and phosphoproteomics data matrices collected from the same cohort of cancer patients. After uploading the data, Multiomics2Targets produces a report that resembles a research publication. The uploaded matrices are processed, analyzed, and visualized using the tools Enrichr, KEA3, ChEA3, Expression2Kinases, and TargetRanger to identify and prioritize proteins, genes, and transcripts as potential targets. Figures and tables, as well as descriptions of the methods and results, are automatically generated. Reports include an abstract, introduction, methods, results, discussion, conclusions, and references and are exportable as citable PDFs and Jupyter Notebooks. Multiomics2Targets is applied to analyze version 3 of the Clinical Proteomic Tumor Analysis Consortium (CPTAC3) pan-cancer cohort, identifying potential targets for each CPTAC3 cancer subtype. Multiomics2Targets is available from https://multiomics2targets.maayanlab.cloud/.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    目的:开发一种基于人工智能(AI)的异常检测模型,作为“精明医生”在检测医院新疾病病例和预防新爆发方面的补充。
    方法:数据包括马萨诸塞州一家安全网医院的住院患者(n=120,714)。一个新的生成预训练变压器(GPT)为基础的临床异常检测系统的设计和进一步训练使用经验风险最小化(ERM),它可以为住院患者的电子健康记录(EHR)建模并检测非典型患者。方法和性能指标,与最近的大型语言模型(LLM)背后的类似,被用来捕获患者临床变量的动态演变并计算OOD异常评分。
    结果:在完全无人监督的情况下,在COVID-19大流行开始时,我们的GPT模型可以预测严重急性呼吸系统综合症冠状病毒2(SARS-CoV-2)感染的住院治疗,接收器工作特性曲线下面积(AUC)为92.2%,使用31个提取的临床变量和3天的检测窗口。我们的GPT实现了个体患者水平的异常检测和死亡率预测AUC为78.3%和94.7%,优于传统线性模型6.6%和9%,分别。我们的模型捕获了SARS-CoV-2感染的不同类型的临床轨迹,以进行可解释的检测,而过度悲观结果预测的趋势产生了更有效的检测途径。此外,我们全面的GPT模型可以帮助临床医生预测患者临床变量并制定个性化治疗计划.
    结论:这项研究表明,可以在医院内准确发现新出现的疫情,通过使用GPT对患者EHR时间序列进行建模,并在模型不支持实际结果时将其标记为异常。这样的GPT也是具有生成未来患者临床变量的功能的综合模型。这可能有助于临床医生制定个性化治疗计划。
    OBJECTIVE: To develop an Artificial Intelligence (AI)-based anomaly detection model as a complement of an \"astute physician\" in detecting novel disease cases in a hospital and preventing emerging outbreaks.
    METHODS: Data included hospitalized patients (n = 120,714) at a safety-net hospital in Massachusetts. A novel Generative Pre-trained Transformer (GPT)-based clinical anomaly detection system was designed and further trained using Empirical Risk Minimization (ERM), which can model a hospitalized patient\'s Electronic Health Records (EHR) and detect atypical patients. Methods and performance metrics, similar to the ones behind the recent Large Language Models (LLMs), were leveraged to capture the dynamic evolution of the patient\'s clinical variables and compute an Out-Of-Distribution (OOD) anomaly score.
    RESULTS: In a completely unsupervised setting, hospitalizations for Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) infection could have been predicted by our GPT model at the beginning of the COVID-19 pandemic, with an Area Under the Receiver Operating Characteristic Curve (AUC) of 92.2 %, using 31 extracted clinical variables and a 3-day detection window. Our GPT achieves individual patient-level anomaly detection and mortality prediction AUC of 78.3 % and 94.7 %, outperforming traditional linear models by 6.6 % and 9 %, respectively. Different types of clinical trajectories of a SARS-CoV-2 infection are captured by our model to make interpretable detections, while a trend of over-pessimistic outcome prediction yields a more effective detection pathway. Furthermore, our comprehensive GPT model can potentially assist clinicians with forecasting patient clinical variables and developing personalized treatment plans.
    CONCLUSIONS: This study demonstrates that an emerging outbreak can be accurately detected within a hospital, by using a GPT to model patient EHR time sequences and labeling them as anomalous when actual outcomes are not supported by the model. Such a GPT is also a comprehensive model with the functionality of generating future patient clinical variables, which can potentially assist clinicians in developing personalized treatment plans.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    人口统计,健康的社会决定因素,越来越多地研究电子健康记录中的非结构化文本中记录的家族史,以了解如何将这些信息与结构化数据一起使用以改善医疗保健结果。GPT模型发布后,许多研究已经应用GPT模型从叙述性临床笔记中提取这些信息。不同于现有的工作,我们的研究重点是通过向GPT模型提供最少的信息来研究在一起提取这些信息时的零镜头学习.我们利用针对人口统计注释的去识别的真实世界临床笔记,各种社会决定因素,和家族史信息。鉴于GPT模型可能提供与原始数据中的文本不同的文本,我们探索了两组评估指标,包括传统的NER评价指标和语义相似度评价指标,完全理解表演。我们的结果表明,GPT-3.5方法在人口统计学提取上平均达到0.975F1,关于社会决定因素提取的0.615F1,家族史提取0.722F1。我们相信这些结果可以通过模型微调或少量学习得到进一步改善。通过案例研究,我们还确定了GPT模型的局限性,这需要在未来的研究中解决。
    Demographics, social determinants of health, and family history documented in the unstructured text within the electronic health records are increasingly being studied to understand how this information can be utilized with the structured data to improve healthcare outcomes. After the GPT models were released, many studies have applied GPT models to extract this information from the narrative clinical notes. Different from the existing work, our research focuses on investigating the zero-shot learning on extracting this information together by providing minimum information to the GPT model. We utilize de-identified real-world clinical notes annotated for demographics, various social determinants, and family history information. Given that the GPT model might provide text different from the text in the original data, we explore two sets of evaluation metrics, including the traditional NER evaluation metrics and semantic similarity evaluation metrics, to completely understand the performance. Our results show that the GPT-3.5 method achieved an average of 0.975 F1 on demographics extraction, 0.615 F1 on social determinants extraction, and 0.722 F1 on family history extraction. We believe these results can be further improved through model fine-tuning or few-shots learning. Through the case studies, we also identified the limitations of the GPT models, which need to be addressed in future research.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:在美国,五分之一的成年人目前是患有严重疾病或残疾的个人的家庭照顾者。与专业护理人员不同,家庭照顾者通常在没有正式准备或培训的情况下承担这一角色。因此,迫切需要提高家庭护理人员提供优质护理的能力。利用技术作为教育工具或辅助护理是一种有前途的方法,有可能提高家庭护理人员的学习和护理能力。大型语言模型(LLM)可以用作支持护理人员的基础技术。LLM可以归类为基础模型(FM),它是在广泛的数据集上训练的大规模模型,可以适应一系列不同的领域任务。尽管有潜力,FM有“幻觉”的关键弱点,“模型产生的信息可能具有误导性或不准确。当语言模型被部署为护理人员的一线帮助工具时,信息可靠性至关重要。
    目的:本研究旨在(1)通过使用FM和护理知识库来开发可靠的护理语言模型(CaLM),(2)使用需要更少的计算资源的小型FM开发可访问的CaLM,(3)与大型调频相比,评估模型的性能。
    方法:我们使用检索增强生成(RAG)框架结合FM微调开发了一种CaLM,通过将模型基于护理知识库来提高FM答案的质量。CaLM的关键组成部分是护理知识库,微调调频,和一个回收模块。我们使用2个小型FM作为CaLM(LLaMA[大型语言模型MetaAI]2和Falcon,具有70亿个参数)的基础,并采用了大型FM(GPT-3.5,估计有1750亿个参数)作为基准。我们通过从互联网上收集各种类型的文档来开发护理知识库。我们专注于阿尔茨海默病和相关痴呆症患者的护理人员。我们使用通常用于评估语言模型的基准指标及其可靠性来评估模型的性能,以提供准确的答案参考。
    结果:RAG框架提高了本研究中使用的所有FM在所有措施中的性能。不出所料,在所有指标上,大型FM的表现都优于小型FM。有趣的是,在所有指标中,使用RAG的小型微调FM的表现明显优于GPT3.5。具有小FM的微调LLaMA2在返回带有答案的参考方面比GPT3.5(即使使用RAG)表现更好。
    结论:研究表明,可以使用具有特定于护理领域的知识库的小型FM开发可靠且可访问的CaLM。
    BACKGROUND: In the United States, 1 in 5 adults currently serves as a family caregiver for an individual with a serious illness or disability. Unlike professional caregivers, family caregivers often assume this role without formal preparation or training. Thus, there is an urgent need to enhance the capacity of family caregivers to provide quality care. Leveraging technology as an educational tool or an adjunct to care is a promising approach that has the potential to enhance the learning and caregiving capabilities of family caregivers. Large language models (LLMs) can potentially be used as a foundation technology for supporting caregivers. An LLM can be categorized as a foundation model (FM), which is a large-scale model trained on a broad data set that can be adapted to a range of different domain tasks. Despite their potential, FMs have the critical weakness of \"hallucination,\" where the models generate information that can be misleading or inaccurate. Information reliability is essential when language models are deployed as front-line help tools for caregivers.
    OBJECTIVE: This study aimed to (1) develop a reliable caregiving language model (CaLM) by using FMs and a caregiving knowledge base, (2) develop an accessible CaLM using a small FM that requires fewer computing resources, and (3) evaluate the model\'s performance compared with a large FM.
    METHODS: We developed a CaLM using the retrieval augmented generation (RAG) framework combined with FM fine-tuning for improving the quality of FM answers by grounding the model on a caregiving knowledge base. The key components of the CaLM are the caregiving knowledge base, a fine-tuned FM, and a retriever module. We used 2 small FMs as candidates for the foundation of the CaLM (LLaMA [large language model Meta AI] 2 and Falcon with 7 billion parameters) and adopted a large FM (GPT-3.5 with an estimated 175 billion parameters) as a benchmark. We developed the caregiving knowledge base by gathering various types of documents from the internet. We focused on caregivers of individuals with Alzheimer disease and related dementias. We evaluated the models\' performances using the benchmark metrics commonly used in evaluating language models and their reliability for providing accurate references with their answers.
    RESULTS: The RAG framework improved the performance of all FMs used in this study across all measures. As expected, the large FM performed better than the small FMs across all metrics. Interestingly, the small fine-tuned FMs with RAG performed significantly better than GPT 3.5 across all metrics. The fine-tuned LLaMA 2 with a small FM performed better than GPT 3.5 (even with RAG) in returning references with the answers.
    CONCLUSIONS: The study shows that a reliable and accessible CaLM can be developed using small FMs with a knowledge base specific to the caregiving domain.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    先前的研究评估了大型语言模型(LLM)在医学学科中的能力;然而,很少有人专注于图像分析,没有特别针对心血管成像或核心脏病学。
    本研究评估了四个LLM-GPT-4,GPT-4Turbo,GPT-4omni(GPT-4o)(开放式AI),和Gemini(GoogleInc.)-在回答2023年美国核心脏病学会委员会准备考试的问题时,反映核心脏病学认证委员会(CBNC)考试的范围。
    我们使用了168个问题:141个纯文本和27个基于图像的问题,分为四个部分镜像CBNC考试。每个LLM都有相同的标准化提示,并应用于每个部分30次,以说明随机性。对除了GPT-4o之外的所有模型评估六周内的性能。McNemar测试比较了正确的反应比例。
    GPT-4,双子座,GPT4-Turbo,GPT-4o正确回答了56.8%的中位数百分位数(95%置信区间55.4%-58.0%),40.5%(39.9%-42.9%),60.7%(59.9%-61.3%)和63.1%(62.5-64.3%)的问题,分别。GPT4o的表现明显优于其他型号(p=0.007与GPT-4Turbo,p<0.001vs.GPT-4和双子座)。与GPT-4,双子座相比,GPT-4o在纯文本问题上表现出色,和GPT-4Turbo(p<0.001,p<0.001,p=0.001),而双子座在基于图像的问题上表现较差(全部p<0.001)。
    GPT-4o在四个LLM中表现出卓越的性能,达到的分数可能在通过类似于CBNC考试的测试所需的范围之内或之外。尽管需要改进医学图像解释,GPT-4o显示出支持医生回答基于文本的临床问题的潜力。
    UNASSIGNED: Previous studies evaluated the ability of large language models (LLMs) in medical disciplines; however, few have focused on image analysis, and none specifically on cardiovascular imaging or nuclear cardiology.
    UNASSIGNED: This study assesses four LLMs - GPT-4, GPT-4 Turbo, GPT-4omni (GPT-4o) (Open AI), and Gemini (Google Inc.) - in responding to questions from the 2023 American Society of Nuclear Cardiology Board Preparation Exam, reflecting the scope of the Certification Board of Nuclear Cardiology (CBNC) examination.
    UNASSIGNED: We used 168 questions: 141 text-only and 27 image-based, categorized into four sections mirroring the CBNC exam. Each LLM was presented with the same standardized prompt and applied to each section 30 times to account for stochasticity. Performance over six weeks was assessed for all models except GPT-4o. McNemar\'s test compared correct response proportions.
    UNASSIGNED: GPT-4, Gemini, GPT4-Turbo, and GPT-4o correctly answered median percentiles of 56.8% (95% confidence interval 55.4% - 58.0%), 40.5% (39.9% - 42.9%), 60.7% (59.9% - 61.3%) and 63.1% (62.5 - 64.3%) of questions, respectively. GPT4o significantly outperformed other models (p=0.007 vs. GPT-4Turbo, p<0.001 vs. GPT-4 and Gemini). GPT-4o excelled on text-only questions compared to GPT-4, Gemini, and GPT-4 Turbo (p<0.001, p<0.001, and p=0.001), while Gemini performed worse on image-based questions (p<0.001 for all).
    UNASSIGNED: GPT-4o demonstrated superior performance among the four LLMs, achieving scores likely within or just outside the range required to pass a test akin to the CBNC examination. Although improvements in medical image interpretation are needed, GPT-4o shows potential to support physicians in answering text-based clinical questions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Letter
    暂无摘要。
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    目的:研究Chatbot生成预训练变压器(ChatGPT)-4在常见喉科疾病临床图片分析中的一致性。
    方法:前瞻性非对照研究。
    方法:多中心研究。
    方法:将患者病史和临床视频喉镜图像提供给ChatGPT-4进行鉴别诊断,管理,和治疗(S)。ChatGPT-4反应由3名盲喉科医师使用人工智能性能仪器(AIPI)进行评估。使用5点Likert量表评估了病例的复杂性以及从业人员与ChatGPT-4之间解释临床图像的一致性。使用组内相关系数(ICC)来衡量评估者之间的一致性强度。
    结果:40例患者,平均复杂性评分为2.60±1.15。包括在内。ChatGPT-4图像解释的平均一致性评分为2.46±1.42。ChatGPT-4完美分析了6例(15%;5/5)的临床图像,而GPT-4和法官之间的一致性在5个案例中很高(12.5%;4/5)。法官报告的一致性得分ICC为0.965(P=.001)。ChatGPT-4错误地记录了声带不规则性(肿块或病变),声门功能不全,和声带麻痹21(52.5%),2(0.05%),和5例(12.5%),分别。ChatGPT-4和从业人员进行了153和63次额外检查,分别(P=.001)。在20.0%至25.0%的病例中,ChatGPT-4的主要诊断是正确的。临床图像一致性评分与AIPI评分显著相关(rs=0.830;P=.001)。
    结论:ChatGPT-4在主要诊断中更有效,而不是在图像分析中,选择最适当的额外检查和治疗。
    OBJECTIVE: To investigate the consistency of Chatbot Generative Pretrained Transformer (ChatGPT)-4 in the analysis of clinical pictures of common laryngological conditions.
    METHODS: Prospective uncontrolled study.
    METHODS: Multicenter study.
    METHODS: Patient history and clinical videolaryngostroboscopic images were presented to ChatGPT-4 for differential diagnoses, management, and treatment(s). ChatGPT-4 responses were assessed by 3 blinded laryngologists with the artificial intelligence performance instrument (AIPI). The complexity of cases and the consistency between practitioners and ChatGPT-4 for interpreting clinical images were evaluated with a 5-point Likert Scale. The intraclass correlation coefficient (ICC) was used to measure the strength of interrater agreement.
    RESULTS: Forty patients with a mean complexity score of 2.60 ± 1.15. were included. The mean consistency score for ChatGPT-4 image interpretation was 2.46 ± 1.42. ChatGPT-4 perfectly analyzed the clinical images in 6 cases (15%; 5/5), while the consistency between GPT-4 and judges was high in 5 cases (12.5%; 4/5). Judges reported an ICC of 0.965 for the consistency score (P = .001). ChatGPT-4 erroneously documented vocal fold irregularity (mass or lesion), glottic insufficiency, and vocal cord paralysis in 21 (52.5%), 2 (0.05%), and 5 (12.5%) cases, respectively. ChatGPT-4 and practitioners indicated 153 and 63 additional examinations, respectively (P = .001). The ChatGPT-4 primary diagnosis was correct in 20.0% to 25.0% of cases. The clinical image consistency score was significantly associated with the AIPI score (rs = 0.830; P = .001).
    CONCLUSIONS: The ChatGPT-4 is more efficient in primary diagnosis, rather than in the image analysis, selecting the most adequate additional examinations and treatments.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号