Practical model

  • 文章类型: Journal Article
    背景:人工智能,特别是聊天机器人系统,正在成为医疗保健的工具,帮助临床决策和患者参与。
    目的:本研究旨在分析ChatGPT-3.5和ChatGPT-4在解决复杂的临床和伦理困境方面的表现,并说明他们在医疗保健决策中的潜在作用,同时比较老年人和居民的评级,和特定的问题类型。
    方法:共有4名专业医师提出了176个现实世界的临床问题。共有8位资深医生和居民以1-5的量表评估了GPT-3.5和GPT-4的5个类别的回答:准确性,相关性,清晰度,实用程序,和全面性。在内科进行评估,急诊医学,和道德。在全球范围内进行了比较,在老年人和居民之间,跨分类。
    结果:两种GPT模型均获得较高的平均得分(GPT-4为4.4,SD0.8,GPT-3.5为4.1,SD1.0)。GPT-4在所有评级维度上都优于GPT-3.5,老年人对这两种模式的反应始终高于居民。具体来说,老年人将GPT-4评为更有益和更完整(分别为4.6vs4.0和4.6vs4.1;P<.001),和GPT-3.5相似(分别为4.1vs3.7和3.9vs3.5;P<.001)。道德查询在这两种模型中都获得了最高的评价,平均分数反映了准确性和完整性标准的一致性。问题类型之间的区别是显著的,特别是对于整个紧急情况下的GPT-4完整性平均分数,内部,和伦理问题(分别为4.2,SD1.0;4.3,SD0.8;和4.5,SD0.7;P<.001),对于GPT-3.5的准确性,有益的,和完整性尺寸。
    结论:ChatGPT帮助医生解决医疗问题的潜力是有希望的,具有增强诊断能力的前景,治疗,和道德。虽然整合到临床工作流程可能很有价值,它必须补充,不替换,人类的专业知识。持续的研究对于确保在临床环境中安全有效的实施至关重要。
    BACKGROUND: Artificial intelligence, particularly chatbot systems, is becoming an instrumental tool in health care, aiding clinical decision-making and patient engagement.
    OBJECTIVE: This study aims to analyze the performance of ChatGPT-3.5 and ChatGPT-4 in addressing complex clinical and ethical dilemmas, and to illustrate their potential role in health care decision-making while comparing seniors\' and residents\' ratings, and specific question types.
    METHODS: A total of 4 specialized physicians formulated 176 real-world clinical questions. A total of 8 senior physicians and residents assessed responses from GPT-3.5 and GPT-4 on a 1-5 scale across 5 categories: accuracy, relevance, clarity, utility, and comprehensiveness. Evaluations were conducted within internal medicine, emergency medicine, and ethics. Comparisons were made globally, between seniors and residents, and across classifications.
    RESULTS: Both GPT models received high mean scores (4.4, SD 0.8 for GPT-4 and 4.1, SD 1.0 for GPT-3.5). GPT-4 outperformed GPT-3.5 across all rating dimensions, with seniors consistently rating responses higher than residents for both models. Specifically, seniors rated GPT-4 as more beneficial and complete (mean 4.6 vs 4.0 and 4.6 vs 4.1, respectively; P<.001), and GPT-3.5 similarly (mean 4.1 vs 3.7 and 3.9 vs 3.5, respectively; P<.001). Ethical queries received the highest ratings for both models, with mean scores reflecting consistency across accuracy and completeness criteria. Distinctions among question types were significant, particularly for the GPT-4 mean scores in completeness across emergency, internal, and ethical questions (4.2, SD 1.0; 4.3, SD 0.8; and 4.5, SD 0.7, respectively; P<.001), and for GPT-3.5\'s accuracy, beneficial, and completeness dimensions.
    CONCLUSIONS: ChatGPT\'s potential to assist physicians with medical issues is promising, with prospects to enhance diagnostics, treatments, and ethics. While integration into clinical workflows may be valuable, it must complement, not replace, human expertise. Continued research is essential to ensure safe and effective implementation in clinical environments.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:工作量增加,包括与电子健康记录(EHR)文档相关的工作量,据报道是护士倦怠的主要原因,并对患者安全和护士满意度产生不利影响。工作量分析的传统方法要么是不代表实际护理的行政措施(例如护患比例),要么是主观的,并且仅限于护理快照(例如,时间运动研究)。实时观察护理和测试工作流程变化可能会妨碍临床护理。使用EHR审计日志检查EHR交互可以提供可扩展的,以不显眼的方式量化护理工作量,至少在EHR文档中代表护理工作的范围内。EHR审计日志极其复杂;然而,简单的分析方法无法发现复杂的时间模式,需要使用最先进的时态数据挖掘方法。为了有效地使用这些方法,有必要将原始审计日志构建为一致且可扩展的逻辑数据模型,该模型可由机器学习(ML)算法使用。
    目的:我们旨在概念化护士与EHR交互的逻辑数据模型,以支持基于EHR审计日志数据的时态ML模型的未来发展。
    方法:我们对EHR审核日志进行了初步审查,以了解所捕获的护理特定数据的类型。使用来自文献的概念和我们以前研究生物医学数据中时间模式的经验,我们制定了一个逻辑数据模型,可以描述护士与EHR的相互作用,可能影响这些互动的护士内在和情境特征,以及以可扩展和可扩展的方式与护理工作量相关的结果。
    结果:我们将与护理工作量相关的EHR审计日志数据的数据结构和概念描述为名为RNteract的逻辑数据模型。我们从概念上演示了如何使用这种逻辑数据模型可以支持时间无监督ML和最先进的人工智能(AI)方法进行预测建模。
    结论:RNteract逻辑数据模型似乎能够支持各种基于AI的系统,并且应该可以推广到任何类型的EHR系统或医疗保健环境。定量识别和分析护士与EHR相互作用的时间模式是开发支持护理文档工作量和解决护士倦怠的干预措施的基础。
    BACKGROUND: Increased workload, including workload related to electronic health record (EHR) documentation, is reported as a main contributor to nurse burnout and adversely affects patient safety and nurse satisfaction. Traditional methods for workload analysis are either administrative measures (such as the nurse-patient ratio) that do not represent actual nursing care or are subjective and limited to snapshots of care (eg, time-motion studies). Observing care and testing workflow changes in real time can be obstructive to clinical care. An examination of EHR interactions using EHR audit logs could provide a scalable, unobtrusive way to quantify the nursing workload, at least to the extent that nursing work is represented in EHR documentation. EHR audit logs are extremely complex; however, simple analytical methods cannot discover complex temporal patterns, requiring use of state-of-the-art temporal data-mining approaches. To effectively use these approaches, it is necessary to structure the raw audit logs into a consistent and scalable logical data model that can be consumed by machine learning (ML) algorithms.
    OBJECTIVE: We aimed to conceptualize a logical data model for nurse-EHR interactions that would support the future development of temporal ML models based on EHR audit log data.
    METHODS: We conducted a preliminary review of EHR audit logs to understand the types of nursing-specific data captured. Using concepts derived from the literature and our previous experience studying temporal patterns in biomedical data, we formulated a logical data model that can describe nurse-EHR interactions, the nurse-intrinsic and situational characteristics that may influence those interactions, and outcomes of relevance to the nursing workload in a scalable and extensible manner.
    RESULTS: We describe the data structure and concepts from EHR audit log data associated with nursing workload as a logical data model named RNteract. We conceptually demonstrate how using this logical data model could support temporal unsupervised ML and state-of-the-art artificial intelligence (AI) methods for predictive modeling.
    CONCLUSIONS: The RNteract logical data model appears capable of supporting a variety of AI-based systems and should be generalizable to any type of EHR system or health care setting. Quantitatively identifying and analyzing temporal patterns of nurse-EHR interactions is foundational for developing interventions that support the nursing documentation workload and address nurse burnout.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:HALE现在是中国政府各级政府的常规战略规划指标。然而,HALE测量需要全面的数据收集和复杂的技术。因此,有效地将多种疾病转化为残疾年(YLD)率是HALE测量的重大挑战。我们的研究旨在基于中国境内实际数据资源的现状,构建一个简单的具有高适用性的YLD率测量模型,以解决在规划过程中测量HALE目标值的挑战。
    方法:首先,基于2019年全球疾病负担(GBD)中中国人的YLD率,皮尔逊相关分析,全局最优方法,等。,用于从当前的中文数据资源中筛选最佳预测变量。预测变量的缺失数据通过样条插值填充。然后,多元线性回归模型构建YLD率测量模型。Sullivan方法用于测量HALE。蒙特卡罗方法用于产生95%的不确定度区间。最后,使用平均绝对误差(MAE)和平均绝对百分比误差(MAPE)评估模型性能.
    结果:构建了一个三输入参数模型来衡量中国按性别划分的年龄YLD率,直接利用传染病的发病率,15岁及以上人群的慢性病发病率,以及增加5岁以下儿童死亡率协变量。合并YLD率的总MAE和MAPE分别为0.0007和0.5949%,分别。0岁组合并HALE的MAE和MAPE分别为0.0341和0.0526%,分别。男性(0.0197,0.0311%)略低于女性(0.0501,0.0755%)。
    结论:我们使用中国国民常规的三个监测指标作为预测变量,构建了一个高精度模型来测量中国的YLD率。该模型为在国家尤其是区域层面测量HALE提供了现实可行的解决方案,考虑到有限的数据。
    BACKGROUND: HALE is now a regular strategic planning indicator for all levels of the Chinese government. However, HALE measurements necessitate comprehensive data collection and intricate technology. Therefore, effectively converting numerous diseases into the years lived with disability (YLD) rate is a significant challenge for HALE measurements. Our study aimed to construct a simple YLD rate measurement model with high applicability based on the current situation of actual data resources within China to address challenges in measuring HALE target values during planning.
    METHODS: First, based on the Chinese YLD rate in the Global Burden of Disease (GBD) 2019, Pearson correlation analysis, the global optimum method, etc., was utilized to screen the best predictor variables from the current Chinese data resources. Missing data for predictor variables were filled in via spline interpolation. Then, multiple linear regression models were fitted to construct the YLD rate measurement model. The Sullivan method was used to measure HALE. The Monte Carlo method was employed to generate 95% uncertainty intervals. Finally, model performances were assessed using the mean absolute error (MAE) and mean absolute percentage error (MAPE).
    RESULTS: A three-input-parameter model was constructed to measure the age-specific YLD rates by sex in China, directly using the incidence of infectious diseases, the incidence of chronic diseases among persons aged 15 and older, and the addition of an under-five mortality rate covariate. The total MAE and MAPE for the combined YLD rate were 0.0007 and 0.5949%, respectively. The MAE and MAPE of the combined HALE in the 0-year-old group were 0.0341 and 0.0526%, respectively. There were slightly fewer males (0.0197, 0.0311%) than females (0.0501, 0.0755%).
    CONCLUSIONS: We constructed a high-accuracy model to measure the YLD rate in China by using three monitoring indicators from the Chinese national routine as predictor variables. The model provides a realistic and feasible solution for measuring HALE at the national and especially regional levels, considering limited data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    人工智能(AI)正在重塑医疗保健,包括护理,整个亚洲,提供改善患者护理和结果的机会。这个观点展示了我们对当前人工智能格局的观点和解释,承认其发展是由增强的处理能力驱动的,广泛的数据集,和完善的算法。在新加坡等国家的显著应用,韩国,Japan,中国展示了人工智能技术的整合,如聊天机器人,虚拟助理,数据挖掘,和自动风险评估系统。本文进一步探讨了人工智能对护理教育的变革影响,强调个性化学习,自适应方法,和人工智能丰富的模拟工具,并讨论了这些发展的机遇和挑战。我们主张传统护理价值观与人工智能创新和谐共存,标志着向亚洲充满希望的医疗保健未来迈出了重要的一步。
    Artificial intelligence (AI) is reshaping health care, including nursing, across Asia, presenting opportunities to improve patient care and outcomes. This viewpoint presents our perspective and interpretation of the current AI landscape, acknowledging its evolution driven by enhanced processing capabilities, extensive data sets, and refined algorithms. Notable applications in countries such as Singapore, South Korea, Japan, and China showcase the integration of AI-powered technologies such as chatbots, virtual assistants, data mining, and automated risk assessment systems. This paper further explores the transformative impact of AI on nursing education, emphasizing personalized learning, adaptive approaches, and AI-enriched simulation tools, and discusses the opportunities and challenges of these developments. We argue for the harmonious coexistence of traditional nursing values with AI innovations, marking a significant stride toward a promising health care future in Asia.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    人工智能(AI)语言模型的最新突破提升了使用对话AI支持心理健康的愿景。越来越多的文献表明不同程度的功效。在本文中,我们问什么时候,在治疗中,它将更容易取代人类,相反,在什么情况下,人与人之间的联系仍将更加受到重视。我们认为,同理心是这个问题答案的核心。首先,我们定义了移情的不同方面,并概述了人类与人工智能的潜在移情能力。接下来,我们考虑是什么决定了治疗中什么时候最需要这些方面,从治疗方法和患者目标的角度来看。最终,我们的目标是促进进一步的调查和对话,敦促从事AI介导治疗的从业者和学者在调查AI在心理健康中的实施时牢记这些问题和考虑因素。
    Recent breakthroughs in artificial intelligence (AI) language models have elevated the vision of using conversational AI support for mental health, with a growing body of literature indicating varying degrees of efficacy. In this paper, we ask when, in therapy, it will be easier to replace humans and, conversely, in what instances, human connection will still be more valued. We suggest that empathy lies at the heart of the answer to this question. First, we define different aspects of empathy and outline the potential empathic capabilities of humans versus AI. Next, we consider what determines when these aspects are needed most in therapy, both from the perspective of therapeutic methodology and from the perspective of patient objectives. Ultimately, our goal is to prompt further investigation and dialogue, urging both practitioners and scholars engaged in AI-mediated therapy to keep these questions and considerations in mind when investigating AI implementation in mental health.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:远程医疗功能的增长使识别患有不受控制的糖尿病风险较高的个体成为可能,并为他们提供有针对性的支持和资源,以帮助他们管理病情。因此,预测模型已成为促进糖尿病管理的有价值的工具。
    目的:本研究旨在概念化和开发新的机器学习(ML)方法,以主动识别参加远程糖尿病监测计划(RDMP)的参与者,他们在计划的12个月内有不受控制的糖尿病风险。
    方法:来自LivongoforDiabetesRDMP的注册数据用于设计单独的动态预测ML模型,以预测参与者从入学第一天(月-0模型)到第11个月(月-11模型)的每个月计划旅程(月-n模型)的每个月检查点的参与者结果。参与者的计划旅程始于进入RDMP并通过RDMP提供的BG计监测自己的血糖(BG)水平。每个参与者在注册RDMP的第一年都通过了12个预测模型。四类参与者属性(即,调查数据,BG数据,药物填充,和健康信号)用于特征构造。使用光梯度增强机对模型进行了训练,并进行了超参数调整。使用标准指标评估模型的性能,包括精度,召回,特异性,曲线下的面积,F1得分,和准确性。
    结果:ML模型表现出强劲的性能,准确识别可观察到的风险参与者,在12个月的计划旅程中,召回率从70%到94%不等,准确率从40%到88%不等。不可观察的风险参与者也表现出了有希望的表现,召回率从61%到82%,准确率从42%到61%。总的来说,随着参与者在计划旅程中的进步,模型性能得到了提高,证明参与数据在预测长期临床结局中的重要性。
    结论:这项研究探索了Livongo对糖尿病RDMP参与者的时间和静态属性,识别糖尿病管理模式和特征,以及它们与预测糖尿病管理结果的关系。主动靶向ML模型准确地识别了处于不受控制的糖尿病风险中的参与者,其精确度很高,可在RDMP的未来几年内推广。识别在整个计划旅程的各个时间点处于风险中的参与者的能力允许个性化干预以改善结果。这种方法在远程监测计划中大规模实施的可行性方面提供了显着进步,并且可以帮助预防不受控制的血糖水平和与糖尿病相关的并发症。未来的研究应包括可能影响参与者糖尿病管理的重大变化的影响。
    BACKGROUND: The growth in the capabilities of telehealth have made it possible to identify individuals with a higher risk of uncontrolled diabetes and provide them with targeted support and resources to help them manage their condition. Thus, predictive modeling has emerged as a valuable tool for the advancement of diabetes management.
    OBJECTIVE: This study aimed to conceptualize and develop a novel machine learning (ML) approach to proactively identify participants enrolled in a remote diabetes monitoring program (RDMP) who were at risk of uncontrolled diabetes at 12 months in the program.
    METHODS: Registry data from the Livongo for Diabetes RDMP were used to design separate dynamic predictive ML models to predict participant outcomes at each monthly checkpoint of the participants\' program journey (month-n models) from the first day of onboarding (month-0 model) up to the 11th month (month-11 model). A participant\'s program journey began upon onboarding into the RDMP and monitoring their own blood glucose (BG) levels through the RDMP-provided BG meter. Each participant passed through 12 predicative models through their first year enrolled in the RDMP. Four categories of participant attributes (ie, survey data, BG data, medication fills, and health signals) were used for feature construction. The models were trained using the light gradient boosting machine and underwent hyperparameter tuning. The performance of the models was evaluated using standard metrics, including precision, recall, specificity, the area under the curve, the F1-score, and accuracy.
    RESULTS: The ML models exhibited strong performance, accurately identifying observable at-risk participants, with recall ranging from 70% to 94% and precision from 40% to 88% across the 12-month program journey. Unobservable at-risk participants also showed promising performance, with recall ranging from 61% to 82% and precision from 42% to 61%. Overall, model performance improved as participants progressed through their program journey, demonstrating the importance of engagement data in predicting long-term clinical outcomes.
    CONCLUSIONS: This study explored the Livongo for Diabetes RDMP participants\' temporal and static attributes, identification of diabetes management patterns and characteristics, and their relationship to predict diabetes management outcomes. Proactive targeting ML models accurately identified participants at risk of uncontrolled diabetes with a high level of precision that was generalizable through future years within the RDMP. The ability to identify participants who are at risk at various time points throughout the program journey allows for personalized interventions to improve outcomes. This approach offers significant advancements in the feasibility of large-scale implementation in remote monitoring programs and can help prevent uncontrolled glycemic levels and diabetes-related complications. Future research should include the impact of significant changes that can affect a participant\'s diabetes management.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:大型语言模型(LLM)具有心理健康应用的潜力。然而,他们不透明的对齐过程可能会嵌入偏见,形成有问题的观点。评估嵌入在LLM中指导其决策的价值观具有道德重要性。施瓦茨的基本价值观理论(STBV)为量化文化价值取向提供了一个框架,并显示了在心理健康环境中检查价值观的效用。包括文化,诊断,和治疗师-客户动态。
    目的:这项研究旨在(1)评估STBV是否可以测量领先的LLM中的价值样构建体,以及(2)确定LLM是否表现出与人类和彼此不同的价值样模式。
    方法:总共,4名法学硕士(吟游诗人,克劳德2,生成预训练变压器[GPT]-3.5,GPT-4)被拟人化,并指示完成肖像值问卷修订(PVQ-RR)以评估类似价值的构造。对他们在10项试验中的反应进行了信度和效度分析。要对LLM值配置文件进行基准测试,将他们的结果与来自49个国家的53,472名完成PVQ-RR的不同样本的已发表数据进行比较.这使我们能够评估LLM是否与跨文化群体的既定人类价值模式有所不同。还通过统计检验比较了模型之间的值概况。
    结果:PVQ-RR显示出良好的信度和效度,用于量化LLM内的价值式基础设施。然而,LLM的价值概况和人口数据之间出现了很大的差异。这些模型缺乏共识,表现出明显的动机偏见,反映不透明的对齐过程。例如,所有模式都优先考虑普遍主义和自我导向,在不强调成就的同时,电源,和相对于人类的安全。成功的判别分析区分了4个不同的LLM值概况。进一步的检查发现,当出现心理健康困境时,有偏见的价值概况强烈预测了LLM的反应,需要在相反的价值之间进行选择。这为嵌入塑造其决策的独特动机价值样结构的模型提供了进一步的验证。
    结论:这项研究利用了STBV来映射激励领先LLM的类价值基础设施。尽管研究表明STBV可以有效地表征LLM中的类价值基础设施,与人类价值观的巨大分歧引发了人们对将这些模型与心理健康应用保持一致的道德担忧。如果在没有适当保障措施的情况下进行整合,对某些文化价值集的偏见会带来风险。例如,即使在临床上不明智的情况下,优先考虑普遍性也可以促进无条件接受。此外,LLM之间的差异强调了标准化调整过程以捕获真正的文化多样性的必要性。因此,任何负责任的将LLM整合到精神卫生保健中都必须考虑到其嵌入的偏见和动机不匹配,以确保跨不同人群的公平交付。实现这一目标将需要透明和完善对齐技术,以灌输全面的人类价值观。
    BACKGROUND: Large language models (LLMs) hold potential for mental health applications. However, their opaque alignment processes may embed biases that shape problematic perspectives. Evaluating the values embedded within LLMs that guide their decision-making have ethical importance. Schwartz\'s theory of basic values (STBV) provides a framework for quantifying cultural value orientations and has shown utility for examining values in mental health contexts, including cultural, diagnostic, and therapist-client dynamics.
    OBJECTIVE: This study aimed to (1) evaluate whether the STBV can measure value-like constructs within leading LLMs and (2) determine whether LLMs exhibit distinct value-like patterns from humans and each other.
    METHODS: In total, 4 LLMs (Bard, Claude 2, Generative Pretrained Transformer [GPT]-3.5, GPT-4) were anthropomorphized and instructed to complete the Portrait Values Questionnaire-Revised (PVQ-RR) to assess value-like constructs. Their responses over 10 trials were analyzed for reliability and validity. To benchmark the LLMs\' value profiles, their results were compared to published data from a diverse sample of 53,472 individuals across 49 nations who had completed the PVQ-RR. This allowed us to assess whether the LLMs diverged from established human value patterns across cultural groups. Value profiles were also compared between models via statistical tests.
    RESULTS: The PVQ-RR showed good reliability and validity for quantifying value-like infrastructure within the LLMs. However, substantial divergence emerged between the LLMs\' value profiles and population data. The models lacked consensus and exhibited distinct motivational biases, reflecting opaque alignment processes. For example, all models prioritized universalism and self-direction, while de-emphasizing achievement, power, and security relative to humans. Successful discriminant analysis differentiated the 4 LLMs\' distinct value profiles. Further examination found the biased value profiles strongly predicted the LLMs\' responses when presented with mental health dilemmas requiring choosing between opposing values. This provided further validation for the models embedding distinct motivational value-like constructs that shape their decision-making.
    CONCLUSIONS: This study leveraged the STBV to map the motivational value-like infrastructure underpinning leading LLMs. Although the study demonstrated the STBV can effectively characterize value-like infrastructure within LLMs, substantial divergence from human values raises ethical concerns about aligning these models with mental health applications. The biases toward certain cultural value sets pose risks if integrated without proper safeguards. For example, prioritizing universalism could promote unconditional acceptance even when clinically unwise. Furthermore, the differences between the LLMs underscore the need to standardize alignment processes to capture true cultural diversity. Thus, any responsible integration of LLMs into mental health care must account for their embedded biases and motivation mismatches to ensure equitable delivery across diverse populations. Achieving this will require transparency and refinement of alignment techniques to instill comprehensive human values.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:心态,这是人类认知过程不可或缺的一部分,涉及对自己和他人精神状态的解释,包括情绪,信仰,和意图。随着人工智能(AI)的出现以及大型语言模型在心理健康应用中的突出地位,关于他们在情感理解方面的能力的问题仍然存在。来自OpenAI的大型语言模型的先前迭代,ChatGPT-3.5,展示了从文本数据中解释情绪的高级能力,超越人类基准。鉴于ChatGPT-4的引入,其增强的视觉处理能力,考虑到GoogleBard现有的视觉功能,有必要对他们的视觉思维能力进行严格评估。
    目的:研究的目的是批判性地评估ChatGPT-4和GoogleBard在辨别视觉思维指标方面的能力,并与基于文本的思维能力进行对比。
    方法:由Baron-Cohen及其同事开发的“眼睛阅读”测试用于评估模型在解释视觉情绪指标方面的熟练程度。同时,情感意识水平量表用于评估大型语言模型在文本思维中的能力。整理来自两个测试的数据提供了ChatGPT-4和Bard的思维能力的整体视图。
    结果:ChatGPT-4,表现出明显的情感识别能力,在两次不同的评估中获得了26分和27分,显著偏离随机响应范式(P<.001)。这些分数与更广泛的人类人口的既定基准一致。值得注意的是,ChatGPT-4表现出一致的反应,与模型的性别或情感的性质没有明显的偏见。相比之下,GoogleBard的性能与随机响应模式一致,确保10分和12分,并使进一步的详细分析变得多余。在文本分析领域,ChatGPT和Bard都超过了普通人群的既定基准,他们的表演非常一致。
    结论:ChatGPT-4证明了其在视觉指导领域的功效,与人类的表现标准密切相关。尽管两种模型在文本情感解释中都表现出值得称赞的敏锐度,巴德在视觉情感解释方面的能力需要进一步的审查和潜在的改进。这项研究强调了道德人工智能发展对情感识别的重要性,强调对包容性数据的需求,与患者和心理健康专家合作,和严格的政府监督,以确保透明度和保护患者隐私。
    BACKGROUND: Mentalization, which is integral to human cognitive processes, pertains to the interpretation of one\'s own and others\' mental states, including emotions, beliefs, and intentions. With the advent of artificial intelligence (AI) and the prominence of large language models in mental health applications, questions persist about their aptitude in emotional comprehension. The prior iteration of the large language model from OpenAI, ChatGPT-3.5, demonstrated an advanced capacity to interpret emotions from textual data, surpassing human benchmarks. Given the introduction of ChatGPT-4, with its enhanced visual processing capabilities, and considering Google Bard\'s existing visual functionalities, a rigorous assessment of their proficiency in visual mentalizing is warranted.
    OBJECTIVE: The aim of the research was to critically evaluate the capabilities of ChatGPT-4 and Google Bard with regard to their competence in discerning visual mentalizing indicators as contrasted with their textual-based mentalizing abilities.
    METHODS: The Reading the Mind in the Eyes Test developed by Baron-Cohen and colleagues was used to assess the models\' proficiency in interpreting visual emotional indicators. Simultaneously, the Levels of Emotional Awareness Scale was used to evaluate the large language models\' aptitude in textual mentalizing. Collating data from both tests provided a holistic view of the mentalizing capabilities of ChatGPT-4 and Bard.
    RESULTS: ChatGPT-4, displaying a pronounced ability in emotion recognition, secured scores of 26 and 27 in 2 distinct evaluations, significantly deviating from a random response paradigm (P<.001). These scores align with established benchmarks from the broader human demographic. Notably, ChatGPT-4 exhibited consistent responses, with no discernible biases pertaining to the sex of the model or the nature of the emotion. In contrast, Google Bard\'s performance aligned with random response patterns, securing scores of 10 and 12 and rendering further detailed analysis redundant. In the domain of textual analysis, both ChatGPT and Bard surpassed established benchmarks from the general population, with their performances being remarkably congruent.
    CONCLUSIONS: ChatGPT-4 proved its efficacy in the domain of visual mentalizing, aligning closely with human performance standards. Although both models displayed commendable acumen in textual emotion interpretation, Bard\'s capabilities in visual emotion interpretation necessitate further scrutiny and potential refinement. This study stresses the criticality of ethical AI development for emotional recognition, highlighting the need for inclusive data, collaboration with patients and mental health experts, and stringent governmental oversight to ensure transparency and protect patient privacy.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:本研究旨在确定冷刀锥切术后切缘阴性的宫颈上皮内病变残留/复发的可靠危险因素。
    方法:共包括2352名在2014年1月至2020年12月期间接受冷刀锥切术的HSILs(高度鳞状上皮内病变)患者,1411名妇女被分配到发展队列,941名女性被分配到验证队列.根据随访数据的不同组合,采用多因素logistic回归建立4种预测模型(模型A:术前因素;模型B:第一次随访数据;模型C:第二次随访数据;模型D:两次随访数据)。准确性,灵敏度,特异性,假阳性率(FPR),假阴性率(FNR),和受试者工作特征曲线下面积(AUC)在验证队列中进行评估.使用六种机器学习算法进一步验证了风险因素的预测能力。
    结果:模型D在验证队列中显示出最高的AUC为0.91(95%CI,0.87至0.96),而模型A,B,C的AUC为0.69(95%CI,0.59至0.78),0.88(95%CI,0.80至0.95),和0.89(95%CI,0.81至0.97)。六种机器学习方法取得了一致的结果。Kaplan-Meier(KM)存活曲线表明,我们的模型可以有效地对所有模型的患者进行分层(所有模型的p<0.05)。
    结论:我们的模型,基于术前和随访因素,可以作为早期发现或预测HSIL患者冷刀锥切术后复发的补充筛查程序。
    Objective: This study aimed to identify reliable risk factors for residual/recurrent cervical intraepithelial lesions in patients with negative margins after cold-knife conization. Methods: A total of 2352 women with HSILs (high-grade squamous intraepithelial lesions) with negative margins who underwent cold-knife conization between January 2014 and December 2020 were included; in total, 1411 women were assigned to the development cohort, and 941 women were assigned to the validation cohort. Multivariate logistic regression was used to build four predictive models based on the different combinations of follow-up data (Model A: preoperative factors; Model B: first-follow-up data; Model C: second-follow-up data; Model D: data from both follow-ups). The accuracy, sensitivity, specificity, false-positive rate (FPR), false-negative rate (FNR), and area under the receiver operating characteristic curve (AUC) were evaluated on the validation cohort. The predictive power of risk factors was further validated using six machine learning algorithms. Results: Model D demonstrated the highest AUC of 0.91 (95% CI, 0.87 to 0.96) in the validation cohort, whereas Models A, B, and C achieved AUCs of 0.69 (95% CI, 0.59 to 0.78), 0.88 (95% CI, 0.80 to 0.95), and 0.89 (95% CI, 0.81 to 0.97) respectively. The six machine learning methods achieved consistent results. Kaplan-Meier (KM) survival curves demonstrated that our models could effectively stratify patients with all models (p < 0.05 for all models). Conclusion: Our model, which is based on preoperative and follow-up factors, can serve as a complementary screening procedure for the early detection or prediction of recurrence after cold-knife conization in HSIL patients.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:对于门静脉高压症(PH)患者,门静脉血栓形成(PVT)是脾切除术后的致命并发症。术后血小板升高被认为是PVT的主要原因。然而,从未研究过术后血小板升高率(PPER)对PVT的预测价值.
    目的:探讨PPER对PVT的预测价值,建立基于PPER的预测模型,以早期识别脾切除术后PVT高危个体。
    方法:我们回顾性回顾了2011年7月至2018年9月期间接受脾切除术的483例与乙型肝炎病毒相关的PH患者,他们被随机分为训练(n=338)或验证(n=145)队列。广义线性(GL)方法,最小绝对收缩和选择运算符(LASSO),和随机森林(RF)用于构建模型。接收器工作特性曲线(ROC),校正曲线,决策曲线分析(DCA),和临床影响曲线(CIC)评价GL模型(GLM)的鲁棒性和临床实用性,LASSO模型(LSM),和射频模型(RFM)。
    结果:多变量分析显示,PPER的第一天和第三天(PPER1,PPER3)与PVT密切相关[比值比(OR):1.78,95%置信区间(CI):1.24-2.62,P=0.002;OR:1.43,95CI:1.16-1.77,P<0.001]。GLM的ROC曲线下的面积,LSM,训练队列中的RFM为0.83(95CI:0.79-0.88),0.84(95CI:0.79-0.88),和0.84(95CI:0.79-0.88),分别为0.77(95CI:0.69-0.85),0.83(95CI:0.76-0.90),和0.78(95CI:0.70-0.85)在验证队列中,分别。校准曲线显示模型预测与实际观察之间令人满意的一致性。DCA和CIC表明,所有模型均具有较高的临床净收益。
    结论:PPER1和PPER3是预测术后PVT的有效指标。我们已经成功开发了基于PPER的实用模型来准确预测PVT,这将方便地帮助临床医生快速区分PVT高危人群,从而指导采取及时的干预措施。
    BACKGROUND: For patients with portal hypertension (PH), portal vein thrombosis (PVT) is a fatal complication after splenectomy. Postoperative platelet elevation is considered the foremost reason for PVT. However, the value of postoperative platelet elevation rate (PPER) in predicting PVT has never been studied.
    OBJECTIVE: To investigate the predictive value of PPER for PVT and establish PPER-based prediction models to early identify individuals at high risk of PVT after splenectomy.
    METHODS: We retrospectively reviewed 483 patients with PH related to hepatitis B virus who underwent splenectomy between July 2011 and September 2018, and they were randomized into either a training (n = 338) or a validation (n = 145) cohort. The generalized linear (GL) method, least absolute shrinkage and selection operator (LASSO), and random forest (RF) were used to construct models. The receiver operating characteristic curves (ROC), calibration curve, decision curve analysis (DCA), and clinical impact curve (CIC) were used to evaluate the robustness and clinical practicability of the GL model (GLM), LASSO model (LSM), and RF model (RFM).
    RESULTS: Multivariate analysis exhibited that the first and third days for PPER (PPER1, PPER3) were strongly associated with PVT [odds ratio (OR): 1.78, 95% confidence interval (CI): 1.24-2.62, P = 0.002; OR: 1.43, 95%CI: 1.16-1.77, P < 0.001, respectively]. The areas under the ROC curves of the GLM, LSM, and RFM in the training cohort were 0.83 (95%CI: 0.79-0.88), 0.84 (95%CI: 0.79-0.88), and 0.84 (95%CI: 0.79-0.88), respectively; and were 0.77 (95%CI: 0.69-0.85), 0.83 (95%CI: 0.76-0.90), and 0.78 (95%CI: 0.70-0.85) in the validation cohort, respectively. The calibration curves showed satisfactory agreement between prediction by models and actual observation. DCA and CIC indicated that all models conferred high clinical net benefits.
    CONCLUSIONS: PPER1 and PPER3 are effective indicators for postoperative prediction of PVT. We have successfully developed PPER-based practical models to accurately predict PVT, which would conveniently help clinicians rapidly differentiate individuals at high risk of PVT, and thus guide the adoption of timely interventions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号