chat-bots-医云文献数字医云科研云海量医学决策数据服务

chat-bots 关注

文献(5篇)

百科

视频

1 Readability of Information Generated by ChatGPT for Hidradenitis Suppurativa.

ChatGPT 为化脓性汗腺炎生成的信息的可读性。影响指数 : 暂无
发表时间：Aug 2024 14
来源期刊：JMIR Dermatol PMID：39141908

DOI：10.2196/55204
文章类型： Letter

暂无摘要。

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
2 Use of Generative AI for Improving Health Literacy in Reproductive Health: Case Study.

使用生成人工智能提高生殖健康中的健康素养：案例研究。影响指数 : 暂无
发表时间：Aug 2024 6
来源期刊：JMIR Form Res PMID：38986153

DOI：10.2196/59434
文章类型： Journal Article

背景：患者发现技术工具更容易获取敏感的健康相关信息，如生殖健康信息。人工智能(AI)聊天机器人的创造性对话能力，比如ChatGPT,为患者提供了一种潜在的方法，可以在线有效地找到与健康相关的问题的答案。
目的：进行了一项初步研究，将新型ChatGPT与现有的Google搜索技术进行比较，有效,以及关于在错过口服避孕药（OCP）剂量后继续行动的最新信息。
方法：十一个问题的序列，模仿患者在错过一定剂量的OCP后询问要采取的行动，作为级联输入到ChatGPT中，考虑到ChatGPT的会话能力。这些问题被输入到四个不同的ChatGPT帐户中，帐户持有人具有各种人口统计特征，评估给予不同账户持有人的答复中的潜在差异和偏见。最主要的问题，“如果我错过了一天的口服避孕药，我该怎么办？”然后将其单独输入到Google搜索中，考虑到它的非对话性质。ChatGPT问题的结果和Google搜索结果对主要问题的可读性进行了评估，准确度,和有效的信息传递。
结果：ChatGPT结果被确定为整体较高年级阅读水平，更长的读取持续时间(表2)，不太准确，较小的电流,和一个不太有效的信息传递。相比之下,谷歌搜索结果答案框和片段处于较低的阅读水平，较短的阅读持续时间，电流更大,能够参考信息的来源(透明)，并提供了除文本之外的各种格式的信息。
结论：ChatGPT在准确性方面还有改进的空间，透明度，最近，和可靠性之前，它可以公平地实施到医疗保健信息交付，并提供潜在的好处，它带来。然而,AI可以用作提供者优先教育患者的工具，创造性,和有效的方法，例如使用AI从医疗保健提供者审查的信息中生成可访问的短教育视频。需要代表不同用户群的更大研究。
背景：
BACKGROUND: Patients find technology tools to be more approachable for seeking sensitive health-related information, such as reproductive health information. The inventive conversational ability of artificial intelligence (AI) chatbots, such as ChatGPT (OpenAI Inc), offers a potential means for patients to effectively locate answers to their health-related questions digitally.
OBJECTIVE: A pilot study was conducted to compare the novel ChatGPT with the existing Google Search technology for their ability to offer accurate, effective, and current information regarding proceeding action after missing a dose of oral contraceptive pill.
METHODS: A sequence of 11 questions, mimicking a patient inquiring about the action to take after missing a dose of an oral contraceptive pill, were input into ChatGPT as a cascade, given the conversational ability of ChatGPT. The questions were input into 4 different ChatGPT accounts, with the account holders being of various demographics, to evaluate potential differences and biases in the responses given to different account holders. The leading question, \"what should I do if I missed a day of my oral contraception birth control?\" alone was then input into Google Search, given its nonconversational nature. The results from the ChatGPT questions and the Google Search results for the leading question were evaluated on their readability, accuracy, and effective delivery of information.
RESULTS: The ChatGPT results were determined to be at an overall higher-grade reading level, with a longer reading duration, less accurate, less current, and with a less effective delivery of information. In contrast, the Google Search resulting answer box and snippets were at a lower-grade reading level, shorter reading duration, more current, able to reference the origin of the information (transparent), and provided the information in various formats in addition to text.
CONCLUSIONS: ChatGPT has room for improvement in accuracy, transparency, recency, and reliability before it can equitably be implemented into health care information delivery and provide the potential benefits it poses. However, AI may be used as a tool for providers to educate their patients in preferred, creative, and efficient ways, such as using AI to generate accessible short educational videos from health care provider-vetted information. Larger studies representing a diverse group of users are needed.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
3 Assessing Generative Pretrained Transformers (GPT) in Clinical Decision-Making: Comparative Analysis of GPT-3.5 and GPT-4.

评估临床决策中的生成预训练变压器（ GPT ）： GPT - 3.5 和 GPT - 4 的比较分析。影响指数 : 7.076
发表时间：Jun 2024 27
来源期刊：J Med Internet Res PMID：38935937

DOI：10.2196/54571
文章类型： Journal Article

背景：人工智能，特别是聊天机器人系统，正在成为医疗保健的工具，帮助临床决策和患者参与。
目的：本研究旨在分析ChatGPT-3.5和ChatGPT-4在解决复杂的临床和伦理困境方面的表现，并说明他们在医疗保健决策中的潜在作用，同时比较老年人和居民的评级，和特定的问题类型。
方法：共有4名专业医师提出了176个现实世界的临床问题。共有8位资深医生和居民以1-5的量表评估了GPT-3.5和GPT-4的5个类别的回答：准确性，相关性，清晰度,实用程序，和全面性。在内科进行评估，急诊医学,和道德。在全球范围内进行了比较，在老年人和居民之间，跨分类。
结果：两种GPT模型均获得较高的平均得分（GPT-4为4.4，SD0.8，GPT-3.5为4.1，SD1.0）。GPT-4在所有评级维度上都优于GPT-3.5，老年人对这两种模式的反应始终高于居民。具体来说,老年人将GPT-4评为更有益和更完整（分别为4.6vs4.0和4.6vs4.1；P<.001），和GPT-3.5相似（分别为4.1vs3.7和3.9vs3.5；P<.001）。道德查询在这两种模型中都获得了最高的评价，平均分数反映了准确性和完整性标准的一致性。问题类型之间的区别是显著的，特别是对于整个紧急情况下的GPT-4完整性平均分数，内部,和伦理问题（分别为4.2，SD1.0；4.3，SD0.8；和4.5，SD0.7；P<.001），对于GPT-3.5的准确性，有益的,和完整性尺寸。
结论：ChatGPT帮助医生解决医疗问题的潜力是有希望的，具有增强诊断能力的前景，治疗,和道德。虽然整合到临床工作流程可能很有价值，它必须补充，不替换,人类的专业知识。持续的研究对于确保在临床环境中安全有效的实施至关重要。
BACKGROUND: Artificial intelligence, particularly chatbot systems, is becoming an instrumental tool in health care, aiding clinical decision-making and patient engagement.
OBJECTIVE: This study aims to analyze the performance of ChatGPT-3.5 and ChatGPT-4 in addressing complex clinical and ethical dilemmas, and to illustrate their potential role in health care decision-making while comparing seniors\' and residents\' ratings, and specific question types.
METHODS: A total of 4 specialized physicians formulated 176 real-world clinical questions. A total of 8 senior physicians and residents assessed responses from GPT-3.5 and GPT-4 on a 1-5 scale across 5 categories: accuracy, relevance, clarity, utility, and comprehensiveness. Evaluations were conducted within internal medicine, emergency medicine, and ethics. Comparisons were made globally, between seniors and residents, and across classifications.
RESULTS: Both GPT models received high mean scores (4.4, SD 0.8 for GPT-4 and 4.1, SD 1.0 for GPT-3.5). GPT-4 outperformed GPT-3.5 across all rating dimensions, with seniors consistently rating responses higher than residents for both models. Specifically, seniors rated GPT-4 as more beneficial and complete (mean 4.6 vs 4.0 and 4.6 vs 4.1, respectively; P<.001), and GPT-3.5 similarly (mean 4.1 vs 3.7 and 3.9 vs 3.5, respectively; P<.001). Ethical queries received the highest ratings for both models, with mean scores reflecting consistency across accuracy and completeness criteria. Distinctions among question types were significant, particularly for the GPT-4 mean scores in completeness across emergency, internal, and ethical questions (4.2, SD 1.0; 4.3, SD 0.8; and 4.5, SD 0.7, respectively; P<.001), and for GPT-3.5\'s accuracy, beneficial, and completeness dimensions.
CONCLUSIONS: ChatGPT\'s potential to assist physicians with medical issues is promising, with prospects to enhance diagnostics, treatments, and ethics. While integration into clinical workflows may be valuable, it must complement, not replace, human expertise. Continued research is essential to ensure safe and effective implementation in clinical environments.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
4 Usability Comparison Among Healthy Participants of an Anthropomorphic Digital Human and a Text-Based Chatbot as a Responder to Questions on Mental Health: Randomized Controlled Trial.

拟人化数字人类和基于文本的聊天机器人作为心理健康问题的回应者的健康参与者之间的可用性比较：随机对照试验。影响指数 : 暂无
发表时间：Apr 2024 29
来源期刊：JMIR Hum Factors PMID：38683664

DOI：10.2196/54581
文章类型： Journal Article

背景：近年来，聊天机器人在心理健康支持中的使用呈指数增长，研究表明它们可能有效治疗心理健康问题。最近,已经引入了称为数字人类的视觉化身的使用。数字人类有能力使用面部表情作为人机交互的另一个维度。重要的是研究基于文本的聊天机器人和数字人类与心理健康服务交互之间的情绪反应和可用性偏好的差异。
目的：本研究旨在探讨在健康参与者进行测试时，数字人机界面和纯文本聊天机器人界面的可用性在多大程度上不同。使用BETSY(行为，情感,治疗系统,和您）使用2个不同的界面：具有拟人化特征的数字人类和纯文本用户界面。我们还着手探索聊天机器人生成的关于心理健康的对话（特定于每个界面）如何影响自我报告的感觉和生物识别技术。
方法：我们探索了具有拟人化特征的数字人类在多大程度上不同于传统的纯文本聊天机器人，通过系统可用性量表感知可用性，通过脑电图的情绪反应，和亲密的感觉。健康参与者（n=45）被随机分为两组，使用具有拟人化特征的数字人类（n=25）或没有这种特征的纯文本聊天机器人（n=20）。各组比较采用线性回归分析和t检验。
结果：在人口统计学特征方面，纯文本和数字人群之间没有观察到差异。纯文本聊天机器人的平均系统可用性量表评分为75.34（SD10.01；范围57-90），而数字人机界面的平均系统可用性评分为64.80（SD14.14；范围40-90）。两组都将各自的聊天机器人界面的可用性评分为平均水平或高于平均水平。女性更有可能报告对BETSY感到恼火。
结论：人们认为纯文本聊天机器人比数字人类更人性化，尽管脑电图测量没有显着差异。男性参与者对两个界面都表现出较低的烦恼，与以前报道的发现相反。
BACKGROUND: The use of chatbots in mental health support has increased exponentially in recent years, with studies showing that they may be effective in treating mental health problems. More recently, the use of visual avatars called digital humans has been introduced. Digital humans have the capability to use facial expressions as another dimension in human-computer interactions. It is important to study the difference in emotional response and usability preferences between text-based chatbots and digital humans for interacting with mental health services.
OBJECTIVE: This study aims to explore to what extent a digital human interface and a text-only chatbot interface differed in usability when tested by healthy participants, using BETSY (Behavior, Emotion, Therapy System, and You) which uses 2 distinct interfaces: a digital human with anthropomorphic features and a text-only user interface. We also set out to explore how chatbot-generated conversations on mental health (specific to each interface) affected self-reported feelings and biometrics.
METHODS: We explored to what extent a digital human with anthropomorphic features differed from a traditional text-only chatbot regarding perception of usability through the System Usability Scale, emotional reactions through electroencephalography, and feelings of closeness. Healthy participants (n=45) were randomized to 2 groups that used a digital human with anthropomorphic features (n=25) or a text-only chatbot with no such features (n=20). The groups were compared by linear regression analysis and t tests.
RESULTS: No differences were observed between the text-only and digital human groups regarding demographic features. The mean System Usability Scale score was 75.34 (SD 10.01; range 57-90) for the text-only chatbot versus 64.80 (SD 14.14; range 40-90) for the digital human interface. Both groups scored their respective chatbot interfaces as average or above average in usability. Women were more likely to report feeling annoyed by BETSY.
CONCLUSIONS: The text-only chatbot was perceived as significantly more user-friendly than the digital human, although there were no significant differences in electroencephalography measurements. Male participants exhibited lower levels of annoyance with both interfaces, contrary to previously reported findings.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
5 Assessing the Alignment of Large Language Models With Human Values for Mental Health Integration: Cross-Sectional Study Using Schwartz's Theory of Basic Values.

评估大型语言模型与人类价值观的一致性以促进心理健康整合：使用 Schwartz 的基本价值观理论进行的横断面研究。影响指数 : 6.332
发表时间：Apr 2024 9
来源期刊：JMIR Ment Health PMID：38593424

DOI：10.2196/55988
文章类型： Journal Article

背景：大型语言模型（LLM）具有心理健康应用的潜力。然而,他们不透明的对齐过程可能会嵌入偏见，形成有问题的观点。评估嵌入在LLM中指导其决策的价值观具有道德重要性。施瓦茨的基本价值观理论（STBV）为量化文化价值取向提供了一个框架，并显示了在心理健康环境中检查价值观的效用。包括文化,诊断,和治疗师-客户动态。
目的：这项研究旨在（1）评估STBV是否可以测量领先的LLM中的价值样构建体，以及（2）确定LLM是否表现出与人类和彼此不同的价值样模式。
方法：总共，4名法学硕士（吟游诗人，克劳德2，生成预训练变压器[GPT]-3.5，GPT-4）被拟人化，并指示完成肖像值问卷修订（PVQ-RR）以评估类似价值的构造。对他们在10项试验中的反应进行了信度和效度分析。要对LLM值配置文件进行基准测试，将他们的结果与来自49个国家的53,472名完成PVQ-RR的不同样本的已发表数据进行比较.这使我们能够评估LLM是否与跨文化群体的既定人类价值模式有所不同。还通过统计检验比较了模型之间的值概况。
结果：PVQ-RR显示出良好的信度和效度，用于量化LLM内的价值式基础设施。然而，LLM的价值概况和人口数据之间出现了很大的差异。这些模型缺乏共识，表现出明显的动机偏见，反映不透明的对齐过程。例如,所有模式都优先考虑普遍主义和自我导向，在不强调成就的同时，电源,和相对于人类的安全。成功的判别分析区分了4个不同的LLM值概况。进一步的检查发现，当出现心理健康困境时，有偏见的价值概况强烈预测了LLM的反应，需要在相反的价值之间进行选择。这为嵌入塑造其决策的独特动机价值样结构的模型提供了进一步的验证。
结论：这项研究利用了STBV来映射激励领先LLM的类价值基础设施。尽管研究表明STBV可以有效地表征LLM中的类价值基础设施，与人类价值观的巨大分歧引发了人们对将这些模型与心理健康应用保持一致的道德担忧。如果在没有适当保障措施的情况下进行整合，对某些文化价值集的偏见会带来风险。例如,即使在临床上不明智的情况下，优先考虑普遍性也可以促进无条件接受。此外，LLM之间的差异强调了标准化调整过程以捕获真正的文化多样性的必要性。因此，任何负责任的将LLM整合到精神卫生保健中都必须考虑到其嵌入的偏见和动机不匹配，以确保跨不同人群的公平交付。实现这一目标将需要透明和完善对齐技术，以灌输全面的人类价值观。
BACKGROUND: Large language models (LLMs) hold potential for mental health applications. However, their opaque alignment processes may embed biases that shape problematic perspectives. Evaluating the values embedded within LLMs that guide their decision-making have ethical importance. Schwartz\'s theory of basic values (STBV) provides a framework for quantifying cultural value orientations and has shown utility for examining values in mental health contexts, including cultural, diagnostic, and therapist-client dynamics.
OBJECTIVE: This study aimed to (1) evaluate whether the STBV can measure value-like constructs within leading LLMs and (2) determine whether LLMs exhibit distinct value-like patterns from humans and each other.
METHODS: In total, 4 LLMs (Bard, Claude 2, Generative Pretrained Transformer [GPT]-3.5, GPT-4) were anthropomorphized and instructed to complete the Portrait Values Questionnaire-Revised (PVQ-RR) to assess value-like constructs. Their responses over 10 trials were analyzed for reliability and validity. To benchmark the LLMs\' value profiles, their results were compared to published data from a diverse sample of 53,472 individuals across 49 nations who had completed the PVQ-RR. This allowed us to assess whether the LLMs diverged from established human value patterns across cultural groups. Value profiles were also compared between models via statistical tests.
RESULTS: The PVQ-RR showed good reliability and validity for quantifying value-like infrastructure within the LLMs. However, substantial divergence emerged between the LLMs\' value profiles and population data. The models lacked consensus and exhibited distinct motivational biases, reflecting opaque alignment processes. For example, all models prioritized universalism and self-direction, while de-emphasizing achievement, power, and security relative to humans. Successful discriminant analysis differentiated the 4 LLMs\' distinct value profiles. Further examination found the biased value profiles strongly predicted the LLMs\' responses when presented with mental health dilemmas requiring choosing between opposing values. This provided further validation for the models embedding distinct motivational value-like constructs that shape their decision-making.
CONCLUSIONS: This study leveraged the STBV to map the motivational value-like infrastructure underpinning leading LLMs. Although the study demonstrated the STBV can effectively characterize value-like infrastructure within LLMs, substantial divergence from human values raises ethical concerns about aligning these models with mental health applications. The biases toward certain cultural value sets pose risks if integrated without proper safeguards. For example, prioritizing universalism could promote unconditional acceptance even when clinically unwise. Furthermore, the differences between the LLMs underscore the need to standardize alignment processes to capture true cultural diversity. Thus, any responsible integration of LLMs into mental health care must account for their embedded biases and motivation mismatches to ensure equitable delivery across diverse populations. Achieving this will require transparency and refinement of alignment techniques to instill comprehensive human values.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)

chat-bots 关注

1 Readability of Information Generated by ChatGPT for Hidradenitis Suppurativa.

2 Use of Generative AI for Improving Health Literacy in Reproductive Health: Case Study.

3 Assessing Generative Pretrained Transformers (GPT) in Clinical Decision-Making: Comparative Analysis of GPT-3.5 and GPT-4.

4 Usability Comparison Among Healthy Participants of an Anthropomorphic Digital Human and a Text-Based Chatbot as a Responder to Questions on Mental Health: Randomized Controlled Trial.

5 Assessing the Alignment of Large Language Models With Human Values for Mental Health Integration: Cross-Sectional Study Using Schwartz's Theory of Basic Values.