self-diagnose

自我诊断
  • 文章类型: Journal Article
    人工智能(AI)在移动健康(mHealth)中的日益突出已经产生了一个不同的应用程序子集,这些应用程序使用用户输入的健康状况和症状信息为用户提供诊断信息-AI支持的症状检查器应用程序(AIShycheck)。虽然这些应用程序可能会增加获得医疗保健的机会,他们提出了相应的道德和法律问题。本文将强调人工智能在医疗保健系统中的使用值得注意的问题,进一步巩固医疗保健系统中现有的偏见和专业问责制问题。对专业义务和责任的偏见和复杂性问题进行深入分析,我们专注于2mHealth应用程序作为例子-巴比伦和阿达。我们选择了这两个应用程序,因为它们在COVID-19大流行期间都广泛分发,并对它们使用人工智能来评估用户症状做出了突出的声明。首先,偏见根深蒂固通常源于用于训练人工智能系统的数据,让人工智能通过垃圾复制这些不平等,“垃圾出”现象。这些应用程序的用户也不太可能在人口统计上代表更大的人口,导致扭曲的结果。第二,鉴于AISymCheck应用程序可靠性的巨大多样性和缺乏监管,专业问责制构成了重大挑战。目前还不清楚这些应用程序是否应该接受安全审查。负责应用介导的误诊,以及这些应用程序是否应该由医生推荐。随着应用程序数量的迅速增加,对卫生专业人员的指导仍然很少。专业机构和宣传组织在解决这些道德和法律差距方面可以发挥特别重要的作用。在这些应用程序中实施技术保障措施可以减轻偏见,人工智能可以主要用中性数据进行训练,应用程序可能会受到监管系统的约束,以允许用户做出明智的决定。在我们看来,至关重要的是,在这些潜在破坏性技术的设计和实施过程中,必须考虑这些法律问题。根深蒂固的偏见和职业责任,在以不同方式操作时,最终加剧了mHealth的不受管制的性质。
    The growing prominence of artificial intelligence (AI) in mobile health (mHealth) has given rise to a distinct subset of apps that provide users with diagnostic information using their inputted health status and symptom information-AI-powered symptom checker apps (AISympCheck). While these apps may potentially increase access to health care, they raise consequential ethical and legal questions. This paper will highlight notable concerns with AI usage in the health care system, further entrenchment of preexisting biases in the health care system and issues with professional accountability. To provide an in-depth analysis of the issues of bias and complications of professional obligations and liability, we focus on 2 mHealth apps as examples-Babylon and Ada. We selected these 2 apps as they were both widely distributed during the COVID-19 pandemic and make prominent claims about their use of AI for the purpose of assessing user symptoms. First, bias entrenchment often originates from the data used to train AI systems, causing the AI to replicate these inequalities through a \"garbage in, garbage out\" phenomenon. Users of these apps are also unlikely to be demographically representative of the larger population, leading to distorted results. Second, professional accountability poses a substantial challenge given the vast diversity and lack of regulation surrounding the reliability of AISympCheck apps. It is unclear whether these apps should be subject to safety reviews, who is responsible for app-mediated misdiagnosis, and whether these apps ought to be recommended by physicians. With the rapidly increasing number of apps, there remains little guidance available for health professionals. Professional bodies and advocacy organizations have a particularly important role to play in addressing these ethical and legal gaps. Implementing technical safeguards within these apps could mitigate bias, AIs could be trained with primarily neutral data, and apps could be subject to a system of regulation to allow users to make informed decisions. In our view, it is critical that these legal concerns are considered throughout the design and implementation of these potentially disruptive technologies. Entrenched bias and professional responsibility, while operating in different ways, are ultimately exacerbated by the unregulated nature of mHealth.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:诊断是有效医疗保健的核心组成部分,但是误诊很常见,会使患者处于危险之中。诊断决策支持系统可以在改善医生和其他医护人员的诊断方面发挥作用。症状检查程序(SC)旨在改善诊断和分诊(即,患者寻求的护理水平)。
    目的:本研究的目的是评估新的大型语言模型ChatGPT(版本3.5和4.0)的性能,广泛使用的WebMDSC,和AdaHealth开发的SC,用于诊断和分诊有紧急或紧急临床问题的患者,并与最终急诊科(ED)诊断和医师审查进行比较。
    方法:我们使用以前收集的,被取消身份,来自40名接受ED治疗的患者的自我报告数据,这些患者在看ED医生之前使用AdaSC记录他们的症状.由不了解诊断和分类的研究助理将鉴定的数据输入到ChatGPT3.5和4.0版以及WebMD中。将所有4个系统的诊断与ED中先前抽象的最终诊断以及三名独立的委员会认证的ED医生的诊断和分诊建议进行了比较,他们盲目地审查了Ada的自我报告临床数据。诊断准确性计算为ChatGPT诊断的比例,AdaSC,WebMDSC,和至少一个ED诊断匹配的独立医师(分层为前1名或前3名)。分类准确度计算为ChatGPT的建议数量,WebMD,或与至少2名独立医生达成一致或被评为“不安全”或“过于谨慎”的Ada。\"
    结果:总体而言,30例和37例有足够的数据进行诊断和分诊分析,分别。Ada的前1诊断率匹配,ChatGPT3.5,ChatGPT4.0,WebMD为9(30%),12(40%),10(33%),和12(40%),分别,医生的平均比率为47%。Ada的前3名诊断匹配率,ChatGPT3.5,ChatGPT4.0和WebMD为19(63%),19(63%),15(50%),和17(57%),分别,医生的平均比率为69%。Ada的分诊结果分布为62%(n=23)同意,14%不安全(n=5),24%(n=9)过于谨慎;对于ChatGPT,3.5是59%(n=22)同意,41%(n=15)不安全,0%(n=0)过于谨慎;对于ChatGPT4.0,76%(n=28)同意,22%(n=8)不安全,3%(n=1)过于谨慎;对于WebMD,70%(n=26)同意,19%(n=7)不安全,和11%(n=4)过于谨慎。ChatGPT3.5的不安全分诊率(41%)显着高于Ada(14%)(P=.009)。
    结论:ChatGPT3.5诊断准确率高,但不安全分诊率高。ChatGPT4.0的诊断准确性最差,但较低的不安全分诊率和与医生的最高分诊协议。Ada和WebMDSC的总体表现优于ChatGPT。在不改善分诊准确性和广泛临床评估的情况下,不建议患者在无监督下使用ChatGPT进行诊断和分诊。
    Diagnosis is a core component of effective health care, but misdiagnosis is common and can put patients at risk. Diagnostic decision support systems can play a role in improving diagnosis by physicians and other health care workers. Symptom checkers (SCs) have been designed to improve diagnosis and triage (ie, which level of care to seek) by patients.
    The aim of this study was to evaluate the performance of the new large language model ChatGPT (versions 3.5 and 4.0), the widely used WebMD SC, and an SC developed by Ada Health in the diagnosis and triage of patients with urgent or emergent clinical problems compared with the final emergency department (ED) diagnoses and physician reviews.
    We used previously collected, deidentified, self-report data from 40 patients presenting to an ED for care who used the Ada SC to record their symptoms prior to seeing the ED physician. Deidentified data were entered into ChatGPT versions 3.5 and 4.0 and WebMD by a research assistant blinded to diagnoses and triage. Diagnoses from all 4 systems were compared with the previously abstracted final diagnoses in the ED as well as with diagnoses and triage recommendations from three independent board-certified ED physicians who had blindly reviewed the self-report clinical data from Ada. Diagnostic accuracy was calculated as the proportion of the diagnoses from ChatGPT, Ada SC, WebMD SC, and the independent physicians that matched at least one ED diagnosis (stratified as top 1 or top 3). Triage accuracy was calculated as the number of recommendations from ChatGPT, WebMD, or Ada that agreed with at least 2 of the independent physicians or were rated \"unsafe\" or \"too cautious.\"
    Overall, 30 and 37 cases had sufficient data for diagnostic and triage analysis, respectively. The rate of top-1 diagnosis matches for Ada, ChatGPT 3.5, ChatGPT 4.0, and WebMD was 9 (30%), 12 (40%), 10 (33%), and 12 (40%), respectively, with a mean rate of 47% for the physicians. The rate of top-3 diagnostic matches for Ada, ChatGPT 3.5, ChatGPT 4.0, and WebMD was 19 (63%), 19 (63%), 15 (50%), and 17 (57%), respectively, with a mean rate of 69% for physicians. The distribution of triage results for Ada was 62% (n=23) agree, 14% unsafe (n=5), and 24% (n=9) too cautious; that for ChatGPT 3.5 was 59% (n=22) agree, 41% (n=15) unsafe, and 0% (n=0) too cautious; that for ChatGPT 4.0 was 76% (n=28) agree, 22% (n=8) unsafe, and 3% (n=1) too cautious; and that for WebMD was 70% (n=26) agree, 19% (n=7) unsafe, and 11% (n=4) too cautious. The unsafe triage rate for ChatGPT 3.5 (41%) was significantly higher (P=.009) than that of Ada (14%).
    ChatGPT 3.5 had high diagnostic accuracy but a high unsafe triage rate. ChatGPT 4.0 had the poorest diagnostic accuracy, but a lower unsafe triage rate and the highest triage agreement with the physicians. The Ada and WebMD SCs performed better overall than ChatGPT. Unsupervised patient use of ChatGPT for diagnosis and triage is not recommended without improvements to triage accuracy and extensive clinical evaluation.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号