clinical notes

临床注意事项
  • 文章类型: Journal Article
    目的:阿尔茨海默病(AD)是美国最常见的痴呆形式。睡眠是与生活方式相关的因素之一,已被证明对老年人的最佳认知功能至关重要。然而,缺乏研究睡眠与AD发病率之间的关联。进行此类研究的主要瓶颈是传统的获取睡眠信息的方法耗时,低效,不可伸缩,仅限于患者的主观体验。我们的目标是自动提取特定的睡眠相关模式,比如打鼾,午睡,睡眠质量差,白天嗜睡,晚上醒来,其他睡眠问题,和睡眠持续时间,从AD患者的临床记录。假设这些睡眠模式在AD的发病中起作用,深入了解睡眠与AD发病和进展之间的关系。
    方法:黄金标准数据集是从adSLEEP的570份随机抽样临床笔记文档的手动注释中创建的,从匹兹堡大学医学中心(UPMC)检索到的7266名AD患者的192.000个取消识别的临床笔记。我们开发了一种基于规则的自然语言处理(NLP)算法,机器学习模型,和基于大型语言模型(LLM)的NLP算法,以自动提取与睡眠相关的概念,包括打鼾,午睡,睡眠问题,睡眠质量差,白天嗜睡,晚上醒来,和睡眠持续时间,来自黄金标准数据集。
    结果:482名患者的注释数据集主要包括白人(89.2%),平均年龄为84.7岁的老年人口,女性占64.1%,绝大多数是非西班牙裔或拉丁裔(94.6%)。基于规则的NLP算法在所有睡眠相关概念中实现了F1的最佳性能。就阳性预测值(PPV)而言,基于规则的NLP算法在白天嗜睡(1.00)和睡眠持续时间(1.00)方面获得了最高的PPV分数,虽然机器学习模型的睡眠PPV最高(0.95),睡眠质量差(0.86),LLAMA2在夜间醒来(0.93)和睡眠问题(0.89)时的PPV最高。
    结论:尽管临床记录中很少记录睡眠信息,提出的基于规则的NLP算法和基于LLM的NLP算法仍然取得了有希望的结果。相比之下,基于机器学习的方法没有取得好的效果,这是由于训练数据中的睡眠信息较小。
    结论:结果表明,基于规则的NLP算法一致地实现了所有睡眠概念的最佳性能。本研讨集中于AD患者的临床注解,但可以扩展到其他疾病的普通睡眠信息提取。
    OBJECTIVE: Alzheimer\'s disease (AD) is the most common form of dementia in the United States. Sleep is one of the lifestyle-related factors that has been shown critical for optimal cognitive function in old age. However, there is a lack of research studying the association between sleep and AD incidence. A major bottleneck for conducting such research is that the traditional way to acquire sleep information is time-consuming, inefficient, non-scalable, and limited to patients\' subjective experience. We aim to automate the extraction of specific sleep-related patterns, such as snoring, napping, poor sleep quality, daytime sleepiness, night wakings, other sleep problems, and sleep duration, from clinical notes of AD patients. These sleep patterns are hypothesized to play a role in the incidence of AD, providing insight into the relationship between sleep and AD onset and progression.
    METHODS: A gold standard dataset is created from manual annotation of 570 randomly sampled clinical note documents from the adSLEEP, a corpus of 192 000 de-identified clinical notes of 7266 AD patients retrieved from the University of Pittsburgh Medical Center (UPMC). We developed a rule-based natural language processing (NLP) algorithm, machine learning models, and large language model (LLM)-based NLP algorithms to automate the extraction of sleep-related concepts, including snoring, napping, sleep problem, bad sleep quality, daytime sleepiness, night wakings, and sleep duration, from the gold standard dataset.
    RESULTS: The annotated dataset of 482 patients comprised a predominantly White (89.2%), older adult population with an average age of 84.7 years, where females represented 64.1%, and a vast majority were non-Hispanic or Latino (94.6%). Rule-based NLP algorithm achieved the best performance of F1 across all sleep-related concepts. In terms of positive predictive value (PPV), the rule-based NLP algorithm achieved the highest PPV scores for daytime sleepiness (1.00) and sleep duration (1.00), while the machine learning models had the highest PPV for napping (0.95) and bad sleep quality (0.86), and LLAMA2 with finetuning had the highest PPV for night wakings (0.93) and sleep problem (0.89).
    CONCLUSIONS: Although sleep information is infrequently documented in the clinical notes, the proposed rule-based NLP algorithm and LLM-based NLP algorithms still achieved promising results. In comparison, the machine learning-based approaches did not achieve good results, which is due to the small size of sleep information in the training data.
    CONCLUSIONS: The results show that the rule-based NLP algorithm consistently achieved the best performance for all sleep concepts. This study focused on the clinical notes of patients with AD but could be extended to general sleep information extraction for other diseases.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    ANCA相关性血管炎(AAV)是一种罕见但严重的疾病。使用索赔数据的传统案例识别方法可能是耗时的,并且可能会错过重要的子组。我们假设分析电子健康记录(EHR)的深度学习模型可以更准确地识别AAV病例。
    我们检查了MassGeneralBrigham(MGB)从1979年12月1日至2021年5月11日的临床文档存储库,使用专家策划的关键字和ICD代码来识别大量潜在的AAV病例。三个标记的数据集(I,II,III)被创造,每个都包含注释部分。我们训练和评估了一系列机器学习和深度学习算法,用于笔记级分类,使用阳性预测值(PPV)等指标,灵敏度,F分数,接收器工作特性曲线下面积(AUROC),和精确度和召回曲线下面积(AUPRC)。进一步评估了深度学习模型在患者层面对AAV病例进行分类的能力。与基于规则的算法在2000个随机选择的样本中进行比较。
    数据集I,II,和III包括6,000、3,008和7,500个注释部分,分别。深度学习在所有三个数据集中实现了最高的AUROC,得分分别为0.983、0.991和0.991。深度学习方法在三个数据集中也是最高的PPV之一(分别为0.941、0.954和0.800)。在2000例的测试队列中,深度学习模型的PPV为0.262,灵敏度估计为0.975。与基于规则的最佳算法相比,深度学习模型确定了另外6个AAV病例,占总数的13%。
    深度学习模型有效地对AAV诊断的临床注释部分进行分类。它在EHR笔记中的应用可能会发现传统的基于规则的方法遗漏的其他案例。
    识别用于研究的AAV病例的传统方法依赖于通过临床护理和/或可能错过重要亚组的计费代码组装的注册表。由临床医生作为自由文本输入的非结构化数据记录患者的诊断,症状,表现,以及其他可能对识别AAV病例有用的状况特征我们发现,深度学习方法可以将笔记分类为指示AAV,当应用于案例级别时,与基于规则的算法相比,使用AAV识别更多的案例。
    UNASSIGNED: ANCA-associated vasculitis (AAV) is a rare but serious disease. Traditional case-identification methods using claims data can be time-intensive and may miss important subgroups. We hypothesized that a deep learning model analyzing electronic health records (EHR) can more accurately identify AAV cases.
    UNASSIGNED: We examined the Mass General Brigham (MGB) repository of clinical documentation from 12/1/1979 to 5/11/2021, using expert-curated keywords and ICD codes to identify a large cohort of potential AAV cases. Three labeled datasets (I, II, III) were created, each containing note sections. We trained and evaluated a range of machine learning and deep learning algorithms for note-level classification, using metrics like positive predictive value (PPV), sensitivity, F-score, area under the receiver operating characteristic curve (AUROC), and area under the precision and recall curve (AUPRC). The deep learning model was further evaluated for its ability to classify AAV cases at the patient-level, compared with rule-based algorithms in 2,000 randomly chosen samples.
    UNASSIGNED: Datasets I, II, and III comprised 6,000, 3,008, and 7,500 note sections, respectively. Deep learning achieved the highest AUROC in all three datasets, with scores of 0.983, 0.991, and 0.991. The deep learning approach also had among the highest PPVs across the three datasets (0.941, 0.954, and 0.800, respectively). In a test cohort of 2,000 cases, the deep learning model achieved a PPV of 0.262 and an estimated sensitivity of 0.975. Compared to the best rule-based algorithm, the deep learning model identified six additional AAV cases, representing 13% of the total.
    UNASSIGNED: The deep learning model effectively classifies clinical note sections for AAV diagnosis. Its application to EHR notes can potentially uncover additional cases missed by traditional rule-based methods.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在许多国家,医疗保健专业人员有法律义务与患者共享电子健康记录中的信息。然而,人们对与青少年分享精神卫生保健笔记提出了担忧,和卫生保健专业人员呼吁建议,以指导这一做法。
    目的是在科学论文的作者之间就为卫生保健专业人员提供的建议达成共识,并调查儿童和青少年专业精神卫生保健诊所的工作人员是否同意这些建议。
    与科学论文的作者进行了Delphi研究,以就建议达成共识。提出建议的过程包括三个步骤。首先,通过PubMed检索筛选了符合入选标准的科学论文.第二,对纳入论文的结果进行编码,并在迭代过程中转化为建议.第三,纳入论文的作者被要求提供反馈,并认为他们同意两轮建议的每一个建议.在Delphi过程之后,我们在儿童和青少年心理保健专科诊所的工作人员中进行了一项横断面研究,以评估他们是否同意达成共识的建议.
    在邀请的84位作者中,27回答就精神保健中与青少年数字分享笔记相关领域的17项建议达成共识。这些建议考虑了如何引入数字访问笔记,写笔记,并支持医疗保健专业人员,以及何时保留笔记。在儿童和青少年专业精神保健诊所的41名工作人员中,60%或更多的人同意17条建议。关于青少年应该获得数字访问笔记的年龄以及与父母数字共享笔记的时间,尚未达成共识。
    共有17项建议涉及卫生保健专业人员的关键方面,与青少年在精神卫生保健中的数字笔记共享达成了共识。卫生保健专业人员可以使用这些建议来指导他们与青少年分享精神卫生保健笔记的做法。然而,遵循这些建议的效果和经验应在临床实践中进行测试。
    UNASSIGNED: In many countries, health care professionals are legally obliged to share information from electronic health records with patients. However, concerns have been raised regarding the sharing of notes with adolescents in mental health care, and health care professionals have called for recommendations to guide this practice.
    UNASSIGNED: The aim was to reach a consensus among authors of scientific papers on recommendations for health care professionals\' digital sharing of notes with adolescents in mental health care and to investigate whether staff at child and adolescent specialist mental health care clinics agreed with the recommendations.
    UNASSIGNED: A Delphi study was conducted with authors of scientific papers to reach a consensus on recommendations. The process of making the recommendations involved three steps. First, scientific papers meeting the eligibility criteria were identified through a PubMed search where the references were screened. Second, the results from the included papers were coded and transformed into recommendations in an iterative process. Third, the authors of the included papers were asked to provide feedback and consider their agreement with each of the suggested recommendations in two rounds. After the Delphi process, a cross-sectional study was conducted among staff at specialist child and adolescent mental health care clinics to assess whether they agreed with the recommendations that reached a consensus.
    UNASSIGNED: Of the 84 invited authors, 27 responded. A consensus was reached on 17 recommendations on areas related to digital sharing of notes with adolescents in mental health care. The recommendations considered how to introduce digital access to notes, write notes, and support health care professionals, and when to withhold notes. Of the 41 staff members at child and adolescent specialist mental health care clinics, 60% or more agreed with the 17 recommendations. No consensus was reached regarding the age at which adolescents should receive digital access to their notes and the timing of digitally sharing notes with parents.
    UNASSIGNED: A total of 17 recommendations related to key aspects of health care professionals\' digital sharing of notes with adolescents in mental health care achieved consensus. Health care professionals can use these recommendations to guide their practice of sharing notes with adolescents in mental health care. However, the effects and experiences of following these recommendations should be tested in clinical practice.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目标:在越来越多的国家中,作为在线记录访问的一部分,患者可以访问其临床记录(“开放笔记”)。特别是在心理健康领域,开放笔记仍然存在争议,一些临床医生认为开放笔记是通过增加患者参与来改善治疗结果的工具,而其他人则担心患者可能会经历心理困扰和污名化,特别是在阅读临床医生的笔记时。需要更多的研究来优化收益并减轻风险。
    方法:使用定性研究设计,我们对在德国执业的精神科医生进行了半结构化访谈,探讨他们认为需要具备哪些条件,以确保在精神病学实践中成功实施公开笔记,以及预期的工作量和治疗结果的后续变化。采用专题分析法对数据进行分析。
    结果:我们采访了18名精神科医生;受访者认为,在实施公开笔记之前,需要做好四个关键条件,包括仔细考虑(1)诊断和症状严重程度,(2)有更多的时间来撰写临床笔记并与患者讨论,(3)可用资源和系统兼容性,(4)法律和数据保护方面。由于引入了公开笔记,受访者预期文档会发生变化,处理过程,和医生互动。虽然预计公开笔记会提高透明度和信任度,参与者预期会产生非预期的负面后果,包括由于与获取相关的误解和冲突而导致治疗关系恶化的风险.
    结论:在德国执业的精神科医生尚未将公开笔记作为医疗保健数据基础设施的一部分。受访者支持公开笔记,但有一些保留。他们发现开放笔记通常是有益的,但预期效果会根据患者特征而有所不同。管理访问的明确准则,时间限制,可用性,隐私至关重要。公开笔记被认为增加了透明度和患者的参与,但也被认为引起了污名化和冲突的问题。
    OBJECTIVE: In a growing list of countries, patients are granted access to their clinical notes (\"open notes\") as part of their online record access. Especially in the field of mental health, open notes remain controversial with some clinicians perceiving open notes as a tool for improving therapeutic outcomes by increasing patient involvement, while others fear that patients might experience psychological distress and perceived stigmatization, particularly when reading clinicians\' notes. More research is needed to optimize the benefits and mitigate the risks.
    METHODS: Using a qualitative research design, we conducted semi-structured interviews with psychiatrists practicing in Germany, to explore what conditions they believe need to be in place to ensure successful implementation of open notes in psychiatric practice as well as expected subsequent changes to their workload and treatment outcomes. Data were analyzed using thematic analysis.
    RESULTS: We interviewed 18 psychiatrists; interviewees believed four key conditions needed to be in place prior to implementation of open notes including careful consideration of (1) diagnoses and symptom severity, (2) the availability of additional time for writing clinical notes and discussing them with patients, (3) available resources and system compatibility, and (4) legal and data protection aspects. As a result of introducing open notes, interviewees expected changes in documentation, treatment processes, and doctor-physician interaction. While open notes were expected to improve transparency and trust, participants anticipated negative unintended consequences including the risk of deteriorating therapeutic relationships due to note access-related misunderstandings and conflicts.
    CONCLUSIONS: Psychiatrists practiced in Germany where open notes have not yet been established as part of the healthcare data infrastructure. Interviewees were supportive of open notes but had some reservations. They found open notes to be generally beneficial but anticipated effects to vary depending on patient characteristics. Clear guidelines for managing access, time constraints, usability, and privacy are crucial. Open notes were perceived to increase transparency and patient involvement but were also believed to raise issues of stigmatization and conflicts.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:从创伤后应激障碍(PTSD)等高危人群中提取领域标准(RDoC)的研究对于积极的心理健康改善和政策增强至关重要。收集的复杂性,集成,并为此目的有效利用临床笔记引入复杂性。
    方法:在我们的研究中,我们创建了一个自然语言处理(NLP)工作流程来分析电子病历(EMR)数据,并使用预训练的基于变压器的自然语言模型来识别和提取领域标准的研究,all-mpnet-base-v2。随后,我们从100,000种临床笔记中构建了词典,并分析了匹兹堡大学医学中心38,807名PTSD患者的567万种临床笔记。随后,我们通过在两个用例中提取和可视化RDoC信息来展示我们方法的重要性:(i)跨多个患者群体,以及(ii)贯穿各种疾病轨迹.
    结果:句子转换模型在所有RDoC领域都显示出很高的F1宏得分,以0.3的余弦相似度阈值实现最高性能。这确保了在所有RDoC域中至少80%的F1得分。该研究显示,心理治疗后PTSD患者的所有六个RDoC域均持续减少。我们发现,与PTSD男性相比,60.6%的PTSD女性至少有六个RDoC域的一个异常实例(51.3%),与男性(41.3%)相比,PTSD女性中有45.1%的感觉运动障碍水平更高。根据我们的记录,我们还发现57.3%的PTSD患者至少有六个RDoC域的一个异常实例。此外,与非退伍军人(分别为59.1%和49.2%)相比,退伍军人的阴性和阳性效价系统异常更高(分别为60%和51.9%).首次诊断PTSD后的领域与对创伤的线索反应性增强有关,自杀,酒精,和物质消费。
    结论:这些发现为不同人群和疾病轨迹中的RDoC功能提供了初步见解。自然语言处理被证明对捕获实时,来自广泛临床记录的上下文相关RDoC实例。
    BACKGROUND: Extracting research of domain criteria (RDoC) from high-risk populations like those with post-traumatic stress disorder (PTSD) is crucial for positive mental health improvements and policy enhancements. The intricacies of collecting, integrating, and effectively leveraging clinical notes for this purpose introduce complexities.
    METHODS: In our study, we created a natural language processing (NLP) workflow to analyze electronic medical record (EMR) data and identify and extract research of domain criteria using a pre-trained transformer-based natural language model, all-mpnet-base-v2. We subsequently built dictionaries from 100,000 clinical notes and analyzed 5.67 million clinical notes from 38,807 PTSD patients from the University of Pittsburgh Medical Center. Subsequently, we showcased the significance of our approach by extracting and visualizing RDoC information in two use cases: (i) across multiple patient populations and (ii) throughout various disease trajectories.
    RESULTS: The sentence transformer model demonstrated high F1 macro scores across all RDoC domains, achieving the highest performance with a cosine similarity threshold value of 0.3. This ensured an F1 score of at least 80% across all RDoC domains. The study revealed consistent reductions in all six RDoC domains among PTSD patients after psychotherapy. We found that 60.6% of PTSD women have at least one abnormal instance of the six RDoC domains as compared to PTSD men (51.3%), with 45.1% of PTSD women with higher levels of sensorimotor disturbances compared to men (41.3%). We also found that 57.3% of PTSD patients have at least one abnormal instance of the six RDoC domains based on our records. Also, veterans had the higher abnormalities of negative and positive valence systems (60% and 51.9% of veterans respectively) compared to non-veterans (59.1% and 49.2% respectively). The domains following first diagnoses of PTSD were associated with heightened cue reactivity to trauma, suicide, alcohol, and substance consumption.
    CONCLUSIONS: The findings provide initial insights into RDoC functioning in different populations and disease trajectories. Natural language processing proves valuable for capturing real-time, context dependent RDoC instances from extensive clinical notes.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:家庭医疗保健(HHC)使患者能够在家中接受医疗保健服务,以管理慢性病并从疾病中康复。最近的研究已经确定了基于种族或族裔的HHC差异。健康的社会决定因素(SDOH)描述了影响患者健康的外部因素,例如获得护理和社会支持。已知来自种族或少数民族社区的个人受到SDOH的影响不成比例。现有证据表明,SDOH已记录在临床笔记中。然而,之前没有研究调查过在HHC设置中来自不同种族或民族背景的个体的SDOH记录.这项研究旨在(1)按种族或种族描述临床记录中记录的SDOH的频率,以及(2)确定种族或种族与SDOH文献之间的关联。
    方法:回顾性数据分析。
    方法:我们在2015年1月1日至2017年12月31日收集了来自纽约一家大型HHC机构的86,866例HHC发作,代表65,693例独特患者的横截面二次数据分析。我们报告了六个SDOH(物理环境,社会环境,住房和经济环境,粮食不安全,获得护理,以及教育和识字)记录在报告为亚洲/太平洋岛民的个人的临床笔记中,黑色,西班牙裔,多种族,美洲原住民,或白色。我们使用逻辑回归模型按种族或种族分析了SDOH文献的差异。
    结果:与报告为白色的患者相比,其他种族或族裔的患者在其临床记录中记录的SDOH发生频率较高.我们的结果表明,种族或种族与HHC中的SDOH文献有关。
    结论:随着对HHC中SDOH的研究不断发展,我们的结果为评估HHC设置中的社会信息以及了解其如何影响所提供护理质量提供了基础.
    结论:这项探索性研究的结果可以帮助临床医生了解来自不同种族和族裔群体的个体之间SDOH的差异,并为未来旨在促进更具包容性的HHC文献实践的研究奠定基础。
    BACKGROUND: Home healthcare (HHC) enables patients to receive healthcare services within their homes to manage chronic conditions and recover from illnesses. Recent research has identified disparities in HHC based on race or ethnicity. Social determinants of health (SDOH) describe the external factors influencing a patient\'s health, such as access to care and social support. Individuals from racially or ethnically minoritized communities are known to be disproportionately affected by SDOH. Existing evidence suggests that SDOH are documented in clinical notes. However, no prior study has investigated the documentation of SDOH across individuals from different racial or ethnic backgrounds in the HHC setting. This study aimed to (1) describe frequencies of SDOH documented in clinical notes by race or ethnicity and (2) determine associations between race or ethnicity and SDOH documentation.
    METHODS: Retrospective data analysis.
    METHODS: We conducted a cross-sectional secondary data analysis of 86,866 HHC episodes representing 65,693 unique patients from one large HHC agency in New York collected between January 1, 2015, and December 31, 2017. We reported the frequency of six SDOH (physical environment, social environment, housing and economic circumstances, food insecurity, access to care, and education and literacy) documented in clinical notes across individuals reported as Asian/Pacific Islander, Black, Hispanic, multi-racial, Native American, or White. We analyzed differences in SDOH documentation by race or ethnicity using logistic regression models.
    RESULTS: Compared to patients reported as White, patients across other racial or ethnic groups had higher frequencies of SDOH documented in their clinical notes. Our results suggest that race or ethnicity is associated with SDOH documentation in HHC.
    CONCLUSIONS: As the study of SDOH in HHC continues to evolve, our results provide a foundation to evaluate social information in the HHC setting and understand how it influences the quality of care provided.
    CONCLUSIONS: The results of this exploratory study can help clinicians understand the differences in SDOH across individuals from different racial and ethnic groups and serve as a foundation for future research aimed at fostering more inclusive HHC documentation practices.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    建立临床注册是临床研究和改善患者护理质量的重要步骤。自然语言处理(NLP)方法在从非结构化临床笔记中提取有价值的信息方面已显示出有希望的结果。然而,临床笔记的结构和性质与训练和测试最先进的NLP模型的常规文本非常不同,他们有自己的挑战。在这项研究中,我们提出了带关键字的句子提取器(SE-K),一种有效和可解释的分类方法,用于从临床笔记中提取信息,并表明它在文本分类中优于计算成本更高的方法。在机构审查委员会(IRB)批准后,我们使用SE-K和两种基于嵌入的NLP方法(带嵌入的句子提取器(SE-E)和来自变形金刚的双向编码器表示(BERT)),从多站点三级保健地区儿童医院的20年非结构化临床数据中建立了前交叉韧带手术的全面注册表。对于样本外验证,低资源方法(SE-K)比基于嵌入的方法(SE-E:0.93±0.04和BERT:0.87±0.09)具有更好的性能(平均AUROC为0.94±0.04)。除了测试和样品外验证之间的最小性能下降。此外,SE-K方法(在CPU上)比SE-E(在CPU上)和BERT(在GPU上)快至少六倍,并提供可解释性。我们提出的方法,SE-K,可以有效地用于从临床笔记中提取相关变量,以建立大规模的登记册,与资源密集型方法相比,性能始终更好(例如,BERT).这样的方法可以促进从非结构化票据中提取信息,用于注册表建设,质量改进和不良事件监测。
    Building clinical registries is an important step in clinical research and improvement of patient care quality. Natural Language Processing (NLP) methods have shown promising results in extracting valuable information from unstructured clinical notes. However, the structure and nature of clinical notes are very different from regular text that state-of-the-art NLP models are trained and tested on, and they have their own set of challenges. In this study, we propose Sentence Extractor with Keywords (SE-K), an efficient and interpretable classification approach for extracting information from clinical notes and show that it outperforms more computationally expensive methods in text classification. Following the Institutional Review Board (IRB) approval, we used SE-K and two embedding based NLP approaches (Sentence Extractor with Embeddings (SE-E) and Bidirectional Encoder Representations from Transformers (BERT)) to develop comprehensive registry of anterior cruciate ligament surgeries from 20 years of unstructured clinical data at a multi-site tertiary-care regional children\'s hospital. The low-resource approach (SE-K) had better performance (average AUROC of 0.94 ± 0.04) than the embedding-based approaches (SE-E: 0.93 ± 0.04 and BERT: 0.87 ± 0.09) for out of sample validation, in addition to minimum performance drop between test and out-of-sample validation. Moreover, the SE-K approach was at least six times faster (on CPU) than SE-E (on CPU) and BERT (on GPU) and provides interpretability. Our proposed approach, SE-K, can be effectively used to extract relevant variables from clinic notes to build large-scale registries, with consistently better performance compared to the more resource-intensive approaches (e.g., BERT). Such approaches can facilitate information extraction from unstructured notes for registry building, quality improvement and adverse event monitoring.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:对新兴传染病的实时监测需要动态发展,可计算的案例定义,经常包含与症状相关的标准。对于症状检测,人口健康监测平台和研究计划都主要依赖于从电子健康记录中提取的结构化数据。
    目的:本研究旨在验证和测试基于人工智能(AI)的自然语言处理(NLP)管道,用于检测儿科患者的医生记录中的COVID-19症状。我们专门研究到急诊科(ED)就诊的患者,这些患者可能是暴发中的前哨病例。
    方法:这项回顾性队列研究的受试者是21岁及以下的患者,他在2020年3月1日至2022年5月31日期间在一家大型学术儿童医院接受儿科ED治疗。根据疾病控制和预防中心(CDC)标准,所有患者的ED注释都用NLP管道处理,以检测11种COVID-19症状的提及。对于黄金标准,3位主题专家标记了226个ED注释,并且具有很强的一致性(F1评分=0.986;阳性预测值[PPV]=0.972;灵敏度=1.0)。F1分数,PPV,和敏感性用于比较NLP和国际疾病分类的性能,第10次修订(ICD-10)编码为黄金标准图表审查。作为形成性用例,在SARS-CoV-2变种时代测量了症状模式的变化。
    结果:在研究期间有85,678次ED发作,包括4%(n=3420)的COVID-19患者。NLP在识别与有任何COVID-19症状(F1评分=0.796)的患者的相遇方面比ICD-10代码(F1评分=0.451)更准确。阳性症状的NLP准确性(敏感性=0.930)高于ICD-10(敏感性=0.300)。然而,阴性症状(特异性=0.994)的ICD-10准确性高于NLP(特异性=0.917)。充血或流鼻涕显示出最高的准确性差异(NLP:F1评分=0.828,ICD-10:F1评分=0.042)。对于与COVID-19患者的接触,每种NLP症状的患病率估计在不同的时代有所不同。与没有这种疾病的患者相比,患有COVID-19的患者更有可能检测到每种NLP症状。影响大小(赔率比)在大流行时代有所不同。
    结论:这项研究确立了基于AI的NLP作为儿科患者实时检测COVID-19症状的高效工具的价值,优于传统的ICD-10方法。它还揭示了不同病毒变体中症状流行的演变性质,强调了对动态的需求,传染病监测中的技术驱动方法。
    BACKGROUND: Real-time surveillance of emerging infectious diseases necessitates a dynamically evolving, computable case definition, which frequently incorporates symptom-related criteria. For symptom detection, both population health monitoring platforms and research initiatives primarily depend on structured data extracted from electronic health records.
    OBJECTIVE: This study sought to validate and test an artificial intelligence (AI)-based natural language processing (NLP) pipeline for detecting COVID-19 symptoms from physician notes in pediatric patients. We specifically study patients presenting to the emergency department (ED) who can be sentinel cases in an outbreak.
    METHODS: Subjects in this retrospective cohort study are patients who are 21 years of age and younger, who presented to a pediatric ED at a large academic children\'s hospital between March 1, 2020, and May 31, 2022. The ED notes for all patients were processed with an NLP pipeline tuned to detect the mention of 11 COVID-19 symptoms based on Centers for Disease Control and Prevention (CDC) criteria. For a gold standard, 3 subject matter experts labeled 226 ED notes and had strong agreement (F1-score=0.986; positive predictive value [PPV]=0.972; and sensitivity=1.0). F1-score, PPV, and sensitivity were used to compare the performance of both NLP and the International Classification of Diseases, 10th Revision (ICD-10) coding to the gold standard chart review. As a formative use case, variations in symptom patterns were measured across SARS-CoV-2 variant eras.
    RESULTS: There were 85,678 ED encounters during the study period, including 4% (n=3420) with patients with COVID-19. NLP was more accurate at identifying encounters with patients that had any of the COVID-19 symptoms (F1-score=0.796) than ICD-10 codes (F1-score =0.451). NLP accuracy was higher for positive symptoms (sensitivity=0.930) than ICD-10 (sensitivity=0.300). However, ICD-10 accuracy was higher for negative symptoms (specificity=0.994) than NLP (specificity=0.917). Congestion or runny nose showed the highest accuracy difference (NLP: F1-score=0.828 and ICD-10: F1-score=0.042). For encounters with patients with COVID-19, prevalence estimates of each NLP symptom differed across variant eras. Patients with COVID-19 were more likely to have each NLP symptom detected than patients without this disease. Effect sizes (odds ratios) varied across pandemic eras.
    CONCLUSIONS: This study establishes the value of AI-based NLP as a highly effective tool for real-time COVID-19 symptom detection in pediatric patients, outperforming traditional ICD-10 methods. It also reveals the evolving nature of symptom prevalence across different virus variants, underscoring the need for dynamic, technology-driven approaches in infectious disease surveillance.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    心力衰竭已经成为一个巨大的公共卫生问题,不能准确预测再入院将进一步导致疾病的高成本和高死亡率。构建再入院预测模型可以辅助医生进行决策,防止病患恶化,减轻费用负担。本文从MIMIC-III数据库中提取患者出院记录。它将患者分为三个研究类别:没有再入院,30天内重新接纳,30天后再入院,预测患者的再入院。我们提出了HR-BGCN模型来预测患者的再入院。首先,我们使用Adaptive-TMix来改进几个类别的预测指标,并减少不平衡类别的影响。然后,提出了基于知识的图注意机制。通过引入文档级显式图结构,图节点特征的编码能力显著提高。通过图学习获得的段落级表示与BERT的上下文令牌级表示相结合,最后,进行多分类任务。我们还比较了几种典型的图学习分类模型,以验证模型的有效性。例如IA-GCN模型,GAT模型,等。结果表明,本文提出的HR-BGCN模型对心力衰竭患者30天再入院的平均F1评分为88.26%,平均准确率为90.47%。HR-BGCN模型在预测心力衰竭再入院方面明显优于图学习分类模型。它可以帮助医生预测30天患者的再入院时间,然后降低患者的再入院率。
    Heart failure has become a huge public health problem, and failure to accurately predict readmission will further lead to the disease\'s high cost and high mortality. The construction of readmission prediction model can assist doctors in making decisions to prevent patients from deteriorating and reduce the cost burden. This paper extracts the patient discharge records from the MIMIC-III database. It divides the patients into three research categories: no readmission, readmission within 30 days, and readmission after 30 days, to predict the readmission of patients. We propose the HR-BGCN model to predict the readmission of patients. First, we use the Adaptive-TMix to improve the prediction indicators of a few categories and reduce the impact of unbalanced categories. Then, the knowledge-informed graph attention mechanism is proposed. By introducing a document-level explicit diagram structure, the coding ability of graph node features is significantly improved. The paragraph-level representation obtained through graph learning is combined with the context token-level representation of BERT, and finally, the multi-classification task is carried out. We also compare several typical graph learning classification models to verify the model\'s effectiveness, such as the IA-GCN model, GAT model, etc. The results show that the average F1 score of the HR-BGCN model proposed in this paper for 30-day readmission of heart failure patients is 88.26%, and the average accuracy is 90.47%. The HR-BGCN model is significantly better than the graph learning classification model for predicting heart failure readmission. It can help doctors predict the 30-day readmission of patients, then reduce the readmission rate of patients.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Preprint
    背景技术从创伤后应激障碍(PTSD)等高危人群中提取领域标准(RDoC)的研究对于积极的心理健康改善和政策增强至关重要。收集的复杂性,集成,并为此目的有效利用临床笔记引入复杂性。方法在我们的研究中,我们创建了一个NLP工作流程来分析电子病历(EMR)数据,并使用预训练的基于变压器的自然语言模型识别和提取领域标准的研究,all-mpnet-base-v2。随后,我们从100,000种临床笔记中构建了词典,并分析了匹兹堡大学医学中心38,807名PTSD患者的567万种临床笔记。随后,我们通过在两个用例中提取和可视化RDoC信息来展示我们方法的重要性:(i)跨多个患者群体,以及(ii)贯穿各种疾病轨迹.结果句子转换模型在所有RDoC域中都表现出优异的F1宏得分,以0.3的余弦相似度阈值实现最高性能。这确保了在所有RDoC域中至少80%的F1得分。该研究显示,心理治疗后PTSD患者的所有六个RDoC域均持续减少。女性感觉运动系统异常最高,而退伍军人的阴性和阳性效价系统异常最高。首次诊断PTSD后的领域与对创伤的线索反应性增强有关,自杀,酒精,和物质消费。结论这些发现为不同人群和疾病轨迹中的RDoC功能提供了初步见解。自然语言处理被证明对捕获实时,来自广泛临床记录的上下文相关RDoC实例。
    UNASSIGNED: Extracting research of domain criteria (RDoC) from high-risk populations like those with post-traumatic stress disorder (PTSD) is crucial for positive mental health improvements and policy enhancements. The intricacies of collecting, integrating, and effectively leveraging clinical notes for this purpose introduce complexities.
    UNASSIGNED: In our study, we created an NLP workflow to analyze electronic medical record (EMR) data, and identify and extract research of domain criteria using a pre-trained transformer-based natural language model, allmpnet-base-v2. We subsequently built dictionaries from 100,000 clinical notes and analyzed 5.67 million clinical notes from 38,807 PTSD patients from the University of Pittsburgh Medical Center. Subsequently, we showcased the significance of our approach by extracting and visualizing RDoC information in two use cases: (i) across multiple patient populations and (ii) throughout various disease trajectories.
    UNASSIGNED: The sentence transformer model demonstrated superior F1 macro scores across all RDoC domains, achieving the highest performance with a cosine similarity threshold value of 0.3. This ensured an F1 score of at least 80% across all RDoC domains. The study revealed consistent reductions in all six RDoC domains among PTSD patients after psychotherapy. Women had the highest abnormalities of sensorimotor systems, while veterans had the highest abnormalities of negative and positive valence systems. The domains following first diagnoses of PTSD were associated with heightened cue reactivity to trauma, suicide, alcohol, and substance consumption.
    UNASSIGNED: The findings provide initial insights into RDoC functioning in different populations and disease trajectories. Natural language processing proves valuable for capturing real-time, context dependent RDoC instances from extensive clinical notes.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号