clinical notes

临床注意事项
  • 文章类型: Journal Article
    随着患者复杂性的增加,其数据存储在零散的健康信息系统中,有效的临床决策需要从患者病史中收集重要信息的自动化和时效性方法。使用COVID-19作为案例研究,我们开发了一个带有用户反馈的查询机器人信息检索系统,使临床医生能够提出自然问题,从而从患者笔记中检索数据.
    我们应用了临床BERT,预先训练的上下文语言模型,到我们的病人笔记数据集来获得句子嵌入,使用K均值来减少实时交互的计算时间。然后采用Rocchio算法来合并用户反馈并提高检索性能。
    在迭代反馈循环实验中,最终迭代的MAP为0.93/0.94,而普通和1的初始MAP为0.66/0.52。/1.与COVID-19特定查询的0.79/0.83相比,确认上下文模型处理自然语言查询和反馈中的歧义有助于提高检索性能。用户在环实验也优于自动伪相关反馈方法。此外,假设在初始检索和相关性反馈之间具有相同精度的零假设被拒绝,具有很高的统计意义(p<0.05)。与Word2Vec相比,TF-IDF和biobert模型,clinicalBERT工作最佳考虑响应精度和用户反馈之间的平衡。
    我们的模型适用于通用和COVID-19特定的查询。然而,一些通用查询没有回答以及其他,因为聚类会降低查询性能,并且查询和句子之间的模糊关系被认为是不相关的。我们还针对具有相同含义但不同表达式的查询测试了我们的模型,并证明了这些查询变体在合并用户反馈后产生了类似的性能。
    总之,我们开发了一个基于NLP的查询机器人,它可以处理同义词和自然语言歧义,以便从患者图表中检索相关信息。用户反馈对于提高模型性能至关重要。
    With increasing patient complexity whose data are stored in fragmented health information systems, automated and time-efficient ways of gathering important information from the patients\' medical history are needed for effective clinical decision making. Using COVID-19 as a case study, we developed a query-bot information retrieval system with user-feedback to allow clinicians to ask natural questions to retrieve data from patient notes.
    We applied clinicalBERT, a pre-trained contextual language model, to our dataset of patient notes to obtain sentence embeddings, using K-Means to reduce computation time for real-time interaction. Rocchio algorithm was then employed to incorporate user-feedback and improve retrieval performance.
    In an iterative feedback loop experiment, MAP for final iteration was 0.93/0.94 as compared to initial MAP of 0.66/0.52 for generic and 1./1. compared to 0.79/0.83 for COVID-19 specific queries confirming that contextual model handles the ambiguity in natural language queries and feedback helps to improve retrieval performance. User-in-loop experiment also outperformed the automated pseudo relevance feedback method. Moreover, the null hypothesis which assumes identical precision between initial retrieval and relevance feedback was rejected with high statistical significance (p ≪ 0.05). Compared to Word2Vec, TF-IDF and bioBERT models, clinicalBERT works optimally considering the balance between response precision and user-feedback.
    Our model works well for generic as well as COVID-19 specific queries. However, some generic queries are not answered as well as others because clustering reduces query performance and vague relations between queries and sentences are considered non-relevant. We also tested our model for queries with the same meaning but different expressions and demonstrated that these query variations yielded similar performance after incorporation of user-feedback.
    In conclusion, we develop an NLP-based query-bot that handles synonyms and natural language ambiguity in order to retrieve relevant information from the patient chart. User-feedback is critical to improve model performance.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    : Developing clinical natural language processing systems often requires access to many clinical documents, which are not widely available to the public due to privacy and security concerns. To address this challenge, we propose to develop methods to generate synthetic clinical notes and evaluate their utility in real clinical natural language processing tasks.
    : We implemented 4 state-of-the-art text generation models, namely CharRNN, SegGAN, GPT-2, and CTRL, to generate clinical text for the History and Present Illness section. We then manually annotated clinical entities for randomly selected 500 History and Present Illness notes generated from the best-performing algorithm. To compare the utility of natural and synthetic corpora, we trained named entity recognition (NER) models from all 3 corpora and evaluated their performance on 2 independent natural corpora.
    : Our evaluation shows GPT-2 achieved the best BLEU (bilingual evaluation understudy) score (with a BLEU-2 of 0.92). NER models trained on synthetic corpus generated by GPT-2 showed slightly better performance on 2 independent corpora: strict F1 scores of 0.709 and 0.748, respectively, when compared with the NER models trained on natural corpus (F1 scores of 0.706 and 0.737, respectively), indicating the good utility of synthetic corpora in clinical NER model development. In addition, we also demonstrated that an augmented method that combines both natural and synthetic corpora achieved better performance than that uses the natural corpus only.
    : Recent advances in text generation have made it possible to generate synthetic clinical notes that could be useful for training NER models for information extraction from natural clinical notes, thus lowering the privacy concern and increasing data availability. Further investigation is needed to apply this technology to practice.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    当试图优化糖尿病(DM)的血糖控制时,低血糖是常见的安全事件。虽然电子病历为检测和分析低血糖提供了天然的基础,数据库中使用的ICD代码可能无效,对检测新的低血糖事件不敏感或非特异性。我们开发了文本预处理方法,以通过分析临床遇到的文本注释来改善低血糖的自动检测。
    我们着手通过引入三种预处理方法来改善临床注释中的低血糖检测:停止词过滤,药物信号,和ICD叙事丰富。为了测试提出的方法,我们选择了VA马里兰州医疗保健系统的临床笔记,基于提示低血糖的三个标准的各种组合,包括糖尿病和低血糖的ICD-9代码,实验室葡萄糖值<70md/dL,和文本参考了最近的低血糖事件。此外,我们构建了一个数据集,包括2009年的395份临床记录和2014年的460份临床记录,以检验所提出方法的一般性.对于每个数据集,两名医师通过人工检查个别临床记录,以确定是否存在低血糖.第三位医师法官担任分歧的最终裁决者。
    每种提出的预处理方法通过在一个数据集上在5.3~7.4%范围内显著增加F1评分,对低血糖检测的性能做出了贡献(p<.01)。在这些方法中,停止词过滤对性能改进贡献最大(7.4%)。与单独使用每种方法相比,组合所有预处理方法可获得更大的性能增益(p<.001)。在其他数据集中观察到类似的模式,通过单独的方法,F1得分在7.7%~9.4%的范围内增加(p<.001)。然而,结合这三种方法没有产生额外的性能增益。
    提出的文本预处理方法改善了从临床文本注释中检测低血糖的性能。停止词过滤取得了最年夜的机能进步。ICD叙事丰富提高了对检测的回忆。结合这三种预处理方法可以获得额外的性能增益。
    Hypoglycemia is a common safety event when attempting to optimize glycemic control in diabetes (DM). While electronic medical records provide a natural ground for detecting and analyzing hypoglycemia, ICD codes used in the databases may be invalid, insensitive or non-specific in detecting new hypoglycemic events. We developed text preprocessing methods to improve automatic detection of hypoglycemia from analysis of clinical encounter text notes.
    We set out to improve hypoglycemia detection from clinical notes by introducing three preprocessing methods: stop word filtering, medication signaling, and ICD narrative enrichment. To test the proposed methods, we selected clinical notes from VA Maryland Healthcare System, based on various combinations of three criteria that are suggestive of hypoglycemia, including ICD-9 code of diabetes and hypoglycemia, laboratory glucose values < 70 md/dL, and text reference to a proximate hypoglycemia event. In addition, we constructed one dataset of 395 clinical notes from year 2009 and another of 460 notes from year 2014 to test the generality of the proposed methods. For each of the datasets, two physician judges manually reviewed individual clinical notes to determine whether hypoglycemia was present or absent. A third physician judge served as a final adjudicator for disagreements.
    Each of the proposed preprocessing methods contributed to the performance of hypoglycemia detection by significantly increasing the F1 score in the range of 5.3∼7.4% on one dataset (p < .01). Among the methods, stop word filtering contributed most to the performance improvement (7.4%). Combining all the preprocessing methods led to greater performance gain (p < .001) compared with using each method individually. Similar patterns were observed for the other dataset with the F1 score being increased in the range of 7.7%∼9.4% by individual methods (p < .001). Nevertheless, combining the three methods did not yield additional performance gain.
    The proposed text preprocessing methods improved the performance of hypoglycemia detection from clinical text notes. Stop word filtering achieved the most performance improvement. ICD narrative enrichment boosted the recall of detection. Combining the three preprocessing methods led to additional performance gains.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    The widespread prevalence of dietary supplements has drawn extensive attention due to the safety and efficacy issue. Clinical notes document a great amount of detailed information on dietary supplement usage, thus providing a rich source for clinical research on supplement safety surveillance. Identification the use status of dietary supplements is one of the initial steps for the ultimate goal of the supplement safety surveillance. In this study, we built rule-based and machine learning-based classifiers to automatically classify the use status of supplements into four categories: Continuing (C), Discontinued (D), Started (S), and Unclassified (U). In comparison to the machine learning classifier trained on the same datasets, the rule-based classifier showed a better performance with F-measure in the C, D, S, U status of 0.93, 0.98, 0.95, and 0.83, respectively. We further analyzed the errors generated by the rule-based classifier. The classifier can be potentially applied to extract supplement information from clinical notes for supporting research and clinical practice related to patient safety on supplement usage.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    患者表型数据的异质性是神经精神疾病起源和进展研究的障碍。由于患者临床数据的缺乏,在诸如Phelan-McDermid综合征(PMS)之类的罕见疾病中,这种困难更加复杂。PMS是自闭症和智力缺陷的罕见综合征遗传原因。在本文中,我们描述了Phelan-McDermid综合征数据网络(PMS_DN),一个平台,通过以下方式促进对表型-基因型相关性和PMS进展的研究:a)整合从患者报告结果(PRO)数据和临床注释中提取的患者表型知识-两种异质性,未充分利用的有关患者表型的知识来源-具有来自同一患者队列的精选遗传信息,并且b)使这种综合知识,以及一套统计工具,在门户网站https://pmsdn上向授权调查人员免费提供。hms.哈佛。edu.PMS_DN是以患者为中心的结果研究计划(PCORI),患者及其家人参与患者数据管理的各个方面,以推动PMS研究。为了促进合作研究,PMS_DN还使用诸如PCORnetPopMedNet之类的分布式研究网络,将这些知识中的患者汇总提供给授权的研究人员。PMS_DN托管在可扩展的基于云的环境上,并符合所有患者数据隐私法规。截至2016年10月31日,PMS_DN整合了从112名患者的临床笔记中提取的高质量知识,并整理了176名患者的遗传报告和415名患者的预处理PRO数据。
    The heterogeneity of patient phenotype data are an impediment to the research into the origins and progression of neuropsychiatric disorders. This difficulty is compounded in the case of rare disorders such as Phelan-McDermid Syndrome (PMS) by the paucity of patient clinical data. PMS is a rare syndromic genetic cause of autism and intellectual deficiency. In this paper, we describe the Phelan-McDermid Syndrome Data Network (PMS_DN), a platform that facilitates research into phenotype-genotype correlation and progression of PMS by: a) integrating knowledge of patient phenotypes extracted from Patient Reported Outcomes (PRO) data and clinical notes-two heterogeneous, underutilized sources of knowledge about patient phenotypes-with curated genetic information from the same patient cohort and b) making this integrated knowledge, along with a suite of statistical tools, available free of charge to authorized investigators on a Web portal https://pmsdn.hms.harvard.edu. PMS_DN is a Patient Centric Outcomes Research Initiative (PCORI) where patients and their families are involved in all aspects of the management of patient data in driving research into PMS. To foster collaborative research, PMS_DN also makes patient aggregates from this knowledge available to authorized investigators using distributed research networks such as the PCORnet PopMedNet. PMS_DN is hosted on a scalable cloud based environment and complies with all patient data privacy regulations. As of October 31, 2016, PMS_DN integrates high-quality knowledge extracted from the clinical notes of 112 patients and curated genetic reports of 176 patients with preprocessed PRO data from 415 patients.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    The objective of this study was to determine whether the Food and Drug Administration\'s Adverse Event Reporting System (FAERS) data set could serve as the basis of automated electronic health record (EHR) monitoring for the adverse drug reaction (ADR) subset of adverse drug events. We retrospectively collected EHR entries for 71 909 pediatric inpatient visits at Cincinnati Children\'s Hospital Medical Center. Natural language processing (NLP) techniques were used to identify positive diseases/disorders and signs/symptoms (DDSSs) from the patients\' clinical narratives. We downloaded all FAERS reports submitted by medical providers and extracted the reported drug-DDSS pairs. For each patient, we aligned the drug-DDSS pairs extracted from their clinical notes with the corresponding drug-DDSS pairs from the FAERS data set to identify Drug-Reaction Pair Sentences (DRPSs). The DRPSs were processed by NLP techniques to identify ADR-related DRPSs. We used clinician annotated, real-world EHR data as reference standard to evaluate the proposed algorithm. During evaluation, the algorithm achieved promising performance and showed great potential in identifying ADRs accurately for pediatric patients.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    OBJECTIVE: Through manual review of clinical notes for patients with type 2 diabetes mellitus attending a Danish diabetes center, the aim of the study was to identify adverse drug reactions (ADRs) associated with three classes of glucose-lowering medicines: \"Combinations of oral blood-glucose lowering medicines\" (A10BD), \"dipeptidyl peptidase-4 (DDP-4) inhibitors\" (A10BH), and \"other blood glucose lowering medicines\" (A10BX). Specifically, we aimed to describe the potential of clinical notes to identify new ADRs and to evaluate if sufficient information can be obtained for causality assessment.
    METHODS: For observed adverse events (AEs) we extracted time to onset, outcome, and suspected medicine(s). AEs were assessed according to World Health Organization-Uppsala Monitoring Centre causality criteria and analyzed with respect to suspected medicines, type of ADR (system organ class), seriousness and labeling status.
    RESULTS: A total of 207 patients were included in the study leading to the identification of 163 AEs. 14% were categorized as certain, 60% as probable/likely, and 26% as possible. 15 (9%) ADRs were unlabeled of which two were serious: peripheral edema associated with sitagliptin and stomach ulcer associated with liraglutide. Of the unlabeled ADRs, 13 (87%) were associated with \"other blood glucose lowering medications,\" the remaining 2 (13%) with \"DDP-4 inhibitors.\"
    CONCLUSIONS: Clinical notes could potentially reveal unlabeled ADRs associated with prescribed medicines and sufficient information is generally available for causality assessment. However, manual review of clinical notes is too time-consuming for routine use and hence there is a need for developing information technology (IT) tools for automatic screening of patient records with the purpose to detect information about potentially serious and unlabeled ADRs.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号