关键词: ANCA-associated vasculitis Case Identification Clinical Notes Deep Learning Electronic Health Records Machine learning

来  源:   DOI:10.1101/2024.06.09.24308603   PDF(Pubmed)

Abstract:
UNASSIGNED: ANCA-associated vasculitis (AAV) is a rare but serious disease. Traditional case-identification methods using claims data can be time-intensive and may miss important subgroups. We hypothesized that a deep learning model analyzing electronic health records (EHR) can more accurately identify AAV cases.
UNASSIGNED: We examined the Mass General Brigham (MGB) repository of clinical documentation from 12/1/1979 to 5/11/2021, using expert-curated keywords and ICD codes to identify a large cohort of potential AAV cases. Three labeled datasets (I, II, III) were created, each containing note sections. We trained and evaluated a range of machine learning and deep learning algorithms for note-level classification, using metrics like positive predictive value (PPV), sensitivity, F-score, area under the receiver operating characteristic curve (AUROC), and area under the precision and recall curve (AUPRC). The deep learning model was further evaluated for its ability to classify AAV cases at the patient-level, compared with rule-based algorithms in 2,000 randomly chosen samples.
UNASSIGNED: Datasets I, II, and III comprised 6,000, 3,008, and 7,500 note sections, respectively. Deep learning achieved the highest AUROC in all three datasets, with scores of 0.983, 0.991, and 0.991. The deep learning approach also had among the highest PPVs across the three datasets (0.941, 0.954, and 0.800, respectively). In a test cohort of 2,000 cases, the deep learning model achieved a PPV of 0.262 and an estimated sensitivity of 0.975. Compared to the best rule-based algorithm, the deep learning model identified six additional AAV cases, representing 13% of the total.
UNASSIGNED: The deep learning model effectively classifies clinical note sections for AAV diagnosis. Its application to EHR notes can potentially uncover additional cases missed by traditional rule-based methods.
摘要:
ANCA相关性血管炎(AAV)是一种罕见但严重的疾病。使用索赔数据的传统案例识别方法可能是耗时的,并且可能会错过重要的子组。我们假设分析电子健康记录(EHR)的深度学习模型可以更准确地识别AAV病例。
我们检查了MassGeneralBrigham(MGB)从1979年12月1日至2021年5月11日的临床文档存储库,使用专家策划的关键字和ICD代码来识别大量潜在的AAV病例。三个标记的数据集(I,II,III)被创造,每个都包含注释部分。我们训练和评估了一系列机器学习和深度学习算法,用于笔记级分类,使用阳性预测值(PPV)等指标,灵敏度,F分数,接收器工作特性曲线下面积(AUROC),和精确度和召回曲线下面积(AUPRC)。进一步评估了深度学习模型在患者层面对AAV病例进行分类的能力。与基于规则的算法在2000个随机选择的样本中进行比较。
数据集I,II,和III包括6,000、3,008和7,500个注释部分,分别。深度学习在所有三个数据集中实现了最高的AUROC,得分分别为0.983、0.991和0.991。深度学习方法在三个数据集中也是最高的PPV之一(分别为0.941、0.954和0.800)。在2000例的测试队列中,深度学习模型的PPV为0.262,灵敏度估计为0.975。与基于规则的最佳算法相比,深度学习模型确定了另外6个AAV病例,占总数的13%。
深度学习模型有效地对AAV诊断的临床注释部分进行分类。它在EHR笔记中的应用可能会发现传统的基于规则的方法遗漏的其他案例。
识别用于研究的AAV病例的传统方法依赖于通过临床护理和/或可能错过重要亚组的计费代码组装的注册表。由临床医生作为自由文本输入的非结构化数据记录患者的诊断,症状,表现,以及其他可能对识别AAV病例有用的状况特征我们发现,深度学习方法可以将笔记分类为指示AAV,当应用于案例级别时,与基于规则的算法相比,使用AAV识别更多的案例。
公众号