clinical coding

临床编码
  • 文章类型: Journal Article
    背景:将国际疾病分类(ICD)代码分配给临床文本是患者分类中常见且至关重要的做法,医院管理,和进一步的统计分析。当前的自动编码方法主要将此任务转移到多标签分类问题。这样的解决方案在长临床文本中遭受高维映射空间和过多冗余信息的困扰。为了缓解这种情况,我们将文本摘要方法引入ICD编码体系,并应用文本匹配来选择ICD代码。
    方法:我们专注于ICD(ICD-10)编码的第十次修订,并设计了一种新颖的基于摘要的方法(SuM),采用端到端策略有效地将ICD-10代码分配给临床文本。在这种方法中,知识指导的指针网络旨在精确地提取和总结临床文本中的关键信息。然后,使用匹配聚合架构的匹配模型将摘要结果与代码对齐,将one-vs-all方案调整为one-vs-one匹配,以避免分类方法中的大标签空间障碍。
    结果:收集了来自中国医院的12,788份ICD-10编码出院摘要,以评估建议的方法。与现有方法相比,对于TOP-50数据集,目标模型的MicroAUC为0.9548,MRR@10为0.7977,Precision@10为0.0944,Recall@10为0.9439,实现了最大的编码结果。全数据集上的结果保持一致。此外,所提出的知识编码器和应用的端到端策略被证明有助于整个模型在选择最合适的代码时获得功效。
    结论:提出的通过文本摘要的自动ICD-10代码分配方法可以有效地捕获长临床文本中的关键信息,并提高临床文本的ICD-10编码性能。
    BACKGROUND: Assigning International Classification of Diseases (ICD) codes to clinical texts is a common and crucial practice in patient classification, hospital management, and further statistics analysis. Current auto-coding methods mainly transfer this task to a multi-label classification problem. Such solutions are suffering from high-dimensional mapping space and excessive redundant information in long clinical texts. To alleviate such a situation, we introduce text summarization methods to the ICD coding regime and apply text matching to select ICD codes.
    METHODS: We focus on the tenth revision of the ICD (ICD-10) coding and design a novel summarization-based approach (SuM) with an end-to-end strategy to efficiently assign ICD-10 code to clinical texts. In this approach, a knowledge-guided pointer network is purposed to distill and summarize key information in clinical texts precisely. Then a matching model with matching-aggregation architecture follows to align the summary result with code, tuning the one-vs-all scenario to one-vs-one matching so that the large-label-space obstacle laid in classification approaches would be avoided.
    RESULTS: The 12,788 ICD-10 coded discharge summaries from a Chinese hospital were collected to evaluate the proposed approach. Compared with existing methods, the purposed model achieves the greatest coding results with Micro AUC of 0.9548, MRR@10 of 0.7977, Precision@10 of 0.0944, and Recall@10 of 0.9439 for the TOP-50 Dataset. Results on the FULL-Dataset remain consistent. Also, the proposed knowledge encoder and applied end-to-end strategy are proven to facilitate the whole model to gain efficacy in selecting the most suitable code.
    CONCLUSIONS: The proposed automatic ICD-10 code assignment approach via text summarization can effectively capture critical messages in long clinical texts and improve the performance of ICD-10 coding of clinical texts.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    与这些属性和关系相关的句子被忽略了。本文►我们提出了一种称为知识图谱增强神经网络(KGENet)的端到端模型来解决上述缺点。具体►我们首先构建一个疾病知识图,重点关注ICD代码的多视图疾病属性以及这些代码之间的疾病关系。我们还使用长序列编码器来获取EHR文档表示。最重要的►KGENet利用多视图疾病属性和结构化疾病关系,通过混合注意力和图传播►分别增强知识。此外►上述过程可以为基于我们的疾病知识图的模型预测结果提供属性感知和关系增强的可解释性。在MIMIC-III基准数据集上进行的实验表明,KGENet在模型有效性和可解释性方面均优于最先进的模型。电子健康记录(EHR)编码将国际疾病分类(ICD)代码分配给每个EHR文档。这些标准医疗代码代表诊断或程序,在医疗应用中起着至关重要的作用。然而,EHR是一个很长的医学文本,很难代表,ICD代码标签空间很大,标签的分布极不平衡。这些因素对自动EHR编码提出了挑战。以前的研究没有探索疾病的属性(例如,症状,测试,药物)ICD代码和疾病关系(例如,原因,危险因素,它们之间的合并症)。此外,医学的重要作用。
    And sentences associated with these attributes and relationships have been neglected. in this paper ►We propose an end-to-end model called Knowledge Graph Enhanced neural network (KGENet) to address the above shortcomings. specifically ►We first construct a disease knowledge graph that focuses on the multi-view disease attributes of ICD codes and the disease relationships between these codes. we also use a long sequence encoder to get EHR document representation. most importantly ►KGENet leverages multi-view disease attributes and structured disease relationships for knowledge enhancement through hybrid attention and graph propagation ►Respectively. furthermore ►The above processes can provide attribute-aware and relationship-augmented explainability for the model prediction results based on our disease knowledge graph. experiments conducted on the MIMIC-III benchmark dataset show that KGENet outperforms state-of-the-art models in both model effectiveness and explainability Electronic health record (EHR) coding assigns International Classification of Diseases (ICD) codes to each EHR document. These standard medical codes represent diagnoses or procedures and play a critical role in medical applications. However, EHR is a long medical text that is difficult to represent, the ICD code label space is large, and the labels have an extremely unbalanced distribution. These factors pose challenges to automatic EHR coding. Previous studies have not explored the disease attributes (e.g., symptoms, tests, medications) of ICD codes and the disease relationships (e.g., causes, risk factors, comorbidities) between them. In addition, the important roles of medical.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    目的:这项研究的目的是传播国际疾病分类第11次修订(ICD-11)的全国性试点的见解。
    方法:描述了在中国59家医院实施ICD-11发病率编码的策略和方法。根据从试点医院获得的反馈,总结了ICD-11实施的关键考虑因素。编码准确性和Krippendorff的α可靠性是根据ICD-11检查中的编码结果计算的。
    结果:在59家试点医院中,58个将ICD-11编码软件集成到其健康信息管理系统中,56个在发病率编码中实施了ICD-11,在2个月的试点编码阶段,对873.425名患者进行了3.723.959次诊断。在病态编码中过渡到ICD-11的关键考虑包括ICD-11内容的丰富,工具的细化,提供系统和量身定制的培训,临床文件的改进,促进下游数据利用率,以及建立国家执行进程和机制。当考虑整个编码字段(包括后协调)时,总体编码精度为82.9%,当仅考虑一个词干代码时,总体编码精度为92.2%。Krippendorff的α值分别为0.792(95%CI,0.788-0.796)和0.799(95%CI,0.795-0.803),分别。
    结论:这项全国性的试点研究提高了国家对ICD-11发病率实施的技术准备,阐明需要在未来努力中仔细考虑的关键因素。在简短的训练计划之后实现的ICD-11编码的良好准确性和互码可靠性强调了ICD-11降低训练成本并提供高质量健康数据的潜力。从这项研究中获得的经验和教训有助于世卫组织在ICD-11方面的工作,并可以在制定过渡计划时为其他国家提供信息。
    OBJECTIVE: The aim of this study was to disseminate insights from a nationwide pilot of the International Classification of Diseases-11th revision (ICD-11).
    METHODS: The strategies and methodologies employed to implement the ICD-11 morbidity coding in 59 hospitals in China are described. The key considerations for the ICD-11 implementation were summarized based on feedback obtained from the pilot hospitals. Coding accuracy and Krippendorff\'s alpha reliability were computed based on the coding results in the ICD-11 exam.
    RESULTS: Among the 59 pilot hospitals, 58 integrated ICD-11 Coding Software into their health information management systems and 56 implemented the ICD-11 in morbidity coding, resulting in 3 723 959 diagnoses for 873 425 patients being coded over a 2-month pilot coding phase. The key considerations in the transition to the ICD-11 in morbidity coding encompassed the enrichment of ICD-11 content, refinement of tools, provision of systematic and tailored training, improvement of clinical documentation, promotion of downstream data utilization, and the establishment of a national process and mechanism for implementation. The overall coding accuracy was 82.9% when considering the entire coding field (including postcoordination) and 92.2% when only one stem code was considered. Krippendorff\'s alpha was 0.792 (95% CI, 0.788-0.796) and 0.799 (95% CI, 0.795-0.803) with and without consideration of the code sequence, respectively.
    CONCLUSIONS: This nationwide pilot study has enhanced national technical readiness for the ICD-11 implementation in morbidity, elucidating key factors warranting careful consideration in future endeavors. The good accuracy and intercoder reliability of the ICD-11 coding achieved following a brief training program underscore the potential for the ICD-11 to reduce training costs and provide high-quality health data. Experiences and lessons learned from this study have contributed to WHO\'s work on the ICD-11 and can inform other countries when formulating their transition plan.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:使用基于电子健康记录(EHR)的数据进行的脓毒症监测可能比管理数据提供更准确的流行病学估计,但缺乏这种方法来估计人群级脓毒症负担的经验.
    方法:这是一项回顾性队列研究,包括2009年至2018年期间在香港公立医院收治的所有成年人。脓毒症定义为假定感染(临床培养和抗生素治疗)和并发急性器官功能障碍(基线SOFA评分增加≥2分)的临床证据。发病率趋势,死亡率,和病死率风险(CFR)通过指数回归进行建模。使用500份病历审查,将基于EHR的定义的性能与4个管理定义进行了比较。
    结果:在研究期间的13,550,168次医院事件中,根据基于EHR的标准,485,057(3.6%)患有脓毒症,CFR为21.5%。2018年,年龄和性别调整后的标准化脓毒症发病率为759/100,000(2009-2018年间相对+2.9%/年[95CI2.0,3.8%]),标准化脓毒症死亡率为156/100,000(相对+1.9%/年[95CI0.9,2.9%])。尽管CFR下降(相对-0.5%/年[95CI-1.0,-0.1%]),脓毒症占所有死亡的比例越来越高(相对而言+3.9%/年[95CI2.9,4.9%]).医学记录回顾表明,基于EHR的定义比管理定义更准确地识别脓毒症(AUC0.91vs0.52-0.55,p<0.001)。
    结论:基于EHR的客观监测定义表明,2009年至2018年期间,香港人群标准化败血症发病率和死亡率有所增加,并且比行政定义更准确。这些发现证明了基于EHR的方法用于大规模脓毒症监测的可行性和优势。
    BACKGROUND: Sepsis surveillance using electronic health record (EHR)-based data may provide more accurate epidemiologic estimates than administrative data, but experience with this approach to estimate population-level sepsis burden is lacking.
    METHODS: This was a retrospective cohort study including all adults admitted to publicly-funded hospitals in Hong Kong between 2009-2018. Sepsis was defined as clinical evidence of presumed infection (clinical cultures and treatment with antibiotics) and concurrent acute organ dysfunction (≥2 point increase in baseline SOFA score). Trends in incidence, mortality, and case fatality risk (CFR) were modelled by exponential regression. Performance of the EHR-based definition was compared with 4 administrative definitions using 500 medical record reviews.
    RESULTS: Among 13,550,168 hospital episodes during the study period, 485,057 (3.6%) had sepsis by EHR-based criteria with 21.5% CFR. In 2018, age- and sex-adjusted standardized sepsis incidence was 759 per 100,000 (relative +2.9%/year [95%CI 2.0, 3.8%] between 2009-2018) and standardized sepsis mortality was 156 per 100,000 (relative +1.9%/year [95%CI 0.9,2.9%]). Despite decreasing CFR (relative -0.5%/year [95%CI -1.0, -0.1%]), sepsis accounted for an increasing proportion of all deaths (relative +3.9%/year [95%CI 2.9, 4.9%]). Medical record reviews demonstrated that the EHR-based definition more accurately identified sepsis than administrative definitions (AUC 0.91 vs 0.52-0.55, p < 0.001).
    CONCLUSIONS: An objective EHR-based surveillance definition demonstrated an increase in population-level standardized sepsis incidence and mortality in Hong Kong between 2009-2018 and was much more accurate than administrative definitions. These findings demonstrate the feasibility and advantages of an EHR-based approach for widescale sepsis surveillance.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    国际疾病分类(ICD)是跨区域和随时间生成可比的全球疾病统计数据的基础。ICD编码的过程涉及根据临床注释为疾病分配代码,可以用标准的方式描述病人的病情。然而,这个过程是复杂的大量的代码和复杂的分类的ICD代码,它们被分层组织成不同的层次,包括章,类别,子类别,及其细分。许多现有的研究只专注于预测子类别代码,忽略代码之间的层次关系。为了解决这个限制,我们提出了一个多任务学习模型,可以为不同的代码级别训练多个分类器,同时还通过增强机制捕获较粗和较细粒度标签之间的关系。我们的方法在英文和中文基准数据集上进行了评估,我们证明了我们的方法通过基线模型实现了竞争性能,特别是在宏观F1结果方面。这些发现表明,我们的方法有效地利用了ICD代码的层次结构来提高疾病代码预测的准确性。对注意力机制的分析表明,我们模型的多粒度注意力在不同粒度级别上捕获了输入文本的关键特征,为预测结果提供合理的解释。
    International Classification of Diseases (ICD) serves as the foundation for generating comparable global disease statistics across regions and over time. The process of ICD coding involves assigning codes to diseases based on clinical notes, which can describe a patient\'s condition in a standard way. However, this process is complicated by the vast number of codes and the intricate taxonomy of ICD codes, which are hierarchically organized into various levels, including chapter, category, subcategory, and its subdivisions. Many existing studies focus solely on predicting subcategory codes, ignoring the hierarchical relationships among codes. To address this limitation, we propose a multitask learning model that trains multiple classifiers for different code levels, while also capturing the relations between coarser and finer-grained labels through a reinforcement mechanism. Our approach is evaluated on both English and Chinese benchmark dataset, and we demonstrate that our method achieves competitive performance with baseline models, particularly in terms of macro-F1 results. These findings suggest that our approach effectively leverages the hierarchical structure of ICD codes to improve disease code prediction accuracy. Analysis of attention mechanism shows that multigranularity attention of our model captures crucial feature of input text on different granularity levels, which can provide reasonable explanations for the prediction results.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    目的:电子病历(EMR)数据库可以促进包括支气管扩张在内的各种疾病的流行病学研究。鉴于支气管扩张的诊断挑战,EMR中编码的有效性需要澄清。我们旨在评估国际疾病分类的有效性,第9版(ICD-9)代码算法,用于在香港临床数据分析和报告系统(CDARS)的全港电子医疗健康记录系统中识别支气管扩张。
    方法:2011-2020年玛丽医院诊断为支气管扩张的成年患者使用CDARS的ICD-9代码494进行鉴定。呼吸专家对所有接受高分辨率计算机断层扫描(HRCT)的患者进行了检查,以确认HRCT上是否存在支气管扩张。
    结果:在同一时期,香港所有公立医院中有19617名患有支气管扩张症的患者,在玛丽医院中有1866名患有支气管扩张症的患者。随机选择648例,并由呼吸专科医生使用病历和HRCT检查进行验证。总体阳性预测值(PPV)为92.7%(95%CI90.7-94.7)。
    结论:这是香港CDARS支气管扩张的首次ICD-9编码验证。我们的研究表明,使用ICD-9代码494可以可靠地支持CDARS数据库用于支气管扩张的进一步临床研究。本文受版权保护。保留所有权利。
    Electronic medical record (EMR) databases can facilitate epidemiology research in various diseases including bronchiectasis. Given the diagnostic challenges of bronchiectasis, the validity of the coding in EMR requires clarification. We aimed to assess the validity of International Classification of Diseases, 9th Revision (ICD-9) code algorithms for identifying bronchiectasis in the territory-wide electronic medical health record system of Clinical Data Analysis and Reporting System (CDARS) in Hong Kong.
    Adult patients who had the diagnosis of bronchiectasis input from Queen Mary Hospital in 2011-2020 were identified using the ICD-9 code of 494 by CDARS. All patients who had high resolution computed tomography (HRCT) were reviewed by respiratory specialists to confirm the presence of bronchiectasis on HRCT.
    A total of 19 617 patients who had the diagnostic code of bronchiectasis among all public hospitals in Hong Kong and 1866 in Queen Mary Hospital in the same period. Six hundred and forty-eight cases were randomly selected and validated using medical record and HRCT review by a respiratory specialist. The overall positive predictive value (PPV) was 92.7% (95% CI 90.7-94.7).
    This was the first ICD-9 coding validation for bronchiectasis in Hong Kong CDARS. Our study demonstrated that using ICD-9 code of 494 was reliable to support utility of CDARS database for further clinical research on bronchiectasis.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    国际疾病分类(ICD)这得到了世界卫生组织的认可,是诊断分类标准。ICD代码存储,检索,并分析健康信息以做出临床决策。目前,ICD编码已被137多个国家采用。然而,在巴基斯坦,很少有医院实施了ICD编码并进行了不同的流行病学研究.此外,他们都没有报道过基于ICD编码的肝脏疾病负担谱,也没有实现自动ICD编码。在这项研究中,我们为PirAbdulQadirShahJeelani医学科学研究所肝移植部门的数据库注释了ICD代码.我们将此数据库命名为肝移植医学信息集市(MIMLT)。结果显示,该数据库包含34个ICD代码,其中V70.8是最常见的代码。此外,我们基于ICD编码确定了肝脏受者的肝脏疾病负担谱.我们发现慢性丙型肝炎(070.54)是肝移植的最常见指征。此外,我们利用MIMLT数据库实现了自动ICD编码,并通过预训练嵌入(DRCNNTLe)模型提出了一种具有迁移学习的新型深度递归卷积神经网络,这是我们的DRCNN-HP模型的扩展版本。DRCNNTLe从其预先训练的嵌入层中提取健壮的文本表示,在大型特定领域的MIMICIII数据库语料库上进行训练。结果表明,利用预先训练的词嵌入,在大型特定领域的语料库上进行训练,可以显着提高DRCNNTLe模型的性能,并在目标数据库较小时提供最新的结果。
    The International Classification of Diseases (ICD), which is endorsed by the World Health Organization, is a diagnostic classification standard. ICD codes store, retrieve, and analyze health information to make clinical decisions. Currently, ICD coding has been adopted by more than 137 countries. However, in Pakistan, very few hospitals have implemented ICD coding and conducted different epidemiological studies. Moreover, none of them have reported the spectrum of liver disease burden based on ICD coding, nor implemented automated ICD coding. In this study, we annotated ICD codes for the database of the liver transplant unit of the Pir Abdul Qadir Shah Jeelani Institute of Medical Sciences. We named this database Medical Information Mart for Liver Transplantation (MIMLT). The results revealed that the database contains 34 ICD codes, of which V70.8 is the most frequent code. Furthermore, we determined the spectrum of liver disease burden in liver recipients based on ICD coding. We found that chronic hepatitis C (070.54) is the most frequent indication for liver transplantation. Additionally, we implemented automated ICD coding utilizing the MIMLT database and proposed a novel Deep Recurrent Convolutional Neural Network with Transfer Learning through pre-trained Embeddings (DRCNNTLe) model, which is an extended version of our DRCNN-HP model. DRCNNTLe extracts robust text representations from its pre-trained embedding layer, which is trained on a large domain-specific MIMIC III database corpus. The results indicate that utilizing pre-trained word embeddings, which are trained on large domain-specific corpora can significantly improve the performance of the DRCNNTLe model and provide state-of-the-art results when the target database is small.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    在高风险的术前对话中,对家庭成员和患者的负面情绪表达知之甚少。
    这项研究旨在确定术前对话中家庭成员和患者负面情绪的发生和模式,调查会话主题,探讨负面情绪与会话主题的相关性。
    使用中文版的维罗纳编码情绪序列定义(VR-Codes-C)进行了一项回顾性研究,以编码297次有关高风险程序的对话。归纳内容分析用于分析负面情绪嵌套的主题。χ2检验用于测试线索与对话主题之间的关联。
    家属和患者负面情绪的发生率很高(85.9%),与其他医疗环境下的大多数对话相比要高得多。负面情绪主要表现为线索(96.4%),提示-b(67.4%)是最常见的类别。提示和担忧主要由家庭成员和患者引起(71.6%)。在七个主题中观察到负面情绪,其中与疾病严重程度有关的心理压力,家庭的护理和经济负担(30.3%)排名靠前。提示b,cue-c和cue-d与某些主题有显著相关性(p<.001)。
    与其他医疗交流相比,在高风险术前对话中,家庭成员和患者传达的负面情绪明显更多。某些类别的线索是由特定的情感对话内容引起的。
    家庭成员和患者对数据做出了贡献。
    Little is known about family members\' and patients\' expression of negative emotions among high-risk preoperative conversations.
    This study aimed to identify the occurrence and patterns of the negative emotions of family members and patients in preoperative conversations, to investigate the conversation themes and to explore the correlation between the negative emotions and the conversation themes.
    A retrospective study was conducted using the Chinese version of Verona Coding Definitions of Emotional Sequences (VR-CoDES-C) to code 297 conversations on high-risk procedures. Inductive content analysis was used to analyse the topics in which negative emotions nested. The χ2 Test was used to test the association between the cues and the conversation themes.
    The occurrence rate of family members\' and patients\' negative emotions was very high (85.9%), much higher when compared to most conversations under other medical settings. The negative emotions were mainly expressed by cues (96.4%), and cue-b (67.4%) was the most frequent category. Cues and concerns were mostly elicited by family members and patients (71.6%). Negative emotions were observed among seven themes, in which \'Psychological stress relating to illness severity, family\'s care and financial burden\' (30.3%) ranked the top. Cue-b, cue-c and cue-d had a significant correlation (p < .001) with certain themes.
    Family members and patients conveyed significantly more negative emotions in the high-risk preoperative conversations than in other medical communications. Certain categories of cues were induced by specific emotional conversation contents.
    Family members and patients contributed to data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    大型电子病历(EMR)数据库可以促进对间质性肺病(ILD)等罕见疾病的流行病学研究。鉴于ILD的稀有性和诊断难度,EMR中编码的有效性需要澄清。我们旨在评估国际疾病分类的有效性,第9版(ICD-9)代码算法,用于识别香港临床数据分析和报告系统(CDARS)的全港电子医疗健康记录系统中的ILD。
    使用以下ICD-9代码识别2005-2018年去玛丽医院就诊的ILD患者:炎症后肺纤维化(PPF;ICD-9:515),特发性纤维化肺泡炎(IFA;ICD-9:516.3),结缔组织疾病相关的间质性肺病(CTD-ILD;ICD-9:517.2,517.8,714.81),结节病(ICD-9:135)和外源性过敏性肺泡炎(EAA;ICD-9:495)。在诊断代码为PPF和IFA的情况下,进行了随机选择,发现了相对较高的病例数。所有的CTD-ILD病例,结节病和EAA被包括在相对较小的病例数的验证中。
    通过一名呼吸专科医生的病历审查,验证了二百六十九例病例。总体阳性预测值(PPV)为79%(95%CI,74%-84%)。在亚组分析中,PPF的真阳性病例数,IFA,CTD-ILD,结节病和EAA为74/100(74%),95/100(95%),11/15(73%),27/32(84%)和6/22(27%),分别。
    这是香港CDARS首次对ILD进行ICD-9编码验证。我们的研究表明,使用ICD-9算法515、516.3、517.2、517.8、714.81和135增强了具有PPV的ILD的识别,这对于支持CDARS数据库用于ILD的进一步临床研究是可靠的。516.3的有效性特别高。
    Large electronic medical record (EMR) databases can facilitate epidemiology research into uncommon diseases such as interstitial lung disease (ILD). Given the rarity and diagnostic difficulty of ILD, the validity of the coding in EMR requires clarification. We aimed to assess the validity of International Classification of Diseases, 9th Revision (ICD-9) code algorithms for identifying ILD in the territory-wide electronic medical health record system of Clinical Data Analysis and Reporting System (CDARS) in Hong Kong.
    Patients who visited the Queen Mary Hospital in 2005-2018 with ILD were identified using the following ICD-9 codes: post-inflammatory pulmonary fibrosis (PPF; ICD-9: 515), idiopathic fibrosing alveolitis (IFA; ICD-9: 516.3), connective tissue disease-associated interstitial lung disease (CTD-ILD; ICD-9: 517.2, 517.8, 714.81), sarcoidosis (ICD-9: 135) and extrinsic allergic alveolitis (EAA; ICD-9: 495). A random selection was conducted in cases with diagnostic code of PPF and IFA, where a relative higher case number was identified. All the cases of CTD-ILD, sarcoidosis and EAA were included in validation for relatively small case number.
    Two hundred and sixty nine cases were validated using medical record review by a respiratory specialist. The overall positive predictive value (PPV) was 79% (95% CI, 74%-84%). In subgroup analysis, true positive case numbers of PPF, IFA, CTD-ILD, sarcoidosis and EAA were 74/100 (74%), 95/100 (95%), 11/15 (73%), 27/32 (84%) and 6/22 (27%), respectively.
    This was the first ICD-9 coding validation for ILD in Hong Kong CDARS. Our study demonstrated that using ICD-9 algorithms 515, 516.3, 517.2, 517.8, 714.81 and 135 enhanced identifications of ILDs with PPV that was reliable to support utility of CDARS database for further clinical research on ILDs. The validity is particularly high with 516.3.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    Computer-assisted clinical coding (CAC) based on automated coding algorithms has been expected to improve the International Classification of Disease, tenth version (ICD-10) coding quality and productivity, whereas studies oriented to primary diagnosis auto-coding are limited in the Chinese context.
    This study aims at developing a machine learning (ML) model for automated primary diagnosis ICD-10 coding.
    A total of 71,709 admissions in Fuwai hospital were included to carry out this study, corresponding to 168 primary diagnosis ICD-10 codes. Based on clinical implications, two feature engineering methods were used to process discharge diagnosis and procedure texts into sequential features and sequential grouping features respectively by which two kinds of models were built and compared. One baseline model using one-hot encoding features was considered. Light Gradient Boosting Machine (LightGBM) was adopted as the classifier, and grid search and cross-validation were used to select the optimal hyperparameters. SHapley Additive exPlanations (SHAP) values were applied to give the interpretability of models.
    Our best prediction model was developed based on sequential grouping features. It showed good performance in the test phase with accuracy and macro-averaged F1 (Macro-F1) of 95.2% and 88.3% respectively. The comparison of the models demonstrated the effectiveness of the sequential information and the grouping strategy in boosting model performance (P-value < 0.01). Subgroup analysis of the best model on each individual code manifested that 91.1% of the codes achieved the F1 over 70.0%.
    Our model has been demonstrated its effectiveness for automated primary diagnosis coding in the Chinese context and its results are interpretable. Hence, it has the potential to assist clinical coders to improve coding efficiency and quality in Chinese inpatient settings.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

公众号