clinical coding

  • 文章类型: Journal Article
    BACKGROUND: Assigning International Classification of Diseases (ICD) codes to clinical texts is a common and crucial practice in patient classification, hospital management, and further statistics analysis. Current auto-coding methods mainly transfer this task to a multi-label classification problem. Such solutions are suffering from high-dimensional mapping space and excessive redundant information in long clinical texts. To alleviate such a situation, we introduce text summarization methods to the ICD coding regime and apply text matching to select ICD codes.
    METHODS: We focus on the tenth revision of the ICD (ICD-10) coding and design a novel summarization-based approach (SuM) with an end-to-end strategy to efficiently assign ICD-10 code to clinical texts. In this approach, a knowledge-guided pointer network is purposed to distill and summarize key information in clinical texts precisely. Then a matching model with matching-aggregation architecture follows to align the summary result with code, tuning the one-vs-all scenario to one-vs-one matching so that the large-label-space obstacle laid in classification approaches would be avoided.
    RESULTS: The 12,788 ICD-10 coded discharge summaries from a Chinese hospital were collected to evaluate the proposed approach. Compared with existing methods, the purposed model achieves the greatest coding results with Micro AUC of 0.9548, MRR@10 of 0.7977, Precision@10 of 0.0944, and Recall@10 of 0.9439 for the TOP-50 Dataset. Results on the FULL-Dataset remain consistent. Also, the proposed knowledge encoder and applied end-to-end strategy are proven to facilitate the whole model to gain efficacy in selecting the most suitable code.
    CONCLUSIONS: The proposed automatic ICD-10 code assignment approach via text summarization can effectively capture critical messages in long clinical texts and improve the performance of ICD-10 coding of clinical texts.






  • 文章类型: Journal Article
    And sentences associated with these attributes and relationships have been neglected. in this paper ►We propose an end-to-end model called Knowledge Graph Enhanced neural network (KGENet) to address the above shortcomings. specifically ►We first construct a disease knowledge graph that focuses on the multi-view disease attributes of ICD codes and the disease relationships between these codes. we also use a long sequence encoder to get EHR document representation. most importantly ►KGENet leverages multi-view disease attributes and structured disease relationships for knowledge enhancement through hybrid attention and graph propagation ►Respectively. furthermore ►The above processes can provide attribute-aware and relationship-augmented explainability for the model prediction results based on our disease knowledge graph. experiments conducted on the MIMIC-III benchmark dataset show that KGENet outperforms state-of-the-art models in both model effectiveness and explainability Electronic health record (EHR) coding assigns International Classification of Diseases (ICD) codes to each EHR document. These standard medical codes represent diagnoses or procedures and play a critical role in medical applications. However, EHR is a long medical text that is difficult to represent, the ICD code label space is large, and the labels have an extremely unbalanced distribution. These factors pose challenges to automatic EHR coding. Previous studies have not explored the disease attributes (e.g., symptoms, tests, medications) of ICD codes and the disease relationships (e.g., causes, risk factors, comorbidities) between them. In addition, the important roles of medical.






  • 文章类型: Journal Article
    OBJECTIVE: The aim of this study was to disseminate insights from a nationwide pilot of the International Classification of Diseases-11th revision (ICD-11).
    METHODS: The strategies and methodologies employed to implement the ICD-11 morbidity coding in 59 hospitals in China are described. The key considerations for the ICD-11 implementation were summarized based on feedback obtained from the pilot hospitals. Coding accuracy and Krippendorff\'s alpha reliability were computed based on the coding results in the ICD-11 exam.
    RESULTS: Among the 59 pilot hospitals, 58 integrated ICD-11 Coding Software into their health information management systems and 56 implemented the ICD-11 in morbidity coding, resulting in 3 723 959 diagnoses for 873 425 patients being coded over a 2-month pilot coding phase. The key considerations in the transition to the ICD-11 in morbidity coding encompassed the enrichment of ICD-11 content, refinement of tools, provision of systematic and tailored training, improvement of clinical documentation, promotion of downstream data utilization, and the establishment of a national process and mechanism for implementation. The overall coding accuracy was 82.9% when considering the entire coding field (including postcoordination) and 92.2% when only one stem code was considered. Krippendorff\'s alpha was 0.792 (95% CI, 0.788-0.796) and 0.799 (95% CI, 0.795-0.803) with and without consideration of the code sequence, respectively.
    CONCLUSIONS: This nationwide pilot study has enhanced national technical readiness for the ICD-11 implementation in morbidity, elucidating key factors warranting careful consideration in future endeavors. The good accuracy and intercoder reliability of the ICD-11 coding achieved following a brief training program underscore the potential for the ICD-11 to reduce training costs and provide high-quality health data. Experiences and lessons learned from this study have contributed to WHO\'s work on the ICD-11 and can inform other countries when formulating their transition plan.






  • 文章类型: Journal Article
    BACKGROUND: Sepsis surveillance using electronic health record (EHR)-based data may provide more accurate epidemiologic estimates than administrative data, but experience with this approach to estimate population-level sepsis burden is lacking.
    METHODS: This was a retrospective cohort study including all adults admitted to publicly-funded hospitals in Hong Kong between 2009-2018. Sepsis was defined as clinical evidence of presumed infection (clinical cultures and treatment with antibiotics) and concurrent acute organ dysfunction (≥2 point increase in baseline SOFA score). Trends in incidence, mortality, and case fatality risk (CFR) were modelled by exponential regression. Performance of the EHR-based definition was compared with 4 administrative definitions using 500 medical record reviews.
    RESULTS: Among 13,550,168 hospital episodes during the study period, 485,057 (3.6%) had sepsis by EHR-based criteria with 21.5% CFR. In 2018, age- and sex-adjusted standardized sepsis incidence was 759 per 100,000 (relative +2.9%/year [95%CI 2.0, 3.8%] between 2009-2018) and standardized sepsis mortality was 156 per 100,000 (relative +1.9%/year [95%CI 0.9,2.9%]). Despite decreasing CFR (relative -0.5%/year [95%CI -1.0, -0.1%]), sepsis accounted for an increasing proportion of all deaths (relative +3.9%/year [95%CI 2.9, 4.9%]). Medical record reviews demonstrated that the EHR-based definition more accurately identified sepsis than administrative definitions (AUC 0.91 vs 0.52-0.55, p < 0.001).
    CONCLUSIONS: An objective EHR-based surveillance definition demonstrated an increase in population-level standardized sepsis incidence and mortality in Hong Kong between 2009-2018 and was much more accurate than administrative definitions. These findings demonstrate the feasibility and advantages of an EHR-based approach for widescale sepsis surveillance.






  • 文章类型: Journal Article
    International Classification of Diseases (ICD) serves as the foundation for generating comparable global disease statistics across regions and over time. The process of ICD coding involves assigning codes to diseases based on clinical notes, which can describe a patient\'s condition in a standard way. However, this process is complicated by the vast number of codes and the intricate taxonomy of ICD codes, which are hierarchically organized into various levels, including chapter, category, subcategory, and its subdivisions. Many existing studies focus solely on predicting subcategory codes, ignoring the hierarchical relationships among codes. To address this limitation, we propose a multitask learning model that trains multiple classifiers for different code levels, while also capturing the relations between coarser and finer-grained labels through a reinforcement mechanism. Our approach is evaluated on both English and Chinese benchmark dataset, and we demonstrate that our method achieves competitive performance with baseline models, particularly in terms of macro-F1 results. These findings suggest that our approach effectively leverages the hierarchical structure of ICD codes to improve disease code prediction accuracy. Analysis of attention mechanism shows that multigranularity attention of our model captures crucial feature of input text on different granularity levels, which can provide reasonable explanations for the prediction results.






  • 文章类型: Journal Article
    Electronic medical record (EMR) databases can facilitate epidemiology research in various diseases including bronchiectasis. Given the diagnostic challenges of bronchiectasis, the validity of the coding in EMR requires clarification. We aimed to assess the validity of International Classification of Diseases, 9th Revision (ICD-9) code algorithms for identifying bronchiectasis in the territory-wide electronic medical health record system of Clinical Data Analysis and Reporting System (CDARS) in Hong Kong.
    Adult patients who had the diagnosis of bronchiectasis input from Queen Mary Hospital in 2011-2020 were identified using the ICD-9 code of 494 by CDARS. All patients who had high resolution computed tomography (HRCT) were reviewed by respiratory specialists to confirm the presence of bronchiectasis on HRCT.
    A total of 19 617 patients who had the diagnostic code of bronchiectasis among all public hospitals in Hong Kong and 1866 in Queen Mary Hospital in the same period. Six hundred and forty-eight cases were randomly selected and validated using medical record and HRCT review by a respiratory specialist. The overall positive predictive value (PPV) was 92.7% (95% CI 90.7-94.7).
    This was the first ICD-9 coding validation for bronchiectasis in Hong Kong CDARS. Our study demonstrated that using ICD-9 code of 494 was reliable to support utility of CDARS database for further clinical research on bronchiectasis.






  • 文章类型: Journal Article
    The International Classification of Diseases (ICD), which is endorsed by the World Health Organization, is a diagnostic classification standard. ICD codes store, retrieve, and analyze health information to make clinical decisions. Currently, ICD coding has been adopted by more than 137 countries. However, in Pakistan, very few hospitals have implemented ICD coding and conducted different epidemiological studies. Moreover, none of them have reported the spectrum of liver disease burden based on ICD coding, nor implemented automated ICD coding. In this study, we annotated ICD codes for the database of the liver transplant unit of the Pir Abdul Qadir Shah Jeelani Institute of Medical Sciences. We named this database Medical Information Mart for Liver Transplantation (MIMLT). The results revealed that the database contains 34 ICD codes, of which V70.8 is the most frequent code. Furthermore, we determined the spectrum of liver disease burden in liver recipients based on ICD coding. We found that chronic hepatitis C (070.54) is the most frequent indication for liver transplantation. Additionally, we implemented automated ICD coding utilizing the MIMLT database and proposed a novel Deep Recurrent Convolutional Neural Network with Transfer Learning through pre-trained Embeddings (DRCNNTLe) model, which is an extended version of our DRCNN-HP model. DRCNNTLe extracts robust text representations from its pre-trained embedding layer, which is trained on a large domain-specific MIMIC III database corpus. The results indicate that utilizing pre-trained word embeddings, which are trained on large domain-specific corpora can significantly improve the performance of the DRCNNTLe model and provide state-of-the-art results when the target database is small.






  • 文章类型: Journal Article
    Little is known about family members\' and patients\' expression of negative emotions among high-risk preoperative conversations.
    This study aimed to identify the occurrence and patterns of the negative emotions of family members and patients in preoperative conversations, to investigate the conversation themes and to explore the correlation between the negative emotions and the conversation themes.
    A retrospective study was conducted using the Chinese version of Verona Coding Definitions of Emotional Sequences (VR-CoDES-C) to code 297 conversations on high-risk procedures. Inductive content analysis was used to analyse the topics in which negative emotions nested. The χ2 Test was used to test the association between the cues and the conversation themes.
    The occurrence rate of family members\' and patients\' negative emotions was very high (85.9%), much higher when compared to most conversations under other medical settings. The negative emotions were mainly expressed by cues (96.4%), and cue-b (67.4%) was the most frequent category. Cues and concerns were mostly elicited by family members and patients (71.6%). Negative emotions were observed among seven themes, in which \'Psychological stress relating to illness severity, family\'s care and financial burden\' (30.3%) ranked the top. Cue-b, cue-c and cue-d had a significant correlation (p < .001) with certain themes.
    Family members and patients conveyed significantly more negative emotions in the high-risk preoperative conversations than in other medical communications. Certain categories of cues were induced by specific emotional conversation contents.
    Family members and patients contributed to data.






  • 文章类型: Journal Article
    Large electronic medical record (EMR) databases can facilitate epidemiology research into uncommon diseases such as interstitial lung disease (ILD). Given the rarity and diagnostic difficulty of ILD, the validity of the coding in EMR requires clarification. We aimed to assess the validity of International Classification of Diseases, 9th Revision (ICD-9) code algorithms for identifying ILD in the territory-wide electronic medical health record system of Clinical Data Analysis and Reporting System (CDARS) in Hong Kong.
    Patients who visited the Queen Mary Hospital in 2005-2018 with ILD were identified using the following ICD-9 codes: post-inflammatory pulmonary fibrosis (PPF; ICD-9: 515), idiopathic fibrosing alveolitis (IFA; ICD-9: 516.3), connective tissue disease-associated interstitial lung disease (CTD-ILD; ICD-9: 517.2, 517.8, 714.81), sarcoidosis (ICD-9: 135) and extrinsic allergic alveolitis (EAA; ICD-9: 495). A random selection was conducted in cases with diagnostic code of PPF and IFA, where a relative higher case number was identified. All the cases of CTD-ILD, sarcoidosis and EAA were included in validation for relatively small case number.
    Two hundred and sixty nine cases were validated using medical record review by a respiratory specialist. The overall positive predictive value (PPV) was 79% (95% CI, 74%-84%). In subgroup analysis, true positive case numbers of PPF, IFA, CTD-ILD, sarcoidosis and EAA were 74/100 (74%), 95/100 (95%), 11/15 (73%), 27/32 (84%) and 6/22 (27%), respectively.
    This was the first ICD-9 coding validation for ILD in Hong Kong CDARS. Our study demonstrated that using ICD-9 algorithms 515, 516.3, 517.2, 517.8, 714.81 and 135 enhanced identifications of ILDs with PPV that was reliable to support utility of CDARS database for further clinical research on ILDs. The validity is particularly high with 516.3.






  • 文章类型: Journal Article
    Computer-assisted clinical coding (CAC) based on automated coding algorithms has been expected to improve the International Classification of Disease, tenth version (ICD-10) coding quality and productivity, whereas studies oriented to primary diagnosis auto-coding are limited in the Chinese context.
    This study aims at developing a machine learning (ML) model for automated primary diagnosis ICD-10 coding.
    A total of 71,709 admissions in Fuwai hospital were included to carry out this study, corresponding to 168 primary diagnosis ICD-10 codes. Based on clinical implications, two feature engineering methods were used to process discharge diagnosis and procedure texts into sequential features and sequential grouping features respectively by which two kinds of models were built and compared. One baseline model using one-hot encoding features was considered. Light Gradient Boosting Machine (LightGBM) was adopted as the classifier, and grid search and cross-validation were used to select the optimal hyperparameters. SHapley Additive exPlanations (SHAP) values were applied to give the interpretability of models.
    Our best prediction model was developed based on sequential grouping features. It showed good performance in the test phase with accuracy and macro-averaged F1 (Macro-F1) of 95.2% and 88.3% respectively. The comparison of the models demonstrated the effectiveness of the sequential information and the grouping strategy in boosting model performance (P-value < 0.01). Subgroup analysis of the best model on each individual code manifested that 91.1% of the codes achieved the F1 over 70.0%.
    Our model has been demonstrated its effectiveness for automated primary diagnosis coding in the Chinese context and its results are interpretable. Hence, it has the potential to assist clinical coders to improve coding efficiency and quality in Chinese inpatient settings.





