mortality risk prediction

死亡风险预测
  • 文章类型: Journal Article
    目标:现实世界的数据包括人口多样性,能够深入了解老年人的慢性病死亡风险。深度学习擅长大型数据集,为现实世界的数据提供承诺。然而,目前的模型关注单一疾病,忽视患者普遍存在的合并症。此外,与疾病相比,死亡率很少见,造成极端的阶级不平衡,阻碍可靠的预测。我们的目标是开发一个深度学习框架,通过解决合并症和阶级失衡,从现实数据中准确预测死亡风险。
    方法:我们集成了多任务和成本敏感的学习,开发增强的深度神经网络架构,扩展多任务学习来预测多种慢性疾病的死亡风险。每个患有慢性病的患者队列被分配到一个单独的任务,共享的低级参数通过不同的顶级网络捕获疾病间的复杂性。纳入了成本敏感函数,以确保学习每个任务的积极类别特征,并实现对多种慢性疾病死亡风险的准确预测。
    结果:我们的研究涵盖了15种流行的慢性病,并对深圳482,145名患者(包括9,516例死亡)的真实数据进行了实验。中国。将提出的模型与六个模型进行比较,其中包括三个机器学习模型:逻辑回归,XGBoost,和CatBoost,和三个最先进的深度学习模型:1D-CNN,TabNet,和圣。实验结果表明,与其他比较算法相比,MTL-CSDNN在测试集上具有更好的预测结果(ACC=0.99,REC=0.99,PRAUC=0.97,MCC=0.98,G-means=0.98)。
    结论:我们的方法为利用现实世界数据进行精确的多疾病死亡风险预测提供了有价值的见解。在优化慢性病管理方面提供潜在的应用,增进福祉,降低老年人的医疗成本。
    OBJECTIVE: Real-world data encompass population diversity, enabling insights into chronic disease mortality risk among the elderly. Deep learning excels on large datasets, offering promise for real-world data. However, current models focus on single diseases, neglecting comorbidities prevalent in patients. Moreover, mortality is infrequent compared to illness, causing extreme class imbalance that impedes reliable prediction. We aim to develop a deep learning framework that accurately forecasts mortality risk from real-world data by addressing comorbidities and class imbalance.
    METHODS: We integrated multi-task and cost-sensitive learning, developing an enhanced deep neural network architecture that extends multi-task learning to predict mortality risk across multiple chronic diseases. Each patient cohort with a chronic disease was assigned to a separate task, with shared lower-level parameters capturing inter-disease complexities through distinct top-level networks. Cost-sensitive functions were incorporated to ensure learning of positive class characteristics for each task and achieve accurate prediction of the risk of death from multiple chronic diseases.
    RESULTS: Our study covers 15 prevalent chronic diseases and is experimented with real-world data from 482,145 patients (including 9,516 deaths) in Shenzhen, China. The proposed model is compared with six models including three machine learning models: logistic regression, XGBoost, and CatBoost, and three state-of-the-art deep learning models: 1D-CNN, TabNet, and Saint. The experimental results show that, compared with the other compared algorithms, MTL-CSDNN has better prediction results on the test set (ACC=0.99, REC=0.99, PRAUC=0.97, MCC=0.98, G-means = 0.98).
    CONCLUSIONS: Our method provides valuable insights into leveraging real-world data for precise multi-disease mortality risk prediction, offering potential applications in optimizing chronic disease management, enhancing well-being, and reducing healthcare costs for the elderly population.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    重症监护病房(ICU)中肺结核(PTB)并发严重社区获得性肺炎(SCAP)的死亡率仍然很高。我们旨在开发一种快速简单的模型,用于这些患者的早期评估和预后分层。
    成都某三甲医院ICU收治的所有PTB并发SCAP的成年患者,四川,回顾性纳入了2019年至2021年(开发队列)和2022年(验证队列)之间的中国。人口统计数据,合并症,实验室值,并收集了干预措施。结果是28天死亡率。采用逐步回归多变量Cox分析建立死亡风险预测评分模型。使用接收器工作特性(ROC)和校准曲线评估模型的预测效率。使用决策曲线分析(DCA)验证模型的临床价值和对决策的影响。
    总的来说,357和168名患者被纳入开发和验证队列,分别。肺结核严重程度指数(PTSI)评分包括长期使用糖皮质激素,体重指数(BMI)<18.5kg/m2,糖尿病,血尿素氮(BUN)≥7.14mmol/L,PO2/FiO2<150mmHg,和血管加压药的使用。开发和验证队列的ROC曲线下面积(AUC)值分别为0.817(95%CI:0.772-0.863)和0.814,分别。PTSI评分的AUC高于APACHEII,SOFA,和CURB-65得分。校准曲线表明在两个队列中都具有良好的校准。与APACHEII和SOFA评分相比,PTSI评分的DCA表明该模型具有较高的临床应用价值。
    该预后工具旨在快速评估PTB并发SCAP患者的28天死亡风险。它可以将这个患者群体分成相关的风险类别,指导有针对性的干预措施,加强临床决策,从而优化患者护理并改善结果。
    UNASSIGNED: The mortality rate from pulmonary tuberculosis (PTB) complicated by severe community-acquired pneumonia (SCAP) in the intensive care unit (ICU) remains high. We aimed to develop a rapid and simple model for the early assessment and stratification of prognosis in these patients.
    UNASSIGNED: All adult patients with PTB complicated by SCAP admitted to the ICU of a tertiary hospital in Chengdu, Sichuan, China between 2019 and 2021 (development cohort) and 2022 (validation cohort) were retrospectively included. Data on demographics, comorbidities, laboratory values, and interventions were collected. The outcome was the 28-day mortality. Stepwise backward multivariate Cox analysis was used to develop a mortality risk prediction score model. Receiver operating characteristic (ROC) and calibration curves were used to evaluate the model\'s predictive efficiency. Decision curve analysis (DCA) was used to validate the model\'s clinical value and impact on decision making.
    UNASSIGNED: Overall, 357 and 168 patients were included in the development and validation cohorts, respectively. The Pulmonary Tuberculosis Severity Index (PTSI) score included long-term use of glucocorticoid, body mass index (BMI) <18.5 kg/m2, diabetes, blood urea nitrogen (BUN) ≥7.14 mmol/L, PO2/FiO2 <150 mmHg, and vasopressor use. The area under the ROC curve (AUC) values were 0.817 (95% CI: 0.772-0.863) and 0.814 for the development and validation cohorts, respectively. The PTSI score had a higher AUC than the APACHE II, SOFA, and CURB-65 score. The calibration curves indicated good calibration in both cohorts. The DCA of the PTSI score indicated the high clinical application of the model compared with the APACHE II and SOFA scores.
    UNASSIGNED: This prognostic tool was designed to rapidly evaluate the 28-day mortality risk in individuals with PTB complicated by SCAP. It can stratify this patient group into relevant risk categories, guide targeted interventions, and enhance clinical decision making, thereby optimizing patient care and improving outcomes.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    血液透析(HD)是终末期肾病的主要治疗手段,死亡率高,经济负担重。预测接受维持性HD患者的死亡风险和识别高危患者对于早期干预和改善生活质量至关重要。在这项研究中,我们提出了基于电子健康记录(EHR)数据的两阶段方案来预测维持性HD患者的死亡风险.首先,我们开发了多层感知器(MLP)模型来预测死亡风险.第二,提出了一种主动对比学习(ACL)方法来选择样本对并优化表示空间,以提高MLP模型的预测性能。我们的ACL方法优于其他方法,平均F1分数为0.820,接受者工作特征曲线下的平均面积为0.853。这项工作可推广到横截面EHR数据分析,而这种两阶段方法也可以应用于其他疾病。
    Hemodialysis (HD) is the main treatment for end-stage renal disease with high mortality and heavy economic burdens. Predicting the mortality risk in patients undergoing maintenance HD and identifying high-risk patients are critical to enable early intervention and improve quality of life. In this study, we proposed a two-stage protocol based on electronic health record (EHR) data to predict mortality risk of maintenance HD patients. First, we developed a multilayer perceptron (MLP) model to predict mortality risk. Second, an Active Contrastive Learning (ACL) method was proposed to select sample pairs and optimize the representation space to improve the prediction performance of the MLP model. Our ACL method outperforms other methods and has an average F1-score of 0.820 and an average area under the receiver operating characteristic curve of 0.853. This work is generalizable to analyses of cross-sectional EHR data, while this two-stage approach can be applied to other diseases as well.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:死亡率风险预测是根据相关的诊断和治疗数据来预测患者是否有死亡风险。如何基于电子健康档案(EHR)准确预测患者死亡风险是当前医疗领域的研究热点。在实际的医疗数据集中,经常有许多缺失的值,会严重干扰模型预测的效果。然而,当缺失值被插值时,大多数现有方法不考虑内插值的保真度或置信度。对缺失变量的错误估计会导致建模困难和性能下降,而模型的可靠性在临床环境中可能会受到影响。
    方法:我们提出了一种基于缺失值估算和可靠性评估的模型,用于死亡率风险预测(MVIRA)。该模型采用变分自编码器和递归神经网络相结合的方法完成缺失值的插值,增强了EHR数据的表征能力,从而提高死亡率风险预测的性能。此外,引入蒙特卡洛Dropout方法对模型预测结果的不确定性进行计算,从而实现模型的可靠性评估。
    结果:我们在公共数据集MIMIC-III和MIMIC-IV上执行模型的性能验证。与竞争模型相比,所提出的模型在整体专业方面表现出更好的性能。
    结论:所提出的模型可以有效提高死亡风险预测的准确性,并可以帮助医疗机构评估患者的状况。
    BACKGROUND: Mortality risk prediction is to predict whether a patient has the risk of death based on relevant diagnosis and treatment data. How to accurately predict patient mortality risk based on electronic health records (EHR) is currently a hot research topic in the healthcare field. In actual medical datasets, there are often many missing values, which can seriously interfere with the effect of model prediction. However, when missing values are interpolated, most existing methods do not take into account the fidelity or confidence of the interpolated values. Misestimation of missing variables can lead to modeling difficulties and performance degradation, while the reliability of the model may be compromised in clinical environments.
    METHODS: We propose a model based on Missing Value Imputation and Reliability Assessment for mortality risk prediction (MVIRA). The model uses a combination of variational autoencoder and recurrent neural networks to complete the interpolation of missing values and enhance the characterization ability of EHR data, thus improving the performance of mortality risk prediction. In addition, we also introduce the Monte Carlo Dropout method to calculate the uncertainty of the model prediction results and thus achieve the reliability assessment of the model.
    RESULTS: We perform performance validation of the model on the public datasets MIMIC-III and MIMIC-IV. The proposed model showed improved performance compared with competitive models in terms of overall specialties.
    CONCLUSIONS: The proposed model can effectively improve the accuracy of mortality risk prediction, and can help medical institutions assess the condition of patients.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:重症监护病房(ICU)收治的严重社区获得性肺炎(SCAP)患者的死亡率一直很高;然而,关于本组患者预后的预测模型报道较少.这项研究旨在筛选危险因素并分配有用的列线图来预测这些患者的死亡率。
    方法:作为一个发展队列,我们使用了455例入住ICU的SCAP患者.Logistic回归分析用于确定死亡的独立危险因素。基于统计学意义的危险因素建立死亡率预测模型。此外,使用列线图对模型进行可视化.作为验证队列,我们使用了另一医院ICU收治的88例SCAP患者.通过分析受试者工作特征(ROC)曲线(AUC)下面积来评估列线图的性能,校正曲线分析,和决策曲线分析(DCA)。
    结果:淋巴细胞,PaO2/FiO2,电击,在发展队列中,APACHEⅡ评分是院内死亡的独立危险因素.外部验证结果显示C指数为0.903(95%CI0.838-0.968)。发展队列模型的AUC为0.85,优于APACHEII评分0.795和SOFA评分0.69。验证队列的AUC为0.893,优于APACHEII评分0.746和SOFA评分0.742。两个队列的校准曲线显示预测概率和实际概率之间的一致性。两组的DCA曲线结果表明,与APACHEII和SOFA评分系统相比,该模型具有较高的临床应用价值。
    结论:我们开发了一个基于淋巴细胞的预测模型,PaO2/FiO2,电击,和APACHEII评分可预测入住ICU的SCAP患者的院内死亡率。该模型有可能帮助医生评估这组患者的预后。
    BACKGROUND: A high mortality rate has always been observed in patients with severe community-acquired pneumonia (SCAP) admitted to the intensive care unit (ICU); however, there are few reported predictive models regarding the prognosis of this group of patients. This study aimed to screen for risk factors and assign a useful nomogram to predict mortality in these patients.
    METHODS: As a developmental cohort, we used 455 patients with SCAP admitted to ICU. Logistic regression analyses were used to identify independent risk factors for death. A mortality prediction model was built based on statistically significant risk factors. Furthermore, the model was visualized using a nomogram. As a validation cohort, we used 88 patients with SCAP admitted to ICU of another hospital. The performance of the nomogram was evaluated by analysis of the area under the receiver operating characteristic (ROC) curve (AUC), calibration curve analysis, and decision curve analysis (DCA).
    RESULTS: Lymphocytes, PaO2/FiO2, shock, and APACHE II score were independent risk factors for in-hospital mortality in the development cohort. External validation results showed a C-index of 0.903 (95% CI 0.838-0.968). The AUC of model for the development cohort was 0.85, which was better than APACHE II score 0.795 and SOFA score 0.69. The AUC for the validation cohort was 0.893, which was better than APACHE II score 0.746 and SOFA score 0.742. Calibration curves for both cohorts showed agreement between predicted and actual probabilities. The results of the DCA curves for both cohorts indicated that the model had a high clinical application in comparison to APACHE II and SOFA scoring systems.
    CONCLUSIONS: We developed a predictive model based on lymphocytes, PaO2/FiO2, shock, and APACHE II scores to predict in-hospital mortality in patients with SCAP admitted to the ICU. The model has the potential to help physicians assess the prognosis of this group of patients.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    我们试图建立一个预测脓毒症患者住院期间死亡风险的模型。
    从临床记录挖掘数据库收集脓毒症患者的数据,2013年1月至2022年8月在温州医科大学附属东阳医院住院。将这些纳入的患者分为建模组和验证组。在建模组中,使用单变量和多变量回归分析确定住院期间死亡的独立危险因素.经过逐步回归分析(两个方向),画了一个列线图。用受试者工作特征(ROC)曲线的曲线下面积(AUC)评价模型的辨别能力,和GiViTI校准图表评估模型校准。进行下降曲线分析(DCA)以评估预测模型的临床有效性。在验证组中,将逻辑回归模型与SOFA评分系统建立的模型进行比较,随机森林方法,和堆叠方法。
    本研究共纳入1740名受试者,1218在建模群体中,522在验证群体中。结果表明,血清胆碱酯酶,总胆红素,呼吸衰竭,乳酸,肌酐,脑钠肽前体是死亡的独立危险因素。模型组和验证组的AUC值分别为0.847和0.826。两组人群中校准图的P值分别为0.838和0.771。DCA曲线高于两个极端曲线。此外,SOFA评分系统建立的模型的AUC值,随机森林方法,验证组中的堆叠法和堆叠法分别为0.777、0.827和0.832。
    结合多个危险因素建立的列线图模型可以有效预测脓毒症患者住院期间的死亡风险。
    UNASSIGNED: We attempted to establish a model for predicting the mortality risk of sepsis patients during hospitalization.
    UNASSIGNED: Data on patients with sepsis were collected from a clinical record mining database, who were hospitalized at the Affiliated Dongyang Hospital of Wenzhou Medical University between January 2013 and August 2022. These included patients were divided into modeling and validation groups. In the modeling group, the independent risk factors of death during hospitalization were determined using univariate and multi-variate regression analyses. After stepwise regression analysis (both directions), a nomogram was drawn. The discrimination ability of the model was evaluated using the area under the curve (AUC) of the receiver operating characteristic (ROC) curve, and the GiViTI calibration chart assessed the model calibration. The Decline Curve Analysis (DCA) was performed to evaluate the clinical effectiveness of the prediction model. Among the validation group, the logistic regression model was compared to the models established by the SOFA scoring system, random forest method, and stacking method.
    UNASSIGNED: A total of 1740 subjects were included in this study, 1218 in the modeling population and 522 in the validation population. The results revealed that serum cholinesterase, total bilirubin, respiratory failure, lactic acid, creatinine, and pro-brain natriuretic peptide were the independent risk factors of death. The AUC values in the modeling group and validation group were 0.847 and 0.826. The P values of calibration charts in the two population sets were 0.838 and 0.771. The DCA curves were above the two extreme curves. Moreover, the AUC values of the models established by the SOFA scoring system, random forest method, and stacking method in the validation group were 0.777, 0.827, and 0.832, respectively.
    UNASSIGNED: The nomogram model established by combining multiple risk factors could effectively predict the mortality risk of sepsis patients during hospitalization.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    风险预测模型是有效分诊传入COVID-19患者的基础。然而,当前的分类方法通常具有较差的预测性能,基于测量成本很高的变量,并经常导致难以解释的决定。我们介绍了两种新的分类方法,可以通过对常规临床变量的自动分析来预测COVID-19死亡风险,具有很高的准确性和可解释性。SVM22-GASS和Clinical-GASS分类器利用机器学习方法和临床专业知识,分别。两者都是使用来自大流行第一波的499名患者的衍生队列开发的,并且使用来自第二大流行阶段的250名患者的独立验证队列进行了验证。Clinical-GASS分类器是一个基于阈值的分类器,它利用SARS-CoV-2严重程度(GASS)评分的一般评估,一项COVID-19特异性临床评分,最近显示其在预测COVID-19死亡风险方面的有效性。SVM22-GASS模型是使用支持向量机(SVM)非线性处理临床数据的二元分类器。在这项研究中,我们表明,SMV22-GASS能够预测验证队列的死亡风险,AUC为0.87,准确度为0.88,优于以前制定的大多数评分.同样,Clinical-GASS分类器预测验证队列的死亡风险,AUC为0.77,准确度为0.78,与其他已建立和新兴的基于机器学习的方法相当.我们的结果证明了仅使用常规临床变量进行准确的COVID-19死亡风险预测的可行性,在入院的早期阶段很容易收集。
    Risk prediction models are fundamental to effectively triage incoming COVID-19 patients. However, current triaging methods often have poor predictive performance, are based on variables that are expensive to measure, and often lead to hard-to-interpret decisions. We introduce two new classification methods that can predict COVID-19 mortality risk from the automatic analysis of routine clinical variables with high accuracy and interpretability. SVM22-GASS and Clinical-GASS classifiers leverage machine learning methods and clinical expertise, respectively. Both were developed using a derivation cohort of 499 patients from the first wave of the pandemic and were validated with an independent validation cohort of 250 patients from the second pandemic phase. The Clinical-GASS classifier is a threshold-based classifier that leverages the General Assessment of SARS-CoV-2 Severity (GASS) score, a COVID-19-specific clinical score that recently showed its effectiveness in predicting the COVID-19 mortality risk. The SVM22-GASS model is a binary classifier that non-linearly processes clinical data using a Support Vector Machine (SVM). In this study, we show that SMV22-GASS was able to predict the mortality risk of the validation cohort with an AUC of 0.87 and an accuracy of 0.88, better than most scores previously developed. Similarly, the Clinical-GASS classifier predicted the mortality risk of the validation cohort with an AUC of 0.77 and an accuracy of 0.78, on par with other established and emerging machine-learning-based methods. Our results demonstrate the feasibility of accurate COVID-19 mortality risk prediction using only routine clinical variables, readily collected in the early stages of hospital admission.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:尚未开发出精确的方法来对重症监护病房(ICU)患者的死亡风险进行建模。传统的死亡风险预测方法难以有效提取纵向电子病历(EHRs)中的信息,因为它们只是将异构变量聚合在EHR中,忽略了变量之间的复杂关系和相互作用以及纵向记录中的时间依赖性。最近,深度学习方法已广泛用于对纵向EHR数据进行建模。然而,大多数现有的基于深度学习的风险预测方法只使用单一疾病的信息,忽略了多种疾病和不同条件之间的相互作用。
    结果:在本文中,我们通过利用EHR中的疾病和治疗信息来开发基于深度学习(DeepMPM)的死亡风险预测模型,来解决这一尚未满足的需求.DeepMPM利用两级注意力机制,即访问级别和可变级别的注意力,从患者的多个纵向病历中得出患者风险状态的表示。受益于使用多种疾病和不同条件的患者的EHR,DeepMPM可以在死亡率风险预测中实现最先进的性能。
    结论:MIMICIII数据库的实验结果表明,在疾病和治疗信息下,DeepMPM可以在ROC曲线下面积(0.85)方面取得良好的表现。此外,DeepMPM可以成功地对疾病之间的复杂相互作用进行建模,以实现比其他深度学习方法更好的疾病和治疗的表征学习。从而提高死亡率预测的准确性。案例研究还表明,DeepMPM具有为用户提供对数据中特征相关性以及每个预测的模型行为的见解的潜力。
    BACKGROUND: Accurate precision approaches have far not been developed for modeling mortality risk in intensive care unit (ICU) patients. Conventional mortality risk prediction methods can hardly extract the information in longitudinal electronic medical records (EHRs) effectively, since they simply aggregate the heterogeneous variables in EHRs, ignoring the complex relationship and interactions between variables and the time dependence in longitudinal records. Recently deep learning approaches have been widely used in modeling longitudinal EHR data. However, most existing deep learning-based risk prediction approaches only use the information of a single disease, neglecting the interactions between multiple diseases and different conditions.
    RESULTS: In this paper, we address this unmet need by leveraging disease and treatment information in EHRs to develop a mortality risk prediction model based on deep learning (DeepMPM). DeepMPM utilizes a two-level attention mechanism, i.e. visit-level and variable-level attention, to derive the representation of patient risk status from patient\'s multiple longitudinal medical records. Benefiting from using EHR of patients with multiple diseases and different conditions, DeepMPM can achieve state-of-the-art performances in mortality risk prediction.
    CONCLUSIONS: Experiment results on MIMIC III database demonstrates that with the disease and treatment information DeepMPM can achieve a good performance in terms of Area Under ROC Curve (0.85). Moreover, DeepMPM can successfully model the complex interactions between diseases to achieve better representation learning of disease and treatment than other deep learning approaches, so as to improve the accuracy of mortality prediction. A case study also shows that DeepMPM offers the potential to provide users with insights into feature correlation in data as well as model behavior for each prediction.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    电子病历(EMR)的广泛使用促进了医疗保健质量的提高。可以自动从EMR数据中提取隐藏信息的表示学习已获得越来越多的关注。
    我们旨在提出具有更多特征关联和任务特定特征重要性的患者表示,以改善急性心肌梗死(AMI)住院患者的预后预测性能。
    医疗概念,包括患者年龄,性别,疾病诊断,实验室测试,结构化放射学特征,程序,和药物,首先使用改进的skip-gram算法嵌入到实值向量中,其中上下文窗口中的概念是通过由关联规则置信度度量的特征关联强度来选择的。然后,每个患者都表示为由任务特定特征重要性加权的特征嵌入的总和,应用于从全球和局部角度进行预测模型预测。最后,我们从公共数据集和私人数据集中将建议的患者表示应用于3010和1671例AMI住院患者的死亡率风险预测。分别,并将其与几种参考表示方法在受试者工作特征曲线下面积(AUROC)方面进行了比较,精确率-召回率曲线下的面积(AUPRC),和F1得分。
    与参考方法相比,所提出的基于嵌入的表示在2个数据集上显示出一贯优异的预测性能,公共和私人数据集的平均AUROC为0.878和0.973,AUPRC为0.220和0.505,F1分数为0.376和0.674,分别,而最大的AUROC,AUPRC,参考方法中的F1分数分别为0.847和0.939,公共和私人数据集的0.196和0.283以及0.344和0.361,分别。整合在患者代表中的特征重要性反映了在预测任务和临床实践中也至关重要的特征。
    特征关联和特征重要性的引入促进了有效的患者表示,并有助于预测性能改进和模型解释。
    The widespread secondary use of electronic medical records (EMRs) promotes health care quality improvement. Representation learning that can automatically extract hidden information from EMR data has gained increasing attention.
    We aimed to propose a patient representation with more feature associations and task-specific feature importance to improve the outcome prediction performance for inpatients with acute myocardial infarction (AMI).
    Medical concepts, including patients\' age, gender, disease diagnoses, laboratory tests, structured radiological features, procedures, and medications, were first embedded into real-value vectors using the improved skip-gram algorithm, where concepts in the context windows were selected by feature association strengths measured by association rule confidence. Then, each patient was represented as the sum of the feature embeddings weighted by the task-specific feature importance, which was applied to facilitate predictive model prediction from global and local perspectives. We finally applied the proposed patient representation into mortality risk prediction for 3010 and 1671 AMI inpatients from a public data set and a private data set, respectively, and compared it with several reference representation methods in terms of the area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), and F1-score.
    Compared with the reference methods, the proposed embedding-based representation showed consistently superior predictive performance on the 2 data sets, achieving mean AUROCs of 0.878 and 0.973, AUPRCs of 0.220 and 0.505, and F1-scores of 0.376 and 0.674 for the public and private data sets, respectively, while the greatest AUROCs, AUPRCs, and F1-scores among the reference methods were 0.847 and 0.939, 0.196 and 0.283, and 0.344 and 0.361 for the public and private data sets, respectively. Feature importance integrated in patient representation reflected features that were also critical in prediction tasks and clinical practice.
    The introduction of feature associations and feature importance facilitated an effective patient representation and contributed to prediction performance improvement and model interpretation.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    由严重急性呼吸道综合症冠状病毒2(SARS-CoV-2)引起的2019年冠状病毒病(COVID-19)感染的增加给全球医疗保健服务带来了压力。因此,确定关键因素对于评估COVID-19感染的严重程度和优化个体治疗策略至关重要.在这方面,本研究利用了武汉地区485名COVID-19个体的血液样本数据集,中国确定预测COVID-19个体死亡率的必需血液生物标志物。为此,过滤器的混合体,统计,并采用基于启发式的特征选择方法选择信息特征的最佳子集。因此,最小冗余最大相关性(mRMR),双尾不成对t检验,和鲸鱼优化算法(WOA)最终被选为三个信息最丰富的血液生物标志物:国际标准化比率(INR),血小板大细胞比率(P-LCR),和D-二聚体。此外,各种机器学习(ML)算法(随机森林(RF),支持向量机(SVM),极端梯度增强(EGB),朴素贝叶斯(NB),逻辑回归(LR),和k最近邻(KNN)进行了训练。比较了训练模型的性能,以确定有助于以更高的准确性预测COVID-19个体死亡率的模型,F1得分,和曲线下面积(AUC)值。在本文中,在独立试验数据上,使用三个信息最丰富的血液参数建立的最佳RF模型预测COVID-19个体的死亡率的准确度分别为0.96±0.062,F1评分为0.96±0.099,AUC值为0.98±0.024.此外,我们提出的基于RF的模型在准确性方面的性能,F1得分,AUC明显优于使用Pre_Surv_COVID_19数据建立的基于已知血液生物标志物的ML模型。因此,本研究提供了一种新的混合方法来筛选信息最丰富的血液生物标志物,以开发基于RF的模型,准确可靠地预测确诊COVID-19个体的院内死亡率,在浪涌期间。在Heroku上实现并部署了基于我们提出的模型的应用程序。
    The increase in coronavirus disease 2019 (COVID-19) infection caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has placed pressure on healthcare services worldwide. Therefore, it is crucial to identify critical factors for the assessment of the severity of COVID-19 infection and the optimization of an individual treatment strategy. In this regard, the present study leverages a dataset of blood samples from 485 COVID-19 individuals in the region of Wuhan, China to identify essential blood biomarkers that predict the mortality of COVID-19 individuals. For this purpose, a hybrid of filter, statistical, and heuristic-based feature selection approach was used to select the best subset of informative features. As a result, minimum redundancy maximum relevance (mRMR), a two-tailed unpaired t-test, and whale optimization algorithm (WOA) were eventually selected as the three most informative blood biomarkers: International normalized ratio (INR), platelet large cell ratio (P-LCR), and D-dimer. In addition, various machine learning (ML) algorithms (random forest (RF), support vector machine (SVM), extreme gradient boosting (EGB), naïve Bayes (NB), logistic regression (LR), and k-nearest neighbor (KNN)) were trained. The performance of the trained models was compared to determine the model that assist in predicting the mortality of COVID-19 individuals with higher accuracy, F1 score, and area under the curve (AUC) values. In this paper, the best performing RF-based model built using the three most informative blood parameters predicts the mortality of COVID-19 individuals with an accuracy of 0.96 ± 0.062, F1 score of 0.96 ± 0.099, and AUC value of 0.98 ± 0.024, respectively on the independent test data. Furthermore, the performance of our proposed RF-based model in terms of accuracy, F1 score, and AUC was significantly better than the known blood biomarkers-based ML models built using the Pre_Surv_COVID_19 data. Therefore, the present study provides a novel hybrid approach to screen the most informative blood biomarkers to develop an RF-based model, which accurately and reliably predicts in-hospital mortality of confirmed COVID-19 individuals, during surge periods. An application based on our proposed model was implemented and deployed at Heroku.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号