Explainable Boosting Machine

可解释的增压机
  • 文章类型: Journal Article
    急性心肌梗死(AMI),一种可以产生严重后果的常见疾病,当心肌血流由于冠状动脉阻塞而停止时发生。早期准确预测AMI对于快速预后和改善患者预后至关重要。代谢组学,研究生物系统中的小分子,是用于发现与许多疾病相关的生物标志物的有效工具。这项研究旨在利用代谢组学数据和一种称为可解释的机器学习方法(EBM)来构建AMI的预测模型。EBM模型是在从99个个体收集的102个预后代谢物的数据集上进行训练的,包括34名健康对照和65名AMI患者。经过全面的数据预处理,确定了21种代谢物作为预测AMI的候选预测因子。EBM模型在预测AMI方面表现出令人满意的性能,具有各种分类性能指标。该模型的预测是基于个体代谢物及其相互作用的综合效应。在这种情况下,在两个不同的EBM建模中获得的结果,仅包括个体代谢物特征和它们的相互作用效应,进行了讨论。最重要的预测指标包括肌酐,烟酰胺,和等柠檬酸盐。这些代谢物参与不同的生物活性,比如能量代谢,DNA修复,和细胞信号。结果表明,代谢组学和EBM模型的组合在构建可靠和可解释的AMI预测输出中的潜力。讨论的代谢物生物标志物可能有助于早期诊断,风险评估,和AMI患者的个性化治疗方法。这项研究成功地开发了一个包含广泛的数据预处理和EBM模型的管道,以识别潜在的代谢物生物标志物来预测AMI。EBM模型,具有整合交互术语的能力,表现出令人满意的分类性能,并揭示了显著的代谢物相互作用,这在评估AMI风险方面可能是有价值的。然而,从这项研究中获得的结果应通过在较大且定义明确的样本中进行的研究进行验证.
    Acute Myocardial Infarction (AMI), a common disease that can have serious consequences, occurs when myocardial blood flow stops due to occlusion of the coronary artery. Early and accurate prediction of AMI is critical for rapid prognosis and improved patient outcomes. Metabolomics, the study of small molecules within biological systems, is an effective tool used to discover biomarkers associated with many diseases. This study intended to construct a predictive model for AMI utilizing metabolomics data and an explainable machine learning approach called Explainable Boosting Machines (EBM). The EBM model was trained on a dataset of 102 prognostic metabolites gathered from 99 individuals, including 34 healthy controls and 65 AMI patients. After a comprehensive data preprocessing, 21 metabolites were determined as the candidate predictors to predict AMI. The EBM model displayed satisfactory performance in predicting AMI, with various classification performance metrics. The model\'s predictions were based on the combined effects of individual metabolites and their interactions. In this context, the results obtained in two different EBM modeling, including both only individual metabolite features and their interaction effects, were discussed. The most important predictors included creatinine, nicotinamide, and isocitrate. These metabolites are involved in different biological activities, such as energy metabolism, DNA repair, and cellular signaling. The results demonstrate the potential of the combination of metabolomics and the EBM model in constructing reliable and interpretable prediction outputs for AMI. The discussed metabolite biomarkers may assist in early diagnosis, risk assessment, and personalized treatment methods for AMI patients. This study successfully developed a pipeline incorporating extensive data preprocessing and the EBM model to identify potential metabolite biomarkers for predicting AMI. The EBM model, with its ability to incorporate interaction terms, demonstrated satisfactory classification performance and revealed significant metabolite interactions that could be valuable in assessing AMI risk. However, the results obtained from this study should be validated with studies to be carried out in larger and well-defined samples.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    准确预测COVID-19预后仍然是一个临床挑战。在这方面,早期发现重症病例有助于COVID-19病例的分诊和治疗.本文旨在探讨COVID-19患者入院时常规实验室检查的预后。
    一组数据集,包括1455名COVID-19患者(727名男性,728名女性)及其入院时进行的常规实验室检查,年龄,重症监护病房(ICU)入院,并收集了结果。将数据集随机分成训练(75%的数据)和测试数据集(25%的数据)。可解释增强机(EBM)和极端梯度增强(XGBoost)用于预测COVID-19病例的死亡率和ICU入院。此外,使用EBM和XGBoost提取特征重要性。
    EBM和XGBoost在测试数据集中达到了86.38%和88.56%的准确性,分别。此外,EBM和XGBoost预测ICU入院的准确率为89.37%,COVID-19患者的测试数据为79.29%,分别。此外,获得的模型表明天冬氨酸转氨酶(AST),淋巴细胞,血尿素氮(BUN),年龄和年龄是COVID-19死亡率的最重要预测因素。此外,淋巴细胞计数,AST,和BUN水平是COVID-19患者最重要的ICU入院预测因子。
    目前的研究表明,根据入院时的常规血液学和临床化学评估,EBM和XGBoost均可预测COVID-19患者的ICU入院和死亡率。此外,根据结果,AST,淋巴细胞计数,BUN水平可作为COVID-19预后的早期预测因子。
    UNASSIGNED: The precise prediction of COVID-19 prognosis remains a clinical challenge. In this regard, early identification of severe cases facilitates the triage and management of COVID-19 cases. The present paper aims to explore the prognosis of COVID-19 patients based on routine laboratory tests taken when patients are admitted.
    UNASSIGNED: A data set including 1455 COVID-19 patients (727 male, 728 female) and their routine laboratory tests conducted upon hospital admission, age, Intensive Care Unit (ICU) admission, and outcome were gathered. The data set was randomly split into the train (75% of the data) and test data set (25% of the data). The explainable boosting machine (EBM) and extreme gradient boosting (XGBoost) were used for predicting the mortality and ICU admission of COVID-19 cases. Also, feature importance was extracted using EBM and XGBoost.
    UNASSIGNED: The EBM and XGBoost achieved 86.38% and 88.56% accuracy in the test data set, respectively. In addition, EBM and XGBoost predicted the ICU admission with an accuracy of 89.37%, and 79.29% in the test data set for COVID-19 patients, respectively. Also, obtained models indicated that aspartate transaminase (AST), lymphocyte, blood urea nitrogen (BUN), and age are the most significant predictors of COVID-19 mortality. Furthermore, the lymphocyte count, AST, and BUN level were the most significant ICU admission predictors of COVID-19 patients.
    UNASSIGNED: The current study indicated that both EBM and XGBoost could predict the ICU admission and mortality of COVID-19 cases based on routine hematological and clinical chemistry evaluation at the time of admission. Also, based on the results, AST, lymphocyte count, and BUN levels could be used as early predictors of COVID-19 prognosis.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:预测轻度认知障碍(MCI)患者的认知能力下降对于识别高危个体和实施有效管理至关重要。为了改进MCI到AD转换的预测,有必要使用可解释的机器学习(XAI)模型来考虑各种因素,这些模型在保持预测准确性的同时提供可解释性。本研究使用具有多模态特征的可解释助推器(EBM)模型来预测MCI在不同随访期间向AD的转化,同时提供可解释性。
    方法:这项回顾性病例对照研究是使用从ADNI数据库获得的数据进行的,包括2006年至2022年1042例MCI患者的记录。这项研究中包括的暴露是MRI生物标志物,认知分数,人口统计,和临床特征。主要结果是随访期间aMCI的AD转化。基于三个特征组合,利用EBM模型预测aMCI转化为AD,在确保准确性的同时获得可解释性。同时,模型中考虑了交互效应。三个特征组合在不同随访时期进行了比较,具有准确性,灵敏度,特异性,AUC-ROC。全局和局部解释通过重要性排名和特征可解释性图显示。
    结果:使用认知评分和MRI标记,五年预测准确性达到85%(AUC=0.92)。除了准确性,我们在不同的随访期内获得了特征的重要性。在AD的早期阶段,MRI标记起着重要作用,而对于中期来说,认知分数更为重要。特征风险评分图显示了所选因素与结果之间的有见地的非线性交互关联。在一年的预测中,右下颞叶容积(<9000)与AD转换显著相关。对于两年的预测,较低的左下颞厚度(<2)是最关键的。对于三年预测,更高的FAQ分数(>4)是最重要的。在四年的预测中,APOE4是最关键的。对于五年预测,右下嗅体积(<1000)是最关键的特征。
    结论:建立的具有多模态特征的玻璃盒模型EBM在预测MCI的AD转化方面表现出优越的能力和详细的可解释性。确定了具有显著重要性的多个特征。进一步的研究对于确定所建立的预测工具是否会改善AD患者的临床管理可能具有重要意义。
    BACKGROUND: Predicting cognition decline in patients with mild cognitive impairment (MCI) is crucial for identifying high-risk individuals and implementing effective management. To improve predicting MCI-to-AD conversion, it is necessary to consider various factors using explainable machine learning (XAI) models which provide interpretability while maintaining predictive accuracy. This study used the Explainable Boosting Machine (EBM) model with multimodal features to predict the conversion of MCI to AD during different follow-up periods while providing interpretability.
    METHODS: This retrospective case-control study is conducted with data obtained from the ADNI database, with records of 1042 MCI patients from 2006 to 2022 included. The exposures included in this study were MRI biomarkers, cognitive scores, demographics, and clinical features. The main outcome was AD conversion from aMCI during follow-up. The EBM model was utilized to predict aMCI converting to AD based on three feature combinations, obtaining interpretability while ensuring accuracy. Meanwhile, the interaction effect was considered in the model. The three feature combinations were compared in different follow-up periods with accuracy, sensitivity, specificity, and AUC-ROC. The global and local explanations are displayed by importance ranking and feature interpretability plots.
    RESULTS: The five-years prediction accuracy reached 85% (AUC = 0.92) using both cognitive scores and MRI markers. Apart from accuracies, we obtained features\' importance in different follow-up periods. In early stage of AD, the MRI markers play a major role, while for middle-term, the cognitive scores are more important. Feature risk scoring plots demonstrated insightful nonlinear interactive associations between selected factors and outcome. In one-year prediction, lower right inferior temporal volume (<9000) is significantly associated with AD conversion. For two-year prediction, low left inferior temporal thickness (<2) is most critical. For three-year prediction, higher FAQ scores (>4) is the most important. During four-year prediction, APOE4 is the most critical. For five-year prediction, lower right entorhinal volume (<1000) is the most critical feature.
    CONCLUSIONS: The established glass-box model EBMs with multimodal features demonstrated a superior ability with detailed interpretability in predicting AD conversion from MCI. Multi features with significant importance were identified. Further study may be of significance to determine whether the established prediction tool would improve clinical management for AD patients.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: English Abstract
    目的:构建一个固有的可解释性机器学习模型,作为可解释的助推机器模型(EBM),用于预测严重缺血性卒中患者一年的死亡风险。
    方法:我们在MIMIC-Ⅳ(2.0)数据库中随机抽取2369例符合资格的重度缺血性卒中患者的数据,他们于2008年至2019年在ICU入院,纳入训练数据集(80%)和测试数据集(20%),并使用EBM模型评估患者的预后。通过计算接收器工作特征(AUC)曲线下的面积来评估模型的预测性能。校准曲线和Brier评分用于评估模型的校准程度,并生成决策曲线以评估净临床获益.
    结果:本研究构建的EBM模型具有良好的鉴别力,校准和净效益,AUC为0.857(95%CI:0.831-0.887),用于预测严重缺血性卒中的预后。校正曲线分析表明,EBM模型的标准曲线最接近理想曲线。决策曲线分析表明,在0.10~0.80的预测概率阈值下,该模型的净收益率最大。基于EBM模型的前5个独立预测变量是年龄,SOFA得分,平均心率,机械通气,和平均呼吸频率,其显著性评分范围为0.179至0.370。
    结论:该EBM模型在预测严重缺血性卒中患者一年内的死亡风险方面具有良好的性能,并允许临床医生通过模型的可解释性更好地了解患者预后的影响因素。
    OBJECTIVE: To construct an inherent interpretability machine learning model as an explainable boosting machine model (EBM) for predicting one-year risk of death in patients with severe ischemic stroke.
    METHODS: We randomly divided the data of 2369 eligible patients with severe ischemic stroke in the MIMIC-Ⅳ(2.0) database, who were admitted in ICU in 2008 to 2019, into a training dataset (80%) and a test dataset (20%), and assessed the prognosis of the patients using the EBM model. The prediction performance of the model was evaluated by calculating the area under the receiver operating characteristic (AUC) curve. The calibration curve and Brier score were used to evaluate the degree of calibration of the model, and a decision curve was generated to assess the net clinical benefit.
    RESULTS: The EBM model constructed in this study had good discrimination power, calibration and net benefit, with an AUC of 0.857 (95% CI: 0.831-0.887) for predicting prognosis of severe ischemic stroke. Calibration curve analysis showed that the standard curve of the EBM model was the closest to the ideal curve. Decision curve analysis showed that the model had the greatest net benefit rate at the prediction probability threshold of 0.10 to 0.80. The top 5 independent predictive variables based on the EBM model were age, SOFA score, mean heart rate, mechanical ventilation, and mean respiratory rate, whose significance scores ranged from 0.179 to 0.370.
    CONCLUSIONS: This EBM model has a good performance for predicting the risk of death within one year in patients with severe ischemic stroke and allows clinicians to better understand the contributing factors of the patients\' outcomes through the model interpretability.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:由于不同的肿瘤行为和治疗反应,预测诊断为脑肿瘤的患者的生存率具有挑战性。机器学习的进步导致了临床预后模型的发展,但是由于缺乏模型的可解释性,融入临床实践几乎不存在。在这项回顾性研究中,我们比较了5种具有不同程度可解释性的分类模型对脑肿瘤诊断后生存超过1年的预测.
    方法:我们的研究纳入了2012年4月至2020年4月间诊断为脑肿瘤的年龄≥16岁的1028例患者。三个本质上可解释的“玻璃盒”分类器(贝叶斯规则列表[BRL],可解释的助推器[EBM],和逻辑回归[LR]),和两个“黑盒”分类器(随机森林[RF]和支持向量机[SVM])在电子患者记录上进行了训练,以预测一年的生存。所有模型均使用平衡精度(BAC)进行评估,F1分数,灵敏度,特异性,和接收器操作特性。黑盒模型可解释性和错误分类的预测使用SHapley加性扩张(SHAP)值进行量化,模型特征重要性由临床专家进行评估。
    结果:RF模型实现了78.9%的最高BAC,紧随其后的是SVM(77.7%),LR(77.5%)和EBM(77.1%)。在所有型号中,年龄,诊断(肿瘤类型),功能特征,首次治疗是预测一年生存率的主要因素。我们使用EBM和SHAP来解释模型错误分类,并研究特征相互作用在预后中的作用。
    结论:可解释模型是预测医学领域的自然选择。本质上可解释的模型,如EBM,通过对临床医生可能未知的潜在危险因素及其相互作用进行加权,可以提供优于传统的脑肿瘤预后临床评估的优势。模型预测和临床知识之间的协议对于在模型决策过程中建立信任至关重要,以及信任模型在应用于新数据时会做出准确的预测。
    OBJECTIVE: Prediction of survival in patients diagnosed with a brain tumour is challenging because of heterogeneous tumour behaviours and treatment response. Advances in machine learning have led to the development of clinical prognostic models, but due to the lack of model interpretability, integration into clinical practice is almost non-existent. In this retrospective study, we compare five classification models with varying degrees of interpretability for the prediction of brain tumour survival greater than one year following diagnosis.
    METHODS: 1028 patients aged ≥16 years with a brain tumour diagnosis between April 2012 and April 2020 were included in our study. Three intrinsically interpretable \'glass box\' classifiers (Bayesian Rule Lists [BRL], Explainable Boosting Machine [EBM], and Logistic Regression [LR]), and two \'black box\' classifiers (Random Forest [RF] and Support Vector Machine [SVM]) were trained on electronic patients records for the prediction of one-year survival. All models were evaluated using balanced accuracy (BAC), F1-score, sensitivity, specificity, and receiver operating characteristics. Black box model interpretability and misclassified predictions were quantified using SHapley Additive exPlanations (SHAP) values and model feature importance was evaluated by clinical experts.
    RESULTS: The RF model achieved the highest BAC of 78.9%, closely followed by SVM (77.7%), LR (77.5%) and EBM (77.1%). Across all models, age, diagnosis (tumour type), functional features, and first treatment were top contributors to the prediction of one year survival. We used EBM and SHAP to explain model misclassifications and investigated the role of feature interactions in prognosis.
    CONCLUSIONS: Interpretable models are a natural choice for the domain of predictive medicine. Intrinsically interpretable models, such as EBMs, may provide an advantage over traditional clinical assessment of brain tumour prognosis by weighting potential risk factors and their interactions that may be unknown to clinicians. An agreement between model predictions and clinical knowledge is essential for establishing trust in the models decision making process, as well as trust that the model will make accurate predictions when applied to new data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    没有多巴胺能缺陷(SWEDD)证据的扫描是指模拟帕金森病(PD)运动和非运动症状但显示多巴胺能系统完整性的患者。出于这个原因,在没有多巴胺成像的情况下,SWEDD和PD患者之间的鉴别诊断通常是不可能的.机器学习(ML)在自动区分这两种疾病与临床和成像数据方面表现出最佳性能。然而,最常见的应用ML算法以牺牲发现的清晰度为代价提供高精度。在这项工作中,一种新颖的ML玻璃盒模型,可解释的助推器(EBM),基于广义加性模型加相互作用(GA2Ms),用于获得对PD和SWEDD进行分类的可解释性,同时仍提供最佳性能。数据集(168名健康对照,HC;396PD;58SWEDD)从PPMI数据库获得,在临床和影像学特征中包括178个。在具有(SBR)和不具有(noSBR)多巴胺能纹状体特异性结合比的特征空间上训练了六个二元EBM分类器:HC-PDSBR,HC-SWEDDSBR,PD-SWEDDSBR和HC-PDnoSBR,HC-SWEDDnoSBR,PD-SWEDDnoSBR。在从PD和SWEDD分类HC方面达到了优异的AUC-ROC(1),有和没有SBR,和PD-SWEDDSBR(0.986),而PD-SWEDDnoSBR显示较低的AUC-ROC(0.882)。除了最佳的准确性,EBM算法能够提供全局和局部解释,揭示UPSIT小册子#1和Epworth嗜睡量表项目3(ESS3)之间存在成对相互作用,MDS-UPDRS-III旋前旋后运动右手(NP3PRSPR)和MDS-UPDRS-III强直左上肢(NP3RIGLU)可以在预测PD和SWEDD方面提供良好的性能,也没有影像学特征。
    Scans without evidence of dopaminergic deficit (SWEDD) refers to patients who mimics motor and non-motor symptoms of Parkinson\'s disease (PD) but showing integrity of dopaminergic system. For this reason, the differential diagnosis between SWEDD and PD patients is often not possible in absence of dopamine imaging. Machine Learning (ML) showed optimal performance in automatically distinguishing these two diseases from clinical and imaging data. However, the most common applied ML algorithms provide high accuracy at expense of findings intelligibility. In this work, a novel ML glass-box model, the Explainable Boosting Machine (EBM), based on Generalized Additive Models plus interactions (GA2Ms), was employed to obtain interpretability in classifying PD and SWEDD while still providing optimal performance. Dataset (168 healthy controls, HC; 396 PD; 58 SWEDD) was obtained from PPMI database and consisted of 178 among clinical and imaging features. Six binary EBM classifiers were trained on feature space with (SBR) and without (noSBR) dopaminergic striatal specific binding ratio: HC-PDSBR, HC-SWEDDSBR, PD-SWEDDSBR and HC-PDnoSBR, HC-SWEDDnoSBR, PD-SWEDDnoSBR. Excellent AUC-ROC (1) was reached in classifying HC from PD and SWEDD, both with and without SBR, and by PD-SWEDDSBR (0.986), while PD-SWEDDnoSBR showed lower AUC-ROC (0.882). Apart from optimal accuracies, EBM algorithm was able to provide global and local explanations, revealing that the presence of pairwise interactions between UPSIT Booklet #1 and Epworth Sleepiness Scale item 3 (ESS3), MDS-UPDRS-III pronation-supination movements right hand (NP3PRSPR) and MDS-UPDRS-III rigidity left upper limb (NP3RIGLU) could provide good performance in predicting PD and SWEDD also without imaging features.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号