Explainable machine learning

可解释的机器学习
  • 文章类型: Journal Article
    新生儿和婴儿的肺动脉高压(PH)是一种复杂的疾病,与几种肺,心脏,以及导致发病率和死亡率的全身性疾病。因此,准确和早期检测PH及其严重程度的分类对于适当和成功的管理至关重要。使用超声心动图,儿科的主要诊断工具,人类评估既耗时又需要专业知识,提高了对自动化方法的需求。使用超声心动图自动评估PH的努力很少,少数提出的方法只关注成人人群的二元PH分类。在这项工作中,我们提出了一种可解释的基于多视角视频的深度学习方法,利用超声心动图对270例新生儿的PH严重程度进行预测和分类.我们使用时空卷积架构从每个视图预测PH,并使用多数投票汇总不同观点的预测。我们的结果显示,在保留的测试集上,使用10倍交叉验证的严重程度预测的平均F1评分为0.84,二元检测的平均F1评分为0.92,严重程度预测的平均F1评分为0.63,二元检测的平均F1评分为0.78。我们用显著性图补充我们的预测,并表明学习的模型侧重于临床相关的心脏结构,激励其在临床实践中的使用。据我们所知,这是第一项使用超声心动图自动评估新生儿PH的工作.
    Pulmonary hypertension (PH) in newborns and infants is a complex condition associated with several pulmonary, cardiac, and systemic diseases contributing to morbidity and mortality. Thus, accurate and early detection of PH and the classification of its severity is crucial for appropriate and successful management. Using echocardiography, the primary diagnostic tool in pediatrics, human assessment is both time-consuming and expertise-demanding, raising the need for an automated approach. Little effort has been directed towards automatic assessment of PH using echocardiography, and the few proposed methods only focus on binary PH classification on the adult population. In this work, we present an explainable multi-view video-based deep learning approach to predict and classify the severity of PH for a cohort of 270 newborns using echocardiograms. We use spatio-temporal convolutional architectures for the prediction of PH from each view, and aggregate the predictions of the different views using majority voting. Our results show a mean F1-score of 0.84 for severity prediction and 0.92 for binary detection using 10-fold cross-validation and 0.63 for severity prediction and 0.78 for binary detection on the held-out test set. We complement our predictions with saliency maps and show that the learned model focuses on clinically relevant cardiac structures, motivating its usage in clinical practice. To the best of our knowledge, this is the first work for an automated assessment of PH in newborns using echocardiograms.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:心血管疾病(CD)是全球主要的健康问题,影响数百万人的症状,如疲劳和胸部不适。及时识别是至关重要的,因为它对全球死亡率的重大贡献。在医疗保健方面,人工智能(AI)有望推进疾病风险评估和治疗结果预测。然而,机器学习(ML)的发展引发了人们对数据隐私和偏见的担忧,特别是在敏感的医疗保健应用。目标是开发和实施一个负责任的CD预测AI模型,优先考虑患者隐私,安全,确保透明度,可解释性,公平,以及医疗保健应用中的道德遵守。
    方法:为了预测CD,同时优先考虑患者的隐私,我们的研究采用了数据匿名化,包括将拉普拉斯噪声添加到年龄和性别等敏感特征中.使用差分隐私(DP)框架对匿名数据集进行分析以保护数据隐私。DP在提取见解的同时确保了机密性。与Logistic回归(LR)相比,高斯朴素贝叶斯(GNB),和随机森林(RF),该方法集成了特征选择,统计分析,和SHapley加法扩张(SHAP)和局部可解释模型不可知解释(LIME)的可解释性。这种方法有助于透明和可解释的人工智能决策,符合负责任的AI开发原则。总的来说,它结合了隐私保护,可解释性,以及准确的CD预测的道德考虑。
    结果:我们对LR的DP框架的研究很有希望,曲线下面积(AUC)为0.848±0.03,准确率为0.797±0.02,准确率为0.789±0.02,召回率为0.797±0.02,F1评分为0.787±0.02,与非隐私框架具有可比性。基于SHAP和LIME的结果支持临床发现,表现出对透明和可解释的人工智能决策的承诺,并符合负责任的AI开发原则。
    结论:我们的研究支持一种预测CD的新方法,合并数据匿名化,隐私保护方法,可解释性工具SHAP,LIME,和道德考虑。这个负责任的AI框架确保了准确的预测,隐私保护,和用户信任,强调全面和透明的ML模型在医疗保健中的重要性。因此,这项研究增强了预测CD的能力,为全球数百万CD患者提供重要的生命线,并可能防止大量死亡。
    OBJECTIVE: Cardiovascular disease (CD) is a major global health concern, affecting millions with symptoms like fatigue and chest discomfort. Timely identification is crucial due to its significant contribution to global mortality. In healthcare, artificial intelligence (AI) holds promise for advancing disease risk assessment and treatment outcome prediction. However, machine learning (ML) evolution raises concerns about data privacy and biases, especially in sensitive healthcare applications. The objective is to develop and implement a responsible AI model for CD prediction that prioritize patient privacy, security, ensuring transparency, explainability, fairness, and ethical adherence in healthcare applications.
    METHODS: To predict CD while prioritizing patient privacy, our study employed data anonymization involved adding Laplace noise to sensitive features like age and gender. The anonymized dataset underwent analysis using a differential privacy (DP) framework to preserve data privacy. DP ensured confidentiality while extracting insights. Compared with Logistic Regression (LR), Gaussian Naïve Bayes (GNB), and Random Forest (RF), the methodology integrated feature selection, statistical analysis, and SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME) for interpretability. This approach facilitates transparent and interpretable AI decision-making, aligning with responsible AI development principles. Overall, it combines privacy preservation, interpretability, and ethical considerations for accurate CD predictions.
    RESULTS: Our investigations from the DP framework with LR were promising, with an area under curve (AUC) of 0.848 ± 0.03, an accuracy of 0.797 ± 0.02, precision at 0.789 ± 0.02, recall at 0.797 ± 0.02, and an F1 score of 0.787 ± 0.02, with a comparable performance with the non-privacy framework. The SHAP and LIME based results support clinical findings, show a commitment to transparent and interpretable AI decision-making, and aligns with the principles of responsible AI development.
    CONCLUSIONS: Our study endorses a novel approach in predicting CD, amalgamating data anonymization, privacy-preserving methods, interpretability tools SHAP, LIME, and ethical considerations. This responsible AI framework ensures accurate predictions, privacy preservation, and user trust, underscoring the significance of comprehensive and transparent ML models in healthcare. Therefore, this research empowers the ability to forecast CD, providing a vital lifeline to millions of CD patients globally and potentially preventing numerous fatalities.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    全球不断升级的藻华对生态系统服务构成了越来越大的威胁。在这项研究中,利用水质参数的时空异质性将滇池划分为三个集群。考虑到水质参数以及气象因素的延迟和瞬时影响,合奏学习,准蒙特卡罗方法用于预测2021年1月至2024年1月之间的每日藻类细胞密度(AD)。始终如一,Stacking-Elastic-Net正则化模型在所有三个集群中都表现出优异的预测准确性。此外,确定了为每个集群实现接近最优精度的最小驱动因素组合,在准确性和成本之间取得平衡。司机对AD影响的排名因集群而异,而气象因素对AD的延迟效应通常超过其对所有集群的瞬时效应。此外,探讨了驱动者和AD之间的异质性或同质性阈值和反应。这些发现可以作为政府机构制定区域可持续水质管理战略的科学和成本效益高的基础。
    The escalating global occurrence of algal blooms poses a growing threat to ecosystem services. In this study, the spatiotemporal heterogeneity of water quality parameters was leveraged to partition Lake Dianchi into three clusters. Considering water quality parameters and both the delayed and instantaneous effects of meteorological factors, ensemble learning, and quasi-Monte Carlo methods were employed to predict daily algal cell density (AD) between January 2021 and January 2024. Consistently, superior predictive accuracy across all three clusters was exhibited by the Stacking-Elastic-Net regularization model. Furthermore, the minimum combination of drivers that achieved near-optimal accuracy for each cluster was identified, striking a balance between accuracy and cost. The ranking of the effect of drivers on AD varied by cluster, while the delayed effect of meteorological factors on AD generally outweighed their instantaneous effect for all clusters. Additionally, the heterogeneous or homogeneous thresholds and responses between drivers and AD were explored. These findings could serve as a scientific and cost-effective basis for government agencies to develop regional sustainable strategies for managing water quality.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    全球人口老龄化是一个重大挑战,老年人的身体和认知能力下降,对慢性疾病和不良健康结局的脆弱性增加。这项研究旨在开发一种可解释的深度学习(DL)模型,以预测住院72小时内老年患者的不良事件。
    该研究使用了台湾一家主要医疗中心的回顾性数据(2017-2020年)。其中包括非创伤老年患者,他们去了急诊科并被送往普通病房。数据预处理包括收集预后因素,如生命体征,实验室结果,病史,和临床管理。开发了一种深度前馈神经网络,并使用准确性评估性能,灵敏度,特异性,阳性预测值(PPV),和接受者工作特征曲线下面积(AUC)。模型解释利用了Shapley加法解释(SHAP)技术。
    分析包括127,268名患者,2.6%的人即将经历重症监护病房转移,呼吸衰竭,或在住院期间死亡。DL模型在验证集和测试集中实现了0.86和0.84的AUC,分别,优于序贯器官衰竭评估(SOFA)评分。敏感性和特异性值范围为0.79至0.81。SHAP技术提供了对特征重要性和交互的见解。
    开发的DL模型在预测老年患者住院72小时内的严重不良事件方面具有很高的准确性。它的性能优于SOFA分数,并为模型的决策过程提供了有价值的见解。
    UNASSIGNED: The global aging population presents a significant challenge, with older adults experiencing declining physical and cognitive abilities and increased vulnerability to chronic diseases and adverse health outcomes. This study aims to develop an interpretable deep learning (DL) model to predict adverse events in geriatric patients within 72 hours of hospitalization.
    UNASSIGNED: The study used retrospective data (2017-2020) from a major medical center in Taiwan. It included non-trauma geriatric patients who visited the emergency department and were admitted to the general ward. Data preprocessing involved collecting prognostic factors like vital signs, lab results, medical history, and clinical management. A deep feedforward neural network was developed, and performance was evaluated using accuracy, sensitivity, specificity, positive predictive value (PPV), and area under the receiver operating characteristic curve (AUC). Model interpretation utilized the Shapley Additive Explanation (SHAP) technique.
    UNASSIGNED: The analysis included 127,268 patients, with 2.6% experiencing imminent intensive care unit transfer, respiratory failure, or death during hospitalization. The DL model achieved AUCs of 0.86 and 0.84 in the validation and test sets, respectively, outperforming the Sequential Organ Failure Assessment (SOFA) score. Sensitivity and specificity values ranged from 0.79 to 0.81. The SHAP technique provided insights into feature importance and interactions.
    UNASSIGNED: The developed DL model demonstrated high accuracy in predicting serious adverse events in geriatric patients within 72 hours of hospitalization. It outperformed the SOFA score and provided valuable insights into the model\'s decision-making process.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    性别差异影响帕金森病(PD)的发展和表现。然而,当前的PD鉴定和治疗方法未充分利用这些区别。以性为重点的PD文献通常将患病率优先于特征重要性分析。然而,潜在的方面可以使一个特征对于预测PD很重要,尽管它的分数。功能之间的交互需要考虑,评分差异和实际特征重要性之间的区别也是如此。例如,男性对某一特征的得分较高并不一定意味着它对女性PD的表征不那么重要。本文提出了一个可解释的机器学习(ML)模型来阐明这些潜在因素,强调特征的重要性。这种洞察力对于个性化医疗至关重要,这表明需要为男性和女性量身定制数据收集和分析。该模型确定了PD的性别差异,帮助预测结果为“健康”或“病理”。它采用了系统级的方法,整合异构数据-临床,成像,遗传学,和人口统计学-研究用于诊断的新生物标志物。可解释的ML方法帮助非ML专家理解模型决策,培养信任并促进对复杂机器学习结果的解释,从而增强可用性和转化研究。ML模型识别肌肉僵硬,自主和认知评估,和家族史是PD诊断的关键贡献者,注意到性别差异。遗传变异SNCA-rs356181在表征男性PD中可能更重要。相互作用分析显示,与女性相比,男性之间的特征相互作用发生率更高。这些差异提供了对PD病理生理学的见解,并可以指导性别特异性诊断和治疗方法的发展。
    Sex differences affect Parkinson\'s disease (PD) development and manifestation. Yet, current PD identification and treatments underuse these distinctions. Sex-focused PD literature often prioritizes prevalence rates over feature importance analysis. However, underlying aspects could make a feature significant for predicting PD, despite its score. Interactions between features require consideration, as do distinctions between scoring disparities and actual feature importance. For instance, a higher score in males for a certain feature doesn\'t necessarily mean it\'s less important for characterizing PD in females. This article proposes an explainable Machine Learning (ML) model to elucidate these underlying factors, emphasizing the importance of features. This insight could be critical for personalized medicine, suggesting the need to tailor data collection and analysis for males and females. The model identifies sex-specific differences in PD, aiding in predicting outcomes as \"Healthy\" or \"Pathological\". It adopts a system-level approach, integrating heterogeneous data - clinical, imaging, genetics, and demographics - to study new biomarkers for diagnosis. The explainable ML approach aids non-ML experts in understanding model decisions, fostering trust and facilitating interpretation of complex ML outcomes, thus enhancing usability and translational research. The ML model identifies muscle rigidity, autonomic and cognitive assessments, and family history as key contributors to PD diagnosis, with sex differences noted. The genetic variant SNCA-rs356181 may be more significant in characterizing PD in males. Interaction analysis reveals a greater occurrence of feature interplay among males compared to females. These disparities offer insights into PD pathophysiology and could guide the development of sex-specific diagnostic and therapeutic approaches.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:这项研究旨在基于可解释的机器学习方法开发更高性能的列线图,并根据重症监护病房(ICU)入院第一天的临床特征预测卒中患者30天内的死亡风险。
    方法:从重症监护医学信息市场(MIMIC)IV和III数据库中提取与卒中患者相关的数据。LightGBM机器学习方法与Shapely加法解释(称为解释机器学习,EML)用于选择临床特征并定义所选特征的截止点。然后使用Cox比例风险回归模型和Kaplan-Meier存活曲线评估这些选择的特征和截止点。最后,使用原始变量和分界点二分的变量构建基于逻辑回归的列线图,用于预测中风患者30天死亡率。分别。在总体和个体维度上评估了两个列线图的性能。
    结果:共纳入2982例中风患者和64例临床特征,MIMIC-IV数据集的30天死亡率为23.6%.10个变量(“沙发(败血症相关器官衰竭评估)”,“最低葡萄糖”,“最高钠”,\"年龄\",\“平均spo2(血氧饱和度)\”,\"最高温度\",\"最大心率\",\“最低bun(血尿素氮)\”,“最小wbc(白细胞)”和“charlson合并症指数”)和相应的截止点由EML定义。在Cox比例风险回归模型(Cox回归)和Kaplan-Meier存活曲线中,根据每个变量的分界点对中风患者进行分组后,属于高危亚组的患者30日死亡率高于低危亚组.对列线图的评估发现,基于EML的列线图不仅优于NIR(净重新分类指数)中的常规列线图,整体维度上的Brier评分和临床净效益,而且在个体维度上也有显著改善,特别是对于低“最高温度”患者。
    结论:10个选定的ICU入住第一天的临床特征需要对卒中患者给予更多关注。基于可解释机器学习的列线图将具有更大的临床应用价值。
    BACKGROUND: This study aimed to develop a higher performance nomogram based on explainable machine learning methods, and to predict the risk of death of stroke patients within 30 days based on clinical characteristics on the first day of intensive care units (ICU) admission.
    METHODS: Data relating to stroke patients were extracted from the Medical Information Marketplace of the Intensive Care (MIMIC) IV and III database. The LightGBM machine learning approach together with Shapely additive explanations (termed as explain machine learning, EML) was used to select clinical features and define cut-off points for the selected features. These selected features and cut-off points were then evaluated using the Cox proportional hazards regression model and Kaplan-Meier survival curves. Finally, logistic regression-based nomograms for predicting 30-day mortality of stroke patients were constructed using original variables and variables dichotomized by cut-off points, respectively. The performance of two nomograms were evaluated in overall and individual dimension.
    RESULTS: A total of 2982 stroke patients and 64 clinical features were included, and the 30-day mortality rate was 23.6% in the MIMIC-IV datasets. 10 variables (\"sofa (sepsis-related organ failure assessment)\", \"minimum glucose\", \"maximum sodium\", \"age\", \"mean spo2 (blood oxygen saturation)\", \"maximum temperature\", \"maximum heart rate\", \"minimum bun (blood urea nitrogen)\", \"minimum wbc (white blood cells)\" and \"charlson comorbidity index\") and respective cut-off points were defined from the EML. In the Cox proportional hazards regression model (Cox regression) and Kaplan-Meier survival curves, after grouping stroke patients according to the cut-off point of each variable, patients belonging to the high-risk subgroup were associated with higher 30-day mortality than those in the low-risk subgroup. The evaluation of nomograms found that the EML-based nomogram not only outperformed the conventional nomogram in NIR (net reclassification index), brier score and clinical net benefits in overall dimension, but also significant improved in individual dimension especially for low \"maximum temperature\" patients.
    CONCLUSIONS: The 10 selected first-day ICU admission clinical features require greater attention for stroke patients. And the nomogram based on explainable machine learning will have greater clinical application.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:我们开发了可解释的机器学习模型来预测腹膜后脂肪肉瘤(RLPS)患者的总生存期(OS)。这种方法旨在增强我们建模结果的可解释性和透明度。
    方法:我们从监测中收集RLPS患者的临床病理信息,流行病学,和最终结果(SEER)数据库,并以7:3的比例将它们分配到训练集和验证集。同时,我们从海军医科大学第一附属医院(上海,中国)。我们进行了LASSO回归和多变量Cox比例风险分析,以识别相关的危险因素,然后将其组合以开发六个机器学习(ML)模型:Cox比例风险模型(Coxph),随机生存森林(RSF),游侠,使用分量线性模型(GBM)的梯度增强,决策树,提升树木。使用一致性指数(C指数)评估了这些ML模型的预测性能,积分累积/动态曲线下面积(AUC),和综合Brier得分,以及Cox-Snell残差图。我们还使用了时间依赖的变量重要性,部分依赖生存图的分析,和聚集生存Shapley加法扩张(SurvSHAP)图的生成,以提供最优模型的全局解释。此外,SurvSHAP(t)和生存局部可解释模型不可知解释(SurvLIME)图用于提供最佳模型的局部解释。
    结果:最终的ML模型由六个因素组成:患者的年龄,性别,婚姻状况,手术史,以及肿瘤的组织病理学分类,组织学分级,SEER阶段。我们的预后模型表现出显著的判别能力,特别是在游侠模型表现最佳的情况下。在训练集中,验证集,和外部验证集,1、3和5年OS的AUC均高于0.83,Brier积分始终低于0.15.游侠模型的可解释性分析还表明,组织学分级,组织病理学分类,年龄是预测操作系统的最重要因素。
    结论:rangerML预后模型表现出最佳性能,可用于预测RLPS患者的OS,为临床医生提前做出明智的决定提供有价值和关键的参考。
    OBJECTIVE: We have developed explainable machine learning models to predict the overall survival (OS) of retroperitoneal liposarcoma (RLPS) patients. This approach aims to enhance the explainability and transparency of our modeling results.
    METHODS: We collected clinicopathological information of RLPS patients from The Surveillance, Epidemiology, and End Results (SEER) database and allocated them into training and validation sets with a 7:3 ratio. Simultaneously, we obtained an external validation cohort from The First Affiliated Hospital of Naval Medical University (Shanghai, China). We performed LASSO regression and multivariate Cox proportional hazards analysis to identify relevant risk factors, which were then combined to develop six machine learning (ML) models: Cox proportional hazards model (Coxph), random survival forest (RSF), ranger, gradient boosting with component-wise linear models (GBM), decision trees, and boosting trees. The predictive performance of these ML models was evaluated using the concordance index (C-index), the integrated cumulative/dynamic area under the curve (AUC), and the integrated Brier score, as well as the Cox-Snell residual plot. We also used time-dependent variable importance, analysis of partial dependence survival plots, and the generation of aggregated survival SHapley Additive exPlanations (SurvSHAP) plots to provide a global explanation of the optimal model. Additionally, SurvSHAP (t) and survival local interpretable model-agnostic explanations (SurvLIME) plots were used to provide a local explanation of the optimal model.
    RESULTS: The final ML models are consisted of six factors: patient\'s age, gender, marital status, surgical history, as well as tumor\'s histopathological classification, histological grade, and SEER stage. Our prognostic model exhibits significant discriminative ability, particularly with the ranger model performing optimally. In the training set, validation set, and external validation set, the AUC for 1, 3, and 5 year OS are all above 0.83, and the integrated Brier scores are consistently below 0.15. The explainability analysis of the ranger model also indicates that histological grade, histopathological classification, and age are the most influential factors in predicting OS.
    CONCLUSIONS: The ranger ML prognostic model exhibits optimal performance and can be utilized to predict the OS of RLPS patients, offering valuable and crucial references for clinical physicians to make informed decisions in advance.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    谵妄是重症监护病房(ICU)收治的老年人中最常见的神经心理并发症,通常与预后不良有关。本研究旨在构建和验证可解释的机器学习(ML),用于老年ICU患者的早期谵妄预测。
    这是一项回顾性观察性队列研究,患者数据从医疗信息集市重症监护IV数据库中提取。与谵妄相关的特征变量,包括诱发因素,疾病相关因素,以及医源性和环境因素,使用最小绝对收缩和选择算子回归进行选择,并使用逻辑回归建立预测模型,决策树,支持向量机,极端梯度提升(XGBoost),k近邻和朴素贝叶斯方法。多个指标用于评估模型的性能,包括接收器工作特性曲线下的面积(AUC),准确度,灵敏度,特异性,召回,F1得分,校准图,和决策曲线分析。使用Shapley添加剂扩张(SHAP)来提高最终模型的可解释性。
    九千七百四十八名65岁或以上的成年人被纳入分析。选择26个特征构建ML预测模型。在比较的模型中,XGBoost模型表现出最好的性能,包括最高的AUC(0.836),精度(0.765),灵敏度(0.713),召回(0.713),和训练集中的F1得分(0.725)。它还表现出优异的辨别力,AUC为0.810,良好的校准,并且在验证队列中具有最高的净获益。SHAP汇总分析表明,格拉斯哥昏迷量表,机械通气,镇静是结局预测的三大风险特征。SHAP依赖图和SHAP力分析在因子水平和个体水平上解释了模型,分别。
    ML是预测老年患者严重谵妄风险的可靠工具。通过结合XGBoost和SHAP,它可以为个性化风险预测提供清晰的解释,并更直观地理解模型中关键特征的作用。这种模式的建立将有助于对谵妄进行早期风险评估和及时干预。
    UNASSIGNED: Delirium is the most common neuropsychological complication among older adults admitted to the intensive care unit (ICU) and is often associated with a poor prognosis. This study aimed to construct and validate an interpretable machine learning (ML) for early delirium prediction in older ICU patients.
    UNASSIGNED: This was a retrospective observational cohort study and patient data were extracted from the Medical Information Mart for Intensive Care-IV database. Feature variables associated with delirium, including predisposing factors, disease-related factors, and iatrogenic and environmental factors, were selected using least absolute shrinkage and selection operator regression, and prediction models were built using logistic regression, decision trees, support vector machines, extreme gradient boosting (XGBoost), k-nearest neighbors and naive Bayes methods. Multiple metrics were used for evaluation of performance of the models, including the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, recall, F1 score, calibration plot, and decision curve analysis. SHapley Additive exPlanations (SHAP) were used to improve the interpretability of the final model.
    UNASSIGNED: Nine thousand seven hundred forty-eight adults aged 65 years or older were included for analysis. Twenty-six features were selected to construct ML prediction models. Among the models compared, the XGBoost model demonstrated the best performance including the highest AUC (0.836), accuracy (0.765), sensitivity (0.713), recall (0.713), and F1 score (0.725) in the training set. It also exhibited excellent discrimination with AUC of 0.810, good calibration, and had the highest net benefit in the validation cohort. The SHAP summary analysis showed that Glasgow Coma Scale, mechanical ventilation, and sedation were the top three risk features for outcome prediction. The SHAP dependency plot and SHAP force analysis interpreted the model at both the factor level and individual level, respectively.
    UNASSIGNED: ML is a reliable tool for predicting the risk of critical delirium in elderly patients. By combining XGBoost and SHAP, it can provide clear explanations for personalized risk prediction and more intuitive understanding of the effect of key features in the model. The establishment of such a model would facilitate the early risk assessment and prompt intervention for delirium.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    机器学习模型正在彻底改变我们发现和设计生物活性肽的方法。这些模型通常需要蛋白质结构意识,因为他们严重依赖顺序数据。这些模型擅长识别特定生物学性质或活性的序列,但他们往往无法理解其复杂的行动机制。要同时解决两个问题,我们研究了抗菌肽作为(i)膜破坏肽的作用机制和结构景观,(ii)膜穿透性肽,和(iii)蛋白结合肽。通过分析关键特征,如二肽和物理化学描述符,我们开发了预测这些类别的高精度模型(86-88%).然而,我们的初始模型(1.0和2.0)表现出倾向于α-螺旋和盘绕结构,影响预测。为了解决这种结构偏差,我们实施了子集选择和数据缩减策略。前者给出了三种可能折叠成α螺旋的肽的结构特异性模型(模型1.1和2.1),线圈(1.3和2.3),或混合结构(1.4和2.4)。后者耗尽了过度代表的结构,导致结构不可知的预测因子1.5和2.5。此外,我们的研究强调了重要特征对不同模型结构类别的敏感性。
    Machine learning models are revolutionizing our approaches to discovering and designing bioactive peptides. These models often need protein structure awareness, as they heavily rely on sequential data. The models excel at identifying sequences of a particular biological nature or activity, but they frequently fail to comprehend their intricate mechanism(s) of action. To solve two problems at once, we studied the mechanisms of action and structural landscape of antimicrobial peptides as (i) membrane-disrupting peptides, (ii) membrane-penetrating peptides, and (iii) protein-binding peptides. By analyzing critical features such as dipeptides and physicochemical descriptors, we developed models with high accuracy (86-88%) in predicting these categories. However, our initial models (1.0 and 2.0) exhibited a bias towards α-helical and coiled structures, influencing predictions. To address this structural bias, we implemented subset selection and data reduction strategies. The former gave three structure-specific models for peptides likely to fold into α-helices (models 1.1 and 2.1), coils (1.3 and 2.3), or mixed structures (1.4 and 2.4). The latter depleted over-represented structures, leading to structure-agnostic predictors 1.5 and 2.5. Additionally, our research highlights the sensitivity of important features to different structure classes across models.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    创伤后晚期癫痫(LPTS),创伤性脑损伤(TBI)的结果,可能会演变成终身疾病,称为创伤后癫痫(PTE)。目前,在TBI患者中引发癫痫发生的机制仍然难以捉摸,激励癫痫社区设计方法来预测哪些TBI患者将发生PTE并确定潜在的生物标志物。为了满足这一需要,我们的研究收集全面,来自多个参与机构的48例TBI患者的纵向多模态数据。创建了一个有监督的二元分类任务,LPTS患者与无LPTS患者的数据进行对比。为了适应某些科目中缺失的模式,我们采取了双管齐下的方法。首先,我们扩展了基于图形模型的贝叶斯估计器,以直接对不完整模态的受试者进行分类。其次,我们探索了传统的插补技术。然后将估算的多模态信息组合起来,在文献中发现的几种融合和降维技术之后,并随后拟合到基于内核或基于树的分类器。对于这种融合,我们提出了两种新算法:递归消除相关分量(RECC),它根据已经选择的特征之间的相关性过滤信息,以及信息分解和选择性融合(IDSF),它有效地从分解的多模态特征中重组信息。我们的交叉验证结果表明,所提出的IDSF算法基于曲线下面积(AUC)得分提供了优越的性能。最终,经过严格的统计比较和使用最常选择的特征的Shapley值进行可解释的机器学习检查,我们建议将以下两种磁共振成像(MRI)异常作为潜在的生物标志物:扩散MRI(dMRI)中的内囊左前肢,功能磁共振成像(fMRI)的右颞中回。
    A late post-traumatic seizure (LPTS), a consequence of traumatic brain injury (TBI), can potentially evolve into a lifelong condition known as post-traumatic epilepsy (PTE). Presently, the mechanism that triggers epileptogenesis in TBI patients remains elusive, inspiring the epilepsy community to devise ways to predict which TBI patients will develop PTE and to identify potential biomarkers. In response to this need, our study collected comprehensive, longitudinal multimodal data from 48 TBI patients across multiple participating institutions. A supervised binary classification task was created, contrasting data from LPTS patients with those without LPTS. To accommodate missing modalities in some subjects, we took a two-pronged approach. Firstly, we extended a graphical model-based Bayesian estimator to directly classify subjects with incomplete modality. Secondly, we explored conventional imputation techniques. The imputed multimodal information was then combined, following several fusion and dimensionality reduction techniques found in the literature, and subsequently fitted to a kernel- or a tree-based classifier. For this fusion, we proposed two new algorithms: recursive elimination of correlated components (RECC) that filters information based on the correlation between the already selected features, and information decomposition and selective fusion (IDSF), which effectively recombines information from decomposed multimodal features. Our cross-validation findings showed that the proposed IDSF algorithm delivers superior performance based on the area under the curve (AUC) score. Ultimately, after rigorous statistical comparisons and interpretable machine learning examination using Shapley values of the most frequently selected features, we recommend the two following magnetic resonance imaging (MRI) abnormalities as potential biomarkers: the left anterior limb of internal capsule in diffusion MRI (dMRI), and the right middle temporal gyrus in functional MRI (fMRI).
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号