SHAP

SHAP
  • 文章类型: Journal Article
    背景:COVID-19病例死亡率(CFRs)存在显著的地理不平等,从全球角度全面了解其国家一级的决定因素是必要的。本研究旨在量化COVID-19CFR的特定国家风险,并提出量身定制的应对策略,包括疫苗接种策略,在156个国家。
    方法:自1月28日起,使用极端梯度增强(XGBoost)确定了COVID-19CFR的跨时间和跨国家变化,包括来自156个国家七个维度的35个因子,2020年1月31日,2022年。使用SHapley加法扩张(SHAP)进一步阐明了驱动CFR的关键因素和每个国家并发风险因素的影响。模拟了疫苗接种率的增加,以说明不同类别国家的CFR降低。
    结果:从2020年1月28日至2022年1月31日,COVID-19总体CFR在各国之间有所不同,范围为每100,000人口68至6373。在COVID-19大流行期间,CFR的决定因素首先从健康状况转变为全民健康覆盖,然后以疫苗接种为主的多因素混合效应。在奥米米周期,根据风险决定因素将国家分为五类。低疫苗接种驱动类(70个国家)主要分布在撒哈拉以南非洲和拉丁美洲,包括大多数低收入国家(95.7%),这些国家有许多并发风险因素。老龄化驱动类(26个国家)主要分布在欧洲高收入国家。高疾病负担类(32个国家)主要分布在亚洲和北美。低GDP驱动的阶层(14个国家)分散在各大洲。模拟疫苗接种率增加5%,导致低疫苗接种驱动类和高疾病负担驱动类的CFR降低31.2%和15.0%,分别,总体风险高的国家的CFR降低幅度更大(SHAP值>0.1),但老龄化驱动的阶层只有3.1%。
    结论:这项研究的证据表明,COVID-19CFR的地理不平等是由关键和并发风险共同决定的,实现降低COVID-19CFR需要的不仅仅是增加疫苗接种覆盖率,而是基于特定国家风险的有针对性的干预策略。
    BACKGROUND: There are significant geographic inequities in COVID-19 case fatality rates (CFRs), and comprehensive understanding its country-level determinants in a global perspective is necessary. This study aims to quantify the country-specific risk of COVID-19 CFR and propose tailored response strategies, including vaccination strategies, in 156 countries.
    METHODS: Cross-temporal and cross-country variations in COVID-19 CFR was identified using extreme gradient boosting (XGBoost) including 35 factors from seven dimensions in 156 countries from 28 January, 2020 to 31 January, 2022. SHapley Additive exPlanations (SHAP) was used to further clarify the clustering of countries by the key factors driving CFR and the effect of concurrent risk factors for each country. Increases in vaccination rates was simulated to illustrate the reduction of CFR in different classes of countries.
    RESULTS: Overall COVID-19 CFRs varied across countries from 28 Jan 2020 to 31 Jan 31 2022, ranging from 68 to 6373 per 100,000 population. During the COVID-19 pandemic, the determinants of CFRs first changed from health conditions to universal health coverage, and then to a multifactorial mixed effect dominated by vaccination. In the Omicron period, countries were divided into five classes according to risk determinants. Low vaccination-driven class (70 countries) mainly distributed in sub-Saharan Africa and Latin America, and include the majority of low-income countries (95.7%) with many concurrent risk factors. Aging-driven class (26 countries) mainly distributed in high-income European countries. High disease burden-driven class (32 countries) mainly distributed in Asia and North America. Low GDP-driven class (14 countries) are scattered across continents. Simulating a 5% increase in vaccination rate resulted in CFR reductions of 31.2% and 15.0% for the low vaccination-driven class and the high disease burden-driven class, respectively, with greater CFR reductions for countries with high overall risk (SHAP value > 0.1), but only 3.1% for the ageing-driven class.
    CONCLUSIONS: Evidence from this study suggests that geographic inequities in COVID-19 CFR is jointly determined by key and concurrent risks, and achieving a decreasing COVID-19 CFR requires more than increasing vaccination coverage, but rather targeted intervention strategies based on country-specific risks.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    有效探索几个关键因素对出租车司机交通违法行为的影响,为交通管理部门提供科学决策,以减少交通伤亡。
    关于南昌市出租车司机交通违法行为的43,458份电子执法数据,江西省,中国,2020年7月1日至2021年6月30日,分别探讨了交通违法特征。采用随机森林算法预测出租车司机交通违法的严重程度和影响交通违法的11个因素,包括时间,路况,环境,和出租车公司使用Shapley附加解释(SHAP)框架进行了分析。
    首先,集成方法平衡Bagging分类器(BBC)被用来平衡数据集。结果表明,原始不平衡数据集的不平衡率(IR)从6.61%降低到2.60%。此外,利用随机森林建立了出租车司机交通违法严重程度的预测模型,结果表明,准确性,m_F1,m_G-平均值,m_AUC,m_AP分别为0.877、0.849、0.599、0.976和0.957。与决策树算法相比,XGBoost,AdaBoost,和神经网络,基于随机森林的预测模型的性能指标最好。最后,SHAP框架用于提高模型的可解释性,并确定影响出租车司机交通违法行为的重要因素。结果表明,功能区,违规地点,发现和道路等级对交通违规的可能性有很大影响;它们的平均SHAP值分别为0.39、0.36和0.26。
    本文的研究结果可能有助于发现影响因素与交通违法严重程度之间的关系,为减少出租车司机的交通违法行为,提高道路安全管理水平提供理论依据。
    To effectively explore the impacts of several key factors on taxi drivers\' traffic violations and provide traffic management departments with scientific decisions to reduce traffic fatalities and injuries.
    43,458 electronic enforcement data about taxi drivers\' traffic violations in Nanchang City, Jiangxi Province, China, from July 1, 2020, to June 30, 2021, were utilized to explore the characteristics of traffic violations. A random forest algorithm was used to predict the severity of taxi drivers\' traffic violations and 11 factors affecting traffic violations, including time, road conditions, environment, and taxi companies were analyzed using the Shapley Additionality Explanation (SHAP) framework.
    Firstly, the ensemble method Balanced Bagging Classifier (BBC) was applied to balance the dataset. The results showed that the imbalance ratio (IR) of the original imbalanced dataset reduced from 6.61% to 2.60%. Moreover, a prediction model for the severity of taxi drivers\' traffic violations was established by using the Random Forest, and the results showed that accuracy, m_F1, m_G-mean, m_AUC, and m_AP obtained 0.877, 0.849, 0.599, 0.976, and 0.957, respectively. Compared with the algorithms of Decision Tree, XG Boost, Ada Boost, and Neural Network, the performance measures of the prediction model based on Random Forest were the best. Finally, the SHAP framework was used to improve the interpretability of the model and identify important factors affecting taxi drivers\' traffic violations. The results showed that functional districts, location of the violation, and road grade were found to have a high impact on the probability of traffic violations; their mean SHAP values were 0.39, 0.36, and 0.26, respectively.
    Findings of this paper may help to discover the relationship between the influencing factors and the severity of traffic violations, and provide a theoretical basis for reducing the traffic violations of taxi drivers and improving the road safety management.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    大流行期间医学专家面临的主要挑战之一是识别和验证新疾病的风险因素并制定有效的治疗方案所需的时间。传统上,这个过程涉及许多可能需要长达数年的临床试验,在此期间,必须采取严格的预防措施来控制疫情并减少死亡。先进的数据分析技术,然而,可以用来指导和加快这一进程。在这项研究中,我们结合进化搜索算法,深度学习,和先进的模型解释方法,以开发一个整体的探索性-预测性-解释性机器学习框架,可以帮助临床决策者及时应对大流行的挑战。拟议的框架在使用真实世界电子健康记录数据库中的急诊就诊研究COVID-19患者的急诊科(ED)再入院时得到了展示。在使用遗传算法进行探索性特征选择阶段之后,我们开发和训练一个深度人工神经网络来早期预测(即,7天)再入院(AUC=0.883)。最后,建立了SHAP模型来估计加性Shapley值(即,重要性评分)的特征,并解释其影响的大小和方向。这些发现大多与冗长而昂贵的临床试验研究报告的结果一致。
    One of the major challenges that confront medical experts during a pandemic is the time required to identify and validate the risk factors of the novel disease and to develop an effective treatment protocol. Traditionally, this process involves numerous clinical trials that may take up to several years, during which strict preventive measures must be in place to control the outbreak and reduce the deaths. Advanced data analytics techniques, however, can be leveraged to guide and speed up this process. In this study, we combine evolutionary search algorithms, deep learning, and advanced model interpretation methods to develop a holistic exploratory-predictive-explanatory machine learning framework that can assist clinical decision-makers in reacting to the challenges of a pandemic in a timely manner. The proposed framework is showcased in studying emergency department (ED) readmissions of COVID-19 patients using ED visits from a real-world electronic health records database. After an exploratory feature selection phase using genetic algorithm, we develop and train a deep artificial neural network to predict early (i.e., 7-day) readmissions (AUC = 0.883). Lastly, a SHAP model is formulated to estimate additive Shapley values (i.e., importance scores) of the features and to interpret the magnitude and direction of their effects. The findings are mostly in line with those reported by lengthy and expensive clinical trial studies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号