Shapley

沙普利
  • 文章类型: Journal Article
    旨在早期发现和准确预测心血管疾病(CVD),以降低死亡率,这项研究的重点是开发一种智能预测系统,以识别有心血管疾病风险的个体。所提出的系统的主要目标是将深度学习模型与高级数据挖掘技术相结合,以促进明智的决策和精确的CVD预测。这种方法涉及几个基本步骤,包括采集数据的预处理,优化的功能选择,和疾病分类,所有这些都旨在提高系统的有效性。所选择的最佳特征作为输入被馈送到疾病分类模型和一些机器学习(ML)算法中,以改进CVD分类中的性能。实验在Python平台上进行了模拟,评估指标如准确性、灵敏度,和F1_score被用来评估模型的性能。ML模型(额外的树(ET),随机森林(RF),AdaBoost,和XG-Boost)分类器实现了94.35%的高准确率,97.87%,96.44%,99.00%,分别,在测试装置上,而拟议的CardioVitalNet(CVN)达到87.45%的准确率。这些结果为选择医疗数据分析模型的过程提供了有价值的见解,最终增强做出更准确诊断和预测的能力。
    Aiming at early detection and accurate prediction of cardiovascular disease (CVD) to reduce mortality rates, this study focuses on the development of an intelligent predictive system to identify individuals at risk of CVD. The primary objective of the proposed system is to combine deep learning models with advanced data mining techniques to facilitate informed decision-making and precise CVD prediction. This approach involves several essential steps, including the preprocessing of acquired data, optimized feature selection, and disease classification, all aimed at enhancing the effectiveness of the system. The chosen optimal features are fed as input to the disease classification models and into some Machine Learning (ML) algorithms for improved performance in CVD classification. The experiment was simulated in the Python platform and the evaluation metrics such as accuracy, sensitivity, and F1_score were employed to assess the models\' performances. The ML models (Extra Trees (ET), Random Forest (RF), AdaBoost, and XG-Boost) classifiers achieved high accuracies of 94.35%, 97.87%, 96.44%, and 99.00%, respectively, on the test set, while the proposed CardioVitalNet (CVN) achieved 87.45% accuracy. These results offer valuable insights into the process of selecting models for medical data analysis, ultimately enhancing the ability to make more accurate diagnoses and predictions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:对COVID-19患者的严重程度进行稳健而准确的预测对于患者的分诊决策至关重要。许多提出的模型倾向于高偏差风险或低至中度歧视。有些还缺乏临床可解释性,并且是根据早期大流行时期的数据开发的。因此,为了更好的临床适用性,迫切需要改进预测模型.
    目的:本研究的主要目的是开发和验证一种基于机器学习的健壮和可解释的早期分诊支持(RIETS)系统,该系统可预测严重程度进展(涉及以下任何事件:重症监护病房入院,在医院死亡,需要机械通风,或需要体外膜氧合)根据常规可用的临床和实验室生物标志物在住院后15天内。
    方法:我们纳入了2020年1月至2022年8月收集的来自韩国19家医院的5945例COVID-19住院患者的数据。对于模型开发和外部验证,根据医院类型(普通和三级治疗)和地理位置(大城市和非大城市),通过分层随机整群抽样将整个数据集分为2个独立队列.机器学习模型通过开发队列的交叉验证技术进行了训练和内部验证。在外部验证队列上使用自举采样技术对它们进行了外部验证。主要根据受试者工作特征曲线下面积(AUROC)选择性能最佳的模型,并使用偏差风险评估来评估其稳健性。对于模型的可解释性,我们使用Shapley和患者聚类方法。
    结果:我们的最终模型,RIETS,是基于11个临床和实验室生物标志物的深度神经网络开发的,这些生物标志物在住院的第一天内很容易获得。严重程度的预测特征包括乳酸脱氢酶,年龄,绝对淋巴细胞计数,呼吸困难,呼吸频率,糖尿病,c反应蛋白,中性粒细胞绝对计数,血小板计数,白细胞计数,和外周血氧饱和度。RIETS表现出优异的辨别力(AUROC=0.937;95%CI0.935-0.938)和高校准(积分校准指数=0.041),在风险评估工具中满足低偏差风险的所有标准,并提供了模型参数和患者聚类的详细解释。此外,RIETS对Omicron病例的可持续预测显示出跨变异期的可运输性潜力(AUROC=0.903,95%CI0.897-0.910)。
    结论:开发并验证了RIETS,可通过及时预测COVID-19住院患者的严重程度来协助早期分类。其高性能和低偏差风险确保相当可靠的预测。在模型开发和验证中使用全国多中心队列意味着可泛化性。使用常规收集的特征可以实现广泛的适应性。模型参数和患者的解释可以促进临床适用性。一起,我们预计,当纳入常规临床实践时,RIETS将促进患者分诊工作流程和有效的资源分配.
    BACKGROUND: Robust and accurate prediction of severity for patients with COVID-19 is crucial for patient triaging decisions. Many proposed models were prone to either high bias risk or low-to-moderate discrimination. Some also suffered from a lack of clinical interpretability and were developed based on early pandemic period data. Hence, there has been a compelling need for advancements in prediction models for better clinical applicability.
    OBJECTIVE: The primary objective of this study was to develop and validate a machine learning-based Robust and Interpretable Early Triaging Support (RIETS) system that predicts severity progression (involving any of the following events: intensive care unit admission, in-hospital death, mechanical ventilation required, or extracorporeal membrane oxygenation required) within 15 days upon hospitalization based on routinely available clinical and laboratory biomarkers.
    METHODS: We included data from 5945 hospitalized patients with COVID-19 from 19 hospitals in South Korea collected between January 2020 and August 2022. For model development and external validation, the whole data set was partitioned into 2 independent cohorts by stratified random cluster sampling according to hospital type (general and tertiary care) and geographical location (metropolitan and nonmetropolitan). Machine learning models were trained and internally validated through a cross-validation technique on the development cohort. They were externally validated using a bootstrapped sampling technique on the external validation cohort. The best-performing model was selected primarily based on the area under the receiver operating characteristic curve (AUROC), and its robustness was evaluated using bias risk assessment. For model interpretability, we used Shapley and patient clustering methods.
    RESULTS: Our final model, RIETS, was developed based on a deep neural network of 11 clinical and laboratory biomarkers that are readily available within the first day of hospitalization. The features predictive of severity included lactate dehydrogenase, age, absolute lymphocyte count, dyspnea, respiratory rate, diabetes mellitus, c-reactive protein, absolute neutrophil count, platelet count, white blood cell count, and saturation of peripheral oxygen. RIETS demonstrated excellent discrimination (AUROC=0.937; 95% CI 0.935-0.938) with high calibration (integrated calibration index=0.041), satisfied all the criteria of low bias risk in a risk assessment tool, and provided detailed interpretations of model parameters and patient clusters. In addition, RIETS showed potential for transportability across variant periods with its sustainable prediction on Omicron cases (AUROC=0.903, 95% CI 0.897-0.910).
    CONCLUSIONS: RIETS was developed and validated to assist early triaging by promptly predicting the severity of hospitalized patients with COVID-19. Its high performance with low bias risk ensures considerably reliable prediction. The use of a nationwide multicenter cohort in the model development and validation implicates generalizability. The use of routinely collected features may enable wide adaptability. Interpretations of model parameters and patients can promote clinical applicability. Together, we anticipate that RIETS will facilitate the patient triaging workflow and efficient resource allocation when incorporated into a routine clinical practice.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:抑郁症是全球关注的问题,全世界有相当数量的人受到影响,特别是在低收入和中等收入国家。抑郁症患病率的上升强调了早期发现和了解此类疾病起源的重要性。
    目的:本文提出了一种使用结合了局部和全局解释的混合可视化方法来检测抑郁症的框架。这种方法旨在帮助模型适应,提供对患者特征的见解,并评估预测模型在不同环境下的适用性。
    方法:本研究利用R编程语言与Caret,ggplot2,绘图,和用于模型训练的Dalex库,可视化,和解释。来自NHANES存储库的数据用于二次数据分析。NHANES存储库是检查美国个人健康和营养的综合来源,涵盖人口统计,饮食,药物使用,生活方式的选择,生殖和心理健康数据。使用NHANES2015-2018年数据建立惩罚逻辑回归模型,而NHANES2019-2020年3月的数据被用于全球和地方层面的解释评估。
    结果:支持该框架的预测模型的平均AUC评分为0.748(95%CI:0.743-0.752),敏感性和特异性的变异性最小。
    结论:内置预测模型突出了胸痛,家庭收入与贫困的比例,吸烟状态是预测原始和局部环境中抑郁状态的关键特征。
    Depression is a global concern, with a significant number of people affected worldwide, particularly in low- and middle-income countries. The rising prevalence of depression emphasizes the importance of early detection and understanding the origins of such conditions.
    This paper proposes a framework for detecting depression using a hybrid visualization approach that combines local and global interpretation. This approach aims to assist in model adaptation, provide insights into patient characteristics, and evaluate prediction model suitability in a different environment.
    This study utilizes R programming language with the Caret, ggplot2, Plotly, and Dalex libraries for model training, visualization, and interpretation. Data from the NHANES repository was used for secondary data analysis. The NHANES repository is a comprehensive source for examining health and nutrition of individuals in the United States, and covers demographic, dietary, medication use, lifestyle choices, reproductive and mental health data. Penalized logistic regression models were built using NHANES 2015-2018 data, while NHANES 2019-March 2020 data was used for evaluation at the global-specific and local level interpretation.
    The prediction model that supports this framework achieved an average AUC score of 0.748 (95% CI: 0.743-0.752), with minimal variability in sensitivity and specificity.
    The built-in prediction model highlights chest pain, the ratio of family income to poverty, and smoking status as crucial features for predicting depressive states in both the original and local environments.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    在COVID-19危机期间,对食品配送服务(FDSs)的需求受到了消费者的推动,他们更喜欢在网上订购餐点,而不是在餐厅等待。由于许多餐馆转移到网上并加入了UberEats等FDSs,Menulog,和Deliveroo,互联网平台上的客户评论已成为有关公司业绩的宝贵信息来源。FDS组织努力收集客户投诉,并有效利用这些信息来确定提高客户满意度所需的改进。然而,由于大量的客户反馈数据和缺乏客户服务顾问,仅解决了少数客户意见。组织可以使用人工智能(AI),而不是依靠客户服务专家,并自行寻找解决方案来节省资金,而不是阅读每篇评论。根据文献,深度学习(DL)方法在处理其他领域的大型数据集时,在获得更好的准确性方面已经显示出显著的结果,但在他们的模型中缺乏可解释性。对可解释AI(XAI)进行快速研究以解释不透明模型的预测看起来很有希望,但仍有待在FDS领域进行探索。本研究通过比较简单和混合DL技术(LSTM,Bi-LSTM,Bi-GRU-LSTM-CNN)在FDS领域,并使用SHapley加法扩张(SHAP)和本地可解释模型-不可知解释(LIME)解释了预测。在从ProductReview网站提取的客户评论数据集上对DL模型进行了训练和测试。结果表明,LSTM,Bi-LSTM和Bi-GRU-LSTM-CNN模型的准确率为96.07%,95.85%和96.33%,分别。该模型应显示更少的假阴性,因为FDS组织旨在识别和解决每个客户投诉。LSTM模型是在其他两个DL模型上选择的,Bi-LSTM和Bi-GRU-LSTM-CNN,由于其假阴性率较低。XAI技术,例如SHAP和LIME,揭示了所使用的词语对正面和负面情绪的特征贡献,用于验证模型。
    The demand for food delivery services (FDSs) during the COVID-19 crisis has been fuelled by consumers who prefer to order meals online and have it delivered to their door than to wait at a restaurant. Since many restaurants moved online and joined FDSs such as Uber Eats, Menulog, and Deliveroo, customer reviews on internet platforms have become a valuable source of information about a company\'s performance. FDS organisations strive to collect customer complaints and effectively utilise the information to identify improvements needed to enhance customer satisfaction. However, only a few customer opinions are addressed because of the large amount of customer feedback data and lack of customer service consultants. Organisations can use artificial intelligence (AI) instead of relying on customer service experts and find solutions on their own to save money as opposed to reading each review. Based on the literature, deep learning (DL) methods have shown remarkable results in obtaining better accuracy when working with large datasets in other domains, but lack explainability in their model. Rapid research on explainable AI (XAI) to explain predictions made by opaque models looks promising but remains to be explored in the FDS domain. This study conducted a sentiment analysis by comparing simple and hybrid DL techniques (LSTM, Bi-LSTM, Bi-GRU-LSTM-CNN) in the FDS domain and explained the predictions using SHapley Additive exPlanations (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME). The DL models were trained and tested on the customer review dataset extracted from the ProductReview website. Results showed that the LSTM, Bi-LSTM and Bi-GRU-LSTM-CNN models achieved an accuracy of 96.07%, 95.85% and 96.33%, respectively. The model should exhibit fewer false negatives because FDS organisations aim to identify and address each and every customer complaint. The LSTM model was chosen over the other two DL models, Bi-LSTM and Bi-GRU-LSTM-CNN, due to its lower rate of false negatives. XAI techniques, such as SHAP and LIME, revealed the feature contribution of the words used towards positive and negative sentiments, which were used to validate the model.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    在COVID-19危机期间,顾客偏好将食物送到家门口,而不是在餐厅等待,这推动了食品配送服务(FDSs)的增长。随着所有餐馆都上线并将FDS带上飞机,比如UberEATS,Menulog或Deliveroo,在线平台上的客户评论已成为有关公司业绩的重要信息来源。FDS组织旨在从客户反馈中收集投诉,并有效地使用数据来确定需要改进的领域,以提高客户满意度。这项工作旨在回顾机器学习(ML)和深度学习(DL)模型以及可解释的人工智能(XAI)方法,以预测FDS领域的客户情绪。文献综述揭示了基于词典和ML技术的广泛使用,可通过FDS中的客户评论来预测情绪。然而,由于缺乏模型的可解释性和决策的可解释性,因此应用DL技术的研究有限。本系统综述的主要发现如下:77%的模型本质上是不可解释的,组织可以为系统中的可解释性和信任而争辩。其他领域的DL模型在准确性方面表现良好,但缺乏可解释性,这可以通过XAI实现来实现。未来的研究应集中于在FDS领域实施DL模型以进行情感分析,并结合XAI技术以发挥模型的可解释性。
    During the COVID-19 crisis, customers\' preference in having food delivered to their doorstep instead of waiting in a restaurant has propelled the growth of food delivery services (FDSs). With all restaurants going online and bringing FDSs onboard, such as UberEATS, Menulog or Deliveroo, customer reviews on online platforms have become an important source of information about the company\'s performance. FDS organisations aim to gather complaints from customer feedback and effectively use the data to determine the areas for improvement to enhance customer satisfaction. This work aimed to review machine learning (ML) and deep learning (DL) models and explainable artificial intelligence (XAI) methods to predict customer sentiments in the FDS domain. A literature review revealed the wide usage of lexicon-based and ML techniques for predicting sentiments through customer reviews in FDS. However, limited studies applying DL techniques were found due to the lack of the model interpretability and explainability of the decisions made. The key findings of this systematic review are as follows: 77% of the models are non-interpretable in nature, and organisations can argue for the explainability and trust in the system. DL models in other domains perform well in terms of accuracy but lack explainability, which can be achieved with XAI implementation. Future research should focus on implementing DL models for sentiment analysis in the FDS domain and incorporating XAI techniques to bring out the explainability of the models.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    在景观流行病学领域,机器学习(ML)对流行病学风险情景建模的贡献是一个很好的选择。本研究旨在通过使用SHAP来确定应用于地理空间健康的ML模型中每个变量的贡献,打破自动学习技术应用的“黑匣子”范式。利用钩虫的流行,肠道寄生虫,在埃塞俄比亚,它们分布广泛;该国承担着撒哈拉以南非洲第三大钩虫负担。使用XGBoost软件,一个非常流行的ML模型,来拟合和分析数据。PythonSHAP库用于了解训练模型中的重要性,用于预测的变量。获得了这些变量对特定预测的贡献的描述,使用不同类型的绘图方法。结果表明,ML模型优于经典统计模型;不仅证明了相似的结果,通过使用SHAP包,生成的模型中变量之间的影响和相互作用。此分析提供了信息,以帮助了解所提出的流行病学问题,并为类似研究提供了工具。
    In the field of landscape epidemiology, the contribution of machine learning (ML) to modeling of epidemiological risk scenarios presents itself as a good alternative. This study aims to break with the \"black box\" paradigm that underlies the application of automatic learning techniques by using SHAP to determine the contribution of each variable in ML models applied to geospatial health, using the prevalence of hookworms, intestinal parasites, in Ethiopia, where they are widely distributed; the country bears the third-highest burden of hookworm in Sub-Saharan Africa. XGBoost software was used, a very popular ML model, to fit and analyze the data. The Python SHAP library was used to understand the importance in the trained model, of the variables for predictions. The description of the contribution of these variables on a particular prediction was obtained, using different types of plot methods. The results show that the ML models are superior to the classical statistical models; not only demonstrating similar results but also explaining, by using the SHAP package, the influence and interactions between the variables in the generated models. This analysis provides information to help understand the epidemiological problem presented and provides a tool for similar studies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Published Erratum
    [This corrects the article DOI: 10.3389/fpubh.2021.675766.].
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    The Severe Acute Respiratory Syndrome Coronavirus 2 pandemic has challenged medical systems to the brink of collapse around the globe. In this paper, logistic regression and three other artificial intelligence models (XGBoost, Artificial Neural Network and Random Forest) are described and used to predict mortality risk of individual patients. The database is based on census data for the designated area and co-morbidities obtained using data from the Ontario Health Data Platform. The dataset consisted of more than 280,000 COVID-19 cases in Ontario for a wide-range of age groups; 0-9, 10-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80-89, and 90+. Findings resulting from using logistic regression, XGBoost, Artificial Neural Network and Random Forest, all demonstrate excellent discrimination (area under the curve for all models exceeded 0.948 with the best performance being 0.956 for an XGBoost model). Based on SHapley Additive exPlanations values, the importance of 24 variables are identified, and the findings indicated the highest importance variables are, in order of importance, age, date of test, sex, and presence/absence of chronic dementia. The findings from this study allow the identification of out-patients who are likely to deteriorate into severe cases, allowing medical professionals to make decisions on timely treatments. Furthermore, the methodology and results may be extended to other public health regions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号