Shapley Additive Explanations

Shapley 添加剂 explanations
  • 文章类型: Journal Article
    随着全球甲状腺结节患病率的增加,这项研究调查了使用蓝牙耳机与甲状腺结节发生率之间的潜在相关性,考虑到这些设备发出的非电离辐射(NIR)的累积效应。在这项研究中,我们使用倾向得分匹配(PSM)和XGBOOST模型分析了来自WenJuanXing平台的600份有效问卷,辅以SHAP分析,评估甲状腺结节的风险。PSM用于平衡基线特征差异,从而减少偏差。然后采用XGBOOST模型来预测风险因素,模型功效由接收器工作特征(ROC)曲线(AUC)下面积测量。SHAP分析有助于量化和解释每个特征对预测结果的影响,确定关键风险因素。最初,来自文娟兴平台的600份有效问卷进行了PSM处理,产生96个案例的匹配数据集用于建模分析。XGBOOST模型的AUC值达到0.95,在区分甲状腺结节风险方面具有较高的准确性。SHAP分析显示,年龄和每日蓝牙耳机使用时间是影响甲状腺结节风险的两个最重要因素。具体来说,延长每天使用蓝牙耳机的持续时间与发生甲状腺结节的风险增加密切相关,如SHAP分析结果所示。我们的研究强调了长时间使用蓝牙耳机与甲状腺结节风险增加之间的显着影响关系,强调在使用现代技术时考虑健康影响的重要性,特别是对于经常使用的蓝牙耳机等设备。通过精确的模型预测和变量重要性分析,我们的研究为制定公共卫生政策和个人卫生习惯选择提供了科学依据,建议在日常生活中应注意蓝牙耳机的使用时间,以降低甲状腺结节的潜在风险。未来的研究应进一步探讨这种关系的生物学机制,并考虑其他潜在的影响因素,以提供更全面的健康指导和预防措施。
    With an increasing prevalence of thyroid nodules globally, this study investigates the potential correlation between the use of Bluetooth headsets and the incidence of thyroid nodules, considering the cumulative effects of non-ionizing radiation (NIR) emitted by these devices. In this study, we analyzed 600 valid questionnaires from the WenJuanXing platform using Propensity Score Matching (PSM) and the XGBOOST model, supplemented by SHAP analysis, to assess the risk of thyroid nodules. PSM was utilized to balance baseline characteristic differences, thereby reducing bias. The XGBOOST model was then employed to predict risk factors, with model efficacy measured by the area under the Receiver Operating Characteristic (ROC) curve (AUC). SHAP analysis helped quantify and explain the impact of each feature on the prediction outcomes, identifying key risk factors. Initially, 600 valid questionnaires from the WenJuanXing platform underwent PSM processing, resulting in a matched dataset of 96 cases for modeling analysis. The AUC value of the XGBOOST model reached 0.95, demonstrating high accuracy in differentiating thyroid nodule risks. SHAP analysis revealed age and daily Bluetooth headset usage duration as the two most significant factors affecting thyroid nodule risk. Specifically, longer daily usage durations of Bluetooth headsets were strongly linked to an increased risk of developing thyroid nodules, as indicated by the SHAP analysis outcomes. Our study highlighted a significant impact relationship between prolonged Bluetooth headset use and increased thyroid nodule risk, emphasizing the importance of considering health impacts in the use of modern technology, especially for devices like Bluetooth headsets that are frequently used daily. Through precise model predictions and variable importance analysis, our research provides a scientific basis for the formulation of public health policies and personal health habit choices, suggesting that attention should be paid to the duration of Bluetooth headset use in daily life to reduce the potential risk of thyroid nodules. Future research should further investigate the biological mechanisms of this relationship and consider additional potential influencing factors to offer more comprehensive health guidance and preventive measures.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    自动驾驶的最新进展伴随着损害自动驾驶汽车(AV)网络的相关网络安全问题。激励使用人工智能模型来检测这些网络上的异常。在这种情况下,使用可解释AI(XAI)来解释这些异常检测AI模型的行为至关重要。这项工作引入了一个全面的框架来评估用于AV中异常检测的黑盒XAI技术,促进对全局和局部XAI方法的检查,以阐明XAI技术做出的决策,这些决策解释了对异常AV行为进行分类的AI模型的行为。通过考虑六个评估指标(描述性准确性,稀疏,稳定性,效率,鲁棒性,和完整性),该框架评估了两种著名的黑盒XAI技术,SHAP和LIME,涉及应用XAI技术来识别对异常分类至关重要的主要特征,接下来是使用两个流行的自动驾驶数据集评估六个指标的SHAP和LIME的广泛实验,VeReMi和传感器。这项研究推进了黑盒XAI方法在自动驾驶系统中的真实世界异常检测的部署,在这一关键领域内,对当前黑箱XAI方法的优势和局限性做出有价值的见解。
    The recent advancements in autonomous driving come with the associated cybersecurity issue of compromising networks of autonomous vehicles (AVs), motivating the use of AI models for detecting anomalies on these networks. In this context, the usage of explainable AI (XAI) for explaining the behavior of these anomaly detection AI models is crucial. This work introduces a comprehensive framework to assess black-box XAI techniques for anomaly detection within AVs, facilitating the examination of both global and local XAI methods to elucidate the decisions made by XAI techniques that explain the behavior of AI models classifying anomalous AV behavior. By considering six evaluation metrics (descriptive accuracy, sparsity, stability, efficiency, robustness, and completeness), the framework evaluates two well-known black-box XAI techniques, SHAP and LIME, involving applying XAI techniques to identify primary features crucial for anomaly classification, followed by extensive experiments assessing SHAP and LIME across the six metrics using two prevalent autonomous driving datasets, VeReMi and Sensor. This study advances the deployment of black-box XAI methods for real-world anomaly detection in autonomous driving systems, contributing valuable insights into the strengths and limitations of current black-box XAI methods within this critical domain.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    卤化物钙钛矿材料在太阳能电池等各个领域有着广阔的应用前景,LED器件,光电探测器,荧光标记,生物成像,和光催化由于它们的带隙特性。本研究从已发表的文献中收集了实验数据,并利用了出色的预测能力,低过度拟合风险,和集成学习模型的强鲁棒性分析卤化物钙钛矿化合物的带隙。结果证明了集成学习决策树模型的有效性,特别是梯度提升决策树模型,均方根误差为0.090eV,平均绝对误差为0.053eV,决定系数为93.11%。对与通过元素摩尔量归一化计算的比率相关的数据的研究表明,X和B位置的离子对带隙有重大影响。此外,掺杂碘原子可以有效降低本征带隙,而锡原子的s和p轨道的杂化也可以降低带隙。通过预测光伏材料MASn1-xPbxI3的带隙来验证模型的准确性。总之,这项研究强调了机器学习对材料开发的积极影响,特别是在预测卤化物钙钛矿化合物的带隙时,其中集成学习方法显示出显著的优势。
    Halide perovskite materials have broad prospects for applications in various fields such as solar cells, LED devices, photodetectors, fluorescence labeling, bioimaging, and photocatalysis due to their bandgap characteristics. This study compiled experimental data from the published literature and utilized the excellent predictive capabilities, low overfitting risk, and strong robustness of ensemble learning models to analyze the bandgaps of halide perovskite compounds. The results demonstrate the effectiveness of ensemble learning decision tree models, especially the gradient boosting decision tree model, with a root mean square error of 0.090 eV, a mean absolute error of 0.053 eV, and a determination coefficient of 93.11%. Research on data related to ratios calculated through element molar quantity normalization indicates significant influences of ions at the X and B positions on the bandgap. Additionally, doping with iodine atoms can effectively reduce the intrinsic bandgap, while hybridization of the s and p orbitals of tin atoms can also decrease the bandgap. The accuracy of the model is validated by predicting the bandgap of the photovoltaic material MASn1-xPbxI3. In conclusion, this study emphasizes the positive impact of machine learning on material development, especially in predicting the bandgaps of halide perovskite compounds, where ensemble learning methods demonstrate significant advantages.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:由于多重耐药生物体(MDROs)引起的医疗保健相关感染,如耐甲氧西林金黄色葡萄球菌(MRSA)和艰难梭菌(CDI),给我们的医疗基础设施带来沉重负担。
    目的:MDROs的筛查是防止传播的重要机制,但却是资源密集型的。这项研究的目的是开发可以使用电子健康记录(EHR)数据预测定植或感染风险的自动化工具,提供有用的信息来帮助感染控制,并指导经验性抗生素覆盖。
    方法:我们回顾性地开发了一个机器学习模型来检测在弗吉尼亚大学医院住院患者样本采集时未分化患者的MRSA定植和感染。我们使用来自患者EHR数据的入院和住院期间信息的临床和非临床特征来构建模型。此外,我们在EHR数据中使用了一类从联系网络派生的特征;这些网络特征可以捕获患者与提供者和其他患者的联系,提高预测MRSA监测试验结果的模型可解释性和准确性。最后,我们探索了不同患者亚群的异质模型,例如,入住重症监护病房或急诊科的人或有特定检测史的人,哪个表现更好。
    结果:我们发现惩罚逻辑回归比其他方法表现更好,当我们使用多项式(二次)变换特征时,该模型的性能根据其接收器操作特征-曲线下面积得分提高了近11%。预测MDRO风险的一些重要特征包括抗生素使用,手术,使用设备,透析,患者的合并症状况,和网络特征。其中,网络功能增加了最大的价值,并将模型的性能提高了至少15%。对于特定患者亚群,具有相同特征转换的惩罚逻辑回归模型也比其他模型表现更好。
    结论:我们的研究表明,使用来自EHR数据的临床和非临床特征,通过机器学习方法可以非常有效地进行MRSA风险预测。网络特征是最具预测性的,并且提供优于现有方法的显著改进。此外,不同患者亚群的异质预测模型提高了模型的性能。
    BACKGROUND: Health care-associated infections due to multidrug-resistant organisms (MDROs), such as methicillin-resistant Staphylococcus aureus (MRSA) and Clostridioides difficile (CDI), place a significant burden on our health care infrastructure.
    OBJECTIVE: Screening for MDROs is an important mechanism for preventing spread but is resource intensive. The objective of this study was to develop automated tools that can predict colonization or infection risk using electronic health record (EHR) data, provide useful information to aid infection control, and guide empiric antibiotic coverage.
    METHODS: We retrospectively developed a machine learning model to detect MRSA colonization and infection in undifferentiated patients at the time of sample collection from hospitalized patients at the University of Virginia Hospital. We used clinical and nonclinical features derived from on-admission and throughout-stay information from the patient\'s EHR data to build the model. In addition, we used a class of features derived from contact networks in EHR data; these network features can capture patients\' contacts with providers and other patients, improving model interpretability and accuracy for predicting the outcome of surveillance tests for MRSA. Finally, we explored heterogeneous models for different patient subpopulations, for example, those admitted to an intensive care unit or emergency department or those with specific testing histories, which perform better.
    RESULTS: We found that the penalized logistic regression performs better than other methods, and this model\'s performance measured in terms of its receiver operating characteristics-area under the curve score improves by nearly 11% when we use polynomial (second-degree) transformation of the features. Some significant features in predicting MDRO risk include antibiotic use, surgery, use of devices, dialysis, patient\'s comorbidity conditions, and network features. Among these, network features add the most value and improve the model\'s performance by at least 15%. The penalized logistic regression model with the same transformation of features also performs better than other models for specific patient subpopulations.
    CONCLUSIONS: Our study shows that MRSA risk prediction can be conducted quite effectively by machine learning methods using clinical and nonclinical features derived from EHR data. Network features are the most predictive and provide significant improvement over prior methods. Furthermore, heterogeneous prediction models for different patient subpopulations enhance the model\'s performance.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    这项回顾性研究的目的是建立基于超声(US)-影像组学和临床因素的组合模型,以预测手术前I期宫颈癌(CC)的患者。
    回顾性分析安徽医科大学第一附属医院经阴道超声检查(TVS)发现宫颈病变的209例CC患者,患者分为训练集(n=146)和内部验证集(n=63),以安徽省妇幼保健院和南充市中心医院的52例CC患者作为外部验证集。通过单因素和多因素逻辑回归分析选择临床独立预测因子。从美国图像中提取美国影像组学特征。通过单变量分析选择最重要的特征后,斯皮尔曼相关分析,和最小绝对收缩和选择算子(LASSO)算法,使用六种机器学习(ML)算法来构建影像组学模型。接下来,临床能力,美国放射组学,并将临床US-影像组学联合模型与诊断I期CC进行了比较。最后,Shapley加法解释(SHAP)方法用于解释每个特征的贡献。
    宫颈病变的长径(L)和鳞状细胞癌相关抗原(SCCa)是I期CC的独立临床预测因子。极限梯度提升(Xgboost)模型在六个ML影像组学模型中表现最好,训练中的曲线下面积(AUC)值,内部验证,和外部验证集分别为0.778、0.751和0.751。在最后三个模型中,基于临床特征和rad评分的组合模型显示出良好的判别力,在训练中使用AUC值,内部验证,外部验证集分别为0.837、0.828和0.839。决策曲线分析验证了组合列线图的临床实用性。SHAP算法说明了组合模型中每个特征的贡献。
    我们建立了一个可解释的组合模型来预测I阶段CC。这种非侵入性预测方法可用于I期CC患者的术前识别。
    UNASSIGNED: The purpose of this retrospective study was to establish a combined model based on ultrasound (US)-radiomics and clinical factors to predict patients with stage I cervical cancer (CC) before surgery.
    UNASSIGNED: A total of 209 CC patients who had cervical lesions found by transvaginal sonography (TVS) from the First Affiliated Hospital of Anhui Medical University were retrospectively reviewed, patients were divided into the training set (n = 146) and internal validation set (n = 63), and 52 CC patients from Anhui Provincial Maternity and Child Health Hospital and Nanchong Central Hospital were taken as the external validation set. The clinical independent predictors were selected by univariate and multivariate logistic regression analyses. US-radiomics features were extracted from US images. After selecting the most significant features by univariate analysis, Spearman\'s correlation analysis, and the least absolute shrinkage and selection operator (LASSO) algorithm, six machine learning (ML) algorithms were used to build the radiomics model. Next, the ability of the clinical, US-radiomics, and clinical US-radiomics combined model was compared to diagnose stage I CC. Finally, the Shapley additive explanations (SHAP) method was used to explain the contribution of each feature.
    UNASSIGNED: Long diameter of the cervical lesion (L) and squamous cell carcinoma-associated antigen (SCCa) were independent clinical predictors of stage I CC. The eXtreme Gradient Boosting (Xgboost) model performed the best among the six ML radiomics models, with area under the curve (AUC) values in the training, internal validation, and external validation sets being 0.778, 0.751, and 0.751, respectively. In the final three models, the combined model based on clinical features and rad-score showed good discriminative power, with AUC values in the training, internal validation, and external validation sets being 0.837, 0.828, and 0.839, respectively. The decision curve analysis validated the clinical utility of the combined nomogram. The SHAP algorithm illustrates the contribution of each feature in the combined model.
    UNASSIGNED: We established an interpretable combined model to predict stage I CC. This non-invasive prediction method may be used for the preoperative identification of patients with stage I CC.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:自杀是青少年死亡的第二大原因,并且与自杀集群有关。尽管对这种可预防的死亡原因进行了大量研究,重点主要是单一国家和传统的统计方法。
    目的:本研究旨在使用跨国数据集和机器学习(ML)开发青少年自杀思维的预测模型。
    方法:我们使用韩国青少年风险行为网络调查的数据,对566,875名年龄在13至18岁之间的青少年进行调查,并使用青少年风险行为调查对103,874名青少年进行外部验证,挪威大学国家综合调查对19,574名青少年进行验证。开发了几种基于树的机器学习模型,并对特征重要性和Shapley加性解释值进行分析,以确定青少年自杀思维的危险因素。
    结果:在对来自韩国的基于韩国青年风险行为网络的调查数据进行训练时,以95%的CI,XGBoost模型报告的接受者工作特征(AUROC)曲线下面积为90.06%(95%CI89.97-90.16),与其他型号相比,表现出卓越的性能。对于使用美国青年风险行为调查数据和挪威大学国家综合调查的外部验证,XGBoost模型的AUROC分别为83.09%和81.27%,分别。在所有数据集中,XGBoost始终优于AUROC得分最高的其他模型,并被选为最优模型。就自杀思维的预测因素而言,悲伤和绝望的感觉是最有影响力的,占影响的57.4%,其次是压力状态为19.8%。其次是年龄(5.7%),家庭收入(4%),学业成绩(3.4%),性别(2.1%),和其他人,各贡献不到2%。
    结论:本研究通过整合来自3个国家的不同数据集来使用ML来解决青少年自杀问题。研究结果强调了情绪健康指标在预测青少年自杀思维中的重要作用。具体来说,悲伤和绝望被认为是最重要的预测因素,其次是压力条件和年龄。这些发现强调了青春期早期诊断和预防心理健康问题的迫切需要。
    BACKGROUND: Suicide is the second-leading cause of death among adolescents and is associated with clusters of suicides. Despite numerous studies on this preventable cause of death, the focus has primarily been on single nations and traditional statistical methods.
    OBJECTIVE: This study aims to develop a predictive model for adolescent suicidal thinking using multinational data sets and machine learning (ML).
    METHODS: We used data from the Korea Youth Risk Behavior Web-based Survey with 566,875 adolescents aged between 13 and 18 years and conducted external validation using the Youth Risk Behavior Survey with 103,874 adolescents and Norway\'s University National General Survey with 19,574 adolescents. Several tree-based ML models were developed, and feature importance and Shapley additive explanations values were analyzed to identify risk factors for adolescent suicidal thinking.
    RESULTS: When trained on the Korea Youth Risk Behavior Web-based Survey data from South Korea with a 95% CI, the XGBoost model reported an area under the receiver operating characteristic (AUROC) curve of 90.06% (95% CI 89.97-90.16), displaying superior performance compared to other models. For external validation using the Youth Risk Behavior Survey data from the United States and the University National General Survey from Norway, the XGBoost model achieved AUROCs of 83.09% and 81.27%, respectively. Across all data sets, XGBoost consistently outperformed the other models with the highest AUROC score, and was selected as the optimal model. In terms of predictors of suicidal thinking, feelings of sadness and despair were the most influential, accounting for 57.4% of the impact, followed by stress status at 19.8%. This was followed by age (5.7%), household income (4%), academic achievement (3.4%), sex (2.1%), and others, which contributed less than 2% each.
    CONCLUSIONS: This study used ML by integrating diverse data sets from 3 countries to address adolescent suicide. The findings highlight the important role of emotional health indicators in predicting suicidal thinking among adolescents. Specifically, sadness and despair were identified as the most significant predictors, followed by stressful conditions and age. These findings emphasize the critical need for early diagnosis and prevention of mental health issues during adolescence.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    这项研究的重点是预测弯曲中由纤维增强聚合物(FRP)增强的钢筋混凝土梁中的混凝土覆盖层分离(CCS)。首先,基于线性回归构建机器学习模型,支持向量回归,BP神经网络,决策树,随机森林,和XGBoost算法。其次,根据评估指标确定了最适合的CCS预测模型,并将其与代码和研究人员的模型进行了比较。最后,进行了基于Shapley加法扩张(SHAP)的参数研究,并得出以下结论:XGBoost最适合于CCS和代码的预测,研究人员的模型精度需要提高,并且遭受过度或保守估计的困扰。混凝土对剪切力的贡献和钢筋的屈服强度是CCS最重要的参数,其中CCS开始时的剪切力与混凝土对剪切力的贡献大致成正比,与钢筋的屈服强度大致成反比。
    This study focuses on the prediction of concrete cover separation (CCS) in reinforced concrete beams strengthened by fiber-reinforced polymer (FRP) in flexure. First, machine learning models were constructed based on linear regression, support vector regression, BP neural networks, decision trees, random forests, and XGBoost algorithms. Secondly, the most suitable model for predicting CCS was identified based on the evaluation metrics and compared with the codes and the researcher\'s model. Finally, a parametric study based on SHapley Additive exPlanations (SHAP) was carried out, and the following conclusions were obtained: XGBoost is best-suited for the prediction of CCS and codes, and researchers\' model accuracy needs to be improved and suffers from over or conservative estimation. The contributions of the concrete to the shear force and the yield strength of the reinforcement are the most important parameters for the CCS, where the shear force at the onset of CCS is approximately proportional to the contribution of the concrete to the shear force and approximately inversely proportional to the yield strength of the reinforcement.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    代谢组学生成复杂的数据,需要先进的计算方法来生成生物洞察力。虽然机器学习(ML)很有前途,选择最佳算法和调整超参数的挑战,特别是对于非专家,remain.自动机器学习(AutoML)可以简化这一过程;然而,可解释性问题可能会持续存在。这项研究引入了一个统一的管道,将AutoML与可解释的AI(XAI)技术相结合,以优化代谢组学分析。我们在两个数据集上测试了我们的方法:肾细胞癌(RCC)尿液代谢组学和卵巢癌(OC)血清代谢组学。AutoML,使用Auto-sklearn,在区分RCC和健康对照方面超越了独立的ML算法,如SVM和k近邻,以及OC患者和其他妇科癌症患者。Auto-sklearn的有效性突出显示,RCC的AUC评分为0.97,OC的AUC评分为0.85,从看不见的测试集获得。重要的是,在大多数考虑的指标上,Auto-sklearn展示了更好的分类性能,利用算法和集成技术的混合。Shapley加法解释(SHAP)提供了特征重要性的全球排名,确定二丁胺和神经节苷脂GM(d34:1)为RCC和OC的最佳判别代谢物,分别。瀑布图通过说明每种代谢物对个体预测的影响来提供本地解释。依赖性图突出了代谢物的相互作用,例如马尿酸与RCC中的一种衍生物之间的联系,在OC中,GM3(d34:1)和GM3(18:1_16:0)之间,暗示潜在的机械关系。通过决策图,进行了详细的误差分析,对比正确分类样本与错误分类样本的特征重要性。实质上,我们的管道强调了协调AutoML和XAI的重要性,促进代谢组学数据科学中简化的ML应用和改进的可解释性。
    Metabolomics generates complex data necessitating advanced computational methods for generating biological insight. While machine learning (ML) is promising, the challenges of selecting the best algorithms and tuning hyperparameters, particularly for nonexperts, remain. Automated machine learning (AutoML) can streamline this process; however, the issue of interpretability could persist. This research introduces a unified pipeline that combines AutoML with explainable AI (XAI) techniques to optimize metabolomics analysis. We tested our approach on two data sets: renal cell carcinoma (RCC) urine metabolomics and ovarian cancer (OC) serum metabolomics. AutoML, using Auto-sklearn, surpassed standalone ML algorithms like SVM and k-Nearest Neighbors in differentiating between RCC and healthy controls, as well as OC patients and those with other gynecological cancers. The effectiveness of Auto-sklearn is highlighted by its AUC scores of 0.97 for RCC and 0.85 for OC, obtained from the unseen test sets. Importantly, on most of the metrics considered, Auto-sklearn demonstrated a better classification performance, leveraging a mix of algorithms and ensemble techniques. Shapley Additive Explanations (SHAP) provided a global ranking of feature importance, identifying dibutylamine and ganglioside GM(d34:1) as the top discriminative metabolites for RCC and OC, respectively. Waterfall plots offered local explanations by illustrating the influence of each metabolite on individual predictions. Dependence plots spotlighted metabolite interactions, such as the connection between hippuric acid and one of its derivatives in RCC, and between GM3(d34:1) and GM3(18:1_16:0) in OC, hinting at potential mechanistic relationships. Through decision plots, a detailed error analysis was conducted, contrasting feature importance for correctly versus incorrectly classified samples. In essence, our pipeline emphasizes the importance of harmonizing AutoML and XAI, facilitating both simplified ML application and improved interpretability in metabolomics data science.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:本研究旨在建立一个机器学习(ML)模型,通过随机森林(RF)和XGBoost算法预测术后非哺乳期乳腺炎(NLM)的复发概率。它可以提供识别NLM复发风险和指导临床治疗计划的能力。
    方法:本研究以上海中医药大学附属曙光医院2019年7月至2021年12月收治的住院患者为研究对象。住院患者数据随访已完成至2022年12月。在这项研究中选择了十个特征来构建ML模型:年龄,体重指数(BMI),堕胎次数,乳头倒置的存在,乳房肿块的范围,白细胞计数(WBC),中性粒细胞与淋巴细胞比率(NLR),白蛋白-球蛋白比率(AGR)和甘油三酯(TG)以及术中排出的存在。我们使用两种ML方法(RF和XGBoost)来建立模型并预测女性患者的NLM复发风险。将258例患者按75%-25%的比例随机分为训练集和测试集。模型性能是基于准确性进行评估的,Precision,回想一下,F1评分和AUC。Shapley加法解释(SHAP)方法用于解释模型。
    结果:有48例(18.6%)NLM患者在随访期间出现复发。在这项研究中选择了十个特征来构建ML模型。对于RF模型,BMI是最重要的影响因素,对于XGBoost模型是术中出院。十倍交叉验证结果表明,RF模型和XGBoost模型均具有良好的预测性能,但是在我们的研究中,XGBoost模型比RF模型具有更好的性能。我们模型中所有特征的SHAP值的趋势与这些特征的临床表现的趋势一致。在模型中包含这十个特征对于建立实际的复发预测模型是必要的。
    结论:十倍交叉验证和SHAP值的结果表明模型具有预测能力。SHAP值的趋势在我们的模型中提供了辅助验证,并使其具有更多的临床意义。
    OBJECTIVE: This study aims to build a machine learning (ML) model to predict the recurrence probability for postoperative non-lactating mastitis (NLM) by Random Forest (RF) and XGBoost algorithms. It can provide the ability to identify the risk of NLM recurrence and guidance in clinical treatment plan.
    METHODS: This study was conducted on inpatients who were admitted to the Mammary Department of Shuguang Hospital affiliated to Shanghai University of Traditional Chinese Medicine between July 2019 to December 2021. Inpatient data follow-up has been completed until December 2022. Ten features were selected in this study to build the ML model: age, body mass index (BMI), number of abortions, presence of inverted nipples, extent of breast mass, white blood cell count (WBC), neutrophil to lymphocyte ratio (NLR), albumin-globulin ratio (AGR) and triglyceride (TG) and presence of intraoperative discharge. We used two ML approaches (RF and XGBoost) to build models and predict the NLM recurrence risk of female patients. Totally 258 patients were randomly divided into a training set and a test set according to a 75%-25% proportion. The model performance was evaluated based on Accuracy, Precision, Recall, F1-score and AUC. The Shapley Additive Explanations (SHAP) method was used to interpret the model.
    RESULTS: There were 48 (18.6%) NLM patients who experienced recurrence during the follow-up period. Ten features were selected in this study to build the ML model. For the RF model, BMI is the most important influence factor and for the XGBoost model is intraoperative discharge. The results of tenfold cross-validation suggest that both the RF model and the XGBoost model have good predictive performance, but the XGBoost model has a better performance than the RF model in our study. The trends of SHAP values of all features in our models are consistent with the trends of these features\' clinical presentation. The inclusion of these ten features in the model is necessary to build practical prediction models for recurrence.
    CONCLUSIONS: The results of tenfold cross-validation and SHAP values suggest that the models have predictive ability. The trend of SHAP value provides auxiliary validation in our models and makes it have more clinical significance.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    番茄作为加工原料是全球重要的,是关键的饮食和农艺研究由于其营养,经济,和健康意义。这项研究探索了机器学习(ML)预测番茄品质的潜力,利用匈牙利48个品种和28个地点在5个季节的数据。它专注于白利糖度,番茄红素含量,和颜色(a/b比)使用极端梯度增强(XGBoost)和人工神经网络(ANN)模型。结果表明,XGBoost的表现始终优于ANN,在预测°Brix(R²=0.98,RMSE=0.07)和番茄红素含量(R²=0.87,RMSE=0.61)方面实现了高精度,并且在颜色预测(a/b比)方面表现出色,R²为0.93,RMSE为0.03。ANN尤其在颜色预测方面落后,显示负R²值-0.35。Shapley添加剂解释(SHAP)汇总图分析表明,两种模型都能有效预测番茄中白利糖度和番茄红素的含量。突出数据的不同方面。SHAP分析强调了模型的效率(尤其是白利糖度和番茄红素的预测),并强调了品种选择和气候和土壤等环境因素的显着影响。这些发现强调了选择和微调适当的ML模型以增强精准农业的重要性,强调XGBoost在处理复杂农艺数据进行质量评估方面的优势。
    The tomato as a raw material for processing is globally important and is pivotal in dietary and agronomic research due to its nutritional, economic, and health significance. This study explored the potential of machine learning (ML) for predicting tomato quality, utilizing data from 48 cultivars and 28 locations in Hungary over 5 seasons. It focused on °Brix, lycopene content, and colour (a/b ratio) using extreme gradient boosting (XGBoost) and artificial neural network (ANN) models. The results revealed that XGBoost consistently outperformed ANN, achieving high accuracy in predicting °Brix (R² = 0.98, RMSE = 0.07) and lycopene content (R² = 0.87, RMSE = 0.61), and excelling in colour prediction (a/b ratio) with a R² of 0.93 and RMSE of 0.03. ANN lagged behind particularly in colour prediction, showing a negative R² value of -0.35. Shapley additive explanation\'s (SHAP) summary plot analysis indicated that both models are effective in predicting °Brix and lycopene content in tomatoes, highlighting different aspects of the data. SHAP analysis highlighted the models\' efficiency (especially in °Brix and lycopene predictions) and underscored the significant influence of cultivar choice and environmental factors like climate and soil. These findings emphasize the importance of selecting and fine-tuning the appropriate ML model for enhancing precision agriculture, underlining XGBoost\'s superiority in handling complex agronomic data for quality assessment.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号