Shapley Additive Explanations

Shapley 添加剂 explanations
  • 文章类型: Journal Article
    生物材料研究的最新进展为预测各种材料特性提供了人工智能。然而,基于氨基酸序列预测生物材料力学性能的研究一直缺乏。这项研究率先使用分类模型来预测丝纤维氨基酸序列的极限拉伸强度,采用逻辑回归,具有各种内核的支持向量机,和深度神经网络(DNN)。值得注意的是,该模型在泛化测试中表现出0.83的高精度。该研究引入了一种超越传统实验方法的创新方法来预测生物材料力学特性。认识到传统线性预测模型的局限性,该研究强调了未来的DNN轨迹,可以以高精度巧妙地捕获非线性关系。此外,通过不同预测模型之间的综合性能比较,该研究提供了对预测某些材料的机械性能的特定模型的有效性的见解。总之,这项研究是一项开创性的贡献,为未来的努力奠定基础,并倡导将人工智能方法无缝集成到材料研究中。
    Recent advancements in biomaterial research conduct artificial intelligence for predicting diverse material properties. However, research predicting the mechanical properties of biomaterial based on amino acid sequences have been notably absent. This research pioneers the use of classification models to predict ultimate tensile strength from silk fiber amino acid sequences, employing logistic regression, support vector machines with various kernels, and a deep neural network (DNN). Remarkably, the model demonstrates a high accuracy of 0.83 during the generalization test. The study introduces an innovative approach to predicting biomaterial mechanical properties beyond traditional experimental methods. Recognizing the limitations of conventional linear prediction models, the research emphasizes the future trajectory toward DNNs that can adeptly capture non-linear relationships with high precision. Moreover, through comprehensive performance comparisons among diverse prediction models, the study offers insights into the effectiveness of specific models for predicting the mechanical properties of certain materials. In conclusion, this study serves as a pioneering contribution, laying the groundwork for future endeavors and advocating for the seamless integration of AI methodologies into materials research.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:口咽鳞状细胞癌(OPSCC)在全球范围内的发病率令人担忧。在临床社区,迫切需要了解OPSCC的病因,以促进有效治疗。
    方法:本研究为鉴定OPSCC发病机制中涉及的关键致癌驱动因素提供了一种整合的基因组学方法。该数据集包含46例人乳头瘤病毒阳性头颈部鳞状细胞癌和25例正常悬垂腭咽成形术的RNA测序(RNA-Seq)样品。在log2FoldChange(FC)得分为2、调整的p值<0.01和筛选714个基因的组之间进行差异标记选择。粒子群优化(PSO)算法选择候选基因子集,将尺寸缩小到73。最先进的机器学习算法是用差异表达的基因和PSO的候选子集进行训练的。
    结果:使用Shapley加法扩张对预测模型的分析显示,七个基因对模型的性能有显著贡献。这些包括ECT2,LAMC2和DSG2,它们主要影响样本组之间的区分。其次是FAT1,PLOD2,COL1A1和PLAU。随机森林和贝叶斯网络算法在使用PSO功能时也获得了完美的验证分数。此外,基因集富集分析,蛋白质-蛋白质相互作用,和疾病本体论挖掘揭示了这些基因与目标条件之间的显着关联。如Shapley添加剂扩张(SHAP)所示,对三个关键基因的生存分析揭示了“癌症基因组图谱”样本中的强烈过表达。
    结论:我们的发现阐明了OPSCC中关键的致癌驱动因素,为开发靶向治疗和增强对其发病机制的理解提供了重要的见解。
    BACKGROUND: The incidence rate of oropharyngeal squamous cell carcinoma (OPSCC) worldwide is alarming. In the clinical community, there is a pressing necessity to comprehend the etiology of the OPSCC to facilitate the administration of effective treatments.
    METHODS: This study confers an integrative genomics approach for identifying key oncogenic drivers involved in the OPSCC pathogenesis. The dataset contains RNA-Sequencing (RNA-Seq) samples of 46 Human papillomavirus-positive head and neck squamous cell carcinoma and 25 normal Uvulopalatopharyngoplasty cases. The differential marker selection is performed between the groups with a log2FoldChange (FC) score of 2, adjusted p-value < 0.01, and screened 714 genes. The Particle Swarm Optimization (PSO) algorithm selects the candidate gene subset, reducing the size to 73. The state-of-the-art machine learning algorithms are trained with the differentially expressed genes and candidate subsets of PSO.
    RESULTS: The analysis of predictive models using Shapley Additive exPlanations revealed that seven genes significantly contribute to the model\'s performance. These include ECT2, LAMC2, and DSG2, which predominantly influence differentiating between sample groups. They were followed in importance by FAT1, PLOD2, COL1A1, and PLAU. The Random Forest and Bayes Net algorithms also achieved perfect validation scores when using PSO features. Furthermore, gene set enrichment analysis, protein-protein interactions, and disease ontology mining revealed a significant association between these genes and the target condition. As indicated by Shapley Additive exPlanations (SHAPs), the survival analysis of three key genes unveiled strong over-expression in the samples from \"The Cancer Genome Atlas\".
    CONCLUSIONS: Our findings elucidate critical oncogenic drivers in OPSCC, offering vital insights for developing targeted therapies and enhancing understanding its pathogenesis.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    随着全球甲状腺结节患病率的增加,这项研究调查了使用蓝牙耳机与甲状腺结节发生率之间的潜在相关性,考虑到这些设备发出的非电离辐射(NIR)的累积效应。在这项研究中,我们使用倾向得分匹配(PSM)和XGBOOST模型分析了来自WenJuanXing平台的600份有效问卷,辅以SHAP分析,评估甲状腺结节的风险。PSM用于平衡基线特征差异,从而减少偏差。然后采用XGBOOST模型来预测风险因素,模型功效由接收器工作特征(ROC)曲线(AUC)下面积测量。SHAP分析有助于量化和解释每个特征对预测结果的影响,确定关键风险因素。最初,来自文娟兴平台的600份有效问卷进行了PSM处理,产生96个案例的匹配数据集用于建模分析。XGBOOST模型的AUC值达到0.95,在区分甲状腺结节风险方面具有较高的准确性。SHAP分析显示,年龄和每日蓝牙耳机使用时间是影响甲状腺结节风险的两个最重要因素。具体来说,延长每天使用蓝牙耳机的持续时间与发生甲状腺结节的风险增加密切相关,如SHAP分析结果所示。我们的研究强调了长时间使用蓝牙耳机与甲状腺结节风险增加之间的显着影响关系,强调在使用现代技术时考虑健康影响的重要性,特别是对于经常使用的蓝牙耳机等设备。通过精确的模型预测和变量重要性分析,我们的研究为制定公共卫生政策和个人卫生习惯选择提供了科学依据,建议在日常生活中应注意蓝牙耳机的使用时间,以降低甲状腺结节的潜在风险。未来的研究应进一步探讨这种关系的生物学机制,并考虑其他潜在的影响因素,以提供更全面的健康指导和预防措施。
    With an increasing prevalence of thyroid nodules globally, this study investigates the potential correlation between the use of Bluetooth headsets and the incidence of thyroid nodules, considering the cumulative effects of non-ionizing radiation (NIR) emitted by these devices. In this study, we analyzed 600 valid questionnaires from the WenJuanXing platform using Propensity Score Matching (PSM) and the XGBOOST model, supplemented by SHAP analysis, to assess the risk of thyroid nodules. PSM was utilized to balance baseline characteristic differences, thereby reducing bias. The XGBOOST model was then employed to predict risk factors, with model efficacy measured by the area under the Receiver Operating Characteristic (ROC) curve (AUC). SHAP analysis helped quantify and explain the impact of each feature on the prediction outcomes, identifying key risk factors. Initially, 600 valid questionnaires from the WenJuanXing platform underwent PSM processing, resulting in a matched dataset of 96 cases for modeling analysis. The AUC value of the XGBOOST model reached 0.95, demonstrating high accuracy in differentiating thyroid nodule risks. SHAP analysis revealed age and daily Bluetooth headset usage duration as the two most significant factors affecting thyroid nodule risk. Specifically, longer daily usage durations of Bluetooth headsets were strongly linked to an increased risk of developing thyroid nodules, as indicated by the SHAP analysis outcomes. Our study highlighted a significant impact relationship between prolonged Bluetooth headset use and increased thyroid nodule risk, emphasizing the importance of considering health impacts in the use of modern technology, especially for devices like Bluetooth headsets that are frequently used daily. Through precise model predictions and variable importance analysis, our research provides a scientific basis for the formulation of public health policies and personal health habit choices, suggesting that attention should be paid to the duration of Bluetooth headset use in daily life to reduce the potential risk of thyroid nodules. Future research should further investigate the biological mechanisms of this relationship and consider additional potential influencing factors to offer more comprehensive health guidance and preventive measures.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    自动驾驶的最新进展伴随着损害自动驾驶汽车(AV)网络的相关网络安全问题。激励使用人工智能模型来检测这些网络上的异常。在这种情况下,使用可解释AI(XAI)来解释这些异常检测AI模型的行为至关重要。这项工作引入了一个全面的框架来评估用于AV中异常检测的黑盒XAI技术,促进对全局和局部XAI方法的检查,以阐明XAI技术做出的决策,这些决策解释了对异常AV行为进行分类的AI模型的行为。通过考虑六个评估指标(描述性准确性,稀疏,稳定性,效率,鲁棒性,和完整性),该框架评估了两种著名的黑盒XAI技术,SHAP和LIME,涉及应用XAI技术来识别对异常分类至关重要的主要特征,接下来是使用两个流行的自动驾驶数据集评估六个指标的SHAP和LIME的广泛实验,VeReMi和传感器。这项研究推进了黑盒XAI方法在自动驾驶系统中的真实世界异常检测的部署,在这一关键领域内,对当前黑箱XAI方法的优势和局限性做出有价值的见解。
    The recent advancements in autonomous driving come with the associated cybersecurity issue of compromising networks of autonomous vehicles (AVs), motivating the use of AI models for detecting anomalies on these networks. In this context, the usage of explainable AI (XAI) for explaining the behavior of these anomaly detection AI models is crucial. This work introduces a comprehensive framework to assess black-box XAI techniques for anomaly detection within AVs, facilitating the examination of both global and local XAI methods to elucidate the decisions made by XAI techniques that explain the behavior of AI models classifying anomalous AV behavior. By considering six evaluation metrics (descriptive accuracy, sparsity, stability, efficiency, robustness, and completeness), the framework evaluates two well-known black-box XAI techniques, SHAP and LIME, involving applying XAI techniques to identify primary features crucial for anomaly classification, followed by extensive experiments assessing SHAP and LIME across the six metrics using two prevalent autonomous driving datasets, VeReMi and Sensor. This study advances the deployment of black-box XAI methods for real-world anomaly detection in autonomous driving systems, contributing valuable insights into the strengths and limitations of current black-box XAI methods within this critical domain.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    卤化物钙钛矿材料在太阳能电池等各个领域有着广阔的应用前景,LED器件,光电探测器,荧光标记,生物成像,和光催化由于它们的带隙特性。本研究从已发表的文献中收集了实验数据,并利用了出色的预测能力,低过度拟合风险,和集成学习模型的强鲁棒性分析卤化物钙钛矿化合物的带隙。结果证明了集成学习决策树模型的有效性,特别是梯度提升决策树模型,均方根误差为0.090eV,平均绝对误差为0.053eV,决定系数为93.11%。对与通过元素摩尔量归一化计算的比率相关的数据的研究表明,X和B位置的离子对带隙有重大影响。此外,掺杂碘原子可以有效降低本征带隙,而锡原子的s和p轨道的杂化也可以降低带隙。通过预测光伏材料MASn1-xPbxI3的带隙来验证模型的准确性。总之,这项研究强调了机器学习对材料开发的积极影响,特别是在预测卤化物钙钛矿化合物的带隙时,其中集成学习方法显示出显著的优势。
    Halide perovskite materials have broad prospects for applications in various fields such as solar cells, LED devices, photodetectors, fluorescence labeling, bioimaging, and photocatalysis due to their bandgap characteristics. This study compiled experimental data from the published literature and utilized the excellent predictive capabilities, low overfitting risk, and strong robustness of ensemble learning models to analyze the bandgaps of halide perovskite compounds. The results demonstrate the effectiveness of ensemble learning decision tree models, especially the gradient boosting decision tree model, with a root mean square error of 0.090 eV, a mean absolute error of 0.053 eV, and a determination coefficient of 93.11%. Research on data related to ratios calculated through element molar quantity normalization indicates significant influences of ions at the X and B positions on the bandgap. Additionally, doping with iodine atoms can effectively reduce the intrinsic bandgap, while hybridization of the s and p orbitals of tin atoms can also decrease the bandgap. The accuracy of the model is validated by predicting the bandgap of the photovoltaic material MASn1-xPbxI3. In conclusion, this study emphasizes the positive impact of machine learning on material development, especially in predicting the bandgaps of halide perovskite compounds, where ensemble learning methods demonstrate significant advantages.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:由于多重耐药生物体(MDROs)引起的医疗保健相关感染,如耐甲氧西林金黄色葡萄球菌(MRSA)和艰难梭菌(CDI),给我们的医疗基础设施带来沉重负担。
    目的:MDROs的筛查是防止传播的重要机制,但却是资源密集型的。这项研究的目的是开发可以使用电子健康记录(EHR)数据预测定植或感染风险的自动化工具,提供有用的信息来帮助感染控制,并指导经验性抗生素覆盖。
    方法:我们回顾性地开发了一个机器学习模型来检测在弗吉尼亚大学医院住院患者样本采集时未分化患者的MRSA定植和感染。我们使用来自患者EHR数据的入院和住院期间信息的临床和非临床特征来构建模型。此外,我们在EHR数据中使用了一类从联系网络派生的特征;这些网络特征可以捕获患者与提供者和其他患者的联系,提高预测MRSA监测试验结果的模型可解释性和准确性。最后,我们探索了不同患者亚群的异质模型,例如,入住重症监护病房或急诊科的人或有特定检测史的人,哪个表现更好。
    结果:我们发现惩罚逻辑回归比其他方法表现更好,当我们使用多项式(二次)变换特征时,该模型的性能根据其接收器操作特征-曲线下面积得分提高了近11%。预测MDRO风险的一些重要特征包括抗生素使用,手术,使用设备,透析,患者的合并症状况,和网络特征。其中,网络功能增加了最大的价值,并将模型的性能提高了至少15%。对于特定患者亚群,具有相同特征转换的惩罚逻辑回归模型也比其他模型表现更好。
    结论:我们的研究表明,使用来自EHR数据的临床和非临床特征,通过机器学习方法可以非常有效地进行MRSA风险预测。网络特征是最具预测性的,并且提供优于现有方法的显著改进。此外,不同患者亚群的异质预测模型提高了模型的性能。
    BACKGROUND: Health care-associated infections due to multidrug-resistant organisms (MDROs), such as methicillin-resistant Staphylococcus aureus (MRSA) and Clostridioides difficile (CDI), place a significant burden on our health care infrastructure.
    OBJECTIVE: Screening for MDROs is an important mechanism for preventing spread but is resource intensive. The objective of this study was to develop automated tools that can predict colonization or infection risk using electronic health record (EHR) data, provide useful information to aid infection control, and guide empiric antibiotic coverage.
    METHODS: We retrospectively developed a machine learning model to detect MRSA colonization and infection in undifferentiated patients at the time of sample collection from hospitalized patients at the University of Virginia Hospital. We used clinical and nonclinical features derived from on-admission and throughout-stay information from the patient\'s EHR data to build the model. In addition, we used a class of features derived from contact networks in EHR data; these network features can capture patients\' contacts with providers and other patients, improving model interpretability and accuracy for predicting the outcome of surveillance tests for MRSA. Finally, we explored heterogeneous models for different patient subpopulations, for example, those admitted to an intensive care unit or emergency department or those with specific testing histories, which perform better.
    RESULTS: We found that the penalized logistic regression performs better than other methods, and this model\'s performance measured in terms of its receiver operating characteristics-area under the curve score improves by nearly 11% when we use polynomial (second-degree) transformation of the features. Some significant features in predicting MDRO risk include antibiotic use, surgery, use of devices, dialysis, patient\'s comorbidity conditions, and network features. Among these, network features add the most value and improve the model\'s performance by at least 15%. The penalized logistic regression model with the same transformation of features also performs better than other models for specific patient subpopulations.
    CONCLUSIONS: Our study shows that MRSA risk prediction can be conducted quite effectively by machine learning methods using clinical and nonclinical features derived from EHR data. Network features are the most predictive and provide significant improvement over prior methods. Furthermore, heterogeneous prediction models for different patient subpopulations enhance the model\'s performance.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    这项回顾性研究的目的是建立基于超声(US)-影像组学和临床因素的组合模型,以预测手术前I期宫颈癌(CC)的患者。
    回顾性分析安徽医科大学第一附属医院经阴道超声检查(TVS)发现宫颈病变的209例CC患者,患者分为训练集(n=146)和内部验证集(n=63),以安徽省妇幼保健院和南充市中心医院的52例CC患者作为外部验证集。通过单因素和多因素逻辑回归分析选择临床独立预测因子。从美国图像中提取美国影像组学特征。通过单变量分析选择最重要的特征后,斯皮尔曼相关分析,和最小绝对收缩和选择算子(LASSO)算法,使用六种机器学习(ML)算法来构建影像组学模型。接下来,临床能力,美国放射组学,并将临床US-影像组学联合模型与诊断I期CC进行了比较。最后,Shapley加法解释(SHAP)方法用于解释每个特征的贡献。
    宫颈病变的长径(L)和鳞状细胞癌相关抗原(SCCa)是I期CC的独立临床预测因子。极限梯度提升(Xgboost)模型在六个ML影像组学模型中表现最好,训练中的曲线下面积(AUC)值,内部验证,和外部验证集分别为0.778、0.751和0.751。在最后三个模型中,基于临床特征和rad评分的组合模型显示出良好的判别力,在训练中使用AUC值,内部验证,外部验证集分别为0.837、0.828和0.839。决策曲线分析验证了组合列线图的临床实用性。SHAP算法说明了组合模型中每个特征的贡献。
    我们建立了一个可解释的组合模型来预测I阶段CC。这种非侵入性预测方法可用于I期CC患者的术前识别。
    UNASSIGNED: The purpose of this retrospective study was to establish a combined model based on ultrasound (US)-radiomics and clinical factors to predict patients with stage I cervical cancer (CC) before surgery.
    UNASSIGNED: A total of 209 CC patients who had cervical lesions found by transvaginal sonography (TVS) from the First Affiliated Hospital of Anhui Medical University were retrospectively reviewed, patients were divided into the training set (n = 146) and internal validation set (n = 63), and 52 CC patients from Anhui Provincial Maternity and Child Health Hospital and Nanchong Central Hospital were taken as the external validation set. The clinical independent predictors were selected by univariate and multivariate logistic regression analyses. US-radiomics features were extracted from US images. After selecting the most significant features by univariate analysis, Spearman\'s correlation analysis, and the least absolute shrinkage and selection operator (LASSO) algorithm, six machine learning (ML) algorithms were used to build the radiomics model. Next, the ability of the clinical, US-radiomics, and clinical US-radiomics combined model was compared to diagnose stage I CC. Finally, the Shapley additive explanations (SHAP) method was used to explain the contribution of each feature.
    UNASSIGNED: Long diameter of the cervical lesion (L) and squamous cell carcinoma-associated antigen (SCCa) were independent clinical predictors of stage I CC. The eXtreme Gradient Boosting (Xgboost) model performed the best among the six ML radiomics models, with area under the curve (AUC) values in the training, internal validation, and external validation sets being 0.778, 0.751, and 0.751, respectively. In the final three models, the combined model based on clinical features and rad-score showed good discriminative power, with AUC values in the training, internal validation, and external validation sets being 0.837, 0.828, and 0.839, respectively. The decision curve analysis validated the clinical utility of the combined nomogram. The SHAP algorithm illustrates the contribution of each feature in the combined model.
    UNASSIGNED: We established an interpretable combined model to predict stage I CC. This non-invasive prediction method may be used for the preoperative identification of patients with stage I CC.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    早期检测老年人的认知能力下降对于有效干预至关重要。这项研究,马鞍山健康老龄化队列研究的一部分,检查了2288名认知功能正常的参与者。42个潜在的预测因子,包括人口统计,慢性疾病,生活方式因素,和基线认知功能,被选中。数据集分为训练,验证,和测试集(60%,20%,20%,分别)。递归特征消除(RFE)和六种机器学习算法用于模型开发。使用曲线下面积(AUC)评估模型性能,特异性,灵敏度,和准确性。沙普利附加扩张(SHAP)被应用于可解释性,揭示了十大有影响力的特征:基线MMSE,教育,经济地位,社会活动,PSQI,BMI,SBP,DBP,IADL,和年龄。基于朴素贝叶斯(NB)算法的模型在测试集上实现了0.820(95%CI0.773-0.887)的AUC,优于其他算法。该模型可以帮助社区环境中的初级卫生保健人员识别出老年人中三年内患认知障碍风险较高的个体。
    BACKGROUND: The prevalence of cognitive impairment and dementia in the older population is increasing, and thereby, early detection of cognitive decline is essential for effective intervention.
    METHODS: This study included 2,288 participants with normal cognitive function from the Ma\'anshan Healthy Aging Cohort Study. Forty-two potential predictors, including demographic characteristics, chronic diseases, lifestyle factors, anthropometric indices, physical function, and baseline cognitive function, were selected based on clinical importance and previous research. The dataset was partitioned into training, validation, and test sets in a proportion of 60% for training, 20% for validation, and 20% for testing, respectively. Recursive feature elimination was used for feature selection, followed by six machine learning algorithms that were employed for model development. The performance of the models was evaluated using area under the curve (AUC), specificity, sensitivity, and accuracy. Moreover, SHapley Additive exPlanations (SHAP) was conducted to access the interpretability of the final selected model and to gain insights into the impact of features on the prediction outcomes. SHAP force plots were established to vividly show the application of the prediction model at the individual level.
    RESULTS: The final predictive model based on the Naive Bayes algorithm achieved an AUC of 0.820 (95% CI, 0.773-0.887) on the test set, outperforming other algorithms. The top ten influential features in the model included baseline Mini-Mental State Examination (MMSE), education, self-reported economic status, collective or social activities, Pittsburgh sleep quality index (PSQI), body mass index, systolic blood pressure, diastolic blood pressure, instrumental activities of daily living, and age. The model demonstrated the potential to identify individuals at a higher risk of cognitive impairment within 3 years from older adults.
    CONCLUSIONS: The predictive model developed in this study contributes to the early detection of cognitive impairment in older adults by primary healthcare staff in community settings.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:自杀是青少年死亡的第二大原因,并且与自杀集群有关。尽管对这种可预防的死亡原因进行了大量研究,重点主要是单一国家和传统的统计方法。
    目的:本研究旨在使用跨国数据集和机器学习(ML)开发青少年自杀思维的预测模型。
    方法:我们使用韩国青少年风险行为网络调查的数据,对566,875名年龄在13至18岁之间的青少年进行调查,并使用青少年风险行为调查对103,874名青少年进行外部验证,挪威大学国家综合调查对19,574名青少年进行验证。开发了几种基于树的机器学习模型,并对特征重要性和Shapley加性解释值进行分析,以确定青少年自杀思维的危险因素。
    结果:在对来自韩国的基于韩国青年风险行为网络的调查数据进行训练时,以95%的CI,XGBoost模型报告的接受者工作特征(AUROC)曲线下面积为90.06%(95%CI89.97-90.16),与其他型号相比,表现出卓越的性能。对于使用美国青年风险行为调查数据和挪威大学国家综合调查的外部验证,XGBoost模型的AUROC分别为83.09%和81.27%,分别。在所有数据集中,XGBoost始终优于AUROC得分最高的其他模型,并被选为最优模型。就自杀思维的预测因素而言,悲伤和绝望的感觉是最有影响力的,占影响的57.4%,其次是压力状态为19.8%。其次是年龄(5.7%),家庭收入(4%),学业成绩(3.4%),性别(2.1%),和其他人,各贡献不到2%。
    结论:本研究通过整合来自3个国家的不同数据集来使用ML来解决青少年自杀问题。研究结果强调了情绪健康指标在预测青少年自杀思维中的重要作用。具体来说,悲伤和绝望被认为是最重要的预测因素,其次是压力条件和年龄。这些发现强调了青春期早期诊断和预防心理健康问题的迫切需要。
    BACKGROUND: Suicide is the second-leading cause of death among adolescents and is associated with clusters of suicides. Despite numerous studies on this preventable cause of death, the focus has primarily been on single nations and traditional statistical methods.
    OBJECTIVE: This study aims to develop a predictive model for adolescent suicidal thinking using multinational data sets and machine learning (ML).
    METHODS: We used data from the Korea Youth Risk Behavior Web-based Survey with 566,875 adolescents aged between 13 and 18 years and conducted external validation using the Youth Risk Behavior Survey with 103,874 adolescents and Norway\'s University National General Survey with 19,574 adolescents. Several tree-based ML models were developed, and feature importance and Shapley additive explanations values were analyzed to identify risk factors for adolescent suicidal thinking.
    RESULTS: When trained on the Korea Youth Risk Behavior Web-based Survey data from South Korea with a 95% CI, the XGBoost model reported an area under the receiver operating characteristic (AUROC) curve of 90.06% (95% CI 89.97-90.16), displaying superior performance compared to other models. For external validation using the Youth Risk Behavior Survey data from the United States and the University National General Survey from Norway, the XGBoost model achieved AUROCs of 83.09% and 81.27%, respectively. Across all data sets, XGBoost consistently outperformed the other models with the highest AUROC score, and was selected as the optimal model. In terms of predictors of suicidal thinking, feelings of sadness and despair were the most influential, accounting for 57.4% of the impact, followed by stress status at 19.8%. This was followed by age (5.7%), household income (4%), academic achievement (3.4%), sex (2.1%), and others, which contributed less than 2% each.
    CONCLUSIONS: This study used ML by integrating diverse data sets from 3 countries to address adolescent suicide. The findings highlight the important role of emotional health indicators in predicting suicidal thinking among adolescents. Specifically, sadness and despair were identified as the most significant predictors, followed by stressful conditions and age. These findings emphasize the critical need for early diagnosis and prevention of mental health issues during adolescence.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:开发和比较基于三相对比增强CT(CECT)的机器学习模型,以区分良性和恶性肾脏肿瘤。
    方法:总共,427名患者来自两个医疗中心:中心1(用作训练集)和中心2(用作外部验证集)。首先,从皮质髓质期(CP)中单独提取1781个放射学特征,肾图相位(NP),和排泄期(EP)CECT图像,之后,通过最小冗余最大相关性方法选择10个特征。第二,随机森林(RF)模型由单相特征(CP,NP,和EP)以及来自所有三个阶段(TP)的特征组合。第三,在训练集和外部验证集中评估RF模型.最后,模型的内部预测机制由SHapley加法扩张(SHAP)方法解释。
    结果:共纳入了来自中心1的266例肾脏肿瘤患者和来自中心2的161例患者。在训练集中,从CP构建的RF模型的AUC,NP,EP,TP特征分别为0.886、0.912、0.930和0.944。在外部验证集中,模型的AUC分别为0.860,0.821,0.921和0.908.根据SHAP方法,“original_shape_flatness”特征在基于EP特征的RF模型的预测结果中起着最重要的作用。
    结论:四种RF模型可有效区分良性和恶性实体肾肿瘤,基于EP特征的RF模型显示最佳性能。
    BACKGROUND: To develop and compare machine learning models based on triphasic contrast-enhanced CT (CECT) for distinguishing between benign and malignant renal tumors.
    METHODS: In total, 427 patients were enrolled from two medical centers: Center 1 (serving as the training set) and Center 2 (serving as the external validation set). First, 1781 radiomic features were individually extracted from corticomedullary phase (CP), nephrographic phase (NP), and excretory phase (EP) CECT images, after which 10 features were selected by the minimum redundancy maximum relevance method. Second, random forest (RF) models were constructed from single-phase features (CP, NP, and EP) as well as from the combination of features from all three phases (TP). Third, the RF models were assessed in the training and external validation sets. Finally, the internal prediction mechanisms of the models were explained by the SHapley Additive exPlanations (SHAP) approach.
    RESULTS: A total of 266 patients with renal tumors from Center 1 and 161 patients from Center 2 were included. In the training set, the AUCs of the RF models constructed from the CP, NP, EP, and TP features were 0.886, 0.912, 0.930, and 0.944, respectively. In the external validation set, the models achieved AUCs of 0.860, 0.821, 0.921, and 0.908, respectively. The \"original_shape_Flatness\" feature played the most important role in the prediction outcome for the RF model based on EP features according to the SHAP method.
    CONCLUSIONS: The four RF models efficiently differentiated benign from malignant solid renal tumors, with the EP feature-based RF model displaying the best performance.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号