GBDT GBDT-医云文献数字医云科研云海量医学决策数据服务

GBDT 关注

GBDT

文献(16篇)

百科

视频

1 GBDT Method Integrating Feature-Enhancement and Active-Learning Strategies-Sea Ice Thickness Inversion in Beaufort Sea.

结合特征增强和主动学习策略的 GBDT 方法 - 波弗特海海冰厚度反演。影响指数 : 3.847
发表时间：Apr 2024 29
来源期刊：Sensors (Basel) PMID：38732944

DOI：10.3390/s24092836
文章类型： Journal Article

海冰,作为地球生态系统的重要组成部分,由于其厚度，对全球气候和人类活动产生了深远的影响。因此,海冰厚度反演具有重要的研究意义。由于环境和设备相关的限制，目前可用于遥感反演的样本数量不足。在高空间分辨率下，遥感数据包含有限的信息和噪声干扰，严重影响了海冰厚度反演的精度。针对上述问题,我们使用来自波弗特海的冰草案数据进行了实验，并设计了一种改进的GBDT方法，该方法集成了特征增强和主动学习策略（IFEAL-GBDT）。在这种方法中，入射角和时间序列用于对数据进行时空校正，减少时间和空间影响。同时,根据原始极化信息，生成有效的多属性特征，以扩大信息含量，提高不同厚度海冰的可分性。考虑到海冰的生长周期和年龄，添加了月份和海水温度的属性。此外,我们研究了一种基于最大标准差的主动学习策略，以选择更多信息和代表性的样本，提高模型的泛化能力。采用改进的GBDT模型进行训练和预测,在处理非线性方面提供优势，高维数据，和数据噪声问题，进一步扩大功能增强和主动学习策略的有效性。与其他方法相比，本文提出的方法达到最佳的反演精度,IFEAL-GBDT的平均绝对误差为8cm，均方根误差为13.7cm，相关系数为0.912。该研究证明了我们方法的有效性，适用于利用Sentinel-1数据确定的海冰厚度的高精度反演。
Sea ice, as an important component of the Earth\'s ecosystem, has a profound impact on global climate and human activities due to its thickness. Therefore, the inversion of sea ice thickness has important research significance. Due to environmental and equipment-related limitations, the number of samples available for remote sensing inversion is currently insufficient. At high spatial resolutions, remote sensing data contain limited information and noise interference, which seriously affect the accuracy of sea ice thickness inversion. In response to the above issues, we conducted experiments using ice draft data from the Beaufort Sea and designed an improved GBDT method that integrates feature-enhancement and active-learning strategies (IFEAL-GBDT). In this method, the incident angle and time series are used to perform spatiotemporal correction of the data, reducing both temporal and spatial impacts. Meanwhile, based on the original polarization information, effective multi-attribute features are generated to expand the information content and improve the separability of sea ice with different thicknesses. Taking into account the growth cycle and age of sea ice, attributes were added for month and seawater temperature. In addition, we studied an active learning strategy based on the maximum standard deviation to select more informative and representative samples and improve the model\'s generalization ability. The improved GBDT model was used for training and prediction, offering advantages in dealing with nonlinear, high-dimensional data, and data noise problems, further expanding the effectiveness of feature-enhancement and active-learning strategies. Compared with other methods, the method proposed in this paper achieves the best inversion accuracy, with an average absolute error of 8 cm and a root mean square error of 13.7 cm for IFEAL-GBDT and a correlation coefficient of 0.912. This research proves the effectiveness of our method, which is suitable for the high-precision inversion of sea ice thickness determined using Sentinel-1 data.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
2 Transformer fault diagnosis method based on SMOTE and NGO-GBDT.

基于 SMOTE 和 NGO - GBDT 的变压器故障诊断方法. 影响指数 : 4.996
发表时间：Mar 2024 26
来源期刊：Sci Rep PMID：38531936

DOI：10.1038/s41598-024-57509-w
文章类型： Journal Article

为了提高变压器故障诊断的准确性,改善模型训练不足导致的不平衡样本对模型辨识精度低的影响,提出了一种基于SMOTE和NGO-GBDT的变压器故障诊断方法。首先,使用合成少数过采样技术（SMOTE）来扩展少数样本。其次,采用非编码比方法构造多维特征参数,引入光梯度提升机(LightGBM)特征优化策略筛选最优特征子集。最后,采用NorthernGoshawk优化(NGO)算法对梯度提升决策树(GBDT)参数进行优化,实现了变压器故障诊断。结果表明,该方法可以减少少数样本的误判。与其他集成模型相比，该方法具有较高的故障识别精度,误判率低，性能稳定。
In order to improve the accuracy of transformer fault diagnosis and improve the influence of unbalanced samples on the low accuracy of model identification caused by insufficient model training, this paper proposes a transformer fault diagnosis method based on SMOTE and NGO-GBDT. Firstly, the Synthetic Minority Over-sampling Technique (SMOTE) was used to expand the minority samples. Secondly, the non-coding ratio method was used to construct multi-dimensional feature parameters, and the Light Gradient Boosting Machine (LightGBM) feature optimization strategy was introduced to screen the optimal feature subset. Finally, Northern Goshawk Optimization (NGO) algorithm was used to optimize the parameters of Gradient Boosting Decision Tree (GBDT), and then the transformer fault diagnosis was realized. The results show that the proposed method can reduce the misjudgment of minority samples. Compared with other integrated models, the proposed method has high fault identification accuracy, low misjudgment rate and stable performance.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
3 Predicting Risk of Bullying Victimization among Primary and Secondary School Students: Based on a Machine Learning Model.

基于机器学习模型的中小学生欺凌受害风险预测 [J]. 影响指数 : 2.286
发表时间：Jan 2024 20
来源期刊：Behav Sci (Basel) PMID：38275356

DOI：10.3390/bs14010073
文章类型： Journal Article

中小学生校园欺凌问题日益受到重视,识别相关因素是降低欺凌受害风险的重要途径。机器学习方法可以帮助研究人员预测和识别个体风险行为。通过机器学习方法(即，梯度提升决策树模型，GBDT)，本纵向研究旨在系统地检查个体，家庭,和学校环境因素，可以预测一年后中小学生遭受欺凌的风险。共有2767名参与者(2065名中学生，702名小学生，55.20%的女学生，T1的平均年龄为12.22）在第一波完成了24个预测因子的测量，包括个人因素(例如，自我控制,性别,grade),家庭因素(家庭凝聚力，家长控制,父母教养方式)，同伴因素(同伴关系)，和学校因素(师生关系，学习能力)。一年后(即，T2),他们完成了Olweus欺凌调查问卷.GBDT模型通过培训一系列基础学习者并输出预测因子的重要性排名，预测了一年后中小学生是否会遭受学校欺凌。GBDT模型表现良好。GBDT模型产生了前6个预测因子：师生关系，同伴关系,家庭凝聚力,负面影响，焦虑,否认育儿方式。保护因素(即，师生关系,同伴关系,和家庭凝聚力)和风险因素(即，负面影响，焦虑,并否认父母教养方式）与一年后中小学生中欺凌受害的风险相关，通过使用机器学习方法来识别。GBDT模型可以作为预测儿童和青少年未来欺凌受害风险的工具，并有助于提高学校欺凌干预措施的有效性。
School bullying among primary and secondary school students has received increasing attention, and identifying relevant factors is a crucial way to reduce the risk of bullying victimization. Machine learning methods can help researchers predict and identify individual risk behaviors. Through a machine learning approach (i.e., the gradient boosting decision tree model, GBDT), the present longitudinal study aims to systematically examine individual, family, and school environment factors that can predict the risk of bullying victimization among primary and secondary school students a year later. A total of 2767 participants (2065 secondary school students, 702 primary school students, 55.20% female students, mean age at T1 was 12.22) completed measures of 24 predictors at the first wave, including individual factors (e.g., self-control, gender, grade), family factors (family cohesion, parental control, parenting style), peer factor (peer relationship), and school factors (teacher-student relationship, learning capacity). A year later (i.e., T2), they completed the Olweus Bullying Questionnaire. The GBDT model predicted whether primary and secondary school students would be exposed to school bullying after one year by training a series of base learners and outputting the importance ranking of predictors. The GBDT model performed well. The GBDT model yielded the top 6 predictors: teacher-student relationship, peer relationship, family cohesion, negative affect, anxiety, and denying parenting style. The protective factors (i.e., teacher-student relationship, peer relationship, and family cohesion) and risk factors (i.e., negative affect, anxiety, and denying parenting style) associated with the risk of bullying victimization a year later among primary and secondary school students are identified by using a machine learning approach. The GBDT model can be used as a tool to predict the future risk of bullying victimization for children and adolescents and to help improve the effectiveness of school bullying interventions.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
4 Machine learning approach for carbon disclosure in the Korean market: The role of environmental performance.

韩国市场碳披露的机器学习方法：环境绩效的作用。影响指数 : 1.512
发表时间：Jan-Mar 2024
来源期刊：Sci Prog PMID：38234092

DOI：10.1177/00368504231220766
文章类型： Journal Article

在过去的几十年里，学者们采用了广泛的方法来确定影响企业自愿碳披露的因素。这些研究大多是在先进市场进行的。本文旨在通过利用随机森林和梯度提升决策树等机器学习模型来研究韩国金融市场中自愿碳披露的趋势。根据一组手工收集的碳披露数据，与传统的逻辑模型相比，我们最初证明了机器学习模型的性能明显更好。关于影响披露的因素，我们始终发现环境分数的重要性，强调ESG管理实践的新兴大趋势在披露决策中的作用。然而,与最近的研究相反，我们没有发现韩国独特的治理结构，财阀,在碳披露决策中的预测性能和变量重要性方面有任何显著不同的含义。
Over the past few decades, scholars have employed a wide range of methodologies to determine the factors influencing firms\' voluntary carbon disclosure. Most of these studies have been conducted in advanced markets. This article aims to examine the trend of voluntary carbon disclosure in the Korean financial market by utilizing machine learning models such as Random Forest and Gradient Boosted Decision Tree. Based on a set of hand-collected carbon disclosure data, we initially demonstrated significantly better performance of machine learning models compared to the traditional logistic model. Regarding the factors influencing disclosure, we consistently find the importance of environmental scores, emphasizing the role of the emerging mega-trend of ESG management practices in disclosure decisions. However, in contrast to recent studies, we do not find that the unique Korean governance structure, chaebol, has any significantly different implications in terms of prediction performance and variable importance in carbon disclosure decisions.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
5 GBDT_KgluSite: An improved computational prediction model for lysine glutarylation sites based on feature fusion and GBDT classifier.

GBDT _ KgluSite: 基于特征融合和 GBDT 分类器的改进的赖氨酸戊二化位点计算预测模型影响指数 : 4.547
发表时间：Dec 2023 11
来源期刊：BMC Genomics PMID：38082413

DOI：10.1186/s12864-023-09834-z
文章类型： Journal Article

背景：赖氨酸戊二酰化（Kglu）是最重要的翻译后修饰（PTM）之一，在各种细胞功能中起着重要作用，包括新陈代谢，线粒体过程，和翻译。因此,Kglu位点的准确鉴定对于阐明蛋白质的分子功能很重要。由于传统生物实验费时费力的局限性,基于计算的Kglu站点预测研究越来越受到重视。
结果：在本文中，我们提出了GBDT_KgluSite，一种基于GBDT和适当特征组合的Kglu站点预测模型，取得了令人满意的性能。具体来说,七个特征，包括基于序列的特征，基于物理化学性质的特征，基于结构的特征，和进化衍生的特征被用来表征蛋白质。NearMiss-3和ElasticNet用于解决数据不平衡和功能冗余问题，分别。实验结果表明,GBDT_KgluSite具有良好的鲁棒性和泛化能力,准确度和AUC值为93.73%，五倍交叉验证为98.14%，90.11%，在独立测试数据集上为96.75%，分别。
结论：GBDT_KgluSite是鉴定蛋白质序列中Kglu位点的有效计算方法。它具有良好的稳定性和泛化能力，可用于将来鉴定新的Kglu位点。相关代码和数据集可在https://github.com/flyinsky6/GBDT_KgluSite获得。
BACKGROUND: Lysine glutarylation (Kglu) is one of the most important Post-translational modifications (PTMs), which plays significant roles in various cellular functions, including metabolism, mitochondrial processes, and translation. Therefore, accurate identification of the Kglu site is important for elucidating protein molecular function. Due to the time-consuming and expensive limitations of traditional biological experiments, computational-based Kglu site prediction research is gaining more and more attention.
RESULTS: In this paper, we proposed GBDT_KgluSite, a novel Kglu site prediction model based on GBDT and appropriate feature combinations, which achieved satisfactory performance. Specifically, seven features including sequence-based features, physicochemical property-based features, structural-based features, and evolutionary-derived features were used to characterize proteins. NearMiss-3 and Elastic Net were applied to address data imbalance and feature redundancy issues, respectively. The experimental results show that GBDT_KgluSite has good robustness and generalization ability, with accuracy and AUC values of 93.73%, and 98.14% on five-fold cross-validation as well as 90.11%, and 96.75% on the independent test dataset, respectively.
CONCLUSIONS: GBDT_KgluSite is an effective computational method for identifying Kglu sites in protein sequences. It has good stability and generalization ability and could be useful for the identification of new Kglu sites in the future. The relevant code and dataset are available at https://github.com/flyinsky6/GBDT_KgluSite .

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
6 Prediction of diabetic kidney disease risk using machine learning models: A population-based cohort study of Asian adults.

使用机器学习模型预测糖尿病肾病风险：一项基于人群的亚洲成年人队列研究。影响指数 : 8.713
发表时间：09 2023 14
来源期刊：Elife PMID：37706530

DOI：10.7554/eLife.81878
文章类型： Journal Article

背景：机器学习（ML）技术通过识别多维数据中最相关的特征来改善疾病预测。我们比较了ML算法预测糖尿病肾病（DKD）的准确性。
方法：我们利用了1365名中国人的纵向数据，参与新加坡眼病流行病学研究（2004-2017）的基线和6年随访的年龄在40-80岁的马来人和印度参与者患有糖尿病，但没有DKD。事件DKD（11.9％）定义为估计的肾小球滤过率（eGFR）<60mL/min/1.73m2，随访时eGFR从基线下降至少25％。339个功能，包括参与者特征，视网膜成像,遗传和血液代谢物被用作预测因子。基于DKD(年龄，性别,种族,糖尿病的持续时间，收缩压,HbA1c,和体重指数）使用接收器工作特征曲线下面积（AUC）。
结果：ML模型，ElasticNet(EN)的最佳AUC(95%置信区间)为0.851(0.847-0.856)，比LR0.795（0.790-0.801）高出7.0%。EN的敏感性和特异性分别为88.2%和65.9%。LR分别为73.0%和72.8%。前15名预测因素包括年龄，种族,抗糖尿病药物,高血压,糖尿病视网膜病变，收缩压,HbA1c,eGFR和与脂质相关的代谢物，脂蛋白,脂肪酸和酮体。
结论：我们的结果显示，ML与特征选择一起提高了无症状稳定人群中DKD风险的预测准确性，并识别了包括代谢物在内的新风险因素。
资助：这项研究得到了国家医学研究委员会的支持，NMRC/OFLCG/001/2017和NMRC/HCSAINV/MOH-001019-00。资助者在研究设计中没有作用，数据收集和分析，决定发布，或准备手稿。
Machine learning (ML) techniques improve disease prediction by identifying the most relevant features in multidimensional data. We compared the accuracy of ML algorithms for predicting incident diabetic kidney disease (DKD).
We utilized longitudinal data from 1365 Chinese, Malay, and Indian participants aged 40-80 y with diabetes but free of DKD who participated in the baseline and 6-year follow-up visit of the Singapore Epidemiology of Eye Diseases Study (2004-2017). Incident DKD (11.9%) was defined as an estimated glomerular filtration rate (eGFR) <60 mL/min/1.73 m2 with at least 25% decrease in eGFR at follow-up from baseline. A total of 339 features, including participant characteristics, retinal imaging, and genetic and blood metabolites, were used as predictors. Performances of several ML models were compared to each other and to logistic regression (LR) model based on established features of DKD (age, sex, ethnicity, duration of diabetes, systolic blood pressure, HbA1c, and body mass index) using area under the receiver operating characteristic curve (AUC).
ML model Elastic Net (EN) had the best AUC (95% CI) of 0.851 (0.847-0.856), which was 7.0% relatively higher than by LR 0.795 (0.790-0.801). Sensitivity and specificity of EN were 88.2 and 65.9% vs. 73.0 and 72.8% by LR. The top 15 predictors included age, ethnicity, antidiabetic medication, hypertension, diabetic retinopathy, systolic blood pressure, HbA1c, eGFR, and metabolites related to lipids, lipoproteins, fatty acids, and ketone bodies.
Our results showed that ML, together with feature selection, improves prediction accuracy of DKD risk in an asymptomatic stable population and identifies novel risk factors, including metabolites.
This study was supported by the National Medical Research Council, NMRC/OFLCG/001/2017 and NMRC/HCSAINV/MOH-001019-00. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
7 Bayesian model averaging for predicting factors associated with length of COVID-19 hospitalization.

贝叶斯模型平均预测与 COVID - 19 住院时间相关的因素。影响指数 : 4.612
发表时间：07 2023 6
来源期刊：BMC Med Res Methodol PMID：37415112

DOI：10.1186/s12874-023-01981-x
文章类型： Journal Article

背景：COVID-19导致的住院时间（LOHS）造成了经济负担，医疗服务系统的成本以及患者和卫生工作者的高心理负担。这项研究的目的是采用基于线性回归模型的贝叶斯模型平均（BMA），并确定COVID-19LOHS的预测因子。
方法：在这项历史队列研究中，从5100名在医院数据库注册的COVID-19患者中，4996名患者有资格进入研究。数据包括人口统计，临床,生物标志物，和LOHS。影响LOHS的因素在六个模型中进行了拟合，包括逐步方法，AIC,经典线性回归模型中的BIC，使用奥卡姆窗口和马尔可夫链蒙特卡罗(MCMC)方法的两个BMA，和GBDT算法，一种新的机器学习方法。
结果：平均住院时间为6.7±5.7天。在拟合经典线性模型时，逐步法和AIC法(R2=0.168，调整后的R2=0.165)均优于BIC法(R2=0.160，调整后的R2=0.158)。在适应BMA时，Occam的Window模型的性能优于MCMC，R2=0.174。值R2=0.64的GBDT方法在测试数据集中的表现比BMA差，但在训练数据集中没有表现。基于六个拟合模型，在ICU住院,呼吸窘迫,年龄,糖尿病,CRP,PO2,WBC,AST,BUN,NLR与预测COVID-19的LOHS显著相关。
结论：使用Occam\'sWindow方法的BMA在预测测试数据集中的LOHS影响因素方面比其他模型具有更好的拟合和性能。
The length of hospital stay (LOHS) caused by COVID-19 has imposed a financial burden, and cost on the healthcare service system and a high psychological burden on patients and health workers. The purpose of this study is to adopt the Bayesian model averaging (BMA) based on linear regression models and to determine the predictors of the LOHS of COVID-19.
In this historical cohort study, from 5100 COVID-19 patients who had registered in the hospital database, 4996 patients were eligible to enter the study. The data included demographic, clinical, biomarkers, and LOHS. Factors affecting the LOHS were fitted in six models, including the stepwise method, AIC, BIC in classical linear regression models, two BMA using Occam\'s Window and Markov Chain Monte Carlo (MCMC) methods, and GBDT algorithm, a new method of machine learning.
The average length of hospitalization was 6.7 ± 5.7 days. In fitting classical linear models, both stepwise and AIC methods (R 2 = 0.168 and adjusted R 2 = 0.165) performed better than BIC (R 2 = 0.160 and adjusted = 0.158). In fitting the BMA, Occam\'s Window model has performed better than MCMC with R 2 = 0.174. The GBDT method with the value of R 2 = 0.64, has performed worse than the BMA in the testing dataset but not in the training dataset. Based on the six fitted models, hospitalized in ICU, respiratory distress, age, diabetes, CRP, PO2, WBC, AST, BUN, and NLR were associated significantly with predicting LOHS of COVID-19.
The BMA with Occam\'s Window method has a better fit and better performance in predicting affecting factors on the LOHS in the testing dataset than other models.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
8 An objective model for diagnosing comorbid cognitive impairment in patients with epilepsy based on the clinical-EEG functional connectivity features.

基于临床脑电图功能连接特征的癫痫患者并发认知障碍的客观诊断模型。影响指数 : 5.152
发表时间：2022
来源期刊：Front Neurosci PMID：36711136

DOI：10.3389/fnins.2022.1060814
文章类型： Journal Article

未经证实：认知障碍（CI）是癫痫（PWE）患者的常见疾病。客观的评估方法诊断PWE中的CI将在现实中有益。这项研究提出了使用脑电图（EEG）的临床和锁相值（PLV）功能连接特征来构建PWE中CI的诊断模型。
未经评估：符合纳入和排除标准的PWEs分为认知正常(CON)组(n=55)和aCI组(n=76)。对患者就诊时的23个临床特征和684个PLVEEG特征进行筛选并使用Fisher评分进行排序。自适应增强（AdaBoost）和梯度增强决策树（GBDT）被用作算法，以构建具有纯临床特征的PWE中CI的诊断模型。纯PLV脑电图特征，或结合临床和PLV脑电图特征。使用五重交叉验证方法评估这些模型的性能。
UNASSIGNED：GBDT建立的模型结合了临床和PLVEEG特征，具有最佳的准确性，精度,召回，F1分数，曲线下面积(AUC)为90.11、93.40、89.50、91.39和0.95%。根据Fisher评分发现的影响模型性能的前5个特征是头部的磁共振成像（MRI）异常发现，教育程度,PLV脑电图在β（β）-波段C3-F4，发作频率，和θ(θ)波段Fp1-Fz的PLV脑电图。前5%的特征中总共有12个表现出统计学上不同的PLV脑电图特征，其中8个是θ波段的PLV脑电图特征。
UNASSIGNED：结合临床和PLV脑电图特征构建的模型可以有效地识别PWEs中的fyCI，并具有作为有用的客观评估方法的潜力。θ波段的PLVEEG可能是CI合并癫痫的补充诊断的潜在生物标志物。
UNASSIGNED: Cognitive impairment (CI) is a common disorder in patients with epilepsy (PWEs). Objective assessment method for diagnosing CI in PWEs would be beneficial in reality. This study proposed to construct a diagnostic model for CI in PWEs using the clinical and the phase locking value (PLV) functional connectivity features of the electroencephalogram (EEG).
UNASSIGNED: PWEs who met the inclusion and exclusion criteria were divided into a cognitively normal (CON) group (n = 55) and a CI group (n = 76). The 23 clinical features and 684 PLV EEG features at the time of patient visit were screened and ranked using the Fisher score. Adaptive Boosting (AdaBoost) and Gradient Boosting Decision Tree (GBDT) were used as algorithms to construct diagnostic models of CI in PWEs either with pure clinical features, pure PLV EEG features, or combined clinical and PLV EEG features. The performance of these models was assessed using a five-fold cross-validation method.
UNASSIGNED: GBDT-built model with combined clinical and PLV EEG features performed the best with accuracy, precision, recall, F1-score, and an area under the curve (AUC) of 90.11, 93.40, 89.50, 91.39, and 0.95%. The top 5 features found to influence the model performance based on the Fisher scores were the magnetic resonance imaging (MRI) findings of the head for abnormalities, educational attainment, PLV EEG in the beta (β)-band C3-F4, seizure frequency, and PLV EEG in theta (θ)-band Fp1-Fz. A total of 12 of the top 5% of features exhibited statistically different PLV EEG features, while eight of which were PLV EEG features in the θ band.
UNASSIGNED: The model constructed from the combined clinical and PLV EEG features could effectively identify CI in PWEs and possess the potential as a useful objective evaluation method. The PLV EEG in the θ band could be a potential biomarker for the complementary diagnosis of CI comorbid with epilepsy.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
9 CED: A case-level explainable paramedical diagnosis via AdaGBDT.

CED ：通过 AdaGBDT 进行的病例级可解释的辅助医疗诊断。影响指数 : 6.698
发表时间：02 2023
来源期刊：Comput Biol Med PMID：36592608

DOI：10.1016/j.compbiomed.2022.106500
文章类型： Journal Article

目的：医学数据的快速增长极大地促进了机器学习在辅助医学诊断中的广泛应用。与他们的表现成反比，大多数机器学习模型通常缺乏可解释性，尤其是模型的局部可解释性，也就是说,具体案例的可解释性。
方法：在本文中，我们提出了一个基于GBDT（梯度提升决策树）的可解释模型，用于特定病例的辅助医疗诊断，主要做出了以下贡献:(1)提出了一种自适应梯度提升决策树(AdaGBDT)模型,有效地提升了决策路径挖掘;(2)学习了针对特定患者的特定病例特征重要性嵌入,应用双边互信息来表征决策路径上的回溯；（3）通过在特定案例度量空间中通过全局可解释的AdaGBDT和基于案例的推理（CBR）进行协作决策，一些困难的情况可以通过可视化解释的方式来识别。在威斯康星诊断乳腺癌数据集和UCI心脏病数据集上评估我们模型的性能。
结果：在两个数据集上进行的实验表明，我们的AdaGBDT实现了最佳性能，F1值分别为0.9647和0.8405。此外，一系列实验分析和案例研究进一步说明了特征重要性嵌入的优异性能。
结论：通过AdaGBDT提出的针对特定病例的可解释辅助医疗诊断具有出色的预测性能，具有有希望的案例级别和一致的全球可解释性。
The rapid growth of medical data has greatly promoted the wide exploitation of machine learning for paramedical diagnosis. Inversely proportional to their performance, most machine learning models generally suffer from the lack of explainability, especially the local explainability of the model, that is, the case-specific explainability.
In this paper, we proposed a GBDT (Gradient Boosting Decision Tree)-based explainable model for case-specific paramedical diagnostics, and mainly make the following contributions: (1) an adaptive gradient boosting decision tree (AdaGBDT) model is proposed to boost the path-mining for decision effectively; (2) to learn a case-specific feature importance embedding for a specific patient, the bi-side mutual information is applied to characterize the backtracking on the decision path; (3) through the collaborative decision-making by globally explainable AdaGBDT with case-based reasoning (CBR) in the case-specific metric space, some hard cases can be identified by the means of visualized interpretation. The performance of our model is evaluated on the Wisconsin diagnostic breast cancer dataset and the UCI heart disease dataset.
Experiments conducted on two datasets show that our AdaGBDT achieves the best performance, with the F1-value of 0.9647 and 0.8405 respectively. Moreover, a series of experimental analyses and case studies further illustrate the excellent performance of feature importance embedding.
The proposed case-specific explainable paramedical diagnosis via AdaGBDT has excellent predictive performance, with both promising case-level and consistent global explainability.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
10 Epilepsy Seizures Prediction Based on Nonlinear Features of EEG Signal and Gradient Boosting Decision Tree.

基于脑电信号非线性特征和梯度提升决策树的癫痫发作预测 [J]. 影响指数 : 4.614
发表时间：09 2022 9
来源期刊：Int J Environ Res Public Health PMID：36141613

DOI：10.3390/ijerph191811326
文章类型： Journal Article

癫痫是一种常见的神经系统疾病，伴有突发性和复发性癫痫发作。早期预测癫痫发作并进行有效干预可显著降低患者遭受的危害。在本文中,提出了一种基于脑电信号非线性特征和梯度增强决策树（GBDT）的癫痫发作早期预测方法。首先,EEG信号分为两类:在一段时间内发作的那些(用InT表示)和没有发作的那些.第二,使用互补集成经验模式分解（CEEMD）和小波阈值去噪去除EEG中的噪声。第三，提取两类脑电的非线性特征,包括近似熵，样本熵，排列熵，谱熵和小波熵。第四，以随机森林为初始结果的GBDT分类器被设计用于区分两类脑电图。第五,使用了两步“kofn”方法来减少错误警报的数量。对CHB-MIT头皮脑电图数据库中的13名患者的脑电图数据进行了评估。基于十倍交叉验证，在30分钟进行InT时，平均准确率为91.76%，成功预测了39例癫痫发作中的38例。当InT服用40分钟时，平均准确率为92.50%,所有选择的42例癫痫发作均成功预测.结果表明所提出的方法用于预测癫痫发作的有效性。
Epilepsy is a common neurological disorder with sudden and recurrent seizures. Early prediction of seizures and effective intervention can significantly reduce the harm suffered by patients. In this paper, a method based on nonlinear features of EEG signal and gradient boosting decision tree (GBDT) is proposed for early prediction of epilepsy seizures. First, the EEG signals were divided into two categories: those that had seizures onset over a period of time (represented by InT) and those that did not. Second, the noise in the EEG was removed using complementary ensemble empirical mode decomposition (CEEMD) and wavelet threshold denoising. Third, the nonlinear features of the two categories of EEG were extracted, including approximate entropy, sample entropy, permutation entropy, spectral entropy and wavelet entropy. Fourth, a GBDT classifier with random forest as the initial result was designed to distinguish the two categories of EEG. Fifth, a two-step \"k of n\" method was used to reduce the number of false alarms. The proposed method was evaluated on 13 patients\' EEG data from the CHB-MIT Scalp EEG Database. Based on ten-fold cross validation, the average accuracy was 91.76% when the InT was taken at 30 min, and 38 out of 39 seizures were successfully predicted. When the InT was taken for 40 min, the average accuracy was 92.50% and all 42 seizures selected were successfully predicted. The results indicate the effectiveness of the proposed method for predicting epilepsy seizures.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文

GBDT 关注

1 GBDT Method Integrating Feature-Enhancement and Active-Learning Strategies-Sea Ice Thickness Inversion in Beaufort Sea.

2 Transformer fault diagnosis method based on SMOTE and NGO-GBDT.

3 Predicting Risk of Bullying Victimization among Primary and Secondary School Students: Based on a Machine Learning Model.

4 Machine learning approach for carbon disclosure in the Korean market: The role of environmental performance.

5 GBDT_KgluSite: An improved computational prediction model for lysine glutarylation sites based on feature fusion and GBDT classifier.

6 Prediction of diabetic kidney disease risk using machine learning models: A population-based cohort study of Asian adults.

7 Bayesian model averaging for predicting factors associated with length of COVID-19 hospitalization.

8 An objective model for diagnosing comorbid cognitive impairment in patients with epilepsy based on the clinical-EEG functional connectivity features.

9 CED: A case-level explainable paramedical diagnosis via AdaGBDT.

10 Epilepsy Seizures Prediction Based on Nonlinear Features of EEG Signal and Gradient Boosting Decision Tree.