ensemble machine learning

集成机器学习
  • 文章类型: Journal Article
    目的:本研究旨在开发一种基于集成机器学习(基于EML)的头颈部癌症患者接受质子放射治疗的放射性皮炎(RD)风险预测模型,与传统模型相比,目标是实现卓越的预测性能。
    方法:对高雄长庚纪念医院57例接受调强质子治疗的头颈癌患者的数据进行分析。该研究纳入了11个临床参数和9个剂量学参数。皮尔逊相关性用于消除高度相关的变量,然后通过LASSO进行特征选择,以关注潜在的RD预测因子。模型训练涉及传统的逻辑回归(LR)和先进的集成方法,如随机森林和XGBoost,通过超参数调整进行了优化。
    结果:特征选择确定了六个关键预测因子,包括吸烟史和具体剂量学参数。集成机器学习模型,特别是XGBoost,表现出卓越的性能,达到0.890的最高AUC。使用SHAP(SHapley加法扩张)值评估特征重要性,强调了各种临床和剂量学因素在预测RD中的相关性。
    结论:研究证实EML方法,特别是XGBoost及其增强算法,提供卓越的预测准确性,增强的功能选择,与传统LR相比,改进了数据处理。虽然LR提供了更大的可解释性,EML的精度和更广泛的适用性使其更适合复杂的医学预测任务,比如预测放射性皮炎。鉴于这些优势,EML强烈建议在临床环境中进行进一步研究和应用。
    OBJECTIVE: This study aims to develop an ensemble machine learning-based (EML-based) risk prediction model for radiation dermatitis (RD) in patients with head and neck cancer undergoing proton radiotherapy, with the goal of achieving superior predictive performance compared to traditional models.
    METHODS: Data from 57 head and neck cancer patients treated with intensity-modulated proton therapy at Kaohsiung Chang Gung Memorial Hospital were analyzed. The study incorporated 11 clinical and 9 dosimetric parameters. Pearson\'s correlation was used to eliminate highly correlated variables, followed by feature selection via LASSO to focus on potential RD predictors. Model training involved traditional logistic regression (LR) and advanced ensemble methods such as Random Forest and XGBoost, which were optimized through hyperparameter tuning.
    RESULTS: Feature selection identified six key predictors, including smoking history and specific dosimetric parameters. Ensemble machine learning models, particularly XGBoost, demonstrated superior performance, achieving the highest AUC of 0.890. Feature importance was assessed using SHAP (SHapley Additive exPlanations) values, which underscored the relevance of various clinical and dosimetric factors in predicting RD.
    CONCLUSIONS: The study confirms that EML methods, especially XGBoost with its boosting algorithm, provide superior predictive accuracy, enhanced feature selection, and improved data handling compared to traditional LR. While LR offers greater interpretability, the precision and broader applicability of EML make it more suitable for complex medical prediction tasks, such as predicting radiation dermatitis. Given these advantages, EML is highly recommended for further research and application in clinical settings.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    由于流域中发生的降雨和河流事件,水源和原水质量可能会恶化。对原水质量的影响通常在饮用水处理厂(DWTPs)中检测到,在流域发生这些事件后会有一段时间。DWTPs中的预警系统(EWS)需要高精度的模型,以预测原水水质参数的变化。集成机器学习(EML)技术最近已用于水质建模,以提高准确性并减少结果的差异。我们使用了三种基于决策树的EML模型(随机森林[RF],梯度增强[GB],和极限梯度提升[XGB])来预测DWTPs的两个关键参数,原水浊度和UV吸光度(UV254),使用降雨量和河流流量时间序列作为预测因子。当模拟原水浊度时,三个EML模型(rRF-Tu2=0.87,rGB-Tu2=0.80和rXGB-Tu2=0.81)显示出非常好的性能指标。对于原水UV254,三种型号(rRF-UV2=0.89,rGB-UV2=0.85和rXGB-UV2=0.88)再次显示出非常好的性能指标。这项研究的结果表明,EML方法可用于EWS,以预测原水质量参数的变化并增强DWTPs的决策。
    Source and raw water quality may deteriorate due to rainfall and river flow events that occur in watersheds. The effects on raw water quality are normally detected in drinking water treatment plants (DWTPs) with a time-lag after these events in the watersheds. Early warning systems (EWSs) in DWTPs require models with high accuracy in order to anticipate changes in raw water quality parameters. Ensemble machine learning (EML) techniques have recently been used for water quality modeling to improve accuracy and decrease variance in the outcomes. We used three decision-tree-based EML models (random forest [RF], gradient boosting [GB], and eXtreme Gradient Boosting [XGB]) to predict two critical parameters for DWTPs, raw water Turbidity and UV absorbance (UV254), using rainfall and river flow time series as predictors. When modeling raw water turbidity, the three EML models (rRF-Tu2=0.87, rGB-Tu2=0.80 and rXGB-Tu2=0.81) showed very good performance metrics. For raw water UV254, the three models (rRF-UV2=0.89, rGB-UV2=0.85 and rXGB-UV2=0.88) again showed very good performance metrics. Results from this study suggest that EML approaches could be used in EWSs to anticipate changes in the quality parameters of raw water and enhance decision-making in DWTPs.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    本研究提出了一种预测植物非生物胁迫响应microRNAs的智能模型。MicroRNAs(miRNA)是调节基因应激的短RNA分子。实验方法既昂贵又费时,与计算机预测相比。解决这个差距,该研究旨在开发一种有效的植物胁迫反应预测计算模型。在本研究中获得了miRNA和Pre-miRNA数据集的两个基准数据集。四种合奏方法,如装袋,升压,堆叠,和混合已经被采用。分类器,如随机森林(RF),额外的树木(ET),AdaBoost(ADB),轻型梯度增压机(LGBM),和支持向量机(SVM)。堆叠和混合使用所有状态分类器作为基础学习者,并使用逻辑回归(LR)作为元分类器。总共使用了四种类型的测试,包括独立的集合,自我一致性,5和10倍的交叉验证,还有Jackknife.这项研究利用了准确性评分、特异性,灵敏度,马修相关系数(MCC),AUC。我们提出的方法在基于独立集测试的两个数据集中都优于现有的最新研究。基于SVM的方法对miRNA数据集的准确度得分为0.659,这是比以前的研究更好。与现有的基准研究相比,ET分类器已经超过了Pre-miRNA数据集的准确性,取得了令人印象深刻的0.67分。该方法可用于未来植物非生物胁迫的预测研究。
    This study proposed an intelligent model for predicting abiotic stress-responsive microRNAs in plants. MicroRNAs (miRNAs) are short RNA molecules regulates the stress in genes. Experimental methods are costly and time-consuming, as compare to in-silico prediction. Addressing this gap, the study seeks to develop an efficient computational model for plant stress response prediction. The two benchmark datasets for MiRNA and Pre-MiRNA dataset have been acquired in this study. Four ensemble approaches such as bagging, boosting, stacking, and blending have been employed. Classifiers such as Random Forest (RF), Extra Trees (ET), Ada Boost (ADB), Light Gradient Boosting Machine (LGBM), and Support Vector Machine (SVM). Stacking and Blending employed all stated classifiers as base learners and Logistic Regression (LR) as Meta Classifier. There have been a total of four types of testing used, including independent set, self-consistency, cross-validation with 5 and 10 folds, and jackknife. This study has utilized evaluation metrics such as accuracy score, specificity, sensitivity, Mathew\'s correlation coefficient (MCC), and AUC. Our proposed methodology has outperformed existing state of the art study in both datasets based on independent set testing. The SVM-based approach has exhibited accuracy score of 0.659 for the MiRNA dataset, which is better than the previous study. The ET classifier has surpassed the accuracy of Pre-MiRNA dataset as compared to the existing benchmark study, achieving an impressive score of 0.67. The proposed method can be used in future research to predict abiotic stresses in plants.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    该研究的目的是基于机器学习算法和气候变化情景来估计未来的地下水潜力区。十四个参数(即,曲率,排水密度,斜坡,粗糙度,降雨,温度,相对湿度,线条密度,土地利用和土地覆盖,一般土壤类型,地质学,地貌学,地形位置指数(TPI),地形湿度指数(TWI))用于开发机器学习算法。三种机器学习算法(即,人工神经网络(ANN),逻辑模型树(LMT),和逻辑回归(LR))用于确定地下水潜力区。基于ROC曲线选择最佳拟合模型。2.5、4.5、6.0和8.5气候情景的代表性浓度途径(RCP)用于模拟未来的气候变化。最后,根据最佳机器学习模型和未来RCP模型,确定了2025年、2030年、2035年和2040年的未来地下水潜力区。根据调查结果,ANN显示出比其他两个模型更好的准确性(AUC:0.875)。ANN模型预测,23.10%的土地处于非常高的地下水潜力区,而33.50%的人在地下水潜力极高的地区。该研究使用ANN模型预测了2025年,2030年,2035年和2040年不同气候变化情景(RCP2.6,RCP4.5,RCP6和RCP8.5)下的降水值,并显示了每种情景的空间分布图。最后,为未来的地下水潜力区生成了16种情景。政府官员可以利用研究结果为国家一级的水资源管理和规划提供基于证据的选择。
    The aim of the study was to estimate future groundwater potential zones based on machine learning algorithms and climate change scenarios. Fourteen parameters (i.e., curvature, drainage density, slope, roughness, rainfall, temperature, relative humidity, lineament density, land use and land cover, general soil types, geology, geomorphology, topographic position index (TPI), topographic wetness index (TWI)) were used in developing machine learning algorithms. Three machine learning algorithms (i.e., artificial neural network (ANN), logistic model tree (LMT), and logistic regression (LR)) were applied to identify groundwater potential zones. The best-fit model was selected based on the ROC curve. Representative concentration pathways (RCP) of 2.5, 4.5, 6.0, and 8.5 climate scenarios of precipitation were used for modeling future climate change. Finally, future groundwater potential zones were identified for 2025, 2030, 2035, and 2040 based on the best machine learning model and future RCP models. According to findings, ANN shows better accuracy than the other two models (AUC: 0.875). The ANN model predicted that 23.10 percent of the land was in very high groundwater potential zones, whereas 33.50 percent was in extremely high groundwater potential zones. The study forecasts precipitation values under different climate change scenarios (RCP2.6, RCP4.5, RCP6, and RCP8.5) for 2025, 2030, 2035, and 2040 using an ANN model and shows spatial distribution maps for each scenario. Finally, sixteen scenarios were generated for future groundwater potential zones. Government officials may utilize the study\'s results to inform evidence-based choices on water management and planning at the national level.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    我们的目标是(1)采用集成的机器学习算法,利用真实世界的临床数据来预测90天的预后,包括透析依赖和死亡率,首次住院透析后,(2)确定与总体结局相关的重要因素.
    2008年1月至2020年12月,我们从台北医科大学临床研究数据库(TMUCRD)的数据集中确定了需要透析的急性肾损伤(AKI-D)的住院患者。提取的数据包括人口统计,合并症,药物,和实验室参数。集成机器学习模型是通过GoogleCloudPlatform利用现实世界的临床数据开发的。
    该研究分析了透析依赖模块中的1080名患者,其中616人在90天后接受了常规透析。我们的合奏模型,由25个前馈神经网络模型组成,以0.846的Auroc展示了最佳性能。我们确定了基线肌酐值,在初次透析前至少90天评估,作为最关键的因素。我们选择了2358名患者,其中984人在90天后死亡,用于生存模块。合奏模型,包括15个前馈神经网络模型和10个梯度提升决策树模型,获得了优异的性能,AUROC为0.865。透析前肌酐值,在初次透析前90天内进行测试,被确定为最重要的因素。
    集成机器学习模型在预测AKI-D结果方面优于逻辑回归模型,与现有文献相比。我们的研究,其中包括来自三家不同医院的大量样本,支持首次住院透析前检测的肌酐值对确定总体预后的意义.医疗保健提供者可以受益于利用我们经过验证的预测模型来改善临床决策并增强高危人群的患者护理。
    UNASSIGNED: Our objectives were to (1) employ ensemble machine learning algorithms utilizing real-world clinical data to predict 90-day prognosis, including dialysis dependence and mortality, following the first hospitalized dialysis and (2) identify the significant factors associated with overall outcomes.
    UNASSIGNED: We identified hospitalized patients with Acute kidney injury requiring dialysis (AKI-D) from a dataset of the Taipei Medical University Clinical Research Database (TMUCRD) from January 2008 to December 2020. The extracted data comprise demographics, comorbidities, medications, and laboratory parameters. Ensemble machine learning models were developed utilizing real-world clinical data through the Google Cloud Platform.
    UNASSIGNED: The Study Analyzed 1080 Patients in the Dialysis-Dependent Module, Out of Which 616 Received Regular Dialysis After 90 Days. Our Ensemble Model, Consisting of 25 Feedforward Neural Network Models, Demonstrated the Best Performance with an Auroc of 0.846. We Identified the Baseline Creatinine Value, Assessed at Least 90 Days Before the Initial Dialysis, as the Most Crucial Factor. We selected 2358 patients, 984 of whom were deceased after 90 days, for the survival module. The ensemble model, comprising 15 feedforward neural network models and 10 gradient-boosted decision tree models, achieved superior performance with an AUROC of 0.865. The pre-dialysis creatinine value, tested within 90 days prior to the initial dialysis, was identified as the most significant factor.
    UNASSIGNED: Ensemble machine learning models outperform logistic regression models in predicting outcomes of AKI-D, compared to existing literature. Our study, which includes a large sample size from three different hospitals, supports the significance of the creatinine value tested before the first hospitalized dialysis in determining overall prognosis. Healthcare providers could benefit from utilizing our validated prediction model to improve clinical decision-making and enhance patient care for the high-risk population.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    由于小的非编码核糖核酸(miRNA)序列之间的复杂关系,miRNA种类的分类,即人类,大猩猩,Rat,老鼠是具有挑战性的。以前的方法不可靠和准确。在这项研究中,我们提出了AtheroPoint的GeneAI3.0,一个强大的,小说,以及通用方法,用于从基于机器学习(EML)和卷积神经网络(CNN)的深度学习(EDL)框架中的集成范式中的每个miRNA序列中的嘌呤和嘧啶的固定模式中提取特征。GeneAI3.0利用了五个常规(熵,相异,能源,同质性,和对比),和三个当代(香农熵,赫斯特指数,分形维数)特征,从给定的miRNA序列生成复合特征集,然后将其传递到我们的ML和DL分类框架中。设计了由5个EML和6个EDL组成的11个新分类器,用于二进制/多类分类。它以9个独奏ML(SML)为基准,6独奏DL(SDL),12个混合DL(HDL)模型,共设计了11+27=38个模型。使用可解释AI(XAI)以及可靠性/统计检验来制定和验证四个假设。使用24个DL分类器的准确度(ACC)/曲线下面积(AUC)的平均性能的顺序是:EDL>HDL>SDL。具有CNN层的EDL模型的平均性能优于没有CNN层的EDL模型的平均性能为0.73%/0.92%。EML模型的平均性能优于SML模型,ACC/AUC提高了6.24%/6.46%。EDL模型的性能明显优于EML模型,ACC/AUC平均增加7.09%/6.96%。GeneAI3.0工具产生了预期的XAI特征图,统计检验显示显著的p值。具有复合特征的集合模型是用于有效地对miRNA序列进行分类的高效和广义模型。
    Due to the intricate relationship between the small non-coding ribonucleic acid (miRNA) sequences, the classification of miRNA species, namely Human, Gorilla, Rat, and Mouse is challenging. Previous methods are not robust and accurate. In this study, we present AtheroPoint\'s GeneAI 3.0, a powerful, novel, and generalized method for extracting features from the fixed patterns of purines and pyrimidines in each miRNA sequence in ensemble paradigms in machine learning (EML) and convolutional neural network (CNN)-based deep learning (EDL) frameworks. GeneAI 3.0 utilized five conventional (Entropy, Dissimilarity, Energy, Homogeneity, and Contrast), and three contemporary (Shannon entropy, Hurst exponent, Fractal dimension) features, to generate a composite feature set from given miRNA sequences which were then passed into our ML and DL classification framework. A set of 11 new classifiers was designed consisting of 5 EML and 6 EDL for binary/multiclass classification. It was benchmarked against 9 solo ML (SML), 6 solo DL (SDL), 12 hybrid DL (HDL) models, resulting in a total of 11 + 27 = 38 models were designed. Four hypotheses were formulated and validated using explainable AI (XAI) as well as reliability/statistical tests. The order of the mean performance using accuracy (ACC)/area-under-the-curve (AUC) of the 24 DL classifiers was: EDL > HDL > SDL. The mean performance of EDL models with CNN layers was superior to that without CNN layers by 0.73%/0.92%. Mean performance of EML models was superior to SML models with improvements of ACC/AUC by 6.24%/6.46%. EDL models performed significantly better than EML models, with a mean increase in ACC/AUC of 7.09%/6.96%. The GeneAI 3.0 tool produced expected XAI feature plots, and the statistical tests showed significant p-values. Ensemble models with composite features are highly effective and generalized models for effectively classifying miRNA sequences.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    组合基础模型的结果以创建元模型是被称为堆叠的集成方法之一。在这项研究中,五个基础学习者的堆叠,包括极限梯度增强,随机森林,前馈神经网络,具有Lasso或ElasticNet正则化的广义线性模型,和支持向量机,用于研究Mn的空间变异,Cd,Pb,库姆-卡哈克含水层中的硝酸盐,伊朗。与个体学习者相比,堆叠策略具有较高的准确性和稳定性,因此被证明是现有机器学习方法的有效替代预测指标。相反,对于所有涉及的参数,没有任何表现最好的基础模型.例如,在镉的情况下,随机森林产生了最好的结果,调整后的R2和RMSE为0.108和0.014,与堆叠法获得的0.337和0.013相反。通过冗余分析(RDA),Mn和Cd显示出与磷酸盐的紧密联系。这证明了磷肥对农业操作的影响。为了分析地下水污染的原因,空间方法可以与多变量分析技术一起使用,比如RDA,帮助发现隐藏的污染源,否则这些污染源不会被发现。铅比硝酸盐有更大的健康风险,根据概率健康风险评估,发现儿童和成人的模拟值的34.4%和6.3%,分别,均高于HQ=1。此外,镉暴露风险影响了研究区域84%的儿童和47%的成年人。
    Combining the results of base models to create a meta-model is one of the ensemble approaches known as stacking. In this study, stacking of five base learners, including eXtreme gradient boosting, random forest, feed-forward neural networks, generalized linear models with Lasso or Elastic Net regularization, and support vector machines, was used to study the spatial variation of Mn, Cd, Pb, and nitrate in Qom-Kahak Aquifers, Iran. The stacking strategy proved to be an effective substitute predictor for existing machine learning approaches due to its high accuracy and stability when compared to individual learners. Contrarily, there was not any best-performing base model for all of the involved parameters. For instance, in the case of cadmium, random forest produced the best results, with adjusted R2 and RMSE of 0.108 and 0.014, as opposed to 0.337 and 0.013 obtained by the stacking method. The Mn and Cd showed a tight link with phosphate by the redundancy analysis (RDA). This demonstrates the effect of phosphate fertilizers on agricultural operations. In order to analyze the causes of groundwater pollution, spatial methodologies can be used with multivariate analytic techniques, such as RDA, to help uncover hidden sources of contamination that would otherwise go undetected. Lead has a larger health risk than nitrate, according to the probabilistic health risk assessment, which found that 34.4% and 6.3% of the simulated values for children and adults, respectively, were higher than HQ = 1. Furthermore, cadmium exposure risk affected 84% of children and 47% of adults in the research area.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    暂无摘要。
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:滴答是兽医保健的重要驱动力,引起刺激,有时会感染宿主。我们探索了2014年至2021年英国小动物兽医监测网络(SAVSNET)收集的来自猫和狗的超过700万份电子健康记录(EHR)的流行病学和地理参考数据,以评估影响蜱附着的因素在个体和时空水平。
    方法:通过文本挖掘鉴定了其中提到蜱的EHR;领域专家证实了那些在动物身上有蜱的人。滴答的存在/缺失记录覆盖了时空系列的气候,环境,人为和寄主分布因素产生时空回归矩阵。集成机器学习时空模型用于微调随机森林的超参数,梯度提升树和广义线性模型回归算法,然后将其用于生成最终的集成元学习器,以每月间隔预测整个GB的tick依恋概率,并以1km的空间分辨率在2014-2021年进行长期平均。在匹配的病例对照数据集上,通过条件逻辑回归评估了与蜱依恋相关的个体宿主因素。
    结果:总计,确定了11741次磋商,其中记录了滴答声。蜱记录的频率很低(0.16%EHR),暗示了对风险的低估。那就是说,猫和狗的蜱虫依恋几率增加与较年轻的成年人年龄有关,较长的外套长度,杂交品种和未分类品种。在猫中,雄性和整个动物记录蜱附着的几率显着增加。控制蜱附着时空风险的关键变量是气候(降水和温度)和植被类型(增强植被指数)。在整个GB中预测了滴答附着的合适区域,特别是在森林和草原地区,主要是在夏天,尤其是在六月。
    结论:我们的结果可以向业主和兽医提供有针对性的健康信息,识别那些动物,蜱虫附着风险较高的季节和区域,并允许更量身定制的预防措施,以减少蜱虫的负担,不适当的杀寄生虫治疗和伴侣动物和人类的潜在TBD。像SAVSNET这样的前哨网络代表了一种新颖的补充数据源,可以提高我们对伴侣动物的蜱虫依恋风险的理解,并作为人类风险的代表。
    BACKGROUND: Ticks are an important driver of veterinary health care, causing irritation and sometimes infection to their hosts. We explored epidemiological and geo-referenced data from > 7 million electronic health records (EHRs) from cats and dogs collected by the Small Animal Veterinary Surveillance Network (SAVSNET) in Great Britain (GB) between 2014 and 2021 to assess the factors affecting tick attachment in an individual and at a spatiotemporal level.
    METHODS: EHRs in which ticks were mentioned were identified by text mining; domain experts confirmed those with ticks on the animal. Tick presence/absence records were overlaid with a spatiotemporal series of climate, environment, anthropogenic and host distribution factors to produce a spatiotemporal regression matrix. An ensemble machine learning spatiotemporal model was used to fine-tune hyperparameters for Random Forest, Gradient-boosted Trees and Generalized Linear Model regression algorithms, which were then used to produce a final ensemble meta-learner to predict the probability of tick attachment across GB at a monthly interval and averaged long-term through 2014-2021 at a spatial resolution of 1 km. Individual host factors associated with tick attachment were also assessed by conditional logistic regression on a matched case-control dataset.
    RESULTS: In total, 11,741 consultations were identified in which a tick was recorded. The frequency of tick records was low (0.16% EHRs), suggesting an underestimation of risk. That said, increased odds for tick attachment in cats and dogs were associated with younger adult ages, longer coat length, crossbreeds and unclassified breeds. In cats, males and entire animals had significantly increased odds of recorded tick attachment. The key variables controlling the spatiotemporal risk for tick attachment were climatic (precipitation and temperature) and vegetation type (Enhanced Vegetation Index). Suitable areas for tick attachment were predicted across GB, especially in forests and grassland areas, mainly during summer, particularly in June.
    CONCLUSIONS: Our results can inform targeted health messages to owners and veterinary practitioners, identifying those animals, seasons and areas of higher risk for tick attachment and allowing for more tailored prophylaxis to reduce tick burden, inappropriate parasiticide treatment and potentially TBDs in companion animals and humans. Sentinel networks like SAVSNET represent a novel complementary data source to improve our understanding of tick attachment risk for companion animals and as a proxy of risk to humans.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    全面预测全球化肥消费量对于提供地球系统模拟等相关领域的关键数据集至关重要,化肥工业,和农业科学。然而,由于以前的研究没有充分考虑影响化肥消费的社会经济因素,化肥消费预测可能仍然存在巨大的不确定性。这里,在这项研究中,通过考虑共享社会经济途径(SSP)情景下社会经济因素的影响,提出了一种融合六种机器学习算法的方法来预测2020年至2100年的全球化肥消费量。结果表明,该方法提供了一个合理可靠的肥料消耗量预测框架,稳定地优于单一算法,具有相对较高的精度(Nash-Sutcliffe效率为0.93,Kling-Gupta效率为0.89,平均绝对百分比误差为10.97%)。我们发现,从2020年到2100年,全球氮肥和磷肥消费量可能会下降,而钾肥可能会逆转这一趋势。氮肥用量呈-1%的下降趋势,-17.13%,在2100年的SSP1、SSP2和SSP3方案下,分别为-3.43%。对于磷肥,这些是-0.68%,-9.68%,和-2.03%。相比之下,全球钾肥消费量可能增加18.03%,9.18%,和6.74%,分别。平均而言,N,P,钾肥消费量在中国最高,最低的是哈萨克斯坦。然而,氮肥消费的热点可能从中国转移到拉丁美洲和加勒比地区。这项研究强调了EML方法可能是预测未来肥料消耗的可靠方法。我们的预测产品不仅有助于更好地了解全球化肥消费趋势和动态,而且还为相关研究提供灵活准确的关键数据/参数。预计的全球化肥消费量数据集可在doi:https://doi.org/10.5281/zenodo.8195593(Gao等人。,2023年)。
    Comprehensively projecting global fertilizer consumption is essential for providing critical datasets in related fields such as earth system simulation, the fertilizer industry, and agricultural sciences. However, since previous studies have not fully considered the socioeconomic factors affecting fertilizer consumption, huge uncertainties may remain in fertilizer consumption projections. Here, an approach ensembled six machine learning algorithms was proposed in this study to predict global fertilizer consumption from 2020 to 2100 by considering the impact of socioeconomic factors under shared socioeconomic pathway (SSP) scenarios. It indicates that the proposed approach provides a rational and reliable framework for fertilizer consumption prediction that stably outperforms the single algorithms with relatively high accuracy (Nash-Sutcliffe efficiency of 0.93, Kling-Gupta efficiency of 0.89, and mean absolute percentage error of 10.97 %). We found that global N and P fertilizer consumption may decrease from 2020 to 2100, while K fertilizer may buck the trend. N fertilizer consumption showed a declining trend of -1 %, -17.13 %, and -3.43 % under the SSP1, SSP2, and SSP3 scenarios in 2100, respectively. For P fertilizer, those were -0.68 %, -9.68 %, and -2.03 %. In contrast, global K fertilizer consumption may increase by 18.03 %, 9.18 %, and 6.74 %, respectively. On average, N, P, and K fertilizer consumption is highest in China, and the lowest is in Kazakhstan. However, the hotspots of N fertilizer consumption may shift from China to Latin America and the Caribbean. This study highlighted the ensemble machine learning approach could potentially be a robust method for predicting future fertilizer consumption. Our prediction product will not only contribute to a better understanding of global fertilizer consumption trends and dynamics but also provide flexible and accurate key data/parameters for related research. The Projected Global Fertilizers Consumption Datasets are available at doi:https://doi.org/10.5281/zenodo.8195593 (Gao et al., 2023).
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号