Ensemble methods

集合方法
  • 文章类型: Journal Article
    这项研究采用机器学习技术来识别影响急诊部门(ED)住院时间(LOS)的因素,并得出透明的决策规则来补充结果。利用全面的数据集,与LOS分类的随机森林相比,梯度提升表现出略微优越的预测性能。值得注意的是,分诊敏锐度和Elixhauser合并症指数(ECI)等变量成为稳健的预测因子。提取的规则优化了LOS分层和资源分配,证明了数据驱动方法在提高ED工作流程效率和患者护理交付方面的关键作用。
    This study employs machine learning techniques to identify factors that influence extended Emergency Department (ED) length of stay (LOS) and derives transparent decision rules to complement the results. Leveraging a comprehensive dataset, Gradient Boosting exhibited marginally superior predictive performance compared to Random Forest for LOS classification. Notably, variables like triage acuity and the Elixhauser Comorbidity Index (ECI) emerged as robust predictors. The extracted rules optimize LOS stratification and resource allocation, demonstrating the critical role of data-driven methodologies in improving ED workflow efficiency and patient care delivery.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    肺炎是一个严重的健康问题,特别是对于弱势群体,需要早期和正确的分类以获得最佳治疗。这项研究解决了使用深度学习与机器学习分类器(DLxMLC)相结合的方法来从胸部X射线(CXR)图像中进行肺炎分类。我们部署了改进的VGG19、ResNet50V2和DenseNet121模型进行特征提取,其次是五个机器学习分类器(逻辑回归,支持向量机,决策树,随机森林,人工神经网络)。我们建议的方法显示出显著的准确性,当与随机森林或决策树分类器结合使用时,VGG19和DenseNet121模型获得99.98%的准确率。ResNet50V2使用随机森林实现了99.25%的准确率。这些结果说明了将深度学习模型与机器学习分类器合并在促进肺炎快速准确识别方面的优势。该研究强调了DLxMLC系统在提高诊断准确性和效率方面的潜力。通过将这些模型整合到临床实践中,医疗保健从业者可以大大提高患者护理和结果。未来的研究应该集中在完善这些模型,并探索它们在其他医学成像任务中的应用。以及包括可解释性方法,以更好地了解其决策过程并建立对其临床使用的信任。这项技术有望在医学成像和患者管理方面取得有希望的突破。
    Pneumonia is a severe health concern, particularly for vulnerable groups, needing early and correct classification for optimal treatment. This study addresses the use of deep learning combined with machine learning classifiers (DLxMLCs) for pneumonia classification from chest X-ray (CXR) images. We deployed modified VGG19, ResNet50V2, and DenseNet121 models for feature extraction, followed by five machine learning classifiers (logistic regression, support vector machine, decision tree, random forest, artificial neural network). The approach we suggested displayed remarkable accuracy, with VGG19 and DenseNet121 models obtaining 99.98% accuracy when combined with random forest or decision tree classifiers. ResNet50V2 achieved 99.25% accuracy with random forest. These results illustrate the advantages of merging deep learning models with machine learning classifiers in boosting the speedy and accurate identification of pneumonia. The study underlines the potential of DLxMLC systems in enhancing diagnostic accuracy and efficiency. By integrating these models into clinical practice, healthcare practitioners could greatly boost patient care and results. Future research should focus on refining these models and exploring their application to other medical imaging tasks, as well as including explainability methodologies to better understand their decision-making processes and build trust in their clinical use. This technique promises promising breakthroughs in medical imaging and patient management.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    咖啡育种计划传统上依赖于多年来观察植物特性,一个缓慢而昂贵的过程。基因组选择(GS)提供了一种基于DNA的替代方法,可以更快地选择优质品种。堆叠集成学习(SEL)结合了多个模型,以实现更准确的选择。本研究探讨了SEL在咖啡育种中的潜力,旨在提高重要性状[产量(YL)的预测精度,水果总数(NF),叶子矿工侵扰(LM),阿拉比卡咖啡中的尾孢子虫病发病率(Cer)]。我们分析了来自195个个体的21,211个单核苷酸多态性(SNP)标记的基因分型数据。为了全面评估模型性能,我们采用了交叉验证(CV)方案。基因组最佳线性无偏预测(GBLUP),多元自适应回归样条(MARS),分位数随机森林(QRF),随机森林(RF)是基础学习者。对于SEL框架内的元学习器,探索了各种选择,包括岭回归,射频,GBLUP,和单一平均。SEL方法能够预测阿拉伯咖啡重要性状的预测能力(PA)。与所有基础学习方法获得的PA相比,SEL表现出更高的PA。PA相对于GBLUP的增益为87.44%(从最佳堆叠模型获得的PA与GBLUP之间的比率),37.83%,199.82%,YL为14.59%,NF,LM和Cer,分别。总的来说,SEL为GS提出了一种有前途的方法。通过组合来自多个模型的预测,SEL可以潜在地增强复杂性状的GS的PA。
    Coffee Breeding programs have traditionally relied on observing plant characteristics over years, a slow and costly process. Genomic selection (GS) offers a DNA-based alternative for faster selection of superior cultivars. Stacking Ensemble Learning (SEL) combines multiple models for potentially even more accurate selection. This study explores SEL potential in coffee breeding, aiming to improve prediction accuracy for important traits [yield (YL), total number of the fruits (NF), leaf miner infestation (LM), and cercosporiosis incidence (Cer)] in Coffea Arabica. We analyzed data from 195 individuals genotyped for 21,211 single-nucleotide polymorphism (SNP) markers. To comprehensively assess model performance, we employed a cross-validation (CV) scheme. Genomic Best Linear Unbiased Prediction (GBLUP), multivariate adaptive regression splines (MARS), Quantile Random Forest (QRF), and Random Forest (RF) served as base learners. For the meta-learner within the SEL framework, various options were explored, including Ridge Regression, RF, GBLUP, and Single Average. The SEL method was able to predict the predictive ability (PA) of important traits in Coffea Arabica. SEL presented higher PA compared with those obtained for all base learner methods. The gains in PA in relation to GBLUP were 87.44% (the ratio between the PA obtained from best Stacking model and the GBLUP), 37.83%, 199.82%, and 14.59% for YL, NF, LM and Cer, respectively. Overall, SEL presents a promising approach for GS. By combining predictions from multiple models, SEL can potentially enhance the PA of GS for complex traits.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在今天的数字世界,随着人口的增长和污染的增加,不健康的生活习惯,比如不规律的饮食,垃圾食品消费,缺乏锻炼变得越来越普遍,导致各种健康问题,包括肾脏问题.这些因素直接影响人体肾脏健康。为了解决这个问题,我们需要依赖文本数据的早期检测技术。文本数据包含有关患者病史的详细信息,症状,测试结果,和治疗计划,全面了解肾脏健康,并及时进行干预。在这篇研究论文中,我们提出了一系列复杂的模型,如梯度提升分类器,轻型GBM,CatBoost,支持向量分类器(SVC),随机升压,Logistic回归,XGBoost,深度神经网络(DNN)改进的DNN改进的DNN表现出卓越的性能,准确率为90%,精度为89%,召回90%,F1评分为89.5%。通过将传统机器学习和深度神经网络相结合,这种综合方法可以识别数据集中的复杂模式。模型的数据驱动进程持续更新内部参数,保证适应不断变化的医疗环境的灵活性。这项研究在创建更详细和个性化的诊断肾结石的能力方面取得了显着进步,这可能会导致更好的临床结果和患者治疗。
    In today\'s digital world, with growing population and increasing pollution, unhealthy lifestyle habits like irregular eating, junk food consumption, and lack of exercise are becoming more common, leading to various health problems, including kidney issues. These factors directly affect human kidney health. To address this, we require early detection techniques that rely on text data. Text data contains detailed information about a patient\'s medical history, symptoms, test results, and treatment plans, giving a complete picture of kidney health and enabling timely intervention. In this research paper, we proposed a range of sophisticated models, such as Gradient Boosting Classifier, Light GBM, CatBoost, Support Vector Classifier (SVC), Random Boost, Logistic Regression, XGBoost, Deep Neural Network (DNN), and an Improved DNN. The Improved DNN demonstrated exceptional performance, with an accuracy of 90 %, precision of 89 %, recall of 90 %, and an F1-Score of 89.5 %. By combining traditional machine learning and deep neural networks, this integrative approach enables the identification of intricate patterns in datasets. The model\'s data-driven processes consistently update internal parameters, guaranteeing flexibility in response to evolving healthcare settings. This research represents a notable advancement in the progress of creating a more detailed and individualised ability to diagnose kidney stones, which could potentially lead to better clinical results and patient treatment.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    这项研究提出了一个小说,基于人工智能的三层方案,用于碳中和出行枢纽的分配。最初,它使用遗传算法确定了最佳地点,优化了旅行时间,实现了77,000,000的高健身值。第二,它涉及对精确位置的基于合奏的适用性分析,利用土地利用组合等因素,人口密度和就业密度,和邻近的停车场,骑自行车,和过境。每个因素都由其碳排放贡献加权,然后纳入适宜性分析模型,生成分数,以指导最终选择最合适的移动枢纽站点。最后一步采用交通分配模型来评估这些站点的环境和经济影响。这包括测量车辆行驶公里数的减少以及计算其他成本节省。本研究着眼于解决可持续发展目标11和9,利用先进技术来加强运输规划政策。Ensemble模型表现出很强的预测准确性,在训练中达到95%的R平方,在测试中达到53%。确定的枢纽站点减少了每日车辆行驶771,074公里,每年节省2.255亿美元。这种综合方法结合了以碳为重点的分析和评估后评价,从而为可持续的交通枢纽规划提供了一个全面的框架。
    This research proposes a novel, three-tier AI-based scheme for the allocation of carbon-neutral mobility hubs. Initially, it identified optimal sites using a genetic algorithm, which optimized travel times and achieved a high fitness value of 77,000,000. Second, it involved an Ensemble-based suitability analysis of the pinpointed locations, using factors such as land use mix, densities of population and employment, and proximities of parking, biking, and transit. Each factor is weighted by its carbon emissions contribution, then incorporated into a suitability analysis model, generating scores that guide the final selection of the most suitable mobility hub sites. The final step employs a traffic assignment model to evaluate these sites\' environmental and economic impacts. This includes measuring reductions in vehicle kilometers traveled and calculating other cost savings. Focusing on addressing sustainable development goals 11 and 9, this study leverages advanced techniques to enhance transportation planning policies. The Ensemble model demonstrated strong predictive accuracy, achieving an R-squared of 95% in training and 53% in testing. The identified hubs\' sites reduced daily vehicle travel by 771,074 km, leading to annual savings of 225.5 million USD. This comprehensive approach integrates carbon-focused analyses and post-assessment evaluations, thereby offering a comprehensive framework for sustainable mobility hub planning.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    阻塞性睡眠呼吸暂停/低通气综合征(OSAHS)是一种与严重的心血管和神经心理学后果有关的疾病,以睡眠期间部分或完全上气道阻塞反复发作为特征,导致通风受损,低氧血症,和微觉醒。多导睡眠图(PSG)是确认OSAHS的黄金标准,然而它的持续时间延长,高成本,和有限的可用性带来了重大挑战。在本文中,我们采用了一系列机器学习技术,包括神经网络,决策树,随机森林,和额外的树木,OSAHS诊断。这种方法旨在实现不仅更易于访问而且更有效的诊断过程。这项研究中使用的数据集包括2014年至2016年在哥伦比亚一家专门的睡眠医疗中心评估的601名成年人的记录。这项研究强调了集成方法的有效性,特别是随机森林和额外的树木,接收器工作特性(ROC)曲线下面积分别为89.2%和89.6%,分别。此外,已经设计了一个网络应用程序,整合最优模型,授权合格的医生通过患者注册做出明智的决定,18个变量的输入,以及利用随机森林模型进行OSAHS筛查。
    Obstructive sleep apnea/hypopnea syndrome (OSAHS) is a condition linked to severe cardiovascular and neuropsychological consequences, characterized by recurrent episodes of partial or complete upper airway obstruction during sleep, leading to compromised ventilation, hypoxemia, and micro-arousals. Polysomnography (PSG) serves as the gold standard for confirming OSAHS, yet its extended duration, high cost, and limited availability pose significant challenges. In this paper, we employ a range of machine learning techniques, including Neural Networks, Decision Trees, Random Forests, and Extra Trees, for OSAHS diagnosis. This approach aims to achieve a diagnostic process that is not only more accessible but also more efficient. The dataset utilized in this study consists of records from 601 adults assessed between 2014 and 2016 at a specialized sleep medical center in Colombia. This research underscores the efficacy of ensemble methods, specifically Random Forests and Extra Trees, achieving an area under the Receiver Operating Characteristic (ROC) curve of 89.2% and 89.6%, respectively. Additionally, a web application has been devised, integrating the optimal model, empowering qualified medical practitioners to make informed decisions through patient registration, an input of 18 variables, and the utilization of the Random Forests model for OSAHS screening.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    预测在可再生能源领域非常重要,因为它使我们能够知道可以生产的能源数量,因此,对能源进行有效的管理。然而,确定哪个预测系统更合适是非常复杂的,因为每个能源基础设施都不同。这项工作研究了在使用集成方法对不同位置进行预测时某些变量的影响。特别是,该提案分析了以下方面的影响:太阳能电池板系统采样频率的变化,神经网络体系结构的类型和每个模型的集成方法块的数量的影响。在多个地点进行全面实验后,我们的研究已经确定了最有效的太阳能预测模型,适合每个能源基础设施的特定条件。结果为选择最佳系统以进行准确有效的能源预测提供了决定性的框架。关键是使用短的时间间隔,这与预测模型的类型及其集成方法无关。
    Forecasting is of great importance in the field of renewable energies because it allows us to know the quantity of energy that can be produced, and thus, to have an efficient management of energy sources. However, determining which prediction system is more adequate is very complex, as each energy infrastructure is different. This work studies the influence of some variables when making predictions using ensemble methods for different locations. In particular, the proposal analyzes the influence of the aspects: the variation of the sampling frequency of solar panel systems, the influence of the type of neural network architecture and the number of ensemble method blocks for each model. Following comprehensive experimentation across multiple locations, our study has identified the most effective solar energy prediction model tailored to the specific conditions of each energy infrastructure. The results offer a decisive framework for selecting the optimal system for accurate and efficient energy forecasting. The key point is the use of short time intervals, which is independent of type of prediction model and of their ensemble method.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    量子技术的最新发展为机器学习算法提供了新的机会,以帮助医疗保健行业诊断复杂的健康疾病。比如心脏病。在这项工作中,我们总结了QuEML在心脏病预测中的有效性。为了评估QuEML与传统机器学习算法的性能,使用Kaggle心脏病数据集,该数据集包含1190个样本,其中53%的样本被标记为阳性样本,其余47%的样本被标记为阴性样本.QuEML的性能是根据准确性进行评估的,精度,召回,特异性,F1得分,和传统机器学习算法的训练时间。从实验结果来看,据观察,提出的量子方法预测约50.03%的阳性样本为阳性,平均44.65%的阴性样本被预测为阴性,而传统的机器学习方法可以预测约49.78%的阳性样本为阳性,44.31%的阴性样本为阴性。此外,测量QuEML的计算复杂度,其训练平均消耗670µs,而传统的机器学习算法训练平均消耗862.5µs。因此,QuEL被发现是一种有前途的心脏病预测方法,其准确率比传统机器学习方法高0.6%,训练时间快192.5µs。
    The recent developments in quantum technology have opened up new opportunities for machine learning algorithms to assist the healthcare industry in diagnosing complex health disorders, such as heart disease. In this work, we summarize the effectiveness of QuEML in heart disease prediction. To evaluate the performance of QuEML against traditional machine learning algorithms, the Kaggle heart disease dataset was used which contains 1190 samples out of which 53% of samples are labeled as positive samples and rest 47% samples are labeled as negative samples. The performance of QuEML was evaluated in terms of accuracy, precision, recall, specificity, F1 score, and training time against traditional machine learning algorithms. From the experimental results, it has been observed that proposed quantum approaches predicted around 50.03% of positive samples as positive and an average of 44.65% of negative samples are predicted as negative whereas traditional machine learning approaches could predict around 49.78% of positive samples as positive and 44.31% of negative samples as negative. Furthermore, the computational complexity of QuEML was measured which consumed average of 670 µs for its training whereas traditional machine learning algorithms could consume an average 862.5 µs for training. Hence, QuEL was found to be a promising approach in heart disease prediction with an accuracy rate of 0.6% higher and training time of 192.5 µs faster than that of traditional machine learning approaches.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    动力学过程模型在科学和工程中有着广泛的应用,包括大气,生理和技术化学,反应堆设计,或流程优化。这些模型依赖于许多动力学参数,如反应速率,扩散系数或分配系数。通过实验确定这些特性可能具有挑战性,特别是对于多相系统,研究人员经常面临着直观地选择实验条件以获得有洞察力的结果的任务。我们开发了一种集成计算模型的数值罗盘(NC)方法,全局优化,合奏方法,和机器学习来识别具有最大可能约束模型参数的实验条件。该方法基于与实验数据一致的解决方案集合中模型输出方差的量化。对于描述油酸气溶胶的异质臭氧分解的多层模型的参数,证明了NC方法的实用性。我们展示了如何使用多相化学反应系统的神经网络代理模型来加速NC的应用,以对实验条件进行全面的映射和分析。NC还可以应用于定量结构-活性关系(QSAR)模型的不确定性量化。我们表明,用于扩展训练数据的分子的不确定性与QSAR模型误差的降低相关。该代码作为Julia包KineticCompass公开可用。
    Kinetic process models are widely applied in science and engineering, including atmospheric, physiological and technical chemistry, reactor design, or process optimization. These models rely on numerous kinetic parameters such as reaction rate, diffusion or partitioning coefficients. Determining these properties by experiments can be challenging, especially for multiphase systems, and researchers often face the task of intuitively selecting experimental conditions to obtain insightful results. We developed a numerical compass (NC) method that integrates computational models, global optimization, ensemble methods, and machine learning to identify experimental conditions with the greatest potential to constrain model parameters. The approach is based on the quantification of model output variance in an ensemble of solutions that agree with experimental data. The utility of the NC method is demonstrated for the parameters of a multi-layer model describing the heterogeneous ozonolysis of oleic acid aerosols. We show how neural network surrogate models of the multiphase chemical reaction system can be used to accelerate the application of the NC for a comprehensive mapping and analysis of experimental conditions. The NC can also be applied for uncertainty quantification of quantitative structure-activity relationship (QSAR) models. We show that the uncertainty calculated for molecules that are used to extend training data correlates with the reduction of QSAR model error. The code is openly available as the Julia package KineticCompass.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    人类水痘病例的全球发病率不断上升,因此需要及时准确的识别以进行有效的疾病控制。以前的研究主要是研究传统的集成检测方法,我们通过利用基于元启发式的集成框架引入了一种新颖的方法。在这项研究中,我们提出了一个创新的CGO-Ensemble框架,旨在提高检测患者中水痘感染的准确性。最初,我们采用了五个迁移学习基础模型,它们集成了特征集成层和残差块。这些组件在从皮肤图像中捕获重要特征中起着至关重要的作用,从而增强模型的功效。下一步,我们采用加权平均方案来合并不同模型产生的预测.为了实现集成过程中每个基础模型的权重优化分配,我们利用混沌博弈优化(CGO)算法。这种战略权重分配大大提高了分类结果,超越随机分配权重的性能。与使用单个模型相比,实施这种方法可以显着提高预测准确性。我们通过在两个广泛认可的基准数据集上进行的综合实验来评估我们提出的方法的有效性:Mpox皮肤病变数据集(MSLD)和Mpox皮肤图像数据集(MSID)。为了深入了解基础模型的决策过程,我们进行了梯度类激活映射(Grad-CAM)分析。实验结果展示了CGO合奏的出色性能,在MSLD上实现100%的令人印象深刻的精度,在MSID上实现94.16%的精度。我们的方法显著优于其他最先进的优化算法,传统的合奏方法,以及在这些数据集上进行Mpox检测的背景下的现有技术。这些发现强调了CGO-Ensemble在准确识别水痘病例方面的有效性和优越性,突出了其在疾病检测和分类方面的潜力。
    The rising global incidence of human Mpox cases necessitates prompt and accurate identification for effective disease control. Previous studies have predominantly delved into traditional ensemble methods for detection, we introduce a novel approach by leveraging a metaheuristic-based ensemble framework. In this research, we present an innovative CGO-Ensemble framework designed to elevate the accuracy of detecting Mpox infection in patients. Initially, we employ five transfer learning base models that integrate feature integration layers and residual blocks. These components play a crucial role in capturing significant features from the skin images, thereby enhancing the models\' efficacy. In the next step, we employ a weighted averaging scheme to consolidate predictions generated by distinct models. To achieve the optimal allocation of weights for each base model in the ensemble process, we leverage the Chaos Game Optimization (CGO) algorithm. This strategic weight assignment enhances classification outcomes considerably, surpassing the performance of randomly assigned weights. Implementing this approach yields notably enhanced prediction accuracy compared to using individual models. We evaluate the effectiveness of our proposed approach through comprehensive experiments conducted on two widely recognized benchmark datasets: the Mpox Skin Lesion Dataset (MSLD) and the Mpox Skin Image Dataset (MSID). To gain insights into the decision-making process of the base models, we have performed Gradient Class Activation Mapping (Grad-CAM) analysis. The experimental results showcase the outstanding performance of the CGO-ensemble, achieving an impressive accuracy of 100% on MSLD and 94.16% on MSID. Our approach significantly outperforms other state-of-the-art optimization algorithms, traditional ensemble methods, and existing techniques in the context of Mpox detection on these datasets. These findings underscore the effectiveness and superiority of the CGO-Ensemble in accurately identifying Mpox cases, highlighting its potential in disease detection and classification.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号