Stacking

堆叠
  • 文章类型: Journal Article
    心脏病仍然是一个复杂而严重的健康问题,需要准确和及时的检测方法。
    在这项研究中,我们提出了一种先进的机器学习系统,旨在有效和精确地诊断心脏病。我们的方法集成了随机森林和AdaBoost分类器的强大功能,以及结合数据预处理技术,如标准缩放和递归特征消除(RFE)进行特征选择。通过利用堆叠的集成学习技术,我们通过结合多个分类器的优势来增强模型的预测性能。
    评估指标结果证明了优越的准确性,并在准确性方面获得了更高的性能,99.25%。与基线模型相比,我们提出的系统的有效性。
    此外,在支持物联网的医疗保健系统中使用该系统显示出改善心脏病诊断并最终提高患者预后的潜力。
    UNASSIGNED: Heart disease remains a complex and critical health issue, necessitating accurate and timely detection methods.
    UNASSIGNED: In this research, we present an advanced machine learning system designed for efficient and precise diagnosis of cardiac disease. Our approach integrates the power of Random Forest and Ada Boost classifiers, along with incorporating data pre-processing techniques such as standard scaling and Recursive Feature Elimination (RFE) for feature selection. By leveraging the ensemble learning technique of stacking, we enhance the model\'s predictive performance by combining the strengths of multiple classifiers.
    UNASSIGNED: The evaluation metrics results demonstrate the superior accuracy and obtained the higher performance in terms of accuracy, 99.25%. The effectiveness of our proposed system compared to baseline models.
    UNASSIGNED: Furthermore, the utilization of this system within IoT-enabled healthcare systems shows promising potential for improving heart disease diagnosis and ultimately enhancing patient outcomes.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:肺癌是全球第二常见的癌症,每年有超过200万例新病例。早期识别将使医疗保健从业者更有效地处理它。计算机辅助检测系统的进步极大地影响了人类疾病的临床分析和决策。为此,机器学习和深度学习技术正在成功应用。由于几个优点,迁移学习已经成为基于图像数据的疾病检测的热点。
    方法:在这项工作中,我们通过堆叠三种不同的迁移学习模型来建立一种新颖的迁移学习模型(VER-Net),以使用肺部CT扫描图像检测肺癌。训练该模型以将CT扫描图像与四个肺癌类别映射。各种措施,如图像预处理,数据增强,和超参数调整,是为了提高VER-Net的功效。使用多分类胸部CT图像对所有模型进行训练和评估。
    结果:实验结果证实,与其他八种迁移学习模型相比,VER-Net的表现优于其他八种迁移学习模型。VER-Net得分91%,92%,91%,和91.3%时,测试的准确性,精度,召回,和F1得分,分别。与最先进的相比,VER-Net具有更好的准确性。
    结论:VER-Net不仅可有效用于肺癌检测,而且还可用于CT扫描图像可用的其他疾病。
    BACKGROUND: Lung cancer is the second most common cancer worldwide, with over two million new cases per year. Early identification would allow healthcare practitioners to handle it more effectively. The advancement of computer-aided detection systems significantly impacted clinical analysis and decision-making on human disease. Towards this, machine learning and deep learning techniques are successfully being applied. Due to several advantages, transfer learning has become popular for disease detection based on image data.
    METHODS: In this work, we build a novel transfer learning model (VER-Net) by stacking three different transfer learning models to detect lung cancer using lung CT scan images. The model is trained to map the CT scan images with four lung cancer classes. Various measures, such as image preprocessing, data augmentation, and hyperparameter tuning, are taken to improve the efficacy of VER-Net. All the models are trained and evaluated using multiclass classifications chest CT images.
    RESULTS: The experimental results confirm that VER-Net outperformed the other eight transfer learning models compared with. VER-Net scored 91%, 92%, 91%, and 91.3% when tested for accuracy, precision, recall, and F1-score, respectively. Compared to the state-of-the-art, VER-Net has better accuracy.
    CONCLUSIONS: VER-Net is not only effectively used for lung cancer detection but may also be useful for other diseases for which CT scan images are available.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    最近,π-π堆叠的反芳族π系统受到了相当大的关注,因为它们由于大量的分子间轨道相互作用而表现出堆叠环芳香性。这里,我们报道了三种抗芳族去甲吡咯二聚体,它们通过手性自分选自组装形成超分子结构。具有3,5-二叔丁基苯基的2,2'-连接的降吡咯二聚体通过同手性自分选在固态和溶液状态下形成π堆叠的二聚体。其在溶液中的缔合常数在20℃时为(3.6±1.7)×105M-1。在固态下,具有3,5-二叔丁基苯基和苯基的3,3'-连接的去甲吡咯二聚体通过杂手性和同手性自分选提供大环和螺旋超分子组装,分别。值得注意的是,取代基的细微修饰导致聚集体结构和手性自分选模式的完全改变。本发现表明,基于反芳族π系统之间的有吸引力的相互作用,反芳族单体单元中的结构操纵导致各种超分子组装体的形成。
    Recently, π-π stacked antiaromatic π-systems have received considerable attention because they can exhibit stacked-ring aromaticity due to substantial intermolecular orbital interactions. Here, we report three antiaromatic norcorrole dimers that self-assemble to form supramolecular architectures through chiral self-sorting. A 2,2\'-linked norcorrole dimer with 3,5-di-tert-butylphenyl groups forms a π-stacked dimer both in solid and solution states via homochiral self-sorting. Its association constant in solution is (3.6±1.7)×105 M-1 at 20 °C. In the solid state, 3,3\'-linked norcorrole dimers with 3,5-di-tert-butylphenyl and phenyl groups afford macrocyclic and helical supramolecular assemblies via heterochiral and homochiral self-sorting, respectively. Notably, the subtle modification in the substituent resulted in a complete change in the structure of the aggregates and the chiral self-sorting mode. The present findings demonstrate that structural manipulation in antiaromatic monomer units leads to the formation of various supramolecular assemblies on the basis of the attractive interactions between antiaromatic π-systems.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    小地区国家以下死亡率的估算和预测是研究健康不平等的重要计划工具。当数据嘈杂时,标准方法表现不佳,国家以下数据集的典型行为。因此,可靠的估计很难获得。我提出了一个贝叶斯分层模型框架,用于预测小型或国家以下级别的死亡率。通过结合人口学和流行病学的思想,经典的死亡率建模框架被扩展到包括一个额外的空间部分,捕获区域异质性。信息在邻近地区汇集,并随着时间和年龄的推移而平滑。为了使预测更加稳健,并解决模型选择的问题,使用离开未来验证来考虑贝叶斯版本的堆叠。我应用这种方法预测了巴伐利亚96个地区的死亡率,德国,按年龄和性别分列。围绕预测的不确定性是根据预测间隔提供的。使用后验预测检查,我证明了这些模型捕获了基本特征,并且适合预测手头的数据。根据保留的数据,我的预测优于缺乏区域成分的标准模型。
    Estimation and prediction of subnational mortality rates for small areas are essential planning tools for studying health inequalities. Standard methods do not perform well when data are noisy, a typical behavior of subnational datasets. Thus, reliable estimates are difficult to obtain. I present a Bayesian hierarchical model framework for prediction of mortality rates at a small or subnational level. By combining ideas from demography and epidemiology, the classical mortality modeling framework is extended to include an additional spatial component capturing regional heterogeneity. Information is pooled across neighboring regions and smoothed over time and age. To make predictions more robust and address the issue of model selection, a Bayesian version of stacking is considered using leave-future-out validation. I apply this method to forecast mortality rates for 96 regions in Bavaria, Germany, disaggregated by age and sex. Uncertainty surrounding the forecasts is provided in terms of prediction intervals. Using posterior predictive checks, I show that the models capture the essential features and are suitable to forecast the data at hand. On held-out data, my predictions outperform those of standard models lacking a regional component.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:感染性腹泻仍然是世界范围内的主要公共卫生问题。本研究使用堆叠集合建立了感染性腹泻发病率的预测模型,旨在实现更好的预测性能。
    方法:根据感染性腹泻病例的监测数据,2016-2021年广州市的相关症状和气象因素,我们使用人工神经网络(ANN)开发了四个基础预测模型,长短期记忆网络(LSTM)支持向量回归(SVR)和极端梯度提升回归树(XGBoost),然后使用堆叠进行整合以获得最终的预测模型。所有模型都用三个指标进行了评估:平均绝对百分比误差(MAPE),均方根误差(RMSE),和平均绝对误差(MAE)。
    结果:纳入症状监测数据和每周感染性腹泻病例数的基础模型能够实现较低的RMSE,MAEs,和MAPE比增加气象数据和每周感染性腹泻病例数的模型。LSTM在四个基础模型中具有最佳的预测性能,和它的RMSE,MAE,和MAPE分别为:84.85、57.50和15.92%,分别。堆叠组合模型的性能优于四个基础模型,谁的RMSE,MAE,MAPE分别为75.82、55.93和15.70%,分别。
    结论:纳入症状监测数据可以提高感染性腹泻预测模型的预测准确性,症状监测数据在增强模型性能方面比气象数据更有效。采用堆叠式组合多种预测模型能够缓解选择最优模型的困难,并且可以获得比基础模型性能更好的模型。
    BACKGROUND: Infectious diarrhea remains a major public health problem worldwide. This study used stacking ensemble to developed a predictive model for the incidence of infectious diarrhea, aiming to achieve better prediction performance.
    METHODS: Based on the surveillance data of infectious diarrhea cases, relevant symptoms and meteorological factors of Guangzhou from 2016 to 2021, we developed four base prediction models using artificial neural networks (ANN), Long Short-Term Memory networks (LSTM), support vector regression (SVR) and extreme gradient boosting regression trees (XGBoost), which were then ensembled using stacking to obtain the final prediction model. All the models were evaluated with three metrics: mean absolute percentage error (MAPE), root mean square error (RMSE), and mean absolute error (MAE).
    RESULTS: Base models that incorporated symptom surveillance data and weekly number of infectious diarrhea cases were able to achieve lower RMSEs, MAEs, and MAPEs than models that added meteorological data and weekly number of infectious diarrhea cases. The LSTM had the best prediction performance among the four base models, and its RMSE, MAE, and MAPE were: 84.85, 57.50 and 15.92%, respectively. The stacking ensembled model outperformed the four base models, whose RMSE, MAE, and MAPE were 75.82, 55.93, and 15.70%, respectively.
    CONCLUSIONS: The incorporation of symptom surveillance data could improve the predictive accuracy of infectious diarrhea prediction models, and symptom surveillance data was more effective than meteorological data in enhancing model performance. Using stacking to combine multiple prediction models were able to alleviate the difficulty in selecting the optimal model, and could obtain a model with better performance than base models.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:甲基丙二酸血症(MMA)是一种常染色体隐性遗传障碍,估计患病率为1:50,000。一线临床诊断测试通常会返回许多假阳性[五个假阳性(FP):一个真阳性(TP)]。在这项工作中,我们的目标是完善一个可以最小化误报数量的分类模型,目前在MMA的上游诊断中存在未满足的需求。
    方法:我们开发了MMA的机器学习多变量筛选模型,可用作减少误报的二级工具。我们利用了基于质谱的特征,这些特征由来自新生儿患者干血样的11种氨基酸和31种肉碱组成,其次是额外的比率特征构造。特征选择策略(通过过滤器选择,递归特征消除,和学习的矢量量化)用于确定用于评估14个分类模型的性能的输入集,以识别用于集成模型开发的候选模型集。
    结果:我们的工作确定了探索代谢分析物的计算模型,以减少假阳性的数量而不损害灵敏度。最佳结果[接收器工作特征曲线下面积(AUROC)为97%,灵敏度92%,95%的特异性]是利用随机森林算法的集合获得的,C5.0,稀疏线性判别分析,和自动编码器深度神经网络堆叠的算法随机梯度提升作为监督者。该模型在95%的灵敏度下以6%的假阳性率(FPR)的筛选应用实现了良好的性能权衡,35%FPR,99%灵敏度,和39%的FPR在100%的灵敏度。
    结论:这项研究的分类结果和方法可供全球临床医生使用,改善儿科患者MMA的整体发现。改进的方法,当调整到100%精度时,可用于进一步告知MMA的诊断过程,并帮助减轻患者及其家人的负担。
    BACKGROUND: Methylmalonic acidemia (MMA) is a disorder of autosomal recessive inheritance, with an estimated prevalence of 1:50,000. First-tier clinical diagnostic tests often return many false positives [five false positive (FP): one true positive (TP)]. In this work, our goal was to refine a classification model that can minimize the number of false positives, currently an unmet need in the upstream diagnostics of MMA.
    METHODS: We developed machine learning multivariable screening models for MMA with utility as a secondary-tier tool for false positives reduction. We utilized mass spectrometry-based features consisting of 11 amino acids and 31 carnitines derived from dried blood samples of neonatal patients, followed by additional ratio feature construction. Feature selection strategies (selection by filter, recursive feature elimination, and learned vector quantization) were used to determine the input set for evaluating the performance of 14 classification models to identify a candidate model set for an ensemble model development.
    RESULTS: Our work identified computational models that explore metabolic analytes to reduce the number of false positives without compromising sensitivity. The best results [area under the receiver operating characteristic curve (AUROC) of 97%, sensitivity of 92%, and specificity of 95%] were obtained utilizing an ensemble of the algorithms random forest, C5.0, sparse linear discriminant analysis, and autoencoder deep neural network stacked with the algorithm stochastic gradient boosting as the supervisor. The model achieved a good performance trade-off for a screening application with 6% false-positive rate (FPR) at 95% sensitivity, 35% FPR at 99% sensitivity, and 39% FPR at 100% sensitivity.
    CONCLUSIONS: The classification results and approach of this research can be utilized by clinicians globally, to improve the overall discovery of MMA in pediatric patients. The improved method, when adjusted to 100% precision, can be used to further inform the diagnostic process journey of MMA and help reduce the burden for patients and their families.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    植物变应原性蛋白(PAP)具有在某些个体中诱导变态反应的潜力。虽然这些蛋白质对大多数人来说通常是无害的,它们可以在那些特别敏感的人中引发免疫反应。因此,筛选和优先考虑植物蛋白的过敏潜力对于诊断工具的开发是必不可少的,治疗性干预措施或药物治疗过敏反应。然而,基于实验方法研究植物蛋白的过敏潜力是昂贵且劳动密集型的。因此,我们开发了StackPAP,用于精确大规模识别PAP的三层堆叠集成框架。在StackPAP中,在第一层,我们对一组广泛的特征描述符进行了全面分析.随后,我们选择并融合了五个潜在的基于序列的特征描述符,包括两亲性假氨基酸组成,二肽与预期平均值的偏差,氨基酸组成,假氨基酸组成和二肽组成。此外,我们应用了一种有效的遗传算法(GA-SAR)来确定信息特征集。在第二层,12种强大的机器学习(ML)方法,结合所有信息功能集,用于构建基分类器池。最后,使用GA-SAR方法选择13个潜在的基本分类器并组合以开发最终的元分类器。我们的实验结果表明,StackPAP具有良好的预测性能,准确地说,马修的相关系数和AUC分别为0.984、0.969和0.993,根据独立测试数据集判断。总之,交叉验证和独立测试结果均表明,与几种基于ML的分类器相比,StackPAP的性能更优越。为了加快植物蛋白变应原性的鉴定,我们为StackPAP开发了一个用户友好的Web服务器(https://pmlabqsar。pythonanywhere.com/StackPAP)。我们预计StackPAP将是从大量植物蛋白中快速筛选PAP的有效且有用的工具。由RamaswamyH.Sarma沟通。
    Plant-allergenic proteins (PAPs) have the potential to induce allergic reactions in certain individuals. While these proteins are generally innocuous for the majority of people, they can elicit an immune response in those with particular sensitivities. Thus, screening and prioritizing the allergenic potential of plant proteins is indispensable for the development of diagnostic tools, therapeutic interventions or medications to treat allergic reactions. However, investigating the allergenic potential of plant proteins based on experimental methods is costly and labour-intensive. Therefore, we develop StackPAP, a three-layer stacking ensemble framework for accurate large-scale identification of PAPs. In StackPAP, at the first layer, we conducted a comprehensive analysis of an extensive set of feature descriptors. Subsequently, we selected and fused five potential sequence-based feature descriptors, including amphiphilic pseudo-amino acid composition, dipeptide deviation from expected mean, amino acid composition, pseudo amino acid composition and dipeptide composition. Additionally, we applied an efficient genetic algorithm (GA-SAR) to determine informative feature sets. In the second layer, 12 powerful machine learning (ML) methods, in combination with all the informative feature sets, were employed to construct a pool of base classifiers. Finally, 13 potential base classifiers were selected using the GA-SAR method and combined to develop the final meta-classifier. Our experimental results revealed the promising prediction performance of StackPAP, with an accuracy, Matthew\'s correlation coefficient and AUC of 0.984, 0.969 and 0.993, respectively, as judged by the independent test dataset. In conclusion, both cross-validation and independent test results indicated the superior performance of StackPAP compared with several ML-based classifiers. To accelerate the identification of the allergenicity of plant proteins, we developed a user-friendly web server for StackPAP (https://pmlabqsar.pythonanywhere.com/StackPAP). We anticipate that StackPAP will be an efficient and useful tool for rapidly screening PAPs from a vast number of plant proteins.Communicated by Ramaswamy H. Sarma.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    Janus基纳米管是新颖的,自组装纳米材料。他们最初的设计灵感来自于DNA碱基对,今天已经发展了各种各样的化学物质,将它们区分为与DNA折纸分离的新材料家族,碳纳米管,聚合物,和脂质。这篇综述文章涵盖了自组装的Janus基纳米管的主要例子,在水环境中由氢键和π-π堆积相互作用驱动。具体来说,自我互补的氢键将分子组织成有序的阵列,形成大环,而π-π相互作用堆叠这些结构以产生管状形式。这篇综述阐明了控制纳米管组装的分子相互作用,并增进了我们对水中纳米级自组装的理解。
    Janus base nanotubes are novel, self-assembled nanomaterials. Their original designs were inspired by DNA base pairs, and today a variety of chemistries has developed, distinguishing them as a new family of materials separate from DNA origami, carbon nanotubes, polymers, and lipids. This review article covers the principal examples of self-assembled Janus base nanotubes, which are driven by hydrogen-bond and π-π stacking interactions in aqueous environments. Specifically, self-complementary hydrogen bonds organize molecules into ordered arrays, forming macrocycles, while π-π interactions stack these structures to create tubular forms. This review elucidates the molecular interactions that govern the assembly of nanotubes and advances our understanding of nanoscale self-assembly in water.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    组合基础模型的结果以创建元模型是被称为堆叠的集成方法之一。在这项研究中,五个基础学习者的堆叠,包括极限梯度增强,随机森林,前馈神经网络,具有Lasso或ElasticNet正则化的广义线性模型,和支持向量机,用于研究Mn的空间变异,Cd,Pb,库姆-卡哈克含水层中的硝酸盐,伊朗。与个体学习者相比,堆叠策略具有较高的准确性和稳定性,因此被证明是现有机器学习方法的有效替代预测指标。相反,对于所有涉及的参数,没有任何表现最好的基础模型.例如,在镉的情况下,随机森林产生了最好的结果,调整后的R2和RMSE为0.108和0.014,与堆叠法获得的0.337和0.013相反。通过冗余分析(RDA),Mn和Cd显示出与磷酸盐的紧密联系。这证明了磷肥对农业操作的影响。为了分析地下水污染的原因,空间方法可以与多变量分析技术一起使用,比如RDA,帮助发现隐藏的污染源,否则这些污染源不会被发现。铅比硝酸盐有更大的健康风险,根据概率健康风险评估,发现儿童和成人的模拟值的34.4%和6.3%,分别,均高于HQ=1。此外,镉暴露风险影响了研究区域84%的儿童和47%的成年人。
    Combining the results of base models to create a meta-model is one of the ensemble approaches known as stacking. In this study, stacking of five base learners, including eXtreme gradient boosting, random forest, feed-forward neural networks, generalized linear models with Lasso or Elastic Net regularization, and support vector machines, was used to study the spatial variation of Mn, Cd, Pb, and nitrate in Qom-Kahak Aquifers, Iran. The stacking strategy proved to be an effective substitute predictor for existing machine learning approaches due to its high accuracy and stability when compared to individual learners. Contrarily, there was not any best-performing base model for all of the involved parameters. For instance, in the case of cadmium, random forest produced the best results, with adjusted R2 and RMSE of 0.108 and 0.014, as opposed to 0.337 and 0.013 obtained by the stacking method. The Mn and Cd showed a tight link with phosphate by the redundancy analysis (RDA). This demonstrates the effect of phosphate fertilizers on agricultural operations. In order to analyze the causes of groundwater pollution, spatial methodologies can be used with multivariate analytic techniques, such as RDA, to help uncover hidden sources of contamination that would otherwise go undetected. Lead has a larger health risk than nitrate, according to the probabilistic health risk assessment, which found that 34.4% and 6.3% of the simulated values for children and adults, respectively, were higher than HQ = 1. Furthermore, cadmium exposure risk affected 84% of children and 47% of adults in the research area.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    人机界面技术从根本上受到运动解码的灵巧性的制约。同步控制和比例控制可以大大提高智能假体的灵活性和灵巧性。在这项研究中,提出了一种利用集成学习解决角度解码问题的新模型。最终,设计了7种表面肌电图(sEMG)信号角度解码模型。使用功能任务期间记录的sEMG估计掌指关节(MCP)的五个角度的运动学。通过皮尔逊相关系数(CC)评估估计性能。在这项研究中,综合模型,结合了CatBoost和LightGBM,是这项任务的最佳模式,其平均CC值和RMSE分别为0.897和7.09。受试者数据集的所有测试场景的CC平均值和RMSE平均值优于高斯过程模型的结果,具有显著差异。此外,该研究提出了一个完整的管道,使用集成学习来构建一个高性能的角度解码系统,用于手部动作识别任务。该领域的研究人员或工程师可以通过此过程快速找到最适合角度解码的集成学习模型,与传统的深度学习模型相比,具有更少的参数和更少的训练数据需求。总之,提出的集成学习方法具有同时和比例控制(SPC)未来的手假肢的潜力。
    Human-machine interface technology is fundamentally constrained by the dexterity of motion decoding. Simultaneous and proportional control can greatly improve the flexibility and dexterity of smart prostheses. In this research, a new model using ensemble learning to solve the angle decoding problem is proposed. Ultimately, seven models for angle decoding from surface electromyography (sEMG) signals are designed. The kinematics of five angles of the metacarpophalangeal (MCP) joints are estimated using the sEMG recorded during functional tasks. The estimation performance was evaluated through the Pearson correlation coefficient (CC). In this research, the comprehensive model, which combines CatBoost and LightGBM, is the best model for this task, whose average CC value and RMSE are 0.897 and 7.09. The mean of the CC and the mean of the RMSE for all the test scenarios of the subjects\' dataset outperform the results of the Gaussian process model, with significant differences. Moreover, the research proposed a whole pipeline that uses ensemble learning to build a high-performance angle decoding system for the hand motion recognition task. Researchers or engineers in this field can quickly find the most suitable ensemble learning model for angle decoding through this process, with fewer parameters and fewer training data requirements than traditional deep learning models. In conclusion, the proposed ensemble learning approach has the potential for simultaneous and proportional control (SPC) of future hand prostheses.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号