Ensemble learning

合奏学习
  • 文章类型: Journal Article
    5-甲酰胞苷(f5C)是在摆动位点的mRNA和tRNA中发现的独特的转录后RNA修饰,在线粒体蛋白质合成中起关键作用,并可能促进翻译的调节。最近的研究揭示了f5C修饰可能驱动线粒体mRNA翻译以促进癌症转移。f5C位点的准确鉴定对于进一步揭示其分子功能和调控机制至关重要。但是目前没有可用的计算方法来预测它们的位置。在这项研究中,我们引入了一种创新的合奏方法,成功实现酿酒酵母f5C的计算识别。我们进行了一个全面的模型选择过程,涉及多种基本的机器学习和深度学习算法,如递归神经网络,卷积神经网络和基于Transformer的模型。最初只对序列信息进行训练,这些单个模型的AUROC范围为0.7104至0.7492.通过整合32个新的结构域衍生的基因组特征,单个模型的性能已显著提高到0.7309和0.8076之间的AUROC。为了进一步提高准确性和鲁棒性,然后,我们用不同的组合构建了这些单个模型的集合。我们的合奏模型获得的最佳性能达到了0.8391的AUROC。Shapley加性解释是为了解释基因组特征的重要贡献,提供对f5C在各种拓扑区域中的假定分布的见解,并可能为揭示它们在不同基因组环境中的功能相关性铺平道路。可以在以下位置访问免费访问的Web服务器,该服务器允许对用户上传的站点进行实时分析:www.rnamd.org/Resf5C-Pred.
    5-formylcytidine (f5C) is a unique post-transcriptional RNA modification found in mRNA and tRNA at the wobble site, playing a crucial role in mitochondrial protein synthesis and potentially contributing to the regulation of translation. Recent studies have unveiled that the f5C modifications may drive mitochondrial mRNA translation to power cancer metastasis. Accurate identification of f5C sites is essential for further unraveling their molecular functions and regulatory mechanisms, but there are currently no computational methods available for predicting their locations. In this study, we introduce an innovative ensemble approach, successfully enabling the computational recognition of Saccharomyces cerevisiae f5C. We conducted a comprehensive model selection process that involved multiple basic machine learning and deep learning algorithms such as recurrent neural networks, convolutional neural networks and Transformer-based models. Initially trained only on sequence information, these individual models achieved an AUROC ranging from 0.7104 to 0.7492. Through the integration of 32 novel domain-derived genomic features, the performance of individual models has significantly improved to an AUROC between 0.7309 and 0.8076. To further enhance accuracy and robustness, we then constructed the ensembles of these individual models with different combinations. The best performance attained by our ensemble models reached an AUROC of 0.8391. Shapley additive explanations were conducted to explain the significant contributions of genomic features, providing insights into the putative distribution of f5C across various topological regions and potentially paving the way for revealing their functional relevance within distinct genomic contexts. A freely accessible web server that allows real-time analysis of user-uploaded sites can be accessed at: www.rnamd.org/Resf5C-Pred.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:本研究旨在评估各种监督纵向学习方法的有效性,将传统的统计模型和机器学习算法与纵向数据进行预测比较。主要目标是基于产前超声测量,评估不同监督纵向学习方法对低出生体重(LBW)和极低出生体重(VLBW)的预测性能。此外,该研究试图提取可解释的风险特征,以预测疾病。
    方法:评估涉及将纵向模型的性能与传统机器学习方法进行基准测试。出生时LBW和VLBW的分类精度,以及使用产前超声测量对出生体重的预测准确性,被评估。
    结果:在本研究中我们调查的学习方法中,纵向机器学习方法,具体来说,混合效应随机森林(MERF),在预测出生体重和分类LBW/VLBW疾病状态方面表现最佳。
    结论:MERF结合了高级机器学习算法的强大功能,以适应观察到的数据中固有的个体内部依赖性,在预测出生体重和分类LBW/VLBW疾病状态方面提供令人满意的性能。该研究强调了合并先前的超声测量并考虑重复测量之间的相关性以进行准确预测的重要性。用于风险特征提取的可解释树算法被证明是可靠的,并且适用于其他学习算法。这些发现强调了纵向学习方法在改善出生体重预测方面的潜力,并强调了与现有文献一致的风险特征的相关性。
    BACKGROUND: This study aimed to assess the efficacy of various supervised longitudinal learning approaches, comparing traditional statistical models and machine learning algorithms for prediction with longitudinal data. The primary objectives were to evaluate the predictive performance of different supervised longitudinal learning methods for low birth weight (LBW) and very low birth weight (VLBW) based on prenatal ultrasound measurements. Additionally, the study sought to extract interpretable risk features for disease prediction.
    METHODS: The evaluation involved benchmarking the performance of longitudinal models against conventional machine learning methods. Classification accuracy for LBW and VLBW at birth, as well as prediction accuracy for birth weight using prenatal sonographic ultrasound measurements, were assessed.
    RESULTS: Among the learning approaches we investigated in this study, the longitudinal machine learning approach, specifically, the mixed effect random forest (MERF), delivered the overall best performance in predicting birthweights and classifying LBW/VLBW disease status.
    CONCLUSIONS: The MERF combined the power of advanced machine learning algorithms to accommodate the inherent within-individual dependence in the observed data, delivering satisfactory performance in predicting the birthweight and classifying LBW/VLBW disease status. The study emphasized the importance of incorporating previous ultrasound measurements and considering correlations between repeated measurements for accurate prediction. The interpretable trees algorithm used for risk feature extraction proved reliable and applicable to other learning algorithms. These findings underscored the potential of longitudinal learning methods in improving birth weight prediction and highlighted the relevance of consistent risk features in line with established literature.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    链接预测(LP)是一项识别潜在的任务,复杂网络中的缺失和虚假链接。蛋白质-蛋白质相互作用(PPI)网络对于理解疾病的潜在生物学机制很重要。许多复杂的网络已经使用LP方法构建;然而,关注疾病相关基因预测并使用各种评估标准评估这些基因的研究数量有限.该研究的主要目的是研究一种简单的集成方法在疾病相关基因预测中的作用。基于局部相似性指数(LSI)的疾病相关基因预测通过简单的集成决策方法进行整合,简单多数投票(SMV)在PPI网络上检测准确的疾病相关基因。人类PPI网络用于发现潜在的疾病相关基因,使用四个LSI进行基因预测。LSI发现了疾病相关基因之间的潜在联系,从OMIM胃部数据库获得,结直肠,乳房,前列腺癌和肺癌.基于LSI的疾病相关基因根据其LSI得分以降序排列,以检索前10、50和100个疾病相关基因。SMV整合四个基于LSIs的预测以获得基于前10、50和100个疾病相关基因的SMV。通过采用重叠分析分别评估了基于LSI和基于SMV的基因的性能,使用GeneCard疾病-基因关系数据集和基因本体论(GO)术语进行。GO术语用于通过LSI和SMV对所有癌症类型的推断基因列表的生物学评估。Adamic-Adar(AA),资源分配索引(RAI)和基于SMV的基因列表通常在两种重叠分析中对所有癌症都获得了良好的性能结果。SMV在乳腺癌数据上也表现出色。排名靠前的疾病相关基因的选择数量的增加也增强了SMV的表现结果。
    Link prediction (LP) is a task for the identification of potential, missing and spurious links in complex networks. Protein-protein interaction (PPI) networks are important for understanding the underlying biological mechanisms of diseases. Many complex networks have been constructed using LP methods; however, there are a limited number of studies that focus on disease-related gene predictions and evaluate these genes using various evaluation criteria. The main objective of the study is to investigate the effect of a simple ensemble method in disease related gene predictions. Local similarity indices (LSIs) based disease related gene predictions were integrated by a simple ensemble decision method, simple majority voting (SMV), on the PPI network to detect accurate disease related genes. Human PPI network was utilized to discover potential disease related genes using four LSIs for the gene prediction. LSIs discovered potential links between disease related genes, which were obtained from OMIM database for gastric, colorectal, breast, prostate and lung cancers. LSIs based disease related genes were ranked due to their LSI scores in descending order for retrieving the top 10, 50 and 100 disease related genes. SMV integrated four LSIs based predictions to obtain SMV based the top 10, 50 and 100 disease related genes. The performance of LSIs based and SMV based genes were evaluated separately by employing overlap analyses, which were performed with GeneCard disease-gene relation dataset and Gene Ontology (GO) terms. The GO-terms were used for biological assessment for the inferred gene lists by LSIs and SMV on all cancer types. Adamic-Adar (AA), Resource Allocation Index (RAI), and SMV based gene lists are generally achieved good performance results on all cancers in both overlap analyses. SMV also outperformed on breast cancer data. The increment in the selection of the number of the top ranked disease related genes also enhanced the performance results of SMV.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    木薯是许多非洲和亚洲国家最重要的碳水化合物人类食品。木薯叶部病害是影响生产的主要问题。通过深度学习模型和迁移学习模型的自动早期木薯叶病检测被用于不同方法的多类别分类。现有方法处理用于预测类的不平衡数据集。本研究工作开发了一种基于混合集成-深度转移模型的早期叶片病害检测方法。将数据增强应用于原始数据以平衡数据集。三种不同的新混合模型,即Ensemble(InceptionV3+DenseNet-BC-121-32+Xception),合奏(ResNet50V2+DenseNet-BC-121-32),开发了合奏(ResNet50V2+ResNet50)。所提出的模型显示了高性能的结果。使用基于自定义的卷积神经网络和预训练模型对所提出的模型进行了广泛的比较。88.83%和97.89%的最高精度是在基于集成的方法中获得的,结合了InceptionV3,Xception,DenseNet-BC-121-32分别为五类和两类分类。
    Cassava is a most important carbohydrate human food consumed in many African and Asian countries. Cassava leaf disease is the major issue which affects production. Automatic early cassava leaf disease detection through deep learning models and transfer learning models were used for multiclass classification with different approaches. Existing approaches deal with imbalanced dataset for predicting the classes. This research work develops an approach based on hybrid Ensemble - deep transfer model approach for early leaf disease detection. Data augmentation was applied to the raw data for balancing the dataset. Three distinct new hybrid models namely Ensemble(InceptionV3+DenseNet-BC-121-32 + Xception), Ensemble(ResNet50V2+DenseNet-BC-121-32), Ensemble(ResNet50V2+ResNet50) were developed. The proposed model shows high performance results. A broad comparison of the proposed model was performed with custom based Convolutional Neural Network and pre-trained models. Highest accuracy of 88.83% and 97.89% was obtained in ensemble based approach that combined InceptionV3, Xception, DenseNet-BC-121-32 for five class and two class classification respectively.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    基于三维卷积神经网络(3DCNN)和遗传算法(GA)的自适应可解释集成模型,即,3DCNN+EL+GA,建议区分患有阿尔茨海默病(AD)或轻度认知障碍(MCI)的受试者,并进一步确定以数据驱动方式显着有助于分类的区分性大脑区域。另外,在体素水平上的区分性脑子区域进一步位于这些已获得的大脑区域中,为CNN设计的基于梯度的归因方法。除了揭示有区别的大脑子区域,阿尔茨海默病神经影像学计划(ADNI)和开放获取成像研究系列(OASIS)的数据集上的测试结果表明,3DCNN+EL+GA优于其他最先进的深度学习算法,并且所获得的有区别的大脑区域(例如,头端海马体,尾部海马,和内侧杏仁核)与情绪有关,记忆,语言,和其他基本脑功能在AD过程早期受损。未来的研究需要检查所提出的方法和想法的普遍性,以辨别其他脑部疾病的区分性大脑区域,比如严重的抑郁症,精神分裂症,自闭症,和脑血管疾病,使用神经成像。
    Adaptive interpretable ensemble model based on three-dimensional Convolutional Neural Network (3DCNN) and Genetic Algorithm (GA), i.e., 3DCNN+EL+GA, was proposed to differentiate the subjects with Alzheimer\'s Disease (AD) or Mild Cognitive Impairment (MCI) and further identify the discriminative brain regions significantly contributing to the classifications in a data-driven way. Plus, the discriminative brain sub-regions at a voxel level were further located in these achieved brain regions, with a gradient-based attribution method designed for CNN. Besides disclosing the discriminative brain sub-regions, the testing results on the datasets from the Alzheimer\'s Disease Neuroimaging Initiative (ADNI) and the Open Access Series of Imaging Studies (OASIS) indicated that 3DCNN+EL+GA outperformed other state-of-the-art deep learning algorithms and that the achieved discriminative brain regions (e.g., the rostral hippocampus, caudal hippocampus, and medial amygdala) were linked to emotion, memory, language, and other essential brain functions impaired early in the AD process. Future research is needed to examine the generalizability of the proposed method and ideas to discern discriminative brain regions for other brain disorders, such as severe depression, schizophrenia, autism, and cerebrovascular diseases, using neuroimaging.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    糖尿病是一种常见的慢性疾病,需要及时识别以进行有效管理。本文介绍了一种可靠的,直截了当,通过整合最先进的机器学习方法,通过纳秒脉冲激光诱导击穿光谱(LIBS)对糖尿病进行微创识别的有效方法。从糖尿病和健康个体的尿液样品收集LIBS光谱。使用主成分分析和集成学习分类模型来识别患病和正常尿液样品之间LIBS峰强度的显着变化。模型,集成六个不同的分类器和交叉验证技术,在预测糖尿病方面表现出很高的准确性(96.5%)。我们的发现强调了LIBS在尿液样本中鉴定糖尿病的潜力。该技术可能具有诊断其他健康状况的未来应用的潜力。
    Diabetes mellitus is a prevalent chronic disease necessitating timely identification for effective management. This paper introduces a reliable, straightforward, and efficient method for the minimally invasive identification of diabetes mellitus through nanosecond pulsed laser-induced breakdown spectroscopy (LIBS) by integrating a state-of-the-art machine learning approach. LIBS spectra were collected from urine samples of diabetic and healthy individuals. Principal component analysis and an ensemble learning classification model were used to identify significant changes in LIBS peak intensity between the diseased and normal urine samples. The model, integrating six distinct classifiers and cross-validation techniques, exhibited high accuracy (96.5%) in predicting diabetes mellitus. Our findings emphasize the potential of LIBS for diabetes mellitus identification in urine samples. This technique may hold potential for future applications in diagnosing other health conditions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:机器学习(ML)被广泛用于预测各种疾病的结果。该研究的目的是使用堆叠集成策略开发基于ML的分类器,以预测日本骨科协会(JOA)对退行性颈椎病(DCM)患者的恢复率。
    方法:将672例DCM患者纳入研究,并通过1年随访标记为JOA恢复率。所有数据均在2012-2023年期间收集,并随机分为训练和测试(8:2)子数据集。总共开发了91个初始ML分类器,并且具有最佳性能的前3个初始分类器被进一步堆叠成具有支持向量机(SVM)分类器的集成分类器。曲线下面积(AUC)是评估所有分类器预测性能的主要指标。主要预测结果是JOA恢复率。
    结果:通过应用集成学习策略(例如,stacking),在结合三个广泛使用的ML模型后,ML分类器的准确性得到了提高(例如,RFE-SVM,嵌入LR-LR,和RFE-AdaBoost)。决策曲线分析显示了集成分类器的优点,因为前3个初始分类器的曲线在预测DCM患者的JOA恢复率方面差异很大。
    结论:集合分类器成功预测DCM患者的JOA恢复率,这显示了协助医生管理DCM患者和充分利用医疗资源的巨大潜力。
    BACKGROUND: Machine learning (ML) is extensively employed for forecasting the outcome of various illnesses. The objective of the study was to develop ML based classifiers using a stacking ensemble strategy to predict the Japanese Orthopedic Association (JOA) recovery rate for patients with degenerative cervical myelopathy (DCM).
    METHODS: A total of 672 patients with DCM were included in the study and labeled with JOA recovery rate by 1-year follow-up. All data were collected during 2012-2023 and were randomly divided into training and testing (8:2) sub-datasets. A total of 91 initial ML classifiers were developed, and the top 3 initial classifiers with the best performance were further stacked into an ensemble classifier with a supported vector machine (SVM) classifier. The area under the curve (AUC) was the main indicator to assess the prediction performance of all classifiers. The primary predicted outcome was the JOA recovery rate.
    RESULTS: By applying an ensemble learning strategy (e.g., stacking), the accuracy of the ML classifier improved following combining three widely used ML models (e.g., RFE-SVM, EmbeddingLR-LR, and RFE-AdaBoost). Decision curve analysis showed the merits of the ensemble classifiers, as the curves of the top 3 initial classifiers varied a lot in predicting JOA recovery rate in DCM patients.
    CONCLUSIONS: The ensemble classifiers successfully predict the JOA recovery rate in DCM patients, which showed a high potential for assisting physicians in managing DCM patients and making full use of medical resources.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    作为神经衰老的重要生物标志物,大脑年龄反映了人类大脑的完整性和健康。准确预测大脑年龄有助于理解神经衰老的潜在机制。在这项研究中,利用T1加权磁共振成像(MRI)数据,提出了一种具有staking策略的交叉分层集成学习算法,以获得脑年龄和推导的预测年龄差异(PAD).该方法的特点是实现两个模块:一个是3D-DenseNet的三个基础学习者,3D-ResNeXt,3D-Inception-v4;另一个是线性回归的14个二级学习者。为了评估性能,我们的方法与单基础学习者进行了比较,常规集成学习算法,和最先进的(SOTA)方法。结果表明,我们提出的模型优于其他模型,具有三个平均绝对误差(MAE)指标,均方根误差(RMSE),和2.9405年的决定系数(R2),3.9458年,和0.9597。此外,正常对照组(NC)三组间PAD存在显著差异,轻度认知障碍(MCI)和阿尔茨海默病(AD),随着整个NC的增长趋势,MCI和AD。结果表明,该算法可以有效地用于计算大脑老化和PAD,并提供早期诊断和评估正常脑老化和AD的潜力。
    As an important biomarker of neural aging, the brain age reflects the integrity and health of the human brain. Accurate prediction of brain age could help to understand the underlying mechanism of neural aging. In this study, a cross-stratified ensemble learning algorithm with staking strategy was proposed to obtain brain age and the derived predicted age difference (PAD) using T1-weighted magnetic resonance imaging (MRI) data. The approach was characterized as by implementing two modules: one was three base learners of 3D-DenseNet, 3D-ResNeXt, 3D-Inception-v4; another was 14 secondary learners of liner regressions. To evaluate performance, our method was compared with single base learners, regular ensemble learning algorithms, and state-of-the-art (SOTA) methods. The results demonstrated that our proposed model outperformed others models, with three metrics of mean absolute error (MAE), root mean-squared error (RMSE), and coefficient of determination (R2) of 2.9405 years, 3.9458 years, and 0.9597, respectively. Furthermore, there existed significant differences in PAD among the three groups of normal control (NC), mild cognitive impairment (MCI) and Alzheimer\'s disease (AD), with an increased trend across NC, MCI, and AD. It was concluded that the proposed algorithm could be effectively used in computing brain aging and PAD, and offering potential for early diagnosis and assessment of normal brain aging and AD.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    对于高精度、高效率地预测先进封装中的焊点疲劳寿命一直存在较高的兴趣。随着计算设施的不断发展,人工智能加(AI+)正变得越来越受欢迎。这项研究将引入机器学习(AI的核心组成部分)。有了机器学习,创建近似系统或函数属性的元模型来预测高级包装的疲劳寿命。然而,预测能力高度依赖于训练数据的大小和分布。增加训练数据量是提高预测性能最直观的方法,但这意味着更高的计算成本。在这项研究中,采用自适应采样方法,利用现有数据库中的小数据集构建机器学习模型。模型的性能将使用预定义的标准进行可视化。此外,集成学习可用于在完全训练后提高AI模型的性能。
    There has always been high interest in predicting the solder joint fatigue life in advanced packaging with high accuracy and efficiency. Artificial Intelligence Plus (AI+) is becoming increasingly popular as computational facilities continue to develop. This study will introduce machine learning (a core component of AI). With machine learning, metamodels that approximate the attributes of systems or functions are created to predict the fatigue life of advanced packaging. However, the prediction ability is highly dependent on the size and distribution of the training data. Increasing the amount of training data is the most intuitive approach to improve prediction performance, but this implies a higher computational cost. In this research, the adaptive sampling methods are applied to build the machine learning model with a small dataset sampled from an existing database. The performance of the model will be visualized using predefined criteria. Moreover, ensemble learning can be used to improve the performance of AI models after they have been fully trained.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:Pesplanus,俗称平足,是指足弓内侧异常低或缺失的情况,导致脚的内部曲率小于正常。症状识别和诊断错误是日常实践中遇到的问题。因此,改善诊断方式很重要。随着大型数据集的可用性,深度神经网络在识别足部结构和准确识别pesplanus方面显示出很有前途的能力。方法:在本研究中,我们通过将Vgg16卷积神经网络(CNN)模型与视觉转换器ViT-B/16相结合,开发了一种新颖的融合模型,以增强对pes平面的检测。这种融合模型利用了CNN和ViT架构的优势,与文献中的报告相比,性能有所提高。此外,采用集成学习技术来确保模型的鲁棒性。结果:通过10倍交叉验证,该模型表现出高灵敏度,特异性,F1得分为97.4%,96.4%,和96.8%,分别。这些结果突出了所提出的模型在快速准确地诊断pesplanus方面的有效性,使其适合部署在诊所或医疗中心。结论:通过促进早期诊断,该模型可以有助于更好地管理处理过程,最终改善患者的生活质量。
    Background: Pes planus, commonly known as flatfoot, is a condition in which the medial arch of the foot is abnormally low or absent, leading to the inner part of the foot having less curvature than normal. Symptom recognition and errors in diagnosis are problems encountered in daily practice. Therefore, it is important to improve how a diagnosis is made. With the availability of large datasets, deep neural networks have shown promising capabilities in recognizing foot structures and accurately identifying pes planus. Methods: In this study, we developed a novel fusion model by combining the Vgg16 convolutional neural network (CNN) model with the vision transformer ViT-B/16 to enhance the detection of pes planus. This fusion model leverages the strengths of both the CNN and ViT architectures, resulting in improved performance compared to that in reports in the literature. Additionally, ensemble learning techniques were employed to ensure the robustness of the model. Results: Through a 10-fold cross-validation, the model demonstrated high sensitivity, specificity, and F1 score values of 97.4%, 96.4%, and 96.8%, respectively. These results highlight the effectiveness of the proposed model in quickly and accurately diagnosing pes planus, making it suitable for deployment in clinics or healthcare centers. Conclusions: By facilitating early diagnosis, the model can contribute to the better management of treatment processes, ultimately leading to an improved quality of life for patients.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号