Stacking ensemble model

堆叠合奏模型
  • 文章类型: Journal Article
    糖尿病的传统并发症是众所周知的,并且继续对数百万糖尿病(DM)患者构成相当大的负担。随着医疗数据的不断积累和技术的进步,人工智能在预测中显示出巨大的潜力和优势,诊断,和DM的治疗。当诊断为DM时,医生的一些主观因素和诊断方法会对诊断结果产生影响,因此,利用人工智能对DM患者进行快速有效的早期预测,可以为医生提供决策支持,并及时为患者提供更准确的治疗服务,具有重要的临床医学意义和现实意义。在本文中,基于“误差-模糊分解”理论,提出了一种自适应堆叠集成模型,\"可以自适应地从预先选择的模型中选择基本分类器。本文提出的自适应堆叠集成模型与KNN进行了比较,SVM,射频,LR,DT,GBDT,XGBoost,LightGBM,CatBoost,MLP和传统的堆叠合奏模型。结果表明,自适应Stacking集成模型在5个评价指标中取得了最好的性能:精度,召回,F1值和AUC值,分别为0.7559、0.7286、0.8132、0.7686和0.8436。该模型能够有效预测DM患者,为临床DM的筛查和诊断提供参考价值。
    The traditional complications of diabetes are well known and continue to pose a considerable burden to millions of people with diabetes mellitus (DM). With the continuous accumulation of medical data and technological advances, artificial intelligence has shown great potential and advantages in the prediction, diagnosis, and treatment of DM. When DM is diagnosed, some subjective factors and diagnostic methods of doctors will have an impact on the diagnostic results, so the use of artificial intelligence for fast and effective early prediction of DM patients can provide decision-making support to doctors and give more accurate treatment services to patients in time, which is of great clinical medical significance and practical significance. In this paper, an adaptive Stacking ensemble model is proposed based on the theory of \"error-ambiguity decomposition,\" which can adaptively select the base classifiers from the pre-selected models. The adaptive Stacking ensemble model proposed in this paper is compared with KNN, SVM, RF, LR, DT, GBDT, XGBoost, LightGBM, CatBoost, MLP and traditional Stacking ensemble models. The results showed that the adaptive Stacking ensemble model achieved the best performance in five evaluation metrics: accuracy, precision, recall, F1 value and AUC value, which were 0.7559, 0.7286, 0.8132, 0.7686 and 0.8436. The model can effectively predict DM patients and provide a reference value for the screening and diagnosis of clinical DM.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:缺乏文献讨论利用堆叠集成算法预测心力衰竭(HF)患者的抑郁。
    目的:建立预测HF患者抑郁的叠加模型。
    方法:本研究分析了2005年至2018年国家健康和营养调查数据库中1084名HF患者的数据。通过单变量分析和使用人工神经网络算法,确定了与抑郁显著相关的预测因子.这些预测因子被用来创建采用基于树的学习者的堆叠模型。通过使用测试数据集评估单个模型和堆叠模型的性能。此外,Shapley加性扩张(SHAP)模型被用来解释堆叠模型。
    结果:模型包括五个预测因子。在这些模型中,堆叠模型展示了最高的性能,曲线下面积为0.77(95CI:0.71-0.84),敏感性为0.71,特异性为0.68。校正曲线支持模型的可靠性,和决策曲线分析证实了其临床价值。SHAP图表明,年龄对堆叠模型的产量影响最大。
    结论:堆叠模型显示出强大的预测性能。临床医生可以利用这个模型来识别患有HF的高风险抑郁症患者,从而能够及早提供心理干预。
    BACKGROUND: There is a lack of literature discussing the utilization of the stacking ensemble algorithm for predicting depression in patients with heart failure (HF).
    OBJECTIVE: To create a stacking model for predicting depression in patients with HF.
    METHODS: This study analyzed data on 1084 HF patients from the National Health and Nutrition Examination Survey database spanning from 2005 to 2018. Through univariate analysis and the use of an artificial neural network algorithm, predictors significantly linked to depression were identified. These predictors were utilized to create a stacking model employing tree-based learners. The performances of both the individual models and the stacking model were assessed by using the test dataset. Furthermore, the SHapley additive exPlanations (SHAP) model was applied to interpret the stacking model.
    RESULTS: The models included five predictors. Among these models, the stacking model demonstrated the highest performance, achieving an area under the curve of 0.77 (95%CI: 0.71-0.84), a sensitivity of 0.71, and a specificity of 0.68. The calibration curve supported the reliability of the models, and decision curve analysis confirmed their clinical value. The SHAP plot demonstrated that age had the most significant impact on the stacking model\'s output.
    CONCLUSIONS: The stacking model demonstrated strong predictive performance. Clinicians can utilize this model to identify high-risk depression patients with HF, thus enabling early provision of psychological interventions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:我们确定了预测因素,并开发了一种新的机器学习(ML)模型来预测脓毒症相关性脑病(SAE)患者的死亡风险。
    方法:在这项回顾性队列研究中,来自重症监护医学信息集市(MIMIC-IV)和eICU协作研究数据库的数据用于模型开发和外部验证.主要结果是SAE患者的院内死亡率;观察到的院内死亡率为14.74%(MIMICIV:1112,eICU:594)。使用最小绝对收缩和选择运算符(LASSO),我们建立了9个ML模型和一个堆叠集成模型,并根据受试者工作特征曲线下面积(AUC)确定了最佳模型。我们使用Shapley加法解释(SHAP)算法来确定最优模型。
    结果:该研究包括9943名患者。LASSO确定了15个变量。堆叠集成模型在测试集上达到最高AUC(0.807),在外部验证上达到0.671。SHAP分析强调格拉斯哥昏迷量表(GCS)和年龄是关键变量。模型(https://sic1。shinyapps.io/SSAAEE/)可以预测SAE患者的院内死亡风险。
    结论:我们开发了一个具有增强泛化能力的堆叠集成模型,使用新数据预测SAE患者的死亡风险。
    OBJECTIVE: We identified predictive factors and developed a novel machine learning (ML) model for predicting mortality risk in patients with sepsis-associated encephalopathy (SAE).
    METHODS: In this retrospective cohort study, data from the Medical Information Mart for Intensive Care IV (MIMIC-IV) and eICU Collaborative Research Database were used for model development and external validation. The primary outcome was the in-hospital mortality rate among patients with SAE; the observed in-hospital mortality rate was 14.74% (MIMIC IV: 1112, eICU: 594). Using the least absolute shrinkage and selection operator (LASSO), we built nine ML models and a stacking ensemble model and determined the optimal model based on the area under the receiver operating characteristic curve (AUC). We used the Shapley additive explanations (SHAP) algorithm to determine the optimal model.
    RESULTS: The study included 9943 patients. LASSO identified 15 variables. The stacking ensemble model achieved the highest AUC on the test set (0.807) and 0.671 on external validation. SHAP analysis highlighted Glasgow Coma Scale (GCS) and age as key variables. The model (https://sic1.shinyapps.io/SSAAEE/) can predict in-hospital mortality risk for patients with SAE.
    CONCLUSIONS: We developed a stacked ensemble model with enhanced generalization capabilities using novel data to predict mortality risk in patients with SAE.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    随着工业化进程的不断推进,空气质量下降的问题日益成为一个关键问题。空气质量指数(AQI)的准确预测被认为是代表大气中污染物含量的包罗万象的措施,是最重要的。本研究引入了一种新颖的方法,该方法结合了堆叠集成和纠错来改善AQI预测。此外,爬行动物搜索算法(RSA)用于优化模型参数。在这项研究中,收集了四个不同的区域AQI数据,其中包含34864个数据样本。最初,我们对10个常用的单一模型进行交叉验证以获得预测结果。然后,根据评价指标,五个模型选择合奏。研究结果表明,与常规模型相比,本文提出的模型在精度上提高了10%左右。因此,这项研究中引入的模型为解决空气污染提供了更科学的方法。
    With the ongoing process of industrialization, the issue of declining air quality is increasingly becoming a critical concern. Accurate prediction of the Air Quality Index (AQI), considered as an all-inclusive measure representing the extent of pollutants present in the atmosphere, is of paramount importance. This study introduces a novel methodology that combines stacking ensemble and error correction to improve AQI prediction. Additionally, the reptile search algorithm (RSA) is employed for optimizing model parameters. In this study, four distinct regional AQI data containing a collection of 34864 data samples are collected. Initially, we perform cross-validation on ten commonly used single models to obtain prediction results. Then, based on evaluation indices, five models are selected for ensemble. The results of the study show that the model proposed in this paper achieves an improvement of around 10% in terms of accuracy when compared to the conventional model. Thus, the model introduced in this study offers a more scientifically grounded approach in tackling air pollution.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    东部肿瘤协作组表现状态(ECOGPS)是一种广泛认可的指标,用于评估癌症患者的功能能力并预测其预后。它在指导医生做出治疗决定中起着至关重要的作用。这项研究旨在建立一个基于堆叠集成的预后预测模型,用于预测正在接受治疗的肝癌患者的ECOGPS。
    我们使用光梯度提升机(LightGBM)作为元模型,和五个基本模型,包括随机森林(RF),额外的树木(ET),AdaBoost(Ada),梯度增压机(GBM),和极端梯度提升(XGBoost)。在对数据进行预处理并应用特征选择方法后,堆叠集成模型使用1622例肝癌患者数据和46个变量进行训练.我们还将堆叠集成模型与基于LIME的可解释模型集成,以获得模型预测可解释性。
    根据研究,堆叠集成模型的最佳组合是ET+XGBoost+RF+GBM+Ada+LightGBM,在训练集上实现了0.9826的ROCAUC,在测试集上实现了0.9675的ROCAUC。
    这种可解释的堆叠集成模型可以成为客观预测肝癌患者ECOGPS的有用工具,并帮助医疗从业者更有效地适应他们的治疗方法。
    UNASSIGNED: The Eastern Cooperative Oncology Group performance status (ECOG PS) is a widely recognized measure used to assess the functional abilities of cancer patients and predict their prognosis. It plays a crucial role in guiding treatment decisions made by physicians. This study aimed to build a stacking ensemble-based prognosis predictor model for predicting the ECOG PS of a liver cancer patient undergoing treatment.
    UNASSIGNED: We used Light Gradient Boosting Machine (LightGBM) as the meta-model, and five base models, including Random Forest (RF), Extra Trees (ET), AdaBoost (Ada), Gradient Boosting Machine (GBM), and Extreme Gradient Boosting (XGBoost). After preprocessing the data and applying feature selection method, the stacking ensemble model was trained using 1622 liver cancer patients\' data and 46 variables. We also integrated the stacking ensemble model with a LIME-based explainable model to obtain model prediction explainability.
    UNASSIGNED: According to the research, the best combination of the stacking ensemble model is ET + XGBoost + RF + GBM + Ada + LightGBM and achieved a ROC AUC of 0.9826 on the training set and 0.9675 on the test set.
    UNASSIGNED: This explainable stacking ensemble model can become a helpful tool for objectively predicting ECOG PS in liver cancer patients and aiding healthcare practitioners to adapt their treatment approach more effectively.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    这项研究采用机器学习分析来自韩国一家著名人寿保险公司的大量数据来证实保险需求理论。这假设保险需求随着风险厌恶而增加。我们定量地描述了厌恶风险的个体的特征。我们的研究重点是94,306人因疾病提出保险索赔。为了预测倾向于额外购买的潜在保险消费者,我们使用机器学习算法构建预测模型。该模型包含19个人口和社会经济因素作为独立变量,以额外的保险获取为因变量。因此,我们揭示了预期购买补充保险产品的消费者的独特特征。我们的发现揭示了自变量与购买额外保险的可能性之间的显着关联。值得注意的是,19个独立变量中有10个对额外保险收购产生了重大影响。这些特征包括居住在农村地区,女性的可能性更高,高龄,增加资产,成为蓝领工人的可能性更高,教育水平较低,结婚或离婚/分居的可能性更大,有癌症史,以及对现有保单持有人事先认购实际损失保险或重大保险合同金额的倾向。我们的研究通过解决先前研究中观察到的局限性而具有学术意义,主要依靠问卷来定性评估风险厌恶。相反,我们提供与风险厌恶相关的个体特征的具体见解。此外,我们预计,韩国保险公司可以利用这些见解来吸引新客户,同时通过预测性风险厌恶分析保留现有成员。这些发现还提供了一系列学科的宝贵见解,包括工商管理,心理学,教育,社会学,和销售/营销,与个人风险偏好和行为有关。
    This research employs machine learning analysis on extensive data from a prominent Korean life insurance company to substantiate the insurance demand theory, which posits that insurance demand increases with risk aversion. We quantitatively delineate the traits of risk-averse individuals. Our study focuses on a cohort of 94,306 individuals who have filed insurance claims due to illness. To forecast prospective insurance consumers inclined toward additional purchases, we construct a predictive model using a machine learning algorithm. This model incorporates 19 demographic and socioeconomic factors as independent variables, with additional insurance acquisition as the dependent variable. Consequently, we uncover the distinctive characteristics of consumers predicted to acquire supplementary insurance products. Our findings reveal a significant association between the independent variables and the likelihood of purchasing additional insurance. Notably, 10 out of the 19 independent variables exert a substantial influence on additional insurance acquisitions. These characteristics encompass residence in rural areas, a higher likelihood of being female, advanced age, increased assets, a higher likelihood of being blue-collar workers, lower education levels, a greater likelihood of being married or divorced/separated, a history of cancer, and a predisposition for existing policyholders with prior subscriptions to actual loss insurance or substantial insurance contract amounts. Our study holds academic significance by addressing limitations observed in prior research, which predominantly relied on questionnaires to qualitatively assess risk aversion. Instead, we offer specific insights into individual characteristics associated with risk aversion. Moreover, we anticipate that Korean insurance companies can leverage these insights to attract new clientele while retaining existing members through predictive risk aversion analysis. These findings also offer valuable insights across a spectrum of disciplines, including business administration, psychology, education, sociology, and sales/marketing, related to individuals\' risk preferences and behaviors.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    吲哚胺2,3-双加氧酶1(IDO1)被视为癌症免疫疗法的极有希望的靶标。这里,我们提出了一个两层堆叠集成模型,IDO1Stack,可以有效预测IDO1抑制剂。首先,我们基于五种机器学习算法和八种分子表征方法构建了一系列分类模型。然后,使用前五个模型作为基础分类器,逻辑回归作为元分类器,建立了堆叠集成模型。测试集和外部验证集上的IDO1Stack的受试者工作特征曲线(AUC)下面积分别为0.952和0.918。此外,我们计算了模型的适用域和特权子结构,并使用SHapley加法扩张(SHAP)解释了模型。预计IDO1Stack可以很好地研究靶标与配体之间的相互作用,为从业人员提供快速筛选和发现IDO1抑制剂的可靠工具。
    Indoleamine 2,3-dioxygenase 1 (IDO1) is viewed as an extremely promising target for cancer immunotherapy. Here, we proposed a two-layer stacking ensemble model, IDO1Stack, that can efficiently predict IDO1 inhibitors. First, we constructed a series of classification models based on five machine learning algorithms and eight molecular characterization methods. Then, a stacking ensemble model was built using the top five models as the base classifier and logistic regression as the meta-classifier. The areas under the receiver operating characteristic curve (AUC) of IDO1Stack on the test set and external validation set were 0.952 and 0.918, respectively. Furthermore, we computed the applicability domain and privileged substructures of the model and interpreted the model using SHapley Additive exPlanations (SHAP). It is expected that IDO1Stack can well study the interaction between target and ligand, providing practitioners with a reliable tool for rapid screening and discovery of IDO1 inhibitors.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:随着脑血管疾病(CD)的流行和医疗资源的日益紧张,预测脑血管患者的医疗保健需求对于优化医疗资源具有重要意义。
    方法:在本研究中,由四个基础学习者组成的堆叠集成模型(岭回归,随机森林,梯度增强决策树,和人工神经网络),并提出了一种元学习器(弹性网),用于使用历史HAs数据预测CD的每日住院人数(HAs),空气质量数据,和成都的气象数据,中国从2015年到2018年。为了解决标签不平衡问题,基于标签分布平滑的重加权方法被集成到元学习器中.我们使用2015年至2017年的数据对模型进行训练,并根据四个指标使用2018年的数据评估其预测能力,包括平均绝对误差(MAE),均方根误差(RMSE),平均绝对百分比误差(MAPE),和决定系数(R2)。此外,Shapley加法扩张(SHAP)框架被用来为我们的堆叠模型的预测提供解释。
    结果:我们提出的模型在两个数据集上优于所有基础学习者和长期短期记忆(LSTM)。特别是,与单个模型获得的最佳结果相比,MAE,RMSE,堆叠模型的MAPE下降了13.9%,12.7%,和5.8%,分别,在CD数据集上,R2提高了6.8%。模型解释表明,环境特征在进一步改善模型性能方面发挥了作用,并确定高温和高浓度的气态空气污染物可能与CD风险增加密切相关。
    结论:我们考虑环境暴露的堆叠模型可以有效预测CD的每日HAs,并且在预警和医疗资源分配方面具有实用价值。
    With the prevalence of cerebrovascular disease (CD) and the increasing strain on healthcare resources, forecasting the healthcare demands of cerebrovascular patients has significant implications for optimizing medical resources.
    In this study, a stacking ensemble model comprised of four base learners (ridge regression, random forest, gradient boosting decision tree, and artificial neural network) and a meta learner (elastic net) was proposed for predicting the daily number of hospital admissions (HAs) for CD using the historical HAs data, air quality data, and meteorological data in Chengdu, China from 2015 to 2018. To solve the label imbalance problem, a re-weighting method based on label distribution smoothing was integrated into the meta learner. We trained the model using the data from 2015 to 2017 and evaluated its predictive ability using the data in 2018 based on four metrics, including mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE), and coefficient of determination (R2). In addition, the SHapley Additive exPlanations (SHAP) framework was applied to provide explanation for the prediction of our stacking model.
    Our proposed model outperformed all the base learners and long short-term memory (LSTM) on two datasets. Particularly, compared with the optimal results obtained by individual models, the MAE, RMSE, and MAPE of the stacking model decreased by 13.9%, 12.7%, and 5.8%, respectively, and the R2 improved by 6.8% on CD dataset. The model explanation demonstrated that environmental features played a role in further improving the model performance and identified that high temperature and high concentrations of gaseous air pollutants might strongly associate with an increased risk of CD.
    Our stacking model considering environmental exposure is efficient in predicting daily HAs for CD and has practical value in early warning and healthcare resource allocation.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:许多研究已经确定了长期护理机构中老年人使用身体约束(PR)的危险因素。然而,缺乏识别高风险个体的预测工具。
    目的:我们旨在开发基于机器学习(ML)的模型来预测老年人的PR风险。
    方法:本研究基于重庆市6家长期护理机构的1026名老年人进行了横断面二次数据分析,中国,从2019年7月到2019年11月。主要结果是使用PR(是或否),由2名收藏家直接观察确定。共有15个候选预测因子(老年人人口统计学和临床因素)可以从临床实践中常见和容易地收集,用于建立9个独立的ML模型:高斯朴素贝叶斯(GNB),k-最近邻(KNN),决策树(DT),逻辑回归(LR),支持向量机(SVM),随机森林(RF),多层感知器(MLP),极端梯度提升(XGBoost),和轻型梯度增压机(Lightgbm),以及堆叠合奏ML。使用准确性评估性能,精度,召回,F分,由上述指标权衡的综合评价指标(CEI),和受试者工作特征曲线下面积(AUC)。使用决策曲线分析(DCA)的净收益方法来评估最佳模型的临床实用性。通过10倍交叉验证测试模型。使用Shapley加法解释(SHAP)解释特征重要性。
    结果:共有1026名老年人(平均83.5,SD7.6岁;n=586,57.1%的男性老年人)和265名受约束的老年人被纳入研究。所有ML模型都表现良好,AUC高于0.905,F评分高于0.900。2个最佳的独立模型是RF(AUC0.938,95%CI0.914-0.947)和SVM(AUC0.949,95%CI0.911-0.953)。DCA表明,RF模型比其他模型显示出更好的临床实用性。将堆叠模型与SVM相结合,射频,和MLP表现最好,AUC(0.950)和CEI(0.943)值,以及DCA曲线表明了最佳的临床效用。SHAP图表明,模型表现的重要贡献者与认知障碍有关,护理依赖,流动性下降,身体激动,还有一根留置管.
    结论:RF和堆叠模型具有较高的性能和临床实用性。用于预测老年人PR概率的ML预测模型可以提供临床筛查和决策支持,这可以帮助医务人员对老年人的早期识别和PR管理。
    Numerous studies have identified risk factors for physical restraint (PR) use in older adults in long-term care facilities. Nevertheless, there is a lack of predictive tools to identify high-risk individuals.
    We aimed to develop machine learning (ML)-based models to predict the risk of PR in older adults.
    This study conducted a cross-sectional secondary data analysis based on 1026 older adults from 6 long-term care facilities in Chongqing, China, from July 2019 to November 2019. The primary outcome was the use of PR (yes or no), identified by 2 collectors\' direct observation. A total of 15 candidate predictors (older adults\' demographic and clinical factors) that could be commonly and easily collected from clinical practice were used to build 9 independent ML models: Gaussian Naïve Bayesian (GNB), k-nearest neighbor (KNN), decision tree (DT), logistic regression (LR), support vector machine (SVM), random forest (RF), multilayer perceptron (MLP), extreme gradient boosting (XGBoost), and light gradient boosting machine (Lightgbm), as well as stacking ensemble ML. Performance was evaluated using accuracy, precision, recall, an F score, a comprehensive evaluation indicator (CEI) weighed by the above indicators, and the area under the receiver operating characteristic curve (AUC). A net benefit approach using the decision curve analysis (DCA) was performed to evaluate the clinical utility of the best model. Models were tested via 10-fold cross-validation. Feature importance was interpreted using Shapley Additive Explanations (SHAP).
    A total of 1026 older adults (mean 83.5, SD 7.6 years; n=586, 57.1% male older adults) and 265 restrained older adults were included in the study. All ML models performed well, with an AUC above 0.905 and an F score above 0.900. The 2 best independent models are RF (AUC 0.938, 95% CI 0.914-0.947) and SVM (AUC 0.949, 95% CI 0.911-0.953). The DCA demonstrated that the RF model displayed better clinical utility than other models. The stacking model combined with SVM, RF, and MLP performed best with AUC (0.950) and CEI (0.943) values, as well as the DCA curve indicated the best clinical utility. The SHAP plots demonstrated that the significant contributors to model performance were related to cognitive impairment, care dependency, mobility decline, physical agitation, and an indwelling tube.
    The RF and stacking models had high performance and clinical utility. ML prediction models for predicting the probability of PR in older adults could offer clinical screening and decision support, which could help medical staff in the early identification and PR management of older adults.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:许多长链非编码RNA(lncRNAs)在不同的人类生物过程中具有关键作用,并且与许多人类疾病密切相关,根据累积证据。预测潜在的lncRNA-疾病关联可以帮助检测疾病生物标志物并进行疾病分析和预防。建立lncRNA-疾病关联预测的有效计算方法至关重要。
    结果:在本文中,我们提出了一个名为MAGCNSE的新模型来预测潜在的lncRNA-疾病关联。我们首先利用图卷积网络从lncRNAs和疾病的多视图相似性图中获得多个特征矩阵。然后,使用注意力机制将权重自适应地分配给lncRNAs和疾病的不同特征矩阵。接下来,lncRNAs和疾病的最终表征是通过使用卷积神经网络从lncRNAs和疾病的多通道特征矩阵中进一步提取特征来获得的。最后,我们使用一个堆叠集成分类器,由多个传统机器学习分类器组成,做出最后的预测。表征学习方法和分类方法中的消融研究结果证明了每个模块的有效性。此外,我们将MAGCNSE的整体性能与其他六个最先进的模型进行了比较,结果表明,它优于其他方法。此外,我们验证了使用lncRNAs和疾病的多视图数据的有效性。案例研究进一步揭示了MAGCNSE在鉴定潜在的lncRNA-疾病关联方面的杰出能力。
    结论:实验结果表明,MAGCNSE是预测潜在的lncRNA-疾病关联的有用方法。
    BACKGROUND: Many long non-coding RNAs (lncRNAs) have key roles in different human biologic processes and are closely linked to numerous human diseases, according to cumulative evidence. Predicting potential lncRNA-disease associations can help to detect disease biomarkers and perform disease analysis and prevention. Establishing effective computational methods for lncRNA-disease association prediction is critical.
    RESULTS: In this paper, we propose a novel model named MAGCNSE to predict underlying lncRNA-disease associations. We first obtain multiple feature matrices from the multi-view similarity graphs of lncRNAs and diseases utilizing graph convolutional network. Then, the weights are adaptively assigned to different feature matrices of lncRNAs and diseases using the attention mechanism. Next, the final representations of lncRNAs and diseases is acquired by further extracting features from the multi-channel feature matrices of lncRNAs and diseases using convolutional neural network. Finally, we employ a stacking ensemble classifier, consisting of multiple traditional machine learning classifiers, to make the final prediction. The results of ablation studies in both representation learning methods and classification methods demonstrate the validity of each module. Furthermore, we compare the overall performance of MAGCNSE with that of six other state-of-the-art models, the results show that it outperforms the other methods. Moreover, we verify the effectiveness of using multi-view data of lncRNAs and diseases. Case studies further reveal the outstanding ability of MAGCNSE in the identification of potential lncRNA-disease associations.
    CONCLUSIONS: The experimental results indicate that MAGCNSE is a useful approach for predicting potential lncRNA-disease associations.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号