Stacked ensemble learning

  • 文章类型: Journal Article
    血管内皮生长因子(VEGF)参与多种疾病的发生发展,包括癌症,糖尿病视网膜病变,黄斑变性和关节炎。了解VEGF在各种疾病中的作用导致了有效治疗的发展,包括抗VEGF药物,显著改善了治疗方法。准确的VEGF鉴定至关重要,然而,实验鉴定是昂贵和耗时的。本研究提出了Deep-VEGF,一种基于深度堆叠集成学习的VEGF预测计算模型。我们使用一级序列制定了两个数据集。构建了一种名为K-SpaceTriSlicing-Bigram位置特定评分metrix(KSTS-BPSSM)的新颖特征描述符,以从一级序列中提取数字特征。模型训练是通过深度学习技术进行的,包括门控经常性单位(GRU),生成对抗网络(GAN)和卷积神经网络(CNN)。GRU和CNN使用堆叠学习方法进行整合。基于KSTS-BPSSM的集成模型确保了最准确的预测结果,在训练和测试数据集上都超过了其他竞争预测因子。这证明了利用深度学习进行准确的VEGF预测作为加速研究的强大工具的潜力。简化药物发现并发现新的治疗靶点。这种有见地的方法有望扩大我们对VEGF在健康和疾病中的作用的认识。由RamaswamyH.Sarma沟通。
    Vascular endothelial growth factor (VEGF) is involved in the development and progression of various diseases, including cancer, diabetic retinopathy, macular degeneration and arthritis. Understanding the role of VEGF in various disorders has led to the development of effective treatments, including anti-VEGF drugs, which have significantly improved therapeutic methods. Accurate VEGF identification is critical, yet experimental identification is expensive and time-consuming. This study presents Deep-VEGF, a novel computational model for VEGF prediction based on deep-stacked ensemble learning. We formulated two datasets using primary sequences. A novel feature descriptor named K-Space Tri Slicing-Bigram position-specific scoring metrix (KSTS-BPSSM) is constructed to extract numerical features from primary sequences. The model training is performed by deep learning techniques, including gated recurrent unit (GRU), generative adversarial network (GAN) and convolutional neural network (CNN). The GRU and CNN are ensembled using stacking learning approach. KSTS-BPSSM-based ensemble model secured the most accurate predictive outcomes, surpassing other competitive predictors across both training and testing datasets. This demonstrates the potential of leveraging deep learning for accurate VEGF prediction as a powerful tool to accelerate research, streamline drug discovery and uncover novel therapeutic targets. This insightful approach holds promise for expanding our knowledge of VEGF\'s role in health and disease.Communicated by Ramaswamy H. Sarma.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    皮肤刺激是与各种物质相关的不利影响,包括化学品,毒品,或天然产品。Dipterocarpol,从双叶茎中提取,含有几种皮肤益处,特别是抗癌,伤口愈合,和抗菌性能。然而,diptercarpol的皮肤刺激仍未评估。定量结构-活性关系(QSAR)是毒性评估的推荐工具,涉及更少的时间,钱,和动物试验以获取不可获得的急性毒性数据。因此,我们的研究旨在开发一种基于机器学习的高精度QSAR模型来预测皮肤刺激。我们使用了具有1064种化学物质的堆叠集成学习模型。我们还遵守了经合组织关于QSAR验证的建议。随后,我们使用提出的模型来探索diptercarpol对角质形成细胞的细胞毒性。我们的发现表明,该模型在准确性方面显示出有希望的统计质量,精度,并在10倍交叉验证和测试数据集中召回。此外,该模型预测diptercarpol没有皮肤刺激,这通过基于细胞的测定得到证实。总之,我们提出的模型可应用于未测试化合物中皮肤刺激的风险评估,这些化合物属于其适用性范围。此模型的Web应用程序可在https://qsarlabs.com/#stackhacat获得。
    Skin irritation is an adverse effect associated with various substances, including chemicals, drugs, or natural products. Dipterocarpol, extracted from Dipterocarpus alatus, contains several skin benefits notably anticancer, wound healing, and antibacterial properties. However, the skin irritation of dipterocarpol remains unassessed. Quantitative structure-activity relationship (QSAR) is a recommended tool for toxicity assessment involving less time, money, and animal testing to access unavailable acute toxicity data. Therefore, our study aimed to develop a highly accurate machine learning-based QSAR model for predicting skin irritation. We utilized a stacked ensemble learning model with 1064 chemicals. We also adhered to the recommendations from the OECD for QSAR validation. Subsequently, we used the proposed model to explore the cytotoxicity of dipterocarpol on keratinocytes. Our findings indicate that the model displayed promising statistical quality in terms of accuracy, precision, and recall in both 10-fold cross-validation and test datasets. Moreover, the model predicted that dipterocarpol does not have skin irritation, which was confirmed by the cell-based assay. In conclusion, our proposed model can be applied for the risk assessment of skin irritation in untested compounds that fall within its applicability domain. The web application of this model is available at https://qsarlabs.com/#stackhacat.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    长链非编码RNA(lncRNAs)是一类长度超过200个碱基对(bps)的RNA,不编码蛋白质,然而,lncRNAs具有许多重要的生物学功能。随着高通量测序技术的发展,发现了大量新的转录本。在这种情况下,lncRNA预测的计算方法需求量很大。在本文中,我们考虑了全局序列特征,并提出了一种基于集成学习的堆叠方法来预测来自转录本的lncRNAs,缩写为PredLnc-GFStack。我们使用遗传算法(GA)从候选特征列表中提取关键特征,然后采用堆叠集成学习方法构建PredLnc-GFStack模型。计算实验结果表明,PredLnc-GFStack优于几种最新的lncRNA预测方法。此外,PredLnc-GFStack展示了跨物种ncRNA预测的杰出能力。
    Long non-coding RNAs (lncRNAs) are a class of RNAs with the length exceeding 200 base pairs (bps), which do not encode proteins, nevertheless, lncRNAs have many vital biological functions. A large number of novel transcripts were discovered as a result of the development of high-throughput sequencing technology. Under this circumstance, computational methods for lncRNA prediction are in great demand. In this paper, we consider global sequence features and propose a stacked ensemble learning-based method to predict lncRNAs from transcripts, abbreviated as PredLnc-GFStack. We extract the critical features from the candidate feature list using the genetic algorithm (GA) and then employ the stacked ensemble learning method to construct PredLnc-GFStack model. Computational experimental results show that PredLnc-GFStack outperforms several state-of-the-art methods for lncRNA prediction. Furthermore, PredLnc-GFStack demonstrates an outstanding ability for cross-species ncRNA prediction.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

公众号