Ensemble learning

合奏学习
  • 文章类型: Journal Article
    纳米脂质体制剂,利用脂质双层封装治疗剂,对靶向药物输送抱有希望。最近的研究探索了机器学习(ML)技术在该领域的应用。这项研究旨在阐明将ML整合到脂质体制剂中的动机,提供对其应用的细致入微的理解,并突出潜在的优势。该综述首先概述了脂质体制剂及其在靶向药物递送中的作用。然后系统地通过目前在这一领域对ML的研究,讨论指导ML适应脂质体制备和表征的原则。此外,这篇综述提出了一个有效的机器学习合并的概念模型。该评论探讨了流行的ML技术,包括合奏学习,决策树,基于实例的学习,和神经网络。它讨论了特征提取和选择,强调数据集性质和ML方法选择对技术相关性的影响。该综述强调了结构化脂质体制剂监督学习模型的重要性,标签数据是必不可少的。它承认K-fold交叉验证的优点,但注意到在脂质体制剂研究中普遍使用单列/测试分裂。这种做法有助于通过3D绘图实现结果的可视化,以进行实际解释。在强调平均绝对误差作为一个关键指标的同时,该综述强调预测值和实际值之间的一致性。它清楚地展示了ML技术在优化关键配方参数如封装效率方面的有效性,颗粒大小,药物装载效率,多分散指数,和脂质体通量。总之,这篇评论浏览了各种机器学习算法的细微差别,说明ML作为脂质体制剂开发决策支持系统的作用。它提出了一个涉及实验的结构化框架,物理化学分析,并通过以人为中心的评估对ML模型进行迭代细化,指导未来的研究。强调细致的实验,跨学科合作,和持续验证,该综述主张将ML无缝整合到脂质体药物递送研究中,以取得有力进展.鼓励未来努力维护这些原则。
    Nanoliposomal formulations, utilizing lipid bilayers to encapsulate therapeutic agents, hold promise for targeted drug delivery. Recent studies have explored the application of machine learning (ML) techniques in this field. This study aims to elucidate the motivations behind integrating ML into liposomal formulations, providing a nuanced understanding of its applications and highlighting potential advantages. The review begins with an overview of liposomal formulations and their role in targeted drug delivery. It then systematically progresses through current research on ML in this area, discussing the principles guiding ML adaptation for liposomal preparation and characterization. Additionally, the review proposes a conceptual model for effective ML incorporation. The review explores popular ML techniques, including ensemble learning, decision trees, instance- based learning, and neural networks. It discusses feature extraction and selection, emphasizing the influence of dataset nature and ML method choice on technique relevance. The review underscores the importance of supervised learning models for structured liposomal formulations, where labeled data is essential. It acknowledges the merits of K-fold cross-validation but notes the prevalent use of single train/test splits in liposomal formulation studies. This practice facilitates the visualization of results through 3D plots for practical interpretation. While highlighting the mean absolute error as a crucial metric, the review emphasizes consistency between predicted and actual values. It clearly demonstrates ML techniques\' effectiveness in optimizing critical formulation parameters such as encapsulation efficiency, particle size, drug loading efficiency, polydispersity index, and liposomal flux. In conclusion, the review navigates the nuances of various ML algorithms, illustrating ML\'s role as a decision support system for liposomal formulation development. It proposes a structured framework involving experimentation, physicochemical analysis, and iterative ML model refinement through human-centered evaluation, guiding future studies. Emphasizing meticulous experimentation, interdisciplinary collaboration, and continuous validation, the review advocates seamless ML integration into liposomal drug delivery research for robust advancements. Future endeavors are encouraged to uphold these principles.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    全球已经进行了许多努力和研究,以抗击2019年冠状病毒病(COVID-19)大流行。在这方面,一些研究人员专注于深度和机器学习方法,以发现更多关于这种疾病。已经有许多关于使用集成学习方法检测COVID-19的文章。尽管如此,似乎没有对这些研究进行科学计量分析或简要回顾。因此,本研究采用科学计量学分析和简要综述相结合的方法研究了使用集成学习方法检测COVID-19的已发表文章.这项研究使用这两种方法来克服他们的局限性,导致增强和可靠的结果。相关文章是从Scopus数据库中检索的。然后采用两步程序。对所收集的文章进行了简要审查。然后他们进行了科学计量和文献计量分析。研究结果表明,卷积神经网络(CNN)是最常用的算法,而支持向量机(SVM),随机森林,Resnet,DenseNet,和视觉几何组(VGG)也经常使用。此外,中国在这一研究领域的众多顶级类别中占有重要地位。这两个研究阶段都产生了有价值的结果和排名。
    Numerous efforts and research have been conducted worldwide to combat the coronavirus disease 2019 (COVID-19) pandemic. In this regard, some researchers have focused on deep and machine-learning approaches to discover more about this disease. There have been many articles on using ensemble learning methods for COVID-19 detection. Still, there seems to be no scientometric analysis or a brief review of these researches. Hence, a combined method of scientometric analysis and brief review was used to study the published articles that employed an ensemble learning approach to detect COVID-19. This research used both methods to overcome their limitations, leading to enhanced and reliable outcomes. The related articles were retrieved from the Scopus database. Then a two-step procedure was employed. A concise review of the collected articles was conducted. Then they underwent scientometric and bibliometric analyses. The findings revealed that convolutional neural network (CNN) is the mostly employed algorithm, while support vector machine (SVM), random forest, Resnet, DenseNet, and visual geometry group (VGG) were also frequently used. Additionally, China has had a significant presence in the numerous top-ranking categories of this field of research. Both study phases yielded valuable results and rankings.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Review
    背景:肺部疾病,传染性和非传染性,是世界上最普遍的死亡原因。医学研究发现了肺炎,肺癌,和2019年冠状病毒病(COVID-19)作为突出的肺部疾病优先于其他疾病。成像模式,包括X光片,计算机断层扫描(CT)扫描,磁共振成像(MRI),正电子发射断层扫描(PET)扫描,和其他人,主要用于医学评估,因为它们提供可用作计算机辅助诊断系统的输入数据集的计算数据。成像数据集用于开发和评估机器学习(ML)方法,以分析和预测突出的肺部疾病。
    目的:这篇综述分析了机器学习范式,成像模式\'利用率,以及突出肺部疾病的最新发展。此外,该研究还探索了各种公开可用的数据集,这些数据集被用于突出的肺部疾病。
    方法:经过同行评审的著名学术研究数据库,即ScienceDirect,arXiv,IEEEXplore,MDPI,还有更多,用于搜索相关文章。应用关键字和组合用于搜索程序,主要考虑审查,比如肺炎,肺癌,COVID-19,各种成像模式,ML,卷积神经网络(CNN),迁移学习,和合奏学习。
    结果:这项研究发现表明,X射线数据集是检测肺炎的首选,而CT扫描数据集主要用于检测肺癌。此外,在COVID-19检测中,X射线数据集优先于CT扫描数据集。分析显示,X射线和CT扫描已经超过了所有其他成像技术。已经观察到,使用CNN在识别突出的肺部疾病中产生高度的准确性和实用性。迁移学习和集成学习是CNN的补充技术,可以促进分析。此外,准确性是最受欢迎的评估指标。
    BACKGROUND: Lung diseases, both infectious and non-infectious, are the most prevalent cause of mortality overall in the world. Medical research has identified pneumonia, lung cancer, and Corona Virus Disease 2019 (COVID-19) as prominent lung diseases prioritized over others. Imaging modalities, including X-rays, computer tomography (CT) scans, magnetic resonance imaging (MRIs), positron emission tomography (PET) scans, and others, are primarily employed in medical assessments because they provide computed data that can be utilized as input datasets for computer-assisted diagnostic systems. Imaging datasets are used to develop and evaluate machine learning (ML) methods to analyze and predict prominent lung diseases.
    OBJECTIVE: This review analyzes ML paradigms, imaging modalities\' utilization, and recent developments for prominent lung diseases. Furthermore, the research also explores various datasets available publically that are being used for prominent lung diseases.
    METHODS: The well-known databases of academic studies that have been subjected to peer review, namely ScienceDirect, arXiv, IEEE Xplore, MDPI, and many more, were used for the search of relevant articles. Applied keywords and combinations used to search procedures with primary considerations for review, such as pneumonia, lung cancer, COVID-19, various imaging modalities, ML, convolutional neural networks (CNNs), transfer learning, and ensemble learning.
    RESULTS: This research finding indicates that X-ray datasets are preferred for detecting pneumonia, while CT scan datasets are predominantly favored for detecting lung cancer. Furthermore, in COVID-19 detection, X-ray datasets are prioritized over CT scan datasets. The analysis reveals that X-rays and CT scans have surpassed all other imaging techniques. It has been observed that using CNNs yields a high degree of accuracy and practicability in identifying prominent lung diseases. Transfer learning and ensemble learning are complementary techniques to CNNs to facilitate analysis. Furthermore, accuracy is the most favored metric for assessment.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    移动应用商店,例如GooglePlay,已经成为几乎所有类型的软件和手机用户服务的著名平台。用户可以通过应用商店浏览和下载应用,它还通过允许用户对其应用程序进行评分和审查来帮助开发人员监视它们。应用程序评论可能包含用户的经验,bug详细信息,请求其他功能,或应用程序的文本评级。由于票数不足,这些评级经常会出现偏差。然而,数字评分和用户评论之间存在显著差异。本研究使用迁移学习方法来预测Google应用程序的数字评分。它受益于用户提供的数字评级的应用程序作为培训数据,并通过分析用户的评论提供真实的评级的移动应用程序。为此,提出了基于词向量特征表示技术的基于迁移学习的模型ELMo。将所提出的模型的性能与其他三个迁移学习和五个机器学习模型进行了比较。该数据集从GooglePlay商店中删除,该商店从14种不同类别的应用程序中提取数据。首先,使用TextBlob分析对有偏见和无偏见的用户评级进行隔离,以制定基本事实,然后对分类器的预测精度进行评估。结果表明,ELMo分类器具有很高的潜力,可以通过用户实际评论来预测真实的数字评分。
    Mobile app stores, such as Google Play, have become famous platforms for practically all types of software and services for mobile phone users. Users may browse and download apps via app stores, which also help developers monitor their apps by allowing users to rate and review them. App reviews may contain the user\'s experience, bug details, requests for additional features, or a textual rating of the app. These ratings can be frequently biased due to inadequate votes. However, there are significant discrepancies between the numerical ratings and the user reviews. This study uses a transfer learning approach to predict the numerical ratings of Google apps. It benefits from user-provided numeric ratings of apps as the training data and provides authentic ratings of mobile apps by analyzing users\' reviews. A transfer learning-based model ELMo is proposed for this purpose which is based on the word vector feature representation technique. The performance of the proposed model is compared with three other transfer learning and five machine learning models. The dataset is scrapped from the Google Play store which extracts the data from 14 different categories of apps. First, biased and unbiased user rating is segregated using TextBlob analysis to formulate the ground truth, and then classifiers prediction accuracy is evaluated. Results demonstrate that the ELMo classifier has a high potential to predict authentic numeric ratings with user actual reviews.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    准确和快速区分长链非编码RNA(lncRNAs)和转录本是探索其生物学功能的先决条件。近年来,已经开发了许多计算方法来从转录本预测lncRNAs,但是对这些计算方法没有系统的评价。在这次审查中,我们介绍了计算预测模型开发中涉及的数据库和功能,随后总结了现有的最先进的计算方法,包括基于二进制分类器的方法,深度学习和集成学习。然而,采用现有的最先进的计算方法的用户友好的方式是在需求。因此,我们开发了一个Python包ezLncPred,它提供了一个实用的命令行实现来利用九种最先进的lncRNA预测方法。最后,我们讨论了lncRNA预测的挑战和未来的方向。
    Accurately and rapidly distinguishing long noncoding RNAs (lncRNAs) from transcripts is prerequisite for exploring their biological functions. In recent years, many computational methods have been developed to predict lncRNAs from transcripts, but there is no systematic review on these computational methods. In this review, we introduce databases and features involved in the development of computational prediction models, and subsequently summarize existing state-of-the-art computational methods, including methods based on binary classifiers, deep learning and ensemble learning. However, a user-friendly way of employing existing state-of-the-art computational methods is in demand. Therefore, we develop a Python package ezLncPred, which provides a pragmatic command line implementation to utilize nine state-of-the-art lncRNA prediction methods. Finally, we discuss challenges of lncRNA prediction and future directions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    抗癌肽(ACP)已知为癌症的潜在治疗剂。由于它们独特的靶向癌细胞而不直接影响健康细胞的能力,他们已经被广泛研究。目前在临床前和临床试验中评估了许多基于肽的药物。近年来,ACP的准确识别受到了相当大的关注;因此,已经开发了许多基于机器学习的ACP计算机识别方法。这些方法在一定程度上促进了对ACPs抗癌治疗机制的研究。这些方法在训练/测试数据集上有很大的不同,机器学习算法,特征编码方案,使用的特征选择方法和评估策略。因此,希望总结现有方法的优缺点,为表征和识别ACP的新型计算工具的开发和改进提供有用的见解和建议。考虑到这一点,我们首先根据ACP的核心算法全面研究了16种最先进的预测因子,特征编码方案,性能评估指标和网络服务器/软件可用性。然后,进行综合性能评估,以评估健壮性和可扩展性的现有预测使用一个精心准备的基准数据集。我们提供了模型性能改进的潜在策略。此外,我们提出了一个新的集成学习框架,被称为ACPredStackL,用于准确识别ACP。ACPredStackL是基于堆叠集成策略结合SVM开发的,朴素贝叶斯,lightGBM和KNN。针对最新方法的经验基准测试实验表明,ACPredStackL在预测ACP方面具有比较性能。ACPredStackL的Web服务器和源代码可在http://bigdata上免费获得。Biocie.cn/ACPredStackL/和https://github.com/liangxiaoq/ACPredStackL,分别。
    Anti-cancer peptides (ACPs) are known as potential therapeutics for cancer. Due to their unique ability to target cancer cells without affecting healthy cells directly, they have been extensively studied. Many peptide-based drugs are currently evaluated in the preclinical and clinical trials. Accurate identification of ACPs has received considerable attention in recent years; as such, a number of machine learning-based methods for in silico identification of ACPs have been developed. These methods promote the research on the mechanism of ACPs therapeutics against cancer to some extent. There is a vast difference in these methods in terms of their training/testing datasets, machine learning algorithms, feature encoding schemes, feature selection methods and evaluation strategies used. Therefore, it is desirable to summarize the advantages and disadvantages of the existing methods, provide useful insights and suggestions for the development and improvement of novel computational tools to characterize and identify ACPs. With this in mind, we firstly comprehensively investigate 16 state-of-the-art predictors for ACPs in terms of their core algorithms, feature encoding schemes, performance evaluation metrics and webserver/software usability. Then, comprehensive performance assessment is conducted to evaluate the robustness and scalability of the existing predictors using a well-prepared benchmark dataset. We provide potential strategies for the model performance improvement. Moreover, we propose a novel ensemble learning framework, termed ACPredStackL, for the accurate identification of ACPs. ACPredStackL is developed based on the stacking ensemble strategy combined with SVM, Naïve Bayesian, lightGBM and KNN. Empirical benchmarking experiments against the state-of-the-art methods demonstrate that ACPredStackL achieves a comparative performance for predicting ACPs. The webserver and source code of ACPredStackL is freely available at http://bigdata.biocie.cn/ACPredStackL/ and https://github.com/liangxiaoq/ACPredStackL, respectively.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    The process of segmenting tumor from MRI image of a brain is one of the highly focused areas in the community of medical science as MRI is noninvasive imaging. This paper discusses a thorough literature review of recent methods of brain tumor segmentation from brain MRI images. It includes the performance and quantitative analysis of state-of-the-art methods. Different methods of image segmentation are briefly explained with the recent contribution of various researchers. Here, an effort is made to open new dimensions for readers to explore the concerned area of research. Through the entire review process, it has been observed that the combination of Conditional Random Field (CRF) with Fully Convolutional Neural Network (FCNN) and CRF with DeepMedic or Ensemble are more effective for the segmentation of tumor from the brain MRI images.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    人类基因组的大部分由被称为垃圾DNA的非编码区组成。然而,最近的研究表明,这些区域含有顺式调控元件,如发起人,增强器,消音器,绝缘子,等。这些调控元件可以在特定细胞类型的基因表达控制中发挥关键作用。条件,和发展阶段。对这些区域的破坏可能导致表型改变。精确识别调控元件是破译转录调控机制的关键。顺式调节事件是涉及染色质可及性的复杂过程,转录因子结合,DNA甲基化,组蛋白修饰,以及它们之间的相互作用。下一代测序技术的发展使我们能够深入捕获这些基因组特征。用于临床遗传学的基因组序列的应用分析增加了检测这些区域的紧迫性。然而,顺式调控事件的复杂性和大量的测序数据需要准确有效的计算方法,特别是,机器学习技术。在这次审查中,我们描述了预测转录因子结合位点的机器学习方法,增强器,和发起人,主要由下一代测序数据驱动。提供数据源以便于新方法的测试。此次审查的目的是吸引计算专家和数据科学家来推进这一领域。
    The majority of the human genome consists of non-coding regions that have been called junk DNA. However, recent studies have unveiled that these regions contain cis-regulatory elements, such as promoters, enhancers, silencers, insulators, etc. These regulatory elements can play crucial roles in controlling gene expressions in specific cell types, conditions, and developmental stages. Disruption to these regions could contribute to phenotype changes. Precisely identifying regulatory elements is key to deciphering the mechanisms underlying transcriptional regulation. Cis-regulatory events are complex processes that involve chromatin accessibility, transcription factor binding, DNA methylation, histone modifications, and the interactions between them. The development of next-generation sequencing techniques has allowed us to capture these genomic features in depth. Applied analysis of genome sequences for clinical genetics has increased the urgency for detecting these regions. However, the complexity of cis-regulatory events and the deluge of sequencing data require accurate and efficient computational approaches, in particular, machine learning techniques. In this review, we describe machine learning approaches for predicting transcription factor binding sites, enhancers, and promoters, primarily driven by next-generation sequencing data. Data sources are provided in order to facilitate testing of novel methods. The purpose of this review is to attract computational experts and data scientists to advance this field.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

公众号