Ensemble

合奏
  • 文章类型: Journal Article
    表观遗传修饰,特别是RNA甲基化和组蛋白改变,在遗传中起着至关重要的作用,发展,和疾病。其中,RNA5-甲基胞嘧啶(m5C)是哺乳动物细胞中最普遍的RNA修饰,对于核糖体合成等过程至关重要,平移保真度,mRNA核输出,营业额,和翻译。核苷酸序列数量的增加导致了m5C位点预测的基于机器学习的预测因子的发展。然而,由于外部验证不足,这些预测因子经常面临与训练数据限制和过拟合相关的挑战.本研究介绍了m5C-Seq,RNA修饰谱分析的集成学习方法,旨在解决这些问题。m5C-Seq采用了一个元分类器,该分类器集成了从一个新的生成的15个概率,大型数据集使用系统的编码方法进行最终预测。与现有预测因子相比,表现出卓越的性能,m5C-Seq代表了精确RNA修饰谱的显著进步。代码和新建立的数据集可通过GitHub在https://github.com/Z-Abbas/m5C-Seq获得。
    Epigenetic modifications, particularly RNA methylation and histone alterations, play a crucial role in heredity, development, and disease. Among these, RNA 5-methylcytosine (m5C) is the most prevalent RNA modification in mammalian cells, essential for processes such as ribosome synthesis, translational fidelity, mRNA nuclear export, turnover, and translation. The increasing volume of nucleotide sequences has led to the development of machine learning-based predictors for m5C site prediction. However, these predictors often face challenges related to training data limitations and overfitting due to insufficient external validation. This study introduces m5C-Seq, an ensemble learning approach for RNA modification profiling, designed to address these issues. m5C-Seq employs a meta-classifier that integrates 15 probabilities generated from a novel, large dataset using systematic encoding methods to make final predictions. Demonstrating superior performance compared to existing predictors, m5C-Seq represents a significant advancement in accurate RNA modification profiling. The code and the newly established datasets are made available through GitHub at https://github.com/Z-Abbas/m5C-Seq.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    近几十年来,天然和人造胶体,以及纳米粒子,越来越多地用于各种应用中。因此,随着消费的增长,表面和地下环境更暴露于这些颗粒。这些颗粒的存在和微生物的胶体促进运输,多孔介质中溶解的污染物和流动胶体之间的相互作用,以及胶体通过地下水的命运和运输-人类社会的主要水源之一-吸引了广泛的研究。本研究调查了几种图像处理方法在胶体检测领域的性能,这是多孔介质研究后续步骤的先决条件。我们在基于显微图像分割的方法上采用了四种不同类别的图像处理方法,基于背景检测的方法,基于过滤器的方法,和基于形态学的方法-进行胶体的检测过程。应用了八种方法,随后分析了它们的缺点和优点,以确定该领域中最好的方法。最后,我们提出了一种集成方法,利用三种最佳方法的优势,使用多数投票来更准确地检测胶体。在实验中,Precision,回想一下,F-measure,和TCR标准被视为评估工具。实验结果表明,图像处理方法在识别胶体时具有很高的准确性。在所有这些方法中,基于形态学的方法是最成功的,实现最佳检测性能,改善小胶体的有限区分特征。此外,我们的合奏方法,在所有评估标准中获得完美分数,突出了其与其他检测方法相比的优越性。
    Over recent decades, natural and artificial colloids, as well as nanoparticles, have been increasingly used in various applications. Consequently, with this rising consumption, surface and subsurface environments are more exposed to these particles. The presence of these particles and the colloid-facilitated transport of microorganisms, the interactions between dissolved contaminants and mobile colloids in porous media, and the fate and transport of colloids through groundwater-one of the primary sources of water supply for human societies-have attracted extensive research. This study investigates the performance of several image processing methods in the field of colloid detection, which is a prerequisite for the subsequent steps in porous media research. We employed four different categories of image processing approaches on microscopy images-segmentation-based methods, background-detection-based methods, filter-based methods, and morphology-based methods-to conduct the detection process of colloids. Eight methods were applied and subsequently analyzed in terms of their drawbacks and advantages to determine the best ones in this domain. Finally, we proposed an ensemble approach that leverages the strengths of the three best methods using a majority vote to detect colloids more accurately. In experiments, Precision, Recall, F-measure, and TCR criteria were considered as evaluation tools. Experimental results demonstrate the high accuracy of image processing methods in recognizing colloids. Among all these methods, morphology-based methods were the most successful, achieving the best detection performance and improving the limited distinguishing features of small colloids. Moreover, our ensemble approach, achieving perfect scores across all evaluation criteria, highlights its superiority compared with other detection methods.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    复杂系统由于其固有的不可预测性和对简化的抵抗力,对传统的科学和统计方法提出了重大挑战。因此,准确检测复杂的行为和随之而来的不确定性至关重要。利用以往研究的背景,我们引入了一种新的信息论方法,称为“不连贯”。通过在一系列结果中使用经过调整的Jensen-ShannonDivergence,我们量化了系统的任意不确定性。首先,我们使用连续和离散数据将此度量与已建立的统计检验进行了比较。在演示如何将不一致性应用于识别复杂系统的关键特征之前,包括对初始条件的敏感性,关键性,和对扰动的反应。
    Complex systems pose significant challenges to traditional scientific and statistical methods due to their inherent unpredictability and resistance to simplification. Accurately detecting complex behavior and the uncertainty which comes with it is therefore essential. Using the context of previous studies, we introduce a new information-theoretic measure, termed \"incoherence\". By using an adapted Jensen-Shannon Divergence across an ensemble of outcomes, we quantify the aleatoric uncertainty of the system. First we compared this measure to established statistical tests using both continuous and discrete data. Before demonstrating how incoherence can be applied to identify key characteristics of complex systems, including sensitivity to initial conditions, criticality, and response to perturbations.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:癌症是复杂的多遗传疾病,应在多靶标药物发现方案中加以解决。计算方法对于加速多靶点抗癌剂的发现非常重要。这里,我们采用了一种基于配体的方法,通过将多层感知器网络(PTML-EL-MLP)与基于片段的拓扑设计(FBTD)方法相结合的微扰理论机器学习模型来合理地设计和预测针对癌症相关蛋白的三重靶标抑制剂,称为原肌球蛋白受体激酶A(TRKA),聚[ADP-核糖]聚合酶1(PARP-1),和胰岛素样生长因子1受体(IGF1R)。
    方法:我们从ChEMBL中提取了化学和生物学数据。我们应用了Box-Jenkins方法来生成多标签拓扑索引,随后创建了PTML-EL-MLP模型。
    结果:我们的PTML-EL-MLP模型的准确率约为80%。FBTD的应用允许PTML-EL-MLP模型的物理化学和结构解释,因此,a)化学驱动的不同分子片段的分析,对多靶标活性有积极影响,b)使用这些有利的片段作为构建模块,以虚拟地设计四种新的药物样分子。设计的分子被预测为针对上述癌症相关蛋白的三重靶标抑制剂。
    结论:我们的研究设想了将PTML建模与FBTD相结合的能力,以产生新的化学多样性,用于肿瘤学研究及其他领域的多靶标药物发现。
    BACKGROUND: Cancers are complex multi-genetic diseases that should be tackled in multi-target drug discovery scenarios. Computational methods are of great importance to accelerate the discovery of multi-target anticancer agents. Here, we employed a ligand-based approach by combining a perturbation-theory machine learning model derived from an ensemble of multilayer perceptron networks (PTML-EL-MLP) with the Fragment-Based Topological Design (FBTD) approach to rationally design and predict triple-target inhibitors against the cancerrelated proteins named Tropomyosin Receptor Kinase A (TRKA), poly[ADP-ribose] polymerase 1 (PARP-1), and Insulin-like Growth Factor 1 Receptor (IGF1R).
    METHODS: We extracted the chemical and biological data from ChEMBL. We applied the Box- Jenkins approach to generate multi-label topological indices and subsequently created the PTML-EL-MLP model.
    RESULTS: Our PTML-EL-MLP model exhibited an accuracy of around 80%. The application FBTD permitted the physicochemical and structural interpretation of the PTML-EL-MLP model, thus enabling a) the chemistry-driven analysis of different molecular fragments with a positive influence on the multi-target activity and b) the use of those favorable fragments as building blocks to virtually design four new drug-like molecules. The designed molecules were predicted as triple-target inhibitors against the aforementioned cancer-related proteins.
    CONCLUSIONS: Our study envisages the capabilities of combining PTML modeling with FBTD for the generation of new chemical diversity for multi-target drug discovery in oncology research and beyond.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:COVID-19(PASC)的后遗症,也被称为长科维德,是急性COVID-19后一系列长期症状的广泛分组。这些症状可能发生在一系列生物系统中,导致在确定PASC的危险因素和该疾病的病因方面面临挑战。对预测未来PASC的特征的理解是有价值的,因为这可以为识别高风险个体和未来的预防工作提供信息。然而,目前有关PASC危险因素的知识有限。
    目的:使用来自国家COVID队列合作组织的55,257名患者(其中1名PASC患者与4名匹配对照)的样本,作为美国国立卫生研究院长期COVID计算挑战的一部分,我们试图从一组经筛选的临床知情协变量中预测PASC诊断的个体风险.国家COVID队列合作组织包括来自美国84个地点的2200多万患者的电子健康记录。
    方法:我们预测了个体PASC状态,给定协变量信息,使用SuperLearner(一种集成机器学习算法,也称为堆叠)来学习梯度提升和随机森林算法的最优组合,以最大化接收器算子曲线下的面积。我们基于3个级别评估了变量重要性(Shapley值):个体特征,时间窗口,和临床领域。我们使用一组随机选择的研究地点从外部验证了这些发现。
    结果:我们能够准确预测个体PASC诊断(曲线下面积0.874)。观察期长度的个体特征,急性COVID-19和病毒性下呼吸道感染期间卫生保健相互作用的数量对随后的PASC诊断最具预测性.暂时,我们发现基线特征是未来PASC诊断的最具预测性的,与之前的特征相比,during,或急性COVID-19后。我们发现医疗保健使用的临床领域,人口统计学或人体测量学,和呼吸因素是PASC诊断的最具预测性的因素。
    结论:这里概述的方法提供了一个开放源代码,使用超级学习者使用电子健康记录数据预测PASC状态的应用示例,可以在各种设置中复制。在个体预测因子和临床领域,我们一致发现,与医疗保健使用相关的因素是PASC诊断的最强预测因子.这表明,任何使用PASC诊断作为主要结果的观察性研究都必须严格考虑异质医疗保健的使用。我们的研究结果支持以下假设:临床医生可能能够在急性COVID-19诊断之前准确评估患者的PASC风险,这可以改善早期干预和预防性护理。我们的发现还强调了呼吸特征在PASC风险评估中的重要性。
    RR2-10.1101/2023.07.27.23293272。
    Postacute sequelae of COVID-19 (PASC), also known as long COVID, is a broad grouping of a range of long-term symptoms following acute COVID-19. These symptoms can occur across a range of biological systems, leading to challenges in determining risk factors for PASC and the causal etiology of this disorder. An understanding of characteristics that are predictive of future PASC is valuable, as this can inform the identification of high-risk individuals and future preventative efforts. However, current knowledge regarding PASC risk factors is limited.
    Using a sample of 55,257 patients (at a ratio of 1 patient with PASC to 4 matched controls) from the National COVID Cohort Collaborative, as part of the National Institutes of Health Long COVID Computational Challenge, we sought to predict individual risk of PASC diagnosis from a curated set of clinically informed covariates. The National COVID Cohort Collaborative includes electronic health records for more than 22 million patients from 84 sites across the United States.
    We predicted individual PASC status, given covariate information, using Super Learner (an ensemble machine learning algorithm also known as stacking) to learn the optimal combination of gradient boosting and random forest algorithms to maximize the area under the receiver operator curve. We evaluated variable importance (Shapley values) based on 3 levels: individual features, temporal windows, and clinical domains. We externally validated these findings using a holdout set of randomly selected study sites.
    We were able to predict individual PASC diagnoses accurately (area under the curve 0.874). The individual features of the length of observation period, number of health care interactions during acute COVID-19, and viral lower respiratory infection were the most predictive of subsequent PASC diagnosis. Temporally, we found that baseline characteristics were the most predictive of future PASC diagnosis, compared with characteristics immediately before, during, or after acute COVID-19. We found that the clinical domains of health care use, demographics or anthropometry, and respiratory factors were the most predictive of PASC diagnosis.
    The methods outlined here provide an open-source, applied example of using Super Learner to predict PASC status using electronic health record data, which can be replicated across a variety of settings. Across individual predictors and clinical domains, we consistently found that factors related to health care use were the strongest predictors of PASC diagnosis. This indicates that any observational studies using PASC diagnosis as a primary outcome must rigorously account for heterogeneous health care use. Our temporal findings support the hypothesis that clinicians may be able to accurately assess the risk of PASC in patients before acute COVID-19 diagnosis, which could improve early interventions and preventive care. Our findings also highlight the importance of respiratory characteristics in PASC risk assessment.
    RR2-10.1101/2023.07.27.23293272.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    突触连接定义了在特定功能任务期间参与相关活动的神经元组。这些神经元的协同群形成集合,参与的作战单位,例如,感官知觉,运动协调和记忆(然后称为全写)。传统上,集合形成被认为是通过长期增强(LTP)作为可塑性机制来加强突触连接而发生的。这种突触记忆理论源于Hebb制定的学习规则,与许多实验观察结果一致。这里,我们提议,作为替代,神经元的内在兴奋性和可塑性构成了第二个,非突触机制,这对合奏的初始形成可能很重要。的确,在行为学习之后,在多个大脑区域广泛观察到增强的神经兴奋性。在皮质结构和杏仁核中,兴奋性变化通常被报告为短暂的,即使它们可以持续几十分钟到几天。也许正是出于这个原因,它们传统上被认为是调制的,仅通过促进LTP诱导来支持集合形成,没有进一步参与记忆功能(记忆分配假设)。我们在这里建议-基于两条线的证据-除了调节LTP分配,增强的兴奋性在学习中起着更根本的作用。首先,增强的兴奋性构成了活跃合奏的标志,由于它,在没有突触可塑性的情况下,亚阈值突触连接变为超阈值(冰山模型)。第二,增强的兴奋性促进树突状电位向体细胞的传播,并允许增强EPSP振幅(LTP)与尖峰输出的耦合(从而增强整体参与)。这个许可门模型描述了永久增加兴奋性的需求,这似乎与它作为一种短暂机制的传统考虑相矛盾。我们建议通过低阈值的内在可塑性诱导,可以对兴奋性进行更长的修改。这表明兴奋性可能会在短时间间隔内进行开/关调节。与此一致,在小脑浦肯野细胞中,兴奋性持续几天到几周,这表明在某些电路中,该现象的持续时间首先不是限制因素。在我们的模型中,突触可塑性定义了神经元通过嵌入的连接网络接收的信息内容。然而,细胞自主兴奋性的可塑性可以动态调节单个神经元的集合参与以及集合的整体活动状态。
    Synaptic connectivity defines groups of neurons that engage in correlated activity during specific functional tasks. These co-active groups of neurons form ensembles, the operational units involved in, for example, sensory perception, motor coordination and memory (then called an engram). Traditionally, ensemble formation has been thought to occur via strengthening of synaptic connections via long-term potentiation (LTP) as a plasticity mechanism. This synaptic theory of memory arises from the learning rules formulated by Hebb and is consistent with many experimental observations. Here, we propose, as an alternative, that the intrinsic excitability of neurons and its plasticity constitute a second, non-synaptic mechanism that could be important for the initial formation of ensembles. Indeed, enhanced neural excitability is widely observed in multiple brain areas subsequent to behavioral learning. In cortical structures and the amygdala, excitability changes are often reported as transient, even though they can last tens of minutes to a few days. Perhaps it is for this reason that they have been traditionally considered as modulatory, merely supporting ensemble formation by facilitating LTP induction, without further involvement in memory function (memory allocation hypothesis). We here suggest-based on two lines of evidence-that beyond modulating LTP allocation, enhanced excitability plays a more fundamental role in learning. First, enhanced excitability constitutes a signature of active ensembles and, due to it, subthreshold synaptic connections become suprathreshold in the absence of synaptic plasticity (iceberg model). Second, enhanced excitability promotes the propagation of dendritic potentials toward the soma and allows for enhanced coupling of EPSP amplitude (LTP) to the spike output (and thus ensemble participation). This permissive gate model describes a need for permanently increased excitability, which seems at odds with its traditional consideration as a short-lived mechanism. We propose that longer modifications in excitability are made possible by a low threshold for intrinsic plasticity induction, suggesting that excitability might be on/off-modulated at short intervals. Consistent with this, in cerebellar Purkinje cells, excitability lasts days to weeks, which shows that in some circuits the duration of the phenomenon is not a limiting factor in the first place. In our model, synaptic plasticity defines the information content received by neurons through the connectivity network that they are embedded in. However, the plasticity of cell-autonomous excitability could dynamically regulate the ensemble participation of individual neurons as well as the overall activity state of an ensemble.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    极端天气事件,比如那些与风和降水有关的,导致每年数十亿欧元的损失。虽然在次大陆尺度上已经发现了由于全球变暖导致的极端降水的变化,它们复杂的特点使它们在更大的区域范围内成为评估的挑战。由于对全球变暖的动态响应变化显示出高度的不确定性,极端风提出了更大的挑战。这种情况因局部尺度与地形的相互作用而变得复杂,城市,陆海对比,等。此处提供的数据集试图解决这些挑战,并提供可以对极端风和降水(最多五天降水)进行可靠评估的信息。我们通过利用高分辨率(12公里)EURO-CORDEX模拟的大型集成(52名成员)来实现这一目标。数据集将是有价值的,不仅是科学界,但也包括公众中的从业者(例如,市政规划师,政府机构)和私营部门(例如,保险公司和再保险公司)。
    Extreme weather events, such as those associated with winds and precipitation, result in billions of euros in damages annually. While changes in extreme precipitation due to global warming have already been detected at sub-continental scales, their complex characteristics make them a challenges to asses at more regional scales. Extreme winds present an even greater challenge as the varying dynamical response to global warming exhibits high levels of uncertainty. This situation is complicated by local scale interactions with orography, cities, land-sea contrasts, etc. The dataset presented here attempts to address these challenges and provide information that will allow robust assessment of extreme winds and precipitation (maximum five day precipitation). We achieve this by leveraging a large ensemble (52 members) of high resolution (12 km) EURO-CORDEX simulations. The dataset will be of value, not only to the scientific community, but also practitioners in the public (e.g., municipal planners, government agencies) and private sectors (e.g., insurers and reinsurers).
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    分析基因组序列在理解生物多样性和分类竹子物种中起着至关重要的作用。用于基因组序列分析的现有方法受到诸如复杂性、精度低,以及需要不断重新配置以响应不断发展的基因组数据集。
    这项研究通过引入一种新颖的基于双启发式特征选择的集成分类模型(DHFS-ECM)来解决这些限制,以从基因组序列中精确识别竹子物种。
    所提出的DHFS-ECM方法采用遗传算法来执行双启发式特征选择。这个过程最大化了类间方差,导致选择信息N-gram特征集。随后,类内方差水平用于创建最佳训练集和验证集,确保全面覆盖特定类别的功能。然后通过集成分类层处理选定的特征,结合多个分层模型进行特定物种分类。
    与最新方法的比较分析表明,DHFS-ECM在准确性方面取得了显着提高(9.5%),精度(5.9%),召回(8.5%),和AUC表现(4.5%)。重要的是,由于双重启发式遗传算法模型促进的连续学习,该模型即使在物种类别数量增加的情况下也能保持其性能。
    DHFS-ECM提供了几个关键优势,包括高效的特征提取,降低模型复杂性,增强的可解释性,并通过集成分类层增加了鲁棒性和准确性。这些属性使DHFS-ECM成为实时临床应用的有前途的工具,并对基因组序列分析领域做出了有价值的贡献。
    UNASSIGNED: Analyzing genomic sequences plays a crucial role in understanding biological diversity and classifying Bamboo species. Existing methods for genomic sequence analysis suffer from limitations such as complexity, low accuracy, and the need for constant reconfiguration in response to evolving genomic datasets.
    UNASSIGNED: This study addresses these limitations by introducing a novel Dual Heuristic Feature Selection-based Ensemble Classification Model (DHFS-ECM) for the precise identification of Bamboo species from genomic sequences.
    UNASSIGNED: The proposed DHFS-ECM method employs a Genetic Algorithm to perform dual heuristic feature selection. This process maximizes inter-class variance, leading to the selection of informative N-gram feature sets. Subsequently, intra-class variance levels are used to create optimal training and validation sets, ensuring comprehensive coverage of class-specific features. The selected features are then processed through an ensemble classification layer, combining multiple stratification models for species-specific categorization.
    UNASSIGNED: Comparative analysis with state-of-the-art methods demonstrate that DHFS-ECM achieves remarkable improvements in accuracy (9.5%), precision (5.9%), recall (8.5%), and AUC performance (4.5%). Importantly, the model maintains its performance even with an increased number of species classes due to the continuous learning facilitated by the Dual Heuristic Genetic Algorithm Model.
    UNASSIGNED: DHFS-ECM offers several key advantages, including efficient feature extraction, reduced model complexity, enhanced interpretability, and increased robustness and accuracy through the ensemble classification layer. These attributes make DHFS-ECM a promising tool for real-time clinical applications and a valuable contribution to the field of genomic sequence analysis.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在2022-2023年前所未有的水痘流行期间,近实时的短期预测疫情的轨迹是至关重要的干预实施和指导政策。然而,随着案件数量大幅下降,评估模型性能对于推进疫情预测领域至关重要。使用来自疾病控制和预防中心和我们的世界数据团队的实验室确认的水痘病例数据,我们生成了巴西的回顾性连续每周预测,加拿大,法国,德国,西班牙,联合王国,美国和全球范围内使用自回归综合移动平均(ARIMA)模型,广义加法模型,简单线性回归,Facebook的先知模式,以及子流行病波和n子流行病建模框架。我们使用平均均方误差评估预测性能,平均绝对误差,加权区间分数,95%预测区间覆盖率,技能分数和温克勒分数。总的来说,在大多数地点和预测范围内,n-sub流行病建模框架胜过其他模型,未加权的合奏模型表现最频繁。相对于所有性能指标的ARIMA模型(大于10%),n-sub流行病和空间波框架在平均预测性能上有了显着提高。调查结果进一步支持用于短期预测新出现和重新出现的传染病流行的次流行框架。
    During the 2022-2023 unprecedented mpox epidemic, near real-time short-term forecasts of the epidemic\'s trajectory were essential in intervention implementation and guiding policy. However, as case levels have significantly decreased, evaluating model performance is vital to advancing the field of epidemic forecasting. Using laboratory-confirmed mpox case data from the Centers for Disease Control and Prevention and Our World in Data teams, we generated retrospective sequential weekly forecasts for Brazil, Canada, France, Germany, Spain, the United Kingdom, the United States and at the global scale using an auto-regressive integrated moving average (ARIMA) model, generalized additive model, simple linear regression, Facebook\'s Prophet model, as well as the sub-epidemic wave and n-sub-epidemic modelling frameworks. We assessed forecast performance using average mean squared error, mean absolute error, weighted interval scores, 95% prediction interval coverage, skill scores and Winkler scores. Overall, the n-sub-epidemic modelling framework outcompeted other models across most locations and forecasting horizons, with the unweighted ensemble model performing best most frequently. The n-sub-epidemic and spatial-wave frameworks considerably improved in average forecasting performance relative to the ARIMA model (greater than 10%) for all performance metrics. Findings further support sub-epidemic frameworks for short-term forecasting epidemics of emerging and re-emerging infectious diseases.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    尽管在基于深度学习的表面缺陷检测方面取得了许多成功,该行业在进行包装缺陷检查方面仍然面临挑战,这些检查包括成分清单等关键信息。特别是,虽然以前的成就主要集中在高质量图像中的缺陷检查,他们不考虑在低质量图像(如包含图像模糊的图像)中进行缺陷检查。为了解决这个问题,我们提出了一种高尚的推理技术,称为时间质量集成(TQE),它结合了时间和质量权重。时间加权通过考虑与观察图像相关的定时来向输入图像分配权重。质量权重优先考虑高质量图像,以确保推理过程强调清晰可靠的输入图像。这两个权重提高了低质量图像推断过程的准确性和可靠性。此外,为了通过实验评估TQE的一般适用性,我们采用了广泛使用的卷积神经网络(CNN),如ResNet-34、EfficientNet、ECAEfficientNet,GoogLeNet,和ShuffleNetV2作为骨干网络。总之,考虑到至少包括一个低质量图像的情况,TQE的F1得分比使用单个CNN模型高约17.64%至22.41%,比平均投票集合高约1.86%至2.06%。
    Despite achieving numerous successes with surface defect inspection based on deep learning, the industry still faces challenges in conducting packaging defect inspections that include critical information such as ingredient lists. In particular, while previous achievements primarily focus on defect inspection in high-quality images, they do not consider defect inspection in low-quality images such as those containing image blur. To address this issue, we proposed a noble inference technique named temporal-quality ensemble (TQE), which combines temporal and quality weights. Temporal weighting assigns weights to input images by considering the timing in relation to the observed image. Quality weight prioritizes high-quality images to ensure the inference process emphasizes clear and reliable input images. These two weights improve both the accuracy and reliability of the inference process of low-quality images. In addition, to experimentally evaluate the general applicability of TQE, we adopt widely used convolutional neural networks (CNNs) such as ResNet-34, EfficientNet, ECAEfficientNet, GoogLeNet, and ShuffleNetV2 as the backbone network. In conclusion, considering cases where at least one low-quality image is included, TQE has an F1-score approximately 17.64% to 22.41% higher than using single CNN models and about 1.86% to 2.06% higher than an average voting ensemble.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号