attention mechanism

注意机制
  • 文章类型: Journal Article
    器官分割是各种医学成像应用中的关键任务。已经开发了许多深度学习模型来做到这一点,但是速度很慢,需要大量的计算资源。为了解决这个问题,使用注意机制,可以在医学图像中定位重要的感兴趣对象,允许模型准确地分割它们,即使有噪声或伪影。通过关注特定的解剖区域,该模型在分割方面变得更好。医学图像在解剖信息形式上具有独特的特征,这使得它们与自然图像不同。不幸的是,大多数深度学习方法要么忽略这些信息,要么没有有效和明确地使用它。将自然智能与人工智能相结合,被称为混合智能,在医学图像分割中显示出了有希望的结果,使模型更健壮,并能够在具有挑战性的情况下表现良好。在本文中,我们提出了几种方法和模型,通过非深度学习方法在医学图像中找到基于深度学习的分割区域。我们开发了这些模型,并使用混合智能概念对其进行了训练。为了评估他们的表现,我们在独特的测试数据上测试了模型,并分析了包括假阴性商和假阳性商在内的指标.我们的发现表明,可以明确地学习对象形状和布局变化,以创建适用于每个解剖对象的计算模型。这项工作为医学图像分割和分析的进步开辟了新的可能性。
    Organ segmentation is a crucial task in various medical imaging applications. Many deep learning models have been developed to do this, but they are slow and require a lot of computational resources. To solve this problem, attention mechanisms are used which can locate important objects of interest within medical images, allowing the model to segment them accurately even when there is noise or artifact. By paying attention to specific anatomical regions, the model becomes better at segmentation. Medical images have unique features in the form of anatomical information, which makes them different from natural images. Unfortunately, most deep learning methods either ignore this information or do not use it effectively and explicitly. Combined natural intelligence with artificial intelligence, known as hybrid intelligence, has shown promising results in medical image segmentation, making models more robust and able to perform well in challenging situations. In this paper, we propose several methods and models to find attention regions in medical images for deep learning-based segmentation via non-deep-learning methods. We developed these models and trained them using hybrid intelligence concepts. To evaluate their performance, we tested the models on unique test data and analyzed metrics including false negatives quotient and false positives quotient. Our findings demonstrate that object shape and layout variations can be explicitly learned to create computational models that are suitable for each anatomic object. This work opens new possibilities for advancements in medical image segmentation and analysis.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    隧道衬砌结构中的裂缝是一个常见而严重的问题,危及交通安全和隧道的耐久性。衬砌接缝和裂缝在强度和形态特征方面的相似性使得隧道衬砌结构裂缝的检测具有挑战性。为了解决这个问题,提出了一种新的基于深度学习的隧道衬砌结构裂缝检测方法。首先,针对衬砌接缝的形态特征引入了一种改进的注意力机制,它不仅聚集了全球空间信息,而且还聚集了两个维度的特征,高度和宽度,来挖掘更多的远距离特征信息。此外,提出了一种利用四个不同方向的条带卷积的混合条带卷积模块。该模块从各个角度捕获远程上下文信息以避免来自背景像素的干扰。为了评估所提出的方法,这两个模块集成到一个U形网络中,并在隧道衬砌裂缝数据集Tunnel200上进行了实验,以及公开可用的裂纹数据集Crack500和DeepCrack。结果表明,该方法优于现有方法,并在这些数据集上取得了优越的性能。
    Cracks in tunnel lining structures constitute a common and serious problem that jeopardizes the safety of traffic and the durability of the tunnel. The similarity between lining seams and cracks in terms of strength and morphological characteristics renders the detection of cracks in tunnel lining structures challenging. To address this issue, a new deep learning-based method for crack detection in tunnel lining structures is proposed. First, an improved attention mechanism is introduced for the morphological features of lining seams, which not only aggregates global spatial information but also features along two dimensions, height and width, to mine more long-distance feature information. Furthermore, a mixed strip convolution module leveraging four different directions of strip convolution is proposed. This module captures remote contextual information from various angles to avoid interference from background pixels. To evaluate the proposed approach, the two modules are integrated into a U-shaped network, and experiments are conducted on Tunnel200, a tunnel lining crack dataset, as well as the publicly available crack datasets Crack500 and DeepCrack. The results show that the approach outperforms existing methods and achieves superior performance on these datasets.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    诸如低对比度和显著的器官形状变化的特征通常在医学图像中表现出来。医学成像中分割性能的提高受到现有注意力机制普遍不足的自适应能力的限制。提出了一种有效的信道先验卷积注意(CPCA)方法,支持注意力权重在渠道和空间维度的动态分布。通过采用多尺度深度卷积模块,在保留信道先验的同时有效地提取空间关系。CCPA具有专注于信息渠道和重要区域的能力。提出了一种基于CPCA的医学图像分割网络CPCANet。CPCANet在两个公开可用的数据集上进行了验证。CPCANet实现了改进的分割性能,同时通过与最先进的算法进行比较,需要更少的计算资源。我们的代码可在https://github.com/Cuthbert-Huang/CPCANet上公开获得。
    Characteristics such as low contrast and significant organ shape variations are often exhibited in medical images. The improvement of segmentation performance in medical imaging is limited by the generally insufficient adaptive capabilities of existing attention mechanisms. An efficient Channel Prior Convolutional Attention (CPCA) method is proposed in this paper, supporting the dynamic distribution of attention weights in both channel and spatial dimensions. Spatial relationships are effectively extracted while preserving the channel prior by employing a multi-scale depth-wise convolutional module. The ability to focus on informative channels and important regions is possessed by CPCA. A segmentation network called CPCANet for medical image segmentation is proposed based on CPCA. CPCANet is validated on two publicly available datasets. Improved segmentation performance is achieved by CPCANet while requiring fewer computational resources through comparisons with state-of-the-art algorithms. Our code is publicly available at https://github.com/Cuthbert-Huang/CPCANet.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    视网膜血管的精确分割对于各种眼部疾病的早期筛查至关重要,如糖尿病性视网膜病变和高血压性视网膜病变。鉴于视网膜血管的整体结构复杂多变,微小的本地特色,精细血管和边缘像素的精确提取仍然是当前研究中的技术挑战。为了增强提取细血管的能力,本文将金字塔通道注意力模块引入到U形网络中。这可以更有效地捕获不同级别的信息,并增加对船只相关渠道的关注,从而提高模型性能。同时,为了防止过度装配,本文利用预激活残差丢弃卷积块对U-Net中的标准卷积块进行了优化,从而提高了模型的泛化能力。该模型在三个基准视网膜数据集上进行评估:DRIVE,CHASE_DB1和STARE。实验结果表明,与基线模型相比,所提出的模型在灵敏度(Sen)分数上提高了7.12%,9.65%,在这三个数据集上为5.36%,分别,证明了其提取精细血管的强大能力。
    The precise segmentation of retinal vasculature is crucial for the early screening of various eye diseases, such as diabetic retinopathy and hypertensive retinopathy. Given the complex and variable overall structure of retinal vessels and their delicate, minute local features, the accurate extraction of fine vessels and edge pixels remains a technical challenge in the current research. To enhance the ability to extract thin vessels, this paper incorporates a pyramid channel attention module into a U-shaped network. This allows for more effective capture of information at different levels and increased attention to vessel-related channels, thereby improving model performance. Simultaneously, to prevent overfitting, this paper optimizes the standard convolutional block in the U-Net with the pre-activated residual discard convolution block, thus improving the model\'s generalization ability. The model is evaluated on three benchmark retinal datasets: DRIVE, CHASE_DB1, and STARE. Experimental results demonstrate that, compared to the baseline model, the proposed model achieves improvements in sensitivity (Sen) scores of 7.12%, 9.65%, and 5.36% on these three datasets, respectively, proving its strong ability to extract fine vessels.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    基于深度学习的联合信源信道编码(JSCC)在图像传输任务中显示出显著的进步。然而,以前的信道自适应JSCC方法通常依赖于当前信道的信噪比(SNR)进行编码,它忽略了神经网络在不同SNR下的自适应能力。本文研究了基于深度学习的JSCC模型对信道动态变化的自适应能力,并介绍了一种新的信道盲JSCC(CBJSCC)方法。CBJSCC利用神经网络的内在学习能力,在不依赖外部SNR信息的情况下,自适应动态信道和多样化的SNR。这种方法是有利的,因为它不受信道估计误差的影响,并且可以应用于一对多的无线通信场景。为了提高JSCC任务的性能,CBJSCC模型采用专门设计的编码器-解码器。实验结果表明,CBJSCC优于现有的依赖于SNR估计和反馈的信道自适应JSCC方法,在加性高斯白噪声环境和慢瑞利衰落信道条件下。通过对模型性能的综合分析,我们进一步验证了该策略在不同应用场景下的鲁棒性和适应性,实验结果为支持这一说法提供了有力的证据。
    Joint source-channel coding (JSCC) based on deep learning has shown significant advancements in image transmission tasks. However, previous channel-adaptive JSCC methods often rely on the signal-to-noise ratio (SNR) of the current channel for encoding, which overlooks the neural network\'s self-adaptive capability across varying SNRs. This paper investigates the self-adaptive capability of deep learning-based JSCC models to dynamically changing channels and introduces a novel method named Channel-Blind JSCC (CBJSCC). CBJSCC leverages the intrinsic learning capability of neural networks to self-adapt to dynamic channels and diverse SNRs without relying on external SNR information. This approach is advantageous, as it is not affected by channel estimation errors and can be applied to one-to-many wireless communication scenarios. To enhance the performance of JSCC tasks, the CBJSCC model employs a specially designed encoder-decoder. Experimental results show that CBJSCC outperforms existing channel-adaptive JSCC methods that depend on SNR estimation and feedback, both in additive white Gaussian noise environments and under slow Rayleigh fading channel conditions. Through a comprehensive analysis of the model\'s performance, we further validate the robustness and adaptability of this strategy across different application scenarios, with the experimental results providing strong evidence to support this claim.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    一种新的算法,Yolov8n-FADS,提出的目的是提高复杂地下环境中矿工头盔检测算法的准确性。通过用注意序列融合(ASF)代替头部,并引入P2检测层,ASF-P2结构能够综合提取图像的全局和局部特征信息,骨架部分的改进能够更有效地捕获空间稀疏分布的特征,这提高了模型感知复杂模式的能力。改进的检测头,SEAMHeadbytheSEAMmodule,可以更有效地处理遮挡。焦损模块可以通过调整正负样本的权重来提高模型检测稀有目标类别的能力。本研究表明,与原始模型相比,改进后的模型有29%的内存压缩,参数数量减少36.7%,检测精度提高了4.9%,能有效提高井下头盔佩戴者的检测精度,减轻井下视频监控人员的工作量,提高监测效率。
    A new algorithm, Yolov8n-FADS, has been proposed with the aim of improving the accuracy of miners\' helmet detection algorithms in complex underground environments. By replacing the head part with Attentional Sequence Fusion (ASF) and introducing the P2 detection layer, the ASF-P2 structure is able to comprehensively extract the global and local feature information of the image, and the improvement in the backbone part is able to capture the spatially sparsely distributed features more efficiently, which improves the model\'s ability to perceive complex patterns. The improved detection head, SEAMHead by the SEAM module, can handle occlusion more effectively. The Focal Loss module can improve the model\'s ability to detect rare target categories by adjusting the weights of positive and negative samples. This study shows that compared with the original model, the improved model has 29% memory compression, a 36.7% reduction in the amount of parameters, and a 4.9% improvement in the detection accuracy, which can effectively improve the detection accuracy of underground helmet wearers, reduce the workload of underground video surveillance personnel, and improve the monitoring efficiency.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    场景文本检测是计算机视觉中的一个重要研究领域,在各种应用场景中发挥着至关重要的作用。然而,现有的场景文本检测方法在面对不同大小的文本实例时往往不能达到满意的效果,形状,复杂的背景。为了应对在自然场景中检测不同文本的挑战,提出了一种基于注意力特征提取和级联特征融合的多尺度自然场景文本检测方法。该方法通过改进的注意力特征融合模块(DSAF)将全局注意力和局部注意力相结合,捕获不同尺度的文本特征,增强网络对文本区域的感知,提高其特征提取能力。同时,改进的级联特征融合模块(PFFM)用于完全集成提取的特征图,扩展了特征的接受领域,丰富了特征图的表达能力。最后,为了解决级联的特征图,引入了轻量级子空间注意模块(SAM),将级联的特征图划分为多个子空间特征图,促进不同尺度特征之间的空间信息交互。在本文中,对ICDAR2015,全文进行了比较实验,和MSRA-TD500数据集,并与现有的一些场景文本检测方法进行了比较。结果表明,该方法在精度方面取得了良好的性能,召回,和F分数,从而验证了其有效性和实用性。
    Scene text detection is an important research field in computer vision, playing a crucial role in various application scenarios. However, existing scene text detection methods often fail to achieve satisfactory results when faced with text instances of different sizes, shapes, and complex backgrounds. To address the challenge of detecting diverse texts in natural scenes, this paper proposes a multi-scale natural scene text detection method based on attention feature extraction and cascaded feature fusion. This method combines global and local attention through an improved attention feature fusion module (DSAF) to capture text features of different scales, enhancing the network\'s perception of text regions and improving its feature extraction capabilities. Simultaneously, an improved cascaded feature fusion module (PFFM) is used to fully integrate the extracted feature maps, expanding the receptive field of features and enriching the expressive ability of the feature maps. Finally, to address the cascaded feature maps, a lightweight subspace attention module (SAM) is introduced to partition the concatenated feature maps into several sub-space feature maps, facilitating spatial information interaction among features of different scales. In this paper, comparative experiments are conducted on the ICDAR2015, Total-Text, and MSRA-TD500 datasets, and comparisons are made with some existing scene text detection methods. The results show that the proposed method achieves good performance in terms of accuracy, recall, and F-score, thus verifying its effectiveness and practicality.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    媒介传播的柑橘绿化(CG)病,也叫黄龙兵,是柑橘最具破坏性的疾病之一。由于目前尚无直接控制这种疾病的措施,当前的疾病管理整合了几项措施,如矢量控制,使用无病树,去除患病的树木,等。综合管理中最重要的问题是如何有效地检测受CG感染的树木。对于CG检测,使用深度学习算法的数字图像分析吸引了研究人员和种植者的极大兴趣。构建了具有更快R-CNN架构的使用迁移学习的模型,并将其与两个预训练的卷积神经网络(CNN)模型进行了比较。VGGNet和ResNet。通过将其特征提取功能集成到卷积块注意力模块(CBAM)中以创建VGGNetCBAM和ResNetCBAM变体来检查其效率。ResNet模型表现最好。此外,CBAM的集成显著提高了CG疾病检测精度和模型的整体性能。使用更快的R-CNN进行迁移学习的高效模型被加载到Web应用程序上,以方便农民通过部署现场图像进行实时诊断。讨论了检测CG疾病的应用的实际能力。
    The vector-transmitted Citrus Greening (CG) disease, also called Huanglongbing, is one of the most destructive diseases of citrus. Since no measures for directly controlling this disease are available at present, current disease management integrates several measures, such as vector control, the use of disease-free trees, the removal of diseased trees, etc. The most essential issue in integrated management is how CG-infected trees can be detected efficiently. For CG detection, digital image analyses using deep learning algorithms have attracted much interest from both researchers and growers. Models using transfer learning with the Faster R-CNN architecture were constructed and compared with two pre-trained Convolutional Neural Network (CNN) models, VGGNet and ResNet. Their efficiency was examined by integrating their feature extraction capabilities into the Convolution Block Attention Module (CBAM) to create VGGNet+CBAM and ResNet+CBAM variants. ResNet models performed best. Moreover, the integration of CBAM notably improved CG disease detection precision and the overall performance of the models. Efficient models with transfer learning using Faster R-CNN were loaded on web applications to facilitate access for real-time diagnosis by farmers via the deployment of in-field images. The practical ability of the applications to detect CG disease is discussed.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    总挥发性碱性氮(TVB-N)和总活菌数(TVC)是肉类重要的新鲜度指标。高光谱成像与化学计量学相结合已被证明在肉类检测中是有效的。然而,化学计量学的一个挑战是缺乏普遍适用的处理组合,需要使用不同的数据集进行试错实验。本研究提出了一种端到端的深度学习模型,金字塔注意力特征融合模型(PAFFM),整合CNN,注意机制和金字塔结构。PAFFM融合原始可见光和近红外范围(VNIR)和短波近红外范围(SWIR)光谱数据,以预测鸡胸肉中的TVB-N和TVC。与CNN和化学计量学模型相比,PAFFM获得了优异的结果,而无需复杂的处理组合优化过程。可视化并解释了对PAFFM性能做出重大贡献的重要波长。本研究为光谱检测的市场应用提供了有价值的参考和技术支持,有利于相关研究和实践领域。
    Total volatile basic nitrogen (TVB-N) and total viable count (TVC) are important freshness indicators of meat. Hyperspectral imaging combined with chemometrics has been proven to be effective in meat detection. However, a challenge with chemometrics is the lack of a universally applicable processing combination, requiring trial-and-error experiments with different datasets. This study proposes an end-to-end deep learning model, pyramid attention features fusion model (PAFFM), integrating CNN, attention mechanism and pyramid structure. PAFFM fuses the raw visible and near-infrared range (VNIR) and shortwave near-infrared range (SWIR) spectral data for predicting TVB-N and TVC in chicken breasts. Compared with the CNN and chemometric models, PAFFM obtains excellent results without a complicated processing combinatorial optimization process. Important wavelengths that contributed significantly to PAFFM performance are visualized and interpreted. This study offers valuable references and technical support for the market application of spectral detection, benefiting related research and practical fields.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    为了应对与行人属性代表性不足和识别不准确有关的挑战,我们提出了一种新的重新识别人的方法,它利用了全局特征学习和分类优化。具体来说,这种方法将基于标准化的渠道注意力模块集成到基本的ResNet50主干中,利用比例因子来优先考虑和增强关键行人特征信息。此外,基于输入的卷积特征图,采用动态激活函数自适应地调制ReLU的参数,从而增强了网络模型的非线性表达能力。通过将Arcface损失纳入交叉熵损失,训练监督模型以学习表现出显著类间方差的行人特征,同时保持紧密的类内一致性。增强模型在两个流行数据集上的评估,Market1501和DukeMTMC-ReID,显示Rank-1准确度提高了1.28%和1.4%,分别,以及1.93%和1.84%的平均精度(MAP)的相应增益。这些发现表明,该模型能够提取更稳健的行人特征,增强特征可辨别性,并最终实现卓越的识别精度。
    To address challenges related to the inadequate representation and inaccurate discrimination of pedestrian attributes, we propose a novel method for person re-identification, which leverages global feature learning and classification optimization. Specifically, this approach integrates a Normalization-based Channel Attention Module into the fundamental ResNet50 backbone, utilizing a scaling factor to prioritize and enhance key pedestrian feature information. Furthermore, dynamic activation functions are employed to adaptively modulate the parameters of ReLU based on the input convolutional feature maps, thereby bolstering the nonlinear expression capabilities of the network model. By incorporating Arcface loss into the cross-entropy loss, the supervised model is trained to learn pedestrian features that exhibit significant inter-class variance while maintaining tight intra-class coherence. The evaluation of the enhanced model on two popular datasets, Market1501 and DukeMTMC-ReID, reveals improvements in Rank-1 accuracy by 1.28% and 1.4%, respectively, along with corresponding gains in the mean average precision (mAP) of 1.93% and 1.84%. These findings indicate that the proposed model is capable of extracting more robust pedestrian features, enhancing feature discriminability, and ultimately achieving superior recognition accuracy.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号