attention mechanism

注意机制
  • 文章类型: Journal Article
    解决检测陶瓷盘表面缺陷的挑战,例如难以检测小缺陷,缺陷尺寸的变化,和不准确的缺陷定位,我们提出了一种增强的YOLOv5s算法。首先,我们改进了YOLOv5s模型的锚架结构,以增强其泛化能力,对不同尺寸的物体进行鲁棒的缺陷检测。其次,我们引入ECA注意机制来提高模型检测小目标的准确性。在相同的实验条件下,我们增强的YOLOV5S算法显示了显著的改进,精确地,F1得分,mAP值增加3.1%,3%,分别为4.5%和4.5%。此外,检测裂纹的准确性,损坏,炉渣,点缺陷增加0.2%,4.7%,5.4%,和分别为1.9%。值得注意的是,检测速度从232帧/s提高到256帧/s。与其他算法的比较分析揭示了优于YOLOv3和YOLOv4模型的性能,展示了在识别小目标缺陷和实现实时检测方面的增强能力。
    Addressing the challenges in detecting surface defects on ceramic disks, such as difficulty in detecting small defects, variations in defect sizes, and inaccurate defect localization, we propose an enhanced YOLOv5s algorithm. Firstly, we improve the anchor frame structure of the YOLOv5s model to enhance its generalization ability, enabling robust defect detection for objects of varying sizes. Secondly, we introduce the ECA attention mechanism to improve the model\'s accuracy in detecting small targets. Under identical experimental conditions, our enhanced YOLOv5s algorithm demonstrates significant improvements, with precision, F1 scores, and mAP values increasing by 3.1 %, 3 %, and 4.5 % respectively. Moreover, the accuracy in detecting crack, damage, slag, and spot defects increases by 0.2 %, 4.7 %, 5.4 %, and 1.9 % respectively. Notably, the detection speed improves from 232 frames/s to 256 frames/s. Comparative analysis with other algorithms reveals superior performance over YOLOv3 and YOLOv4 models, showcasing enhanced capability in identifying small target defects and achieving real-time detection.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    蛋白质-肽相互作用(PPepIs)对于理解细胞功能至关重要。这可以促进新药的设计。作为形成PPepI的重要组成部分,蛋白-肽结合位点是理解PPepIs机制的基础.因此,准确识别蛋白-肽结合位点成为一项关键任务。传统的研究这些结合位点的实验方法费时费力,并且已经发明了一些计算工具来补充它。然而,由于需要配体信息,这些计算工具在通用性或准确性方面有限制,复杂的特征构造,或者他们对基于氨基酸残基的建模的依赖。为了解决这些计算算法的缺点,在这项工作中,我们描述了一个基于几何注意力的肽结合位点识别(GAPS)网络。所提出的模型利用几何特征工程来构建原子表示,并结合了多种注意力机制来更新相关的生物特征。此外,迁移学习策略是为了利用蛋白质-蛋白质结合位点信息来增强蛋白质-肽结合位点识别能力,考虑到蛋白质和肽之间的共同结构和生物学偏见。因此,GAPS在此任务中展示了最先进的性能和出色的鲁棒性。此外,我们的模型在几个扩展的实验中表现出卓越的性能,包括预测apo蛋白-肽,蛋白质环肽和AlphaFold预测的蛋白质肽结合位点。这些结果证实了GAPS模型是一个强大的,多才多艺,适用于多种结合位点预测的稳定方法。
    Protein-peptide interactions (PPepIs) are vital to understanding cellular functions, which can facilitate the design of novel drugs. As an essential component in forming a PPepI, protein-peptide binding sites are the basis for understanding the mechanisms involved in PPepIs. Therefore, accurately identifying protein-peptide binding sites becomes a critical task. The traditional experimental methods for researching these binding sites are labor-intensive and time-consuming, and some computational tools have been invented to supplement it. However, these computational tools have limitations in generality or accuracy due to the need for ligand information, complex feature construction, or their reliance on modeling based on amino acid residues. To deal with the drawbacks of these computational algorithms, we describe a geometric attention-based network for peptide binding site identification (GAPS) in this work. The proposed model utilizes geometric feature engineering to construct atom representations and incorporates multiple attention mechanisms to update relevant biological features. In addition, the transfer learning strategy is implemented for leveraging the protein-protein binding sites information to enhance the protein-peptide binding sites recognition capability, taking into account the common structure and biological bias between proteins and peptides. Consequently, GAPS demonstrates the state-of-the-art performance and excellent robustness in this task. Moreover, our model exhibits exceptional performance across several extended experiments including predicting the apo protein-peptide, protein-cyclic peptide and the AlphaFold-predicted protein-peptide binding sites. These results confirm that the GAPS model is a powerful, versatile, stable method suitable for diverse binding site predictions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在全球化的浪潮中,文化融合现象激增,突出强调跨文化交际中固有的挑战。为了应对这些挑战,当代研究已将重点转移到人机对话上。尤其是在人机对话的教育范式中,分析用户对话中的情感识别尤为重要。准确识别和理解用户的情感倾向以及人机交互和游戏的效率和体验。本研究旨在提高人机对话中的语言情感识别能力。它提出了一种基于来自变压器(BERT)的双向编码器表示的混合模型(BCBA),卷积神经网络(CNN),双向门控递归单位(BiGRU),注意机制。该模型利用BERT模型从文本中提取语义和句法特征。同时,它集成了CNN和BiGRU网络,以更深入地研究文本特征,增强模型在细致入微的情感识别方面的熟练程度。此外,通过引入注意力机制,该模型可以根据单词的情绪倾向为单词分配不同的权重。这使其能够优先考虑具有可辨别的情绪倾向的单词,以进行更精确的情绪分析。通过在两个数据集上的实验验证,BCBA模型在情感识别和分类任务中取得了显著的效果。该模型的准确性和F1得分都有了显著提高,平均准确率为0.84,平均F1评分为0.8。混淆矩阵分析揭示了该模型的最小分类错误率。此外,随着迭代次数的增加,模型的召回率稳定在约0.7。这一成就展示了该模型在语义理解和情感分析方面的强大功能,并展示了其在跨文化背景下处理语言表达中的情感特征方面的优势。本研究提出的BCBA模型为人机对话中的情感识别提供了有效的技术支持,这对于构建更加智能、人性化的人机交互系统具有重要意义。在未来,我们将继续优化模型的结构,提高其处理复杂情绪和跨语言情绪识别的能力,并探索将该模型应用于更多的实际场景,进一步促进人机对话技术的发展和应用。
    Amid the wave of globalization, the phenomenon of cultural amalgamation has surged in frequency, bringing to the fore the heightened prominence of challenges inherent in cross-cultural communication. To address these challenges, contemporary research has shifted its focus to human-computer dialogue. Especially in the educational paradigm of human-computer dialogue, analysing emotion recognition in user dialogues is particularly important. Accurately identify and understand users\' emotional tendencies and the efficiency and experience of human-computer interaction and play. This study aims to improve the capability of language emotion recognition in human-computer dialogue. It proposes a hybrid model (BCBA) based on bidirectional encoder representations from transformers (BERT), convolutional neural networks (CNN), bidirectional gated recurrent units (BiGRU), and the attention mechanism. This model leverages the BERT model to extract semantic and syntactic features from the text. Simultaneously, it integrates CNN and BiGRU networks to delve deeper into textual features, enhancing the model\'s proficiency in nuanced sentiment recognition. Furthermore, by introducing the attention mechanism, the model can assign different weights to words based on their emotional tendencies. This enables it to prioritize words with discernible emotional inclinations for more precise sentiment analysis. The BCBA model has achieved remarkable results in emotion recognition and classification tasks through experimental validation on two datasets. The model has significantly improved both accuracy and F1 scores, with an average accuracy of 0.84 and an average F1 score of 0.8. The confusion matrix analysis reveals a minimal classification error rate for this model. Additionally, as the number of iterations increases, the model\'s recall rate stabilizes at approximately 0.7. This accomplishment demonstrates the model\'s robust capabilities in semantic understanding and sentiment analysis and showcases its advantages in handling emotional characteristics in language expressions within a cross-cultural context. The BCBA model proposed in this study provides effective technical support for emotion recognition in human-computer dialogue, which is of great significance for building more intelligent and user-friendly human-computer interaction systems. In the future, we will continue to optimize the model\'s structure, improve its capability in handling complex emotions and cross-lingual emotion recognition, and explore applying the model to more practical scenarios to further promote the development and application of human-computer dialogue technology.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    随着图像的多样性和数量不断增长,对高效细粒度图像检索的需求已经在许多领域激增。然而,当前基于深度学习的细粒度图像检索方法通常只集中在顶层特征上,忽略中间层携带的相关信息,即使这些信息包含更细粒度的识别内容。此外,这些方法通常在哈希代码映射期间采用统一的加权策略,冒着失去关键区域映射的风险-对细粒度检索任务的不可逆损害。为解决上述问题,我们提出了一种利用特征融合和哈希映射技术进行细粒度图像检索的新方法。我们的方法利用了多级特征级联,不仅强调顶层图像特征,而且强调中间层图像特征,并在每个级别集成了特征融合模块,以增强区分信息的提取。此外,我们引入了一个代理自我关注架构,标志着它在这方面的第一个应用,这引导模型优先考虑远程功能,进一步避免了映射的关键区域的丢失。最后,我们提出的模型显著优于现有的最新技术,将12位数据集的检索精度平均提高了40%,22%的24位数据集,16%的32位数据集,以及五个公开可用的细粒度数据集的48位数据集的11%。我们还通过另外五个数据集和统计显著性检验来验证我们提出的方法的泛化能力和性能稳定性。我们的代码可以从https://github.com/BJFU-CS2012/MuiltNet下载。git.
    As the diversity and volume of images continue to grow, the demand for efficient fine-grained image retrieval has surged across numerous fields. However, the current deep learning-based approaches to fine-grained image retrieval often concentrate solely on the top-layer features, neglecting the relevant information carried in the middle layer, even though these information contains more fine-grained identification content. Moreover, these methods typically employ a uniform weighting strategy during hash code mapping, risking the loss of critical region mapping-an irreversible detriment to fine-grained retrieval tasks. To address the above problems, we propose a novel method for fine-grained image retrieval that leverage feature fusion and hash mapping techniques. Our approach harnesses a multi-level feature cascade, emphasizing not just top-layer but also intermediate-layer image features, and integrates a feature fusion module at each level to enhance the extraction of discriminative information. In addition, we introduce an agent self-attention architecture, marking its first application in this context, which steers the model to prioritize on long-range features, further avoiding the loss of critical regions of the mapping. Finally, our proposed model significantly outperforms existing state-of-the-art, improving the retrieval accuracy by an average of 40% for the 12-bit dataset, 22% for the 24-bit dataset, 16% for the 32-bit dataset, and 11% for the 48-bit dataset across five publicly available fine-grained datasets. We also validate the generalization ability and performance stability of our proposed method by another five datasets and statistical significance tests. Our code can be downloaded from https://github.com/BJFU-CS2012/MuiltNet.git.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    使用深度学习方法从3D医学图像(如磁共振成像(MRI)和计算机断层扫描(CT)扫描)中自动分割多个器官和肿瘤,可以帮助诊断和治疗癌症。然而,器官经常重叠和复杂的连接,其特点是广泛的解剖变异和低对比度。此外,肿瘤形状的多样性,location,和外观,再加上背景体素的优势,使得三维医学图像的精确分割变得困难。在本文中,提出了一种新颖的3D大核(LK)注意模块来解决这些问题,以实现准确的多器官分割和肿瘤分割。在提出的LK注意力模块中结合了生物启发的自我注意力和卷积的优点,包括本地上下文信息,远程依赖,和频道适应。该模块还分解LK卷积以优化计算成本,并且可以容易地并入诸如U-Net的CNN中。综合消融实验证明了卷积分解的可行性,并探索了最有效和最有效的网络设计。其中,在CT-ORG和BraTS2020数据集上评估了最佳的中型3DLK基于注意力的U-Net网络,与前卫CNN和基于Transformer的医学图像分割方法相比,实现了最先进的分割性能。对由于所提出的3DLK注意力模块而产生的性能改进进行了统计验证。
    Automated segmentation of multiple organs and tumors from 3D medical images such as magnetic resonance imaging (MRI) and computed tomography (CT) scans using deep learning methods can aid in diagnosing and treating cancer. However, organs often overlap and are complexly connected, characterized by extensive anatomical variation and low contrast. In addition, the diversity of tumor shape, location, and appearance, coupled with the dominance of background voxels, makes accurate 3D medical image segmentation difficult. In this paper, a novel 3D large-kernel (LK) attention module is proposed to address these problems to achieve accurate multi-organ segmentation and tumor segmentation. The advantages of biologically inspired self-attention and convolution are combined in the proposed LK attention module, including local contextual information, long-range dependencies, and channel adaptation. The module also decomposes the LK convolution to optimize the computational cost and can be easily incorporated into CNNs such as U-Net. Comprehensive ablation experiments demonstrated the feasibility of convolutional decomposition and explored the most efficient and effective network design. Among them, the best Mid-type 3D LK attention-based U-Net network was evaluated on CT-ORG and BraTS 2020 datasets, achieving state-of-the-art segmentation performance when compared to avant-garde CNN and Transformer-based methods for medical image segmentation. The performance improvement due to the proposed 3D LK attention module was statistically validated.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    器官分割是各种医学成像应用中的关键任务。已经开发了许多深度学习模型来做到这一点,但是速度很慢,需要大量的计算资源。为了解决这个问题,使用注意机制,可以在医学图像中定位重要的感兴趣对象,允许模型准确地分割它们,即使有噪声或伪影。通过关注特定的解剖区域,该模型在分割方面变得更好。医学图像在解剖信息形式上具有独特的特征,这使得它们与自然图像不同。不幸的是,大多数深度学习方法要么忽略这些信息,要么没有有效和明确地使用它。将自然智能与人工智能相结合,被称为混合智能,在医学图像分割中显示出了有希望的结果,使模型更健壮,并能够在具有挑战性的情况下表现良好。在本文中,我们提出了几种方法和模型,通过非深度学习方法在医学图像中找到基于深度学习的分割区域。我们开发了这些模型,并使用混合智能概念对其进行了训练。为了评估他们的表现,我们在独特的测试数据上测试了模型,并分析了包括假阴性商和假阳性商在内的指标.我们的发现表明,可以明确地学习对象形状和布局变化,以创建适用于每个解剖对象的计算模型。这项工作为医学图像分割和分析的进步开辟了新的可能性。
    Organ segmentation is a crucial task in various medical imaging applications. Many deep learning models have been developed to do this, but they are slow and require a lot of computational resources. To solve this problem, attention mechanisms are used which can locate important objects of interest within medical images, allowing the model to segment them accurately even when there is noise or artifact. By paying attention to specific anatomical regions, the model becomes better at segmentation. Medical images have unique features in the form of anatomical information, which makes them different from natural images. Unfortunately, most deep learning methods either ignore this information or do not use it effectively and explicitly. Combined natural intelligence with artificial intelligence, known as hybrid intelligence, has shown promising results in medical image segmentation, making models more robust and able to perform well in challenging situations. In this paper, we propose several methods and models to find attention regions in medical images for deep learning-based segmentation via non-deep-learning methods. We developed these models and trained them using hybrid intelligence concepts. To evaluate their performance, we tested the models on unique test data and analyzed metrics including false negatives quotient and false positives quotient. Our findings demonstrate that object shape and layout variations can be explicitly learned to create computational models that are suitable for each anatomic object. This work opens new possibilities for advancements in medical image segmentation and analysis.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    隧道衬砌结构中的裂缝是一个常见而严重的问题,危及交通安全和隧道的耐久性。衬砌接缝和裂缝在强度和形态特征方面的相似性使得隧道衬砌结构裂缝的检测具有挑战性。为了解决这个问题,提出了一种新的基于深度学习的隧道衬砌结构裂缝检测方法。首先,针对衬砌接缝的形态特征引入了一种改进的注意力机制,它不仅聚集了全球空间信息,而且还聚集了两个维度的特征,高度和宽度,来挖掘更多的远距离特征信息。此外,提出了一种利用四个不同方向的条带卷积的混合条带卷积模块。该模块从各个角度捕获远程上下文信息以避免来自背景像素的干扰。为了评估所提出的方法,这两个模块集成到一个U形网络中,并在隧道衬砌裂缝数据集Tunnel200上进行了实验,以及公开可用的裂纹数据集Crack500和DeepCrack。结果表明,该方法优于现有方法,并在这些数据集上取得了优越的性能。
    Cracks in tunnel lining structures constitute a common and serious problem that jeopardizes the safety of traffic and the durability of the tunnel. The similarity between lining seams and cracks in terms of strength and morphological characteristics renders the detection of cracks in tunnel lining structures challenging. To address this issue, a new deep learning-based method for crack detection in tunnel lining structures is proposed. First, an improved attention mechanism is introduced for the morphological features of lining seams, which not only aggregates global spatial information but also features along two dimensions, height and width, to mine more long-distance feature information. Furthermore, a mixed strip convolution module leveraging four different directions of strip convolution is proposed. This module captures remote contextual information from various angles to avoid interference from background pixels. To evaluate the proposed approach, the two modules are integrated into a U-shaped network, and experiments are conducted on Tunnel200, a tunnel lining crack dataset, as well as the publicly available crack datasets Crack500 and DeepCrack. The results show that the approach outperforms existing methods and achieves superior performance on these datasets.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    视网膜血管的精确分割对于各种眼部疾病的早期筛查至关重要,如糖尿病性视网膜病变和高血压性视网膜病变。鉴于视网膜血管的整体结构复杂多变,微小的本地特色,精细血管和边缘像素的精确提取仍然是当前研究中的技术挑战。为了增强提取细血管的能力,本文将金字塔通道注意力模块引入到U形网络中。这可以更有效地捕获不同级别的信息,并增加对船只相关渠道的关注,从而提高模型性能。同时,为了防止过度装配,本文利用预激活残差丢弃卷积块对U-Net中的标准卷积块进行了优化,从而提高了模型的泛化能力。该模型在三个基准视网膜数据集上进行评估:DRIVE,CHASE_DB1和STARE。实验结果表明,与基线模型相比,所提出的模型在灵敏度(Sen)分数上提高了7.12%,9.65%,在这三个数据集上为5.36%,分别,证明了其提取精细血管的强大能力。
    The precise segmentation of retinal vasculature is crucial for the early screening of various eye diseases, such as diabetic retinopathy and hypertensive retinopathy. Given the complex and variable overall structure of retinal vessels and their delicate, minute local features, the accurate extraction of fine vessels and edge pixels remains a technical challenge in the current research. To enhance the ability to extract thin vessels, this paper incorporates a pyramid channel attention module into a U-shaped network. This allows for more effective capture of information at different levels and increased attention to vessel-related channels, thereby improving model performance. Simultaneously, to prevent overfitting, this paper optimizes the standard convolutional block in the U-Net with the pre-activated residual discard convolution block, thus improving the model\'s generalization ability. The model is evaluated on three benchmark retinal datasets: DRIVE, CHASE_DB1, and STARE. Experimental results demonstrate that, compared to the baseline model, the proposed model achieves improvements in sensitivity (Sen) scores of 7.12%, 9.65%, and 5.36% on these three datasets, respectively, proving its strong ability to extract fine vessels.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    基于深度学习的联合信源信道编码(JSCC)在图像传输任务中显示出显著的进步。然而,以前的信道自适应JSCC方法通常依赖于当前信道的信噪比(SNR)进行编码,它忽略了神经网络在不同SNR下的自适应能力。本文研究了基于深度学习的JSCC模型对信道动态变化的自适应能力,并介绍了一种新的信道盲JSCC(CBJSCC)方法。CBJSCC利用神经网络的内在学习能力,在不依赖外部SNR信息的情况下,自适应动态信道和多样化的SNR。这种方法是有利的,因为它不受信道估计误差的影响,并且可以应用于一对多的无线通信场景。为了提高JSCC任务的性能,CBJSCC模型采用专门设计的编码器-解码器。实验结果表明,CBJSCC优于现有的依赖于SNR估计和反馈的信道自适应JSCC方法,在加性高斯白噪声环境和慢瑞利衰落信道条件下。通过对模型性能的综合分析,我们进一步验证了该策略在不同应用场景下的鲁棒性和适应性,实验结果为支持这一说法提供了有力的证据。
    Joint source-channel coding (JSCC) based on deep learning has shown significant advancements in image transmission tasks. However, previous channel-adaptive JSCC methods often rely on the signal-to-noise ratio (SNR) of the current channel for encoding, which overlooks the neural network\'s self-adaptive capability across varying SNRs. This paper investigates the self-adaptive capability of deep learning-based JSCC models to dynamically changing channels and introduces a novel method named Channel-Blind JSCC (CBJSCC). CBJSCC leverages the intrinsic learning capability of neural networks to self-adapt to dynamic channels and diverse SNRs without relying on external SNR information. This approach is advantageous, as it is not affected by channel estimation errors and can be applied to one-to-many wireless communication scenarios. To enhance the performance of JSCC tasks, the CBJSCC model employs a specially designed encoder-decoder. Experimental results show that CBJSCC outperforms existing channel-adaptive JSCC methods that depend on SNR estimation and feedback, both in additive white Gaussian noise environments and under slow Rayleigh fading channel conditions. Through a comprehensive analysis of the model\'s performance, we further validate the robustness and adaptability of this strategy across different application scenarios, with the experimental results providing strong evidence to support this claim.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    一种新的算法,Yolov8n-FADS,提出的目的是提高复杂地下环境中矿工头盔检测算法的准确性。通过用注意序列融合(ASF)代替头部,并引入P2检测层,ASF-P2结构能够综合提取图像的全局和局部特征信息,骨架部分的改进能够更有效地捕获空间稀疏分布的特征,这提高了模型感知复杂模式的能力。改进的检测头,SEAMHeadbytheSEAMmodule,可以更有效地处理遮挡。焦损模块可以通过调整正负样本的权重来提高模型检测稀有目标类别的能力。本研究表明,与原始模型相比,改进后的模型有29%的内存压缩,参数数量减少36.7%,检测精度提高了4.9%,能有效提高井下头盔佩戴者的检测精度,减轻井下视频监控人员的工作量,提高监测效率。
    A new algorithm, Yolov8n-FADS, has been proposed with the aim of improving the accuracy of miners\' helmet detection algorithms in complex underground environments. By replacing the head part with Attentional Sequence Fusion (ASF) and introducing the P2 detection layer, the ASF-P2 structure is able to comprehensively extract the global and local feature information of the image, and the improvement in the backbone part is able to capture the spatially sparsely distributed features more efficiently, which improves the model\'s ability to perceive complex patterns. The improved detection head, SEAMHead by the SEAM module, can handle occlusion more effectively. The Focal Loss module can improve the model\'s ability to detect rare target categories by adjusting the weights of positive and negative samples. This study shows that compared with the original model, the improved model has 29% memory compression, a 36.7% reduction in the amount of parameters, and a 4.9% improvement in the detection accuracy, which can effectively improve the detection accuracy of underground helmet wearers, reduce the workload of underground video surveillance personnel, and improve the monitoring efficiency.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号