关键词: Attention mechanism Deep hashing Feature fusion Fine-grained hashing

来  源:   DOI:10.7717/peerj-cs.2025   PDF(Pubmed)

Abstract:
As the diversity and volume of images continue to grow, the demand for efficient fine-grained image retrieval has surged across numerous fields. However, the current deep learning-based approaches to fine-grained image retrieval often concentrate solely on the top-layer features, neglecting the relevant information carried in the middle layer, even though these information contains more fine-grained identification content. Moreover, these methods typically employ a uniform weighting strategy during hash code mapping, risking the loss of critical region mapping-an irreversible detriment to fine-grained retrieval tasks. To address the above problems, we propose a novel method for fine-grained image retrieval that leverage feature fusion and hash mapping techniques. Our approach harnesses a multi-level feature cascade, emphasizing not just top-layer but also intermediate-layer image features, and integrates a feature fusion module at each level to enhance the extraction of discriminative information. In addition, we introduce an agent self-attention architecture, marking its first application in this context, which steers the model to prioritize on long-range features, further avoiding the loss of critical regions of the mapping. Finally, our proposed model significantly outperforms existing state-of-the-art, improving the retrieval accuracy by an average of 40% for the 12-bit dataset, 22% for the 24-bit dataset, 16% for the 32-bit dataset, and 11% for the 48-bit dataset across five publicly available fine-grained datasets. We also validate the generalization ability and performance stability of our proposed method by another five datasets and statistical significance tests. Our code can be downloaded from https://github.com/BJFU-CS2012/MuiltNet.git.
摘要:
随着图像的多样性和数量不断增长,对高效细粒度图像检索的需求已经在许多领域激增。然而,当前基于深度学习的细粒度图像检索方法通常只集中在顶层特征上,忽略中间层携带的相关信息,即使这些信息包含更细粒度的识别内容。此外,这些方法通常在哈希代码映射期间采用统一的加权策略,冒着失去关键区域映射的风险-对细粒度检索任务的不可逆损害。为解决上述问题,我们提出了一种利用特征融合和哈希映射技术进行细粒度图像检索的新方法。我们的方法利用了多级特征级联,不仅强调顶层图像特征,而且强调中间层图像特征,并在每个级别集成了特征融合模块,以增强区分信息的提取。此外,我们引入了一个代理自我关注架构,标志着它在这方面的第一个应用,这引导模型优先考虑远程功能,进一步避免了映射的关键区域的丢失。最后,我们提出的模型显著优于现有的最新技术,将12位数据集的检索精度平均提高了40%,22%的24位数据集,16%的32位数据集,以及五个公开可用的细粒度数据集的48位数据集的11%。我们还通过另外五个数据集和统计显著性检验来验证我们提出的方法的泛化能力和性能稳定性。我们的代码可以从https://github.com/BJFU-CS2012/MuiltNet下载。git.
公众号