关键词: attention mechanism cascaded feature fusion deep learning text detection

来  源:   DOI:10.3390/s24123758   PDF(Pubmed)

Abstract:
Scene text detection is an important research field in computer vision, playing a crucial role in various application scenarios. However, existing scene text detection methods often fail to achieve satisfactory results when faced with text instances of different sizes, shapes, and complex backgrounds. To address the challenge of detecting diverse texts in natural scenes, this paper proposes a multi-scale natural scene text detection method based on attention feature extraction and cascaded feature fusion. This method combines global and local attention through an improved attention feature fusion module (DSAF) to capture text features of different scales, enhancing the network\'s perception of text regions and improving its feature extraction capabilities. Simultaneously, an improved cascaded feature fusion module (PFFM) is used to fully integrate the extracted feature maps, expanding the receptive field of features and enriching the expressive ability of the feature maps. Finally, to address the cascaded feature maps, a lightweight subspace attention module (SAM) is introduced to partition the concatenated feature maps into several sub-space feature maps, facilitating spatial information interaction among features of different scales. In this paper, comparative experiments are conducted on the ICDAR2015, Total-Text, and MSRA-TD500 datasets, and comparisons are made with some existing scene text detection methods. The results show that the proposed method achieves good performance in terms of accuracy, recall, and F-score, thus verifying its effectiveness and practicality.
摘要:
场景文本检测是计算机视觉中的一个重要研究领域,在各种应用场景中发挥着至关重要的作用。然而,现有的场景文本检测方法在面对不同大小的文本实例时往往不能达到满意的效果,形状,复杂的背景。为了应对在自然场景中检测不同文本的挑战,提出了一种基于注意力特征提取和级联特征融合的多尺度自然场景文本检测方法。该方法通过改进的注意力特征融合模块(DSAF)将全局注意力和局部注意力相结合,捕获不同尺度的文本特征,增强网络对文本区域的感知,提高其特征提取能力。同时,改进的级联特征融合模块(PFFM)用于完全集成提取的特征图,扩展了特征的接受领域,丰富了特征图的表达能力。最后,为了解决级联的特征图,引入了轻量级子空间注意模块(SAM),将级联的特征图划分为多个子空间特征图,促进不同尺度特征之间的空间信息交互。在本文中,对ICDAR2015,全文进行了比较实验,和MSRA-TD500数据集,并与现有的一些场景文本检测方法进行了比较。结果表明,该方法在精度方面取得了良好的性能,召回,和F分数,从而验证了其有效性和实用性。
公众号