text detection

  • 文章类型: Journal Article
    场景文本检测是计算机视觉中的一个重要研究领域,在各种应用场景中发挥着至关重要的作用。然而,现有的场景文本检测方法在面对不同大小的文本实例时往往不能达到满意的效果,形状,复杂的背景。为了应对在自然场景中检测不同文本的挑战,提出了一种基于注意力特征提取和级联特征融合的多尺度自然场景文本检测方法。该方法通过改进的注意力特征融合模块(DSAF)将全局注意力和局部注意力相结合,捕获不同尺度的文本特征,增强网络对文本区域的感知,提高其特征提取能力。同时,改进的级联特征融合模块(PFFM)用于完全集成提取的特征图,扩展了特征的接受领域,丰富了特征图的表达能力。最后,为了解决级联的特征图,引入了轻量级子空间注意模块(SAM),将级联的特征图划分为多个子空间特征图,促进不同尺度特征之间的空间信息交互。在本文中,对ICDAR2015,全文进行了比较实验,和MSRA-TD500数据集,并与现有的一些场景文本检测方法进行了比较。结果表明,该方法在精度方面取得了良好的性能,召回,和F分数,从而验证了其有效性和实用性。
    Scene text detection is an important research field in computer vision, playing a crucial role in various application scenarios. However, existing scene text detection methods often fail to achieve satisfactory results when faced with text instances of different sizes, shapes, and complex backgrounds. To address the challenge of detecting diverse texts in natural scenes, this paper proposes a multi-scale natural scene text detection method based on attention feature extraction and cascaded feature fusion. This method combines global and local attention through an improved attention feature fusion module (DSAF) to capture text features of different scales, enhancing the network\'s perception of text regions and improving its feature extraction capabilities. Simultaneously, an improved cascaded feature fusion module (PFFM) is used to fully integrate the extracted feature maps, expanding the receptive field of features and enriching the expressive ability of the feature maps. Finally, to address the cascaded feature maps, a lightweight subspace attention module (SAM) is introduced to partition the concatenated feature maps into several sub-space feature maps, facilitating spatial information interaction among features of different scales. In this paper, comparative experiments are conducted on the ICDAR2015, Total-Text, and MSRA-TD500 datasets, and comparisons are made with some existing scene text detection methods. The results show that the proposed method achieves good performance in terms of accuracy, recall, and F-score, thus verifying its effectiveness and practicality.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    ChatGPT等AI聊天机器人正在彻底改变我们的AI能力,尤其是在文本生成中,为了帮助加快许多任务,但是他们引入了新的困境。考虑到AI文本检测器的已知和意想不到的局限性,AI生成文本的检测已经成为一个备受争议的话题。到目前为止,这方面的许多研究都集中在检测人工智能生成的文本;然而,这项研究的目的是评估相反的情况,AI文本检测工具能够区分人类生成的文本。使用了几本最著名的科学期刊的数千篇摘要来测试这些检测工具的预测能力,评估1980年至2023年的摘要。我们发现AI文本检测器错误地将已知真实摘要的8%识别为AI生成的文本。这进一步凸显了此类检测工具的当前局限性,并提出了新颖的检测器或组合方法,可以解决这一缺点,并在我们浏览这个新的AI环境时最大限度地减少其意想不到的后果。
    AI Chat Bots such as ChatGPT are revolutionizing our AI capabilities, especially in text generation, to help expedite many tasks, but they introduce new dilemmas. The detection of AI-generated text has become a subject of great debate considering the AI text detector\'s known and unexpected limitations. Thus far, much research in this area has focused on the detection of AI-generated text; however, the goal of this study was to evaluate the opposite scenario, an AI-text detection tool\'s ability to discriminate human-generated text. Thousands of abstracts from several of the most well-known scientific journals were used to test the predictive capabilities of these detection tools, assessing abstracts from 1980 to 2023. We found that the AI text detector erroneously identified up to 8% of the known real abstracts as AI-generated text. This further highlights the current limitations of such detection tools and argues for novel detectors or combined approaches that can address this shortcoming and minimize its unanticipated consequences as we navigate this new AI landscape.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在自然场景图像中检测不规则或任意形状的文本是一项具有挑战性的任务,最近引起了研究界的极大关注。然而,受CNN接受领域的限制,这些方法不能通过局部卷积算子直接捕获远距离分量区域之间的关系。在本文中,我们提出了一种新颖的方法,可以有效,鲁棒地检测自然场景图像中的不规则文本。首先,我们采用基于VGG16_BN的完全卷积网络体系结构,通过估计的字符中心点生成文本分量,这可以确保较高的文本组件检测召回率和较少的非字符文本组件。第二,文本行分组被视为使用图卷积网络(GCN)推断文本组件的邻接关系的问题。最后,为了评估我们的算法,我们通过在三个公共数据集上进行实验,将其与其他现有算法进行比较:ICDAR2013,CTW-1500和MSRA-TD500。结果表明,该方法可以很好地处理不规则场景文本,并且在这三个公共数据集上取得了有希望的结果。
    Detecting irregular or arbitrary shape text in natural scene images is a challenging task that has recently attracted considerable attention from research communities. However, limited by the CNN receptive field, these methods cannot directly capture relations between distant component regions by local convolutional operators. In this paper, we propose a novel method that can effectively and robustly detect irregular text in natural scene images. First, we employ a fully convolutional network architecture based on VGG16_BN to generate text components via the estimated character center points, which can ensure a high text component detection recall rate and fewer noncharacter text components. Second, text line grouping is treated as a problem of inferring the adjacency relations of text components with a graph convolution network (GCN). Finally, to evaluate our algorithm, we compare it with other existing algorithms by performing experiments on three public datasets: ICDAR2013, CTW-1500 and MSRA-TD500. The results show that the proposed method handles irregular scene text well and that it achieves promising results on these three public datasets.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    文本检测与识别是计算机视觉领域的研究热点,这被认为是传统光学字符识别(OCR)技术的进一步发展。随着机器视觉系统的快速发展和深度学习算法的广泛应用,文本识别取得了优异的性能。相比之下,从复杂的自然场景中检测文本块仍然是一项具有挑战性的任务。目前,已经提出了许多先进的自然场景文本检测算法,但是由于检测管道的复杂性,其中大多数运行速度较慢,并且无法应用于工业场景。在本文中,提出了一种基于CCD的机器视觉系统,用于发票图像中的实时文本检测。在这个系统中,我们从几个方面应用了优化,包括光学系统,硬件架构,和深度学习算法,提高机器视觉系统的速度性能。实验数据证实,优化方法可以显著提高机器视觉系统的运行速度,使其满足工业场景下的实时文本检测要求。
    Text detection and recognition is a hot topic in computer vision, which is considered to be the further development of the traditional optical character recognition (OCR) technology. With the rapid development of machine vision system and the wide application of deep learning algorithms, text recognition has achieved excellent performance. In contrast, detecting text block from complex natural scenes is still a challenging task. At present, many advanced natural scene text detection algorithms have been proposed, but most of them run slow due to the complexity of the detection pipeline and cannot be applied to industrial scenes. In this paper, we proposed a CCD based machine vision system for real-time text detection in invoice images. In this system, we applied optimizations from several aspects including the optical system, the hardware architecture, and the deep learning algorithm to improve the speed performance of the machine vision system. The experimental data confirms that the optimization methods can significantly improve the running speed of the machine vision system and make it meeting the real-time text detection requirements in industrial scenarios.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    对于任意形状的场景文本检测存在日益增长的兴趣。文本检测的有效性也从水平文本检测发展到在多个方向和任意形状上执行文本检测的能力。然而,场景文本检测仍然是一个具有挑战性的任务,由于在大小和纵横比和形状的多样性的显著差异,以及方向,粗略注释,和其他因素。基于回归的方法受对象检测的启发,由于其方法的特点,在拟合任意形状文本的边缘方面存在局限性。基于分割的方法,另一方面,在像素级别执行预测,因此可以更好地拟合任意形状的文本。然而,图像文本标注的不准确性和文本像素的分布特征,其中包含大量背景像素和错误分类的像素,在一定程度上降低了基于分割的文本检测方法的性能。通常,考虑像素是否属于文本区域高度依赖于其具有的语义信息的强度和像素在文本区域中的位置。基于以上两点,我们提出了一种结合位置和语义信息的场景文本检测方法。首先,我们使用位置编码模块(PosEM)将位置信息添加到图像中,以帮助模型学习与位置相关的隐含特征关系。第二,我们使用语义增强模块(SEM)来增强模型在特征提取过程中对图像中语义信息的关注。然后,为了最小化由于不准确的图像文本注释和文本像素的分布特性而引起的噪声的影响,我们将检测结果转换为可以更合理地表示文本分布的概率图。最后,我们使用后处理算法来重建和过滤文本实例,以减少误报。实验结果表明,我们的模型在总文本上有了显著的改进,MSRA-TD500和CTW1500数据集,优于大多数以前的先进算法。
    There is a growing interest in scene text detection for arbitrary shapes. The effectiveness of text detection has also evolved from horizontal text detection to the ability to perform text detection in multiple directions and arbitrary shapes. However, scene text detection is still a challenging task due to significant differences in size and aspect ratio and diversity in shape, as well as orientation, coarse annotations, and other factors. Regression-based methods are inspired by object detection and have limitations in fitting the edges of arbitrarily shaped text due to the characteristics of their methods. Segmentation-based methods, on the other hand, perform prediction at the pixel level and thus can fit arbitrarily shaped text better. However, the inaccuracy of image text annotations and the distribution characteristics of text pixels, which contain a large number of background pixels and misclassified pixels, degrades the performance of segmentation-based text detection methods to some extent. Usually, considering whether a pixel belongs to a text region is highly dependent on the strength of the semantic information it has and the position of the pixel in the text area. Based on the above two points, we propose an innovative and robust method for scene text detection combining position and semantic information. First, we add position information to the images using a position encoding module (PosEM) to help the model learn the implicit feature relationships associated with the position. Second, we use the semantic enhancement module (SEM) to enhance the model\'s focus on the semantic information in the image during feature extraction. Then, to minimize the effect of noise due to inaccurate image text annotations and the distribution characteristics of text pixels, we convert the detection results into a probability map that can more reasonably represent the text distribution. Finally, we reconstruct and filter the text instances using a post-processing algorithm to reduce false positives. The experimental results show that our model improves significantly on the Total-Text, MSRA-TD500, and CTW1500 datasets, outperforming most previous advanced algorithms.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    场景文本检测是指在场景图像中定位文本区域并用文本框将其标记出来。随着移动互联网的快速发展和智能手机等移动终端设备的日益普及,场景文本检测技术的研究得到了高度重视和广泛的应用。近年来,随着以卷积神经网络为代表的深度学习的兴起,场景文本检测的研究有了新的发展。然而,由于以下两个因素,场景文本检测仍然是一项非常具有挑战性的任务。首先,自然场景中的图像往往具有复杂的背景,这很容易干扰检测过程。其次,自然场景中的文本非常多样化,水平的,偏斜,直,和弯曲的文本,所有这些都可能出现在同一场景中。当卷积神经网络提取特征时,感知场有限的卷积层不能很好地对全局语义信息进行建模。因此,本文进一步提出了一种基于双分支特征提取的场景文本检测算法。本文通过残差校正分支(RCB)扩大了感受野,以获取具有更大接受域的上下文信息。同时,为了提高使用功能的效率,提出了一种基于FPN的两分支注意特征融合(TB-AFF)模块,为了结合全球和当地的注意力来精确定位文本区域,增强网络对文本区域的敏感性,并准确检测自然场景中的文本位置。在本文中,进行了几组对比实验,并与目前主流的文本检测方法进行了对比,所有这些都取得了更好的结果,从而验证了改进方法的有效性。
    Scene text detection refers to locating text regions in a scene image and marking them out with text boxes. With the rapid development of the mobile Internet and the increasing popularity of mobile terminal devices such as smartphones, the research on scene text detection technology has been highly valued and widely applied. In recent years, with the rise of deep learning represented by convolutional neural networks, research on scene text detection has made new developments. However, scene text detection is still a very challenging task due to the following two factors. Firstly, images in natural scenes often have complex backgrounds, which can easily interfere with the detection process. Secondly, the text in natural scenes is very diverse, with horizontal, skewed, straight, and curved text, all of which may be present in the same scene. As convolutional neural networks extract features, the convolutional layer with limited perceptual field cannot model the global semantic information well. Therefore, this paper further proposes a scene text detection algorithm based on dual-branch feature extraction. This paper enlarges the receptive field by means of a residual correction branch (RCB), to obtain contextual information with a larger receptive field. At the same time, in order to improve the efficiency of using the features, a two-branch attentional feature fusion (TB-AFF) module is proposed based on FPN, to combine global and local attention to pinpoint text regions, enhance the sensitivity of the network to text regions, and accurately detect the text location in natural scenes. In this paper, several sets of comparative experiments were conducted and compared with the current mainstream text detection methods, all of which achieved better results, thus verifying the effectiveness of the improved proposed method.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    通常无法获得完整的电子健康记录(EHR),因为信息壁垒是由信息化水平和EHR系统类型的差异造成的。因此,我们旨在开发一种深度学习系统[用于对非结构化纸质医疗报告(DeepSSR)中的文本图像进行结构化识别的深度学习系统],用于对非结构化纸质医疗报告(UPBMR)中的文本图像进行结构化识别,以帮助医生解决数据共享问题。
    UPBMR图像首先通过二值化进行预处理,图像校正,和图像分割。接下来,表区域是用轻量级网络检测到的(即,提出的YOLOv3-MobileNet模型)。此外,基于可微分二值化(DB)和卷积递归神经网络(CRNN)的模型对表格区域的文本进行检测和识别。最后,识别的文本是根据其行和列坐标构造的。DeepSSR在我们的数据集上使用4,221张UPBMR图像进行了训练和验证,这些图像被随机分为训练,验证,和测试集的比例为8:1:1。
    DeepSSR实现了91.10%的高精度和每幅图像0.668s的速度。在系统中,提出的用于表格检测的YOLOv3-MobileNet模型达到了97.8%的精度和每幅图像0.006s的速度。
    DeepSSR在基于UPBMR图像的文本结构化识别中具有较高的准确性和较快的速度。该系统可以帮助解决由于具有不同EHR系统的医院之间的信息障碍而导致的数据共享问题。
    UNASSIGNED: Complete electronic health records (EHRs) are not often available, because information barriers are caused by differences in the level of informatization and the type of the EHR system. Therefore, we aimed to develop a deep learning system [deep learning system for structured recognition of text images from unstructured paper-based medical reports (DeepSSR)] for structured recognition of text images from unstructured paper-based medical reports (UPBMRs) to help physicians solve the data-sharing problem.
    UNASSIGNED: UPBMR images were firstly preprocessed through binarization, image correction, and image segmentation. Next, the table area was detected with a lightweight network (i.e., the proposed YOLOv3-MobileNet model). In addition, the text of the table area was detected and recognized with the model based on differentiable binarization (DB) and convolutional recurrent neural network (CRNN). Finally, the recognized text was structured according to its row and column coordinates. DeepSSR was trained and validated on our dataset with 4,221 UPBMR images which were randomly split into training, validation, and testing sets in a ratio of 8:1:1.
    UNASSIGNED: DeepSSR achieved a high accuracy of 91.10% and a speed of 0.668 s per image. In the system, the proposed YOLOv3-MobileNet model for table detection achieved a precision of 97.8% and a speed of 0.006 s per image.
    UNASSIGNED: DeepSSR has high accuracy and fast speed in structured recognition of text based on UPBMR images. This system may help solve the data-sharing problem due to information barriers between hospitals with different EHR systems.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    随着人类进入智能时代,文本检测技术已逐步在行业中得到应用。然而,复杂背景下的文本检测仍然是研究人员需要克服的难题。当前的大多数算法都不足以定位文本区域,相邻文本实例的误检问题仍然存在。为了解决上述问题,本文提出了一种基于自注意环境的多级残差特征金字塔网络(MR-FPN),,可以准确地分离相邻的文本实例。具体来说,该框架使用ResNet50作为骨干网络,在特征金字塔网络(FPN)上进行了改进。引入了自我注意模块(SAM)来捕获像素级关系,增加上下文连接,并获得有效的功能。同时,多尺度增强模块(MEM)提高了文本信息的表达能力,提取强语义信息,整合特征金字塔生成的多尺度特征。此外,当特征金字塔逐步向下传递时,有关上部特征的信息将导致丢失,多级残差可以有效地解决这一问题。该模型能有效提高特征金字塔的融合能力,为文本检测提供更精细的功能,提高了文本检测的鲁棒性。该模型在CTW1500、Total-Text、ICDAR2015和MSRA-TD500不同种类的数据集取得了不同程度的提升。值得一提的是,本文在Total-Text数据集上获得的F-measure为83.31%,比基线系统的F-measure高出5%。
    With humanity entering the age of intelligence, text detection technology has been gradually applied in the industry. However, text detection in a complex background is still a challenging problem for researchers to overcome. Most of the current algorithms are not robust enough to locate text regions, and the problem of the misdetection of adjacent text instances still exists. In order to solve the above problems, this paper proposes a multi-level residual feature pyramid network (MR-FPN) based on a self-attention environment, which can accurately separate adjacent text instances. Specifically, the framework uses ResNet50 as the backbone network, which is improved on the feature pyramid network (FPN). A self-attention module (SAM) is introduced to capture pixel-level relations, increase context connection, and obtain efficient features. At the same time, the multi-scale enhancement module (MEM) improves the expression ability of text information, extracting strong semantic information and integrating the multi-scale features generated by the feature pyramid. In addition, information regarding the upper features will cause loss when the feature pyramid is passed down step by step, and multi-level residuals can effectively solve this problem. The proposed model can effectively improve the fusion ability of the feature pyramid, provide more refined features for text detection, and improve the robustness of text detection. This model was evaluated on CTW1500, Total-Text, ICDAR2015, and MSRA-TD500 datasets of different kinds and achieved varying degrees of improvement. It is worth mentioning that the F-measure of 83.31% obtained by this paper on the Total-Text dataset exceeds that of the baseline system by 5%.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    Text detection in natural scene images for content analysis is an interesting task. The research community has seen some great developments for English/Mandarin text detection. However, Urdu text extraction in natural scene images is a task not well addressed. In this work, firstly, a new dataset is introduced for Urdu text in natural scene images. The dataset comprises of 500 standalone images acquired from real scenes. Secondly, the channel enhanced Maximally Stable Extremal Region (MSER) method is applied to extract Urdu text regions as candidates in an image. Two-stage filtering mechanism is applied to eliminate non-candidate regions. In the first stage, text and noise are classified based on their geometric properties. In the second stage, a support vector machine classifier is trained to discard non-text candidate regions. After this, text candidate regions are linked using centroid-based vertical and horizontal distances. Text lines are further analyzed by a different classifier based on HOG features to remove non-text regions. Extensive experimentation is performed on the locally developed dataset to evaluate the performance. The experimental results show good performance on test set images. The dataset will be made available for research use. To the best of our knowledge, the work is the first of its kind for the Urdu language and would provide a good dataset for free research use and serve as a baseline performance on the task of Urdu text extraction.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Recently, various object detection frameworks have been applied to text detection tasks and have achieved good performance in the final detection. With the further expansion of text detection application scenarios, the research value of text detection topics has gradually increased. Text detection in natural scenes is more challenging for horizontal text based on a quadrilateral detection box and for curved text of any shape. Most networks have a good effect on the balancing of target samples in text detection, but it is challenging to deal with small targets and solve extremely unbalanced data. We continued to use PSENet to deal with such problems in this work. On the other hand, we studied the problem that most of the existing scene text detection methods use ResNet and FPN as the backbone of feature extraction, and improved the ResNet and FPN network parts of PSENet to make it more conducive to the combination of feature extraction in the early stage. A SEMPANet framework without an anchor and in one stage is proposed to implement a lightweight model, which is embodied in the training time of about 24 h. Finally, we selected the two most representative datasets for oriented text and curved text to conduct experiments. On ICDAR2015, the improved network\'s latest results further verify its effectiveness; it reached 1.01% in F-measure compared with PSENet-1s. On CTW1500, the improved network performed better than the original network on average.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

公众号