instance segmentation

实例分割
  • 文章类型: Journal Article
    及时检测和控制茶叶害虫对保障茶叶生产质量至关重要。由于传统的基于CNN的方法的特征提取能力不足,他们面临的挑战,如在密集和模仿场景中检测害虫的不准确和低效。本研究提出了一个端到端的茶叶害虫检测和分割框架,TeaPest-transfer(TP-transfer),基于MaskTransfiner,以解决在模仿和密集场景中检测和分割害虫的挑战。为了改善传统卷积模块的特征提取能力和精度差,本研究提出了三种策略。首先,一个可变形的注意块集成到模型中,它由可变形卷积和自我注意组成,只使用关键内容术语。其次,通过更有效的特征对齐金字塔网络(FaPN)改进了骨干网中的FPN体系结构。最后,在训练期间,使用病灶丢失来平衡正样本和负样本,和参数适应数据集分布。此外,为了解决茶叶害虫图像的缺乏,构建了一个名为TeaPestDataset的数据集,其中包含1,752张图片和29种茶叶害虫。在TeaPestDataset上的实验结果表明,与其他模型相比,所提出的TP-Transfiner模型具有最先进的性能,检测精度(AP50)为87.211%,分割性能为87.381%。值得注意的是,与最先进的基于CNN的模型MaskR-CNN相比,该模型的分割平均精度(mAP)显着提高了9.4%,模型大小减少了30%。同时,TP-Transfiner的轻量级模块融合保持了快速的推理速度和紧凑的模型尺寸,展示茶园害虫防治的实际潜力,尤其是在密集和模仿的场景中。
    Detecting and controlling tea pests promptly are crucial for safeguarding tea production quality. Due to the insufficient feature extraction ability of traditional CNN-based methods, they face challenges such as inaccuracy and inefficiency of detecting pests in dense and mimicry scenarios. This study proposes an end-to-end tea pest detection and segmentation framework, TeaPest-Transfiner (TP-Transfiner), based on Mask Transfiner to address the challenge of detecting and segmenting pests in mimicry and dense scenarios. In order to improve the feature extraction inability and weak accuracy of traditional convolution modules, this study proposes three strategies. Firstly, a deformable attention block is integrated into the model, which consists of deformable convolution and self-attention using the key content only term. Secondly, the FPN architecture in the backbone network is improved with a more effective feature-aligned pyramid network (FaPN). Lastly, focal loss is employed to balance positive and negative samples during the training period, and parameters are adapted to the dataset distribution. Furthermore, to address the lack of tea pest images, a dataset called TeaPestDataset is constructed, which contains 1,752 images and 29 species of tea pests. Experimental results on the TeaPestDataset show that the proposed TP-Transfiner model achieves state-of-the-art performance compared with other models, attaining a detection precision (AP50) of 87.211% and segmentation performance of 87.381%. Notably, the model shows a significant improvement in segmentation average precision (mAP) by 9.4% and a reduction in model size by 30% compared to the state-of-the-art CNN-based model Mask R-CNN. Simultaneously, TP-Transfiner\'s lightweight module fusion maintains fast inference speeds and a compact model size, demonstrating practical potential for pest control in tea gardens, especially in dense and mimicry scenarios.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    研究进水颗粒与生物质之间的相互作用是废水生物处理的基础和重要内容。微观层面的方法允许这一点,如显微镜图像分析方法与常规ImageJ处理软件。然而,这些方法既昂贵又耗时,并且需要大量的手动参数调整工作。为了解决这个问题,我们提出了一种深度学习(DL)方法来自动检测和量化不含生物质的微粒,并从显微镜图像中捕获生物质。首先,我们引入了一个“TU代尔夫特-颗粒和生物质之间的相互作用”数据集,其中包含标记的显微镜图像。然后,我们使用该数据集构建了DL模型,该数据集具有七个最先进的模型体系结构,用于实例分割任务,比如面具R-CNN,级联面具R-CNN,Yolact和YOLOv8。结果表明,具有ResNet50骨干的级联掩模R-CNN具有很好的检测精度,测试装置上的mAP50box和mAP50mask为90.6%。然后,我们将我们的结果与传统的ImageJ处理方法进行了比较。结果表明,DL方法在检测精度和处理成本方面明显优于ImageJ处理方法。DL方法显示出微平均精度提高了13.8%,微平均召回率提高了21.7%,与ImageJ方法相比。此外,DL方法可以在1分钟内处理70张图像,而ImageJ方法花费至少6小时。我们方法的有希望的性能使其能够提供一种潜在的替代方案,以负担得起的方式检查生物废水处理过程中微粒和生物质之间的相互作用。这种方法为治疗过程提供了更多有用的见解,能够进一步揭示生物处理系统中的微粒转移。
    Investigating the interaction between influent particles and biomass is basic and important for the biological wastewater treatment. The micro-level methods allow for this, such as the microscope image analysis method with the conventional ImageJ processing software. However, these methods are cost and time-consuming, and require a large amount of work on manual parameter tuning. To deal with this problem, we proposed a deep learning (DL) method to automatically detect and quantify microparticles free from biomass and entrapped in biomass from microscope images. Firstly, we introduced a \"TU Delft-Interaction between Particles and Biomass\" dataset containing labeled microscope images. Then, we built DL models using this dataset with seven state-of-the-art model architectures for a instance segmentation task, such as Mask R-CNN, Cascade Mask R-CNN, Yolact and YOLOv8. The results show that the Cascade Mask R-CNN with ResNet50 backbone achieves promising detection accuracy, with a mAP50box and mAP50mask of 90.6 % on the test set. Then, we benchmarked our results against the conventional ImageJ processing method. The results show that the DL method significantly outperforms the ImageJ processing method in terms of detection accuracy and processing cost. The DL method shows a 13.8 % improvement in micro-average precision, and a 21.7 % improvement in micro-average recall, compared to the ImageJ method. Moreover, the DL method can process 70 images within 1 min, while the ImageJ method costs at least 6 h. The promising performance of our method allows it to offer a potential alternative to examine the interaction between microparticles and biomass in biological wastewater treatment process in an affordable manner. This approach offers more useful insights into the treatment process, enabling further reveal the microparticles transfer in biological treatment systems.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    目的:木材包含不同的细胞类型,如纤维,气管和血管,定义其属性。研究细胞的形状,尺寸,和安排在显微镜图像是至关重要的理解木材的特点。通常,这涉及将样品浸渍(浸泡)在溶液中以分离细胞,然后将它们铺在载玻片上,用覆盖广阔区域的显微镜成像,捕获成千上万的细胞。然而,这些细胞经常在图像中聚集和重叠,使用标准的图像处理方法使分割变得困难和耗时。
    结果:在这项工作中,我们开发了一种自动深度学习分割方法,该方法利用一阶段YOLOv8模型对显微镜图像中浸渍的纤维和血管形式的白杨树进行快速准确的分割和表征。该模型可以分析32,640x25,920像素的图像,并展示了有效的细胞检测和分割,实现78%的MAP0.5-0.95。为了评估模型的稳健性,我们检查了已知较长纤维的转基因树系的纤维。结果与以前的手动测量相当。此外,我们创建了一个用于图像分析的用户友好的Web应用程序,并提供了在GoogleColab上使用的代码。
    结论:通过利用YOLOv8的进步,这项工作提供了一种深度学习解决方案,可以对适合实际应用的木材细胞进行有效的量化和分析。
    OBJECTIVE: Wood comprises different cell types, such as fibers, tracheids and vessels, defining its properties. Studying cells\' shape, size, and arrangement in microscopy images is crucial for understanding wood characteristics. Typically, this involves macerating (soaking) samples in a solution to separate cells, then spreading them on slides for imaging with a microscope that covers a wide area, capturing thousands of cells. However, these cells often cluster and overlap in images, making the segmentation difficult and time-consuming using standard image-processing methods.
    RESULTS: In this work, we developed an automatic deep learning segmentation approach that utilizes the one-stage YOLOv8 model for fast and accurate segmentation and characterization of macerated fiber and vessel form aspen trees in microscopy images. The model can analyze 32,640 x 25,920 pixels images and demonstrate effective cell detection and segmentation, achieving a mAP 0.5 - 0.95 of 78 %. To assess the model\'s robustness, we examined fibers from a genetically modified tree line known for longer fibers. The outcomes were comparable to previous manual measurements. Additionally, we created a user-friendly web application for image analysis and provided the code for use on Google Colab.
    CONCLUSIONS: By leveraging YOLOv8\'s advances, this work provides a deep learning solution to enable efficient quantification and analysis of wood cells suitable for practical applications.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    现实世界的理解是沟通信息世界和物理世界的媒介,实现虚实映射和交互。然而,仅基于2D图像的场景理解面临着一些问题,例如缺乏几何信息和对遮挡的鲁棒性有限。深度传感器带来新的机遇,但是在将深度与几何和语义先验融合方面仍然存在挑战。为了解决这些问题,我们的方法考虑了视频流数据的可重复性和新生成数据的稀疏性。我们介绍了一种稀疏相关的网络体系结构(SCN),该体系结构明确设计用于在线RGBD实例分割。此外,我们利用对象级RGB-DSLAM系统的强大功能,从而超越了仅强调几何或语义的传统方法的局限性。我们随着时间的推移建立相关性,并利用这种相关性来制定规则并生成稀疏数据。我们彻底评估系统在NYUDepthV2和ScanNetV2数据集上的性能,与现有的最先进的替代方案相比,结合帧到帧的相关性可以显着提高实例分割的准确性和一致性。此外,使用稀疏数据降低了数据复杂度,同时保证了18fps的实时性要求。此外,通过利用对象布局理解的先验知识,我们展示了增强现实的一个有前途的应用,展示其潜力和实用性。
    Real-world understanding serves as a medium that bridges the information world and the physical world, enabling the realization of virtual-real mapping and interaction. However, scene understanding based solely on 2D images faces problems such as a lack of geometric information and limited robustness against occlusion. The depth sensor brings new opportunities, but there are still challenges in fusing depth with geometric and semantic priors. To address these concerns, our method considers the repeatability of video stream data and the sparsity of newly generated data. We introduce a sparsely correlated network architecture (SCN) designed explicitly for online RGBD instance segmentation. Additionally, we leverage the power of object-level RGB-D SLAM systems, thereby transcending the limitations of conventional approaches that solely emphasize geometry or semantics. We establish correlation over time and leverage this correlation to develop rules and generate sparse data. We thoroughly evaluate the system\'s performance on the NYU Depth V2 and ScanNet V2 datasets, demonstrating that incorporating frame-to-frame correlation leads to significantly improved accuracy and consistency in instance segmentation compared to existing state-of-the-art alternatives. Moreover, using sparse data reduces data complexity while ensuring the real-time requirement of 18 fps. Furthermore, by utilizing prior knowledge of object layout understanding, we showcase a promising application of augmented reality, showcasing its potential and practicality.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    猪的姿势与牲畜的健康和福利密切相关。研究人员对使用深度学习技术进行猪姿势检测产生了极大的兴趣。然而,由于图像角度和时间的变化,这项任务具有挑战性,以及单个图像中存在多只猪。在这项研究中,我们探索了一种基于实例分割评分的对象检测和分割算法,以检测不同的猪姿势(胸骨,侧卧,走路,和坐着)并在组图像中分割猪区域,从而能够识别一组内个体猪的姿势。该算法将具有50层的残差网络和特征金字塔网络相结合,以从输入图像中提取特征图。然后使用这些特征图使用区域候选网络来生成感兴趣区域(RoI)。对于每个RoI,算法执行回归以确定位置,分类,和每个猪姿势的分割。为了应对诸如小组住房中重叠猪的目标缺失和错误检测等挑战,使用阈值为0.7的非最大抑制(NMS)。通过广泛的超参数分析,0.01的学习率,512的批量大小和每批4个图像提供了卓越的性能,准确率超过96%。同样,在这些设置下,对象检测和实例分割的平均精度(mAP)超过83%。此外,我们将该方法与更快的R-CNN目标检测模型进行了比较。Further,在考虑各种超参数和迭代的不同处理单元上的执行时间已经被分析。
    Pig posture is closely linked with livestock health and welfare. There has been significant interest among researchers in using deep learning techniques for pig posture detection. However, this task is challenging due to variations in image angles and times, as well as the presence of multiple pigs in a single image. In this study, we explore an object detection and segmentation algorithm based on instance segmentation scoring to detect different pig postures (sternal lying, lateral lying, walking, and sitting) and segment pig areas in group images, thereby enabling the identification of individual pig postures within a group. The algorithm combines a residual network with 50 layers and a feature pyramid network to extract feature maps from input images. These feature maps are then used to generate regions of interest (RoI) using a region candidate network. For each RoI, the algorithm performs regression to determine the location, classification, and segmentation of each pig posture. To address challenges such as missing targets and error detections among overlapping pigs in group housing, non-maximum suppression (NMS) is used with a threshold of 0.7. Through extensive hyperparameter analysis, a learning rate of 0.01, a batch size of 512, and 4 images per batch offer superior performance, with accuracy surpassing 96%. Similarly, the mean average precision (mAP) exceeds 83% for object detection and instance segmentation under these settings. Additionally, we compare the method with the faster R-CNN object detection model. Further, execution times on different processing units considering various hyperparameters and iterations have been analyzed.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    快速准确地评估建筑物的破坏程度是灾后应急响应的一项艰巨任务。现有的大部分研究主要采用语义分割和目标检测方法,取得了良好的效果。然而,用于高分辨率无人机(UAV)图像,这些方法可能会导致建筑物内各种损坏类别的问题,并且无法准确提取建筑物边缘,从而阻碍了灾后救援和精细评估。为了解决这个问题,我们提出了一种改进的实例分割模型,该模型通过在骨干中加入混合局部通道注意力(MLCA)机制来提高分类精度,并通过细化颈部部分来提高小对象分割精度。该方法已在扬州地震UVA图像上进行了测试。实验结果表明,在两个平均平均精度(mAP)评估指标中,修改后的模型优于原始模型1.07%和1.11%,分别为mAPbbox50和mAPseg50。重要的是,完整类别的分类精度分别提高了2.73%和2.73%,分别,而崩塌类别则改善了2.58%和2.14%。此外,还将所提出的方法与最先进的实例分割模型进行了比较,例如,面具-R-CNN和YOLOV9-Seg。结果表明,所提出的模型在准确性和效率上都具有优势。具体来说,该模型的效率比其他精度相似的模型快三倍。该方法可为细粒度建筑损伤评估提供有价值的解决方案。
    Quickly and accurately assessing the damage level of buildings is a challenging task for post-disaster emergency response. Most of the existing research mainly adopts semantic segmentation and object detection methods, which have yielded good results. However, for high-resolution Unmanned Aerial Vehicle (UAV) imagery, these methods may result in the problem of various damage categories within a building and fail to accurately extract building edges, thus hindering post-disaster rescue and fine-grained assessment. To address this issue, we proposed an improved instance segmentation model that enhances classification accuracy by incorporating a Mixed Local Channel Attention (MLCA) mechanism in the backbone and improving small object segmentation accuracy by refining the Neck part. The method was tested on the Yangbi earthquake UVA images. The experimental results indicated that the modified model outperformed the original model by 1.07% and 1.11% in the two mean Average Precision (mAP) evaluation metrics, mAPbbox50 and mAPseg50, respectively. Importantly, the classification accuracy of the intact category was improved by 2.73% and 2.73%, respectively, while the collapse category saw an improvement of 2.58% and 2.14%. In addition, the proposed method was also compared with state-of-the-art instance segmentation models, e.g., Mask-R-CNN and YOLO V9-Seg. The results demonstrated that the proposed model exhibits advantages in both accuracy and efficiency. Specifically, the efficiency of the proposed model is three times faster than other models with similar accuracy. The proposed method can provide a valuable solution for fine-grained building damage evaluation.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    单个作物植物的准确实例分割对于实现农业中幼苗的高通量表型分析和智能田间管理至关重要。当前采用遥感的作物监测技术主要集中在人口分析上,因此缺乏对单个植物的精确估计。这项研究集中在玉米上,一种关键的主食作物,并利用来自无人机(UAV)的多光谱遥感数据。采用大规模SAM图像分割模型来有效地注释玉米植物实例,从而构建玉米幼苗实例分割的数据集。该研究评估了六种实例分割算法的实验准确性:MaskR-CNN,级联面具R-CNN,PointRend,YOLOv5,面具评分R-CNN,和YOLOv8,采用多光谱波段的各种组合进行比较分析。实验结果表明,YOLOv8模型具有出色的分割精度,特别是在NRG波段,bbox_mAP50和segm_mAP50精度达到95.2%和94%,分别,超越其他模型。此外,YOLOv8在泛化实验中展示了强大的性能,表明其在不同环境和条件下的适应性。此外,本研究模拟分析了不同分辨率对模型分割精度的影响。研究结果表明,即使在降低分辨率(1.333cm/px)的情况下,YOLOv8模型也能保持较高的分割精度,符合表型分析和现场管理标准。
    The accurate instance segmentation of individual crop plants is crucial for achieving a high-throughput phenotypic analysis of seedlings and smart field management in agriculture. Current crop monitoring techniques employing remote sensing predominantly focus on population analysis, thereby lacking precise estimations for individual plants. This study concentrates on maize, a critical staple crop, and leverages multispectral remote sensing data sourced from unmanned aerial vehicles (UAVs). A large-scale SAM image segmentation model is employed to efficiently annotate maize plant instances, thereby constructing a dataset for maize seedling instance segmentation. The study evaluates the experimental accuracy of six instance segmentation algorithms: Mask R-CNN, Cascade Mask R-CNN, PointRend, YOLOv5, Mask Scoring R-CNN, and YOLOv8, employing various combinations of multispectral bands for a comparative analysis. The experimental findings indicate that the YOLOv8 model exhibits exceptional segmentation accuracy, notably in the NRG band, with bbox_mAP50 and segm_mAP50 accuracies reaching 95.2% and 94%, respectively, surpassing other models. Furthermore, YOLOv8 demonstrates robust performance in generalization experiments, indicating its adaptability across diverse environments and conditions. Additionally, this study simulates and analyzes the impact of different resolutions on the model\'s segmentation accuracy. The findings reveal that the YOLOv8 model sustains high segmentation accuracy even at reduced resolutions (1.333 cm/px), meeting the phenotypic analysis and field management criteria.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    生物细胞的实例分割在医学图像分析中对于识别和分割单个细胞具有重要的意义。亚细胞结构的定量测量需要进一步的细胞水平亚细胞部分分割。亚细胞结构测量对于细胞表型和质量分析至关重要。出于这些目的,首先引入实例感知部分分割网络来区分单个细胞并为每个检测到的细胞分割亚细胞结构。自从世界卫生组织建立了精子质量评估的定量标准以来,这种方法已在人类精子细胞上得到证明。具体来说,提出了一种新颖的单元解析网络(CP-Net),用于精确的实例级单元解析。基于注意力的特征融合模块被设计为通过使用实例遮罩作为空间线索而不是作为区分各种实例的严格约束来减轻具有不规则形状的细胞的轮廓不对齐。开发了一种粗到细的分割模块,通过从整体到部分的分层分割来有效地分割细胞内的微小亚细胞结构,而不是直接分割每个细胞部分。此外,构建了一个精子解析数据集,包括320个带5个语义亚细胞部分标签的带注释的精子图像。对收集的数据集进行的大量实验表明,所提出的CP-Net优于最先进的实例感知部分分割网络。
    Instance segmentation of biological cells is important in medical image analysis for identifying and segmenting individual cells, and quantitative measurement of subcellular structures requires further cell-level subcellular part segmentation. Subcellular structure measurements are critical for cell phenotyping and quality analysis. For these purposes, instance-aware part segmentation network is first introduced to distinguish individual cells and segment subcellular structures for each detected cell. This approach is demonstrated on human sperm cells since the World Health Organization has established quantitative standards for sperm quality assessment. Specifically, a novel Cell Parsing Net (CP-Net) is proposed for accurate instance-level cell parsing. An attention-based feature fusion module is designed to alleviate contour misalignments for cells with an irregular shape by using instance masks as spatial cues instead of as strict constraints to differentiate various instances. A coarse-to-fine segmentation module is developed to effectively segment tiny subcellular structures within a cell through hierarchical segmentation from whole to part instead of directly segmenting each cell part. Moreover, a sperm parsing dataset is built including 320 annotated sperm images with five semantic subcellular part labels. Extensive experiments on the collected dataset demonstrate that the proposed CP-Net outperforms state-of-the-art instance-aware part segmentation networks.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: English Abstract
    There are some problems in positron emission tomography/ computed tomography (PET/CT) lung images, such as little information of feature pixels in lesion regions, complex and diverse shapes, and blurred boundaries between lesions and surrounding tissues, which lead to inadequate extraction of tumor lesion features by the model. To solve the above problems, this paper proposes a dense interactive feature fusion Mask RCNN (DIF-Mask RCNN) model. Firstly, a feature extraction network with cross-scale backbone and auxiliary structures was designed to extract the features of lesions at different scales. Then, a dense interactive feature enhancement network was designed to enhance the lesion detail information in the deep feature map by interactively fusing the shallowest lesion features with neighboring features and current features in the form of dense connections. Finally, a dense interactive feature fusion feature pyramid network (FPN) network was constructed, and the shallow information was added to the deep features one by one in the bottom-up path with dense connections to further enhance the model\'s perception of weak features in the lesion region. The ablation and comparison experiments were conducted on the clinical PET/CT lung image dataset. The results showed that the APdet, APseg, APdet_s and APseg_s indexes of the proposed model were 67.16%, 68.12%, 34.97% and 37.68%, respectively. Compared with Mask RCNN (ResNet50), APdet and APseg indexes increased by 7.11% and 5.14%, respectively. DIF-Mask RCNN model can effectively detect and segment tumor lesions. It provides important reference value and evaluation basis for computer-aided diagnosis of lung cancer.
    正电子发射断层显像/X线计算机体层成像(PET/CT)肺部图像中存在病灶区域特征像素信息少、形状复杂多样,病变与周围组织界限模糊等问题,导致模型对肿瘤病变特征提取不充分。针对上述问题,本文提出基于密集交互式融合Mask RCNN(DIF-Mask RCNN)实例分割模型。首先设计具有跨尺度主辅结构的特征提取网络,提取出不同尺度病灶特征;然后设计密集交互式增强辅助网络(DIFEN),通过将最浅层病变特征以密集连接形式与邻近特征、当前特征进行交互融合,增强深层特征图中病灶细节信息;最后构建密集交互式融合金字塔网络(DIF-FPN),在自下而上路径中将浅层信息逐个以密集连接方式补充到深层特征中,进一步加强模型对病变区域的微弱特征感知力。在临床PET/CT肺部图像数据集上进行消融实验和对比实验,结果表明所提模型对于病变区域实例分割的APdet、APseg、APdet_s、APseg_s指标分别为67.16%、68.12%、34.97%、37.68%,与Mask RCNN(ResNet50)相比在APdet和APseg指标上分别提升7.11%、5.14%。DIF-Mask RCNN模型能够有效检测分割肿瘤病变,为肺癌辅助诊断提供重要的参考价值与评估依据。.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    考虑到汉字结构的复杂性,特别是笔划之间的连接和相交,汉字笔划提取和识别的准确率低,以及不清楚的分割。本研究建立在YOLOv8n-seg模型的基础上,提出了YOLOv8n-seg-CAA-BiFPN汉字笔划精细分割模型。提出的协调感知注意力机制(CAA)将骨干网络输入特征图分为四个部分,对水平应用不同的权重,垂直,和渠道关注计算和融合关键信息,从而捕捉紧密排列的笔划位置的上下文规律性。网络的颈部集成了增强的加权双向特征金字塔网络(BiFPN),增强各种大小的笔划特征的融合效果。采用Shape-IoU损失函数代替传统的CIoU损失函数,重点关注笔划边界框的形状和比例,以优化边界框回归过程。最后,Grad-CAM++技术用于生成分割预测的热图,促进有效特征的可视化,并更深入地了解模型的重点领域。在公共汉字笔划数据集CCSE-Kai和CCSE-HW上进行培训和测试,该模型的平均准确率为84.71%,平均召回率为83.65%,平均准确率为80.11%。与原始的YOLOv8n-seg和现有的主流分割模型如SegFormer相比,BiSeNetV2和MaskR-CNN,平均精度提高了3.50%,4.35%,10.56%,和22.05%,平均召回率分别提高了4.42%,9.32%,15.64%,和24.92%,平均精度分别提高了3.11%,4.15%,8.02%,和19.33%,分别。结果表明,YOLOv8n-seg-CAA-BiFPN网络可以准确地实现汉字笔划分割。
    Considering the complex structure of Chinese characters, particularly the connections and intersections between strokes, there are challenges in low accuracy of Chinese character stroke extraction and recognition, as well as unclear segmentation. This study builds upon the YOLOv8n-seg model to propose the YOLOv8n-seg-CAA-BiFPN Chinese character stroke fine segmentation model. The proposed Coordinate-Aware Attention mechanism (CAA) divides the backbone network input feature map into four parts, applying different weights for horizontal, vertical, and channel attention to compute and fuse key information, thus capturing the contextual regularity of closely arranged stroke positions. The network\'s neck integrates an enhanced weighted bi-directional feature pyramid network (BiFPN), enhancing the fusion effect for features of strokes of various sizes. The Shape-IoU loss function is adopted in place of the traditional CIoU loss function, focusing on the shape and scale of stroke bounding boxes to optimize the bounding box regression process. Finally, the Grad-CAM++ technique is used to generate heatmaps of segmentation predictions, facilitating the visualization of effective features and a deeper understanding of the model\'s focus areas. Trained and tested on the public Chinese character stroke datasets CCSE-Kai and CCSE-HW, the model achieves an average accuracy of 84.71%, an average recall rate of 83.65%, and a mean average precision of 80.11%. Compared to the original YOLOv8n-seg and existing mainstream segmentation models like SegFormer, BiSeNetV2, and Mask R-CNN, the average accuracy improved by 3.50%, 4.35%, 10.56%, and 22.05%, respectively; the average recall rates improved by 4.42%, 9.32%, 15.64%, and 24.92%, respectively; and the mean average precision improved by 3.11%, 4.15%, 8.02%, and 19.33%, respectively. The results demonstrate that the YOLOv8n-seg-CAA-BiFPN network can accurately achieve Chinese character stroke segmentation.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号