关键词: deep learning polarization semantic segmentation urban perception

来  源:   DOI:10.3390/s24154893   PDF(Pubmed)

Abstract:
Intelligent urban perception is one of the hot topics. Most previous urban perception models based on semantic segmentation mainly used RGB images as unimodal inputs. However, in natural urban scenes, the interplay of light and shadow often leads to confused RGB features, which diminish the model\'s perception ability. Multimodal polarization data encompass information dimensions beyond RGB, which can enhance the representation of shadow regions, serving as additional data for assistance. Additionally, in recent years, transformers have achieved outstanding performance in visual tasks, and their large, effective receptive field can provide more discriminative cues for shadow regions. For these reasons, this study proposes a novel semantic segmentation model called MixImages, which can combine polarization data for pixel-level perception. We conducted comprehensive experiments on a polarization dataset of urban scenes. The results showed that the proposed MixImages can achieve an accuracy advantage of 3.43% over the control group model using only RGB images in the unimodal benchmark while gaining a performance improvement of 4.29% in the multimodal benchmark. Additionally, to provide a reference for specific downstream tasks, we also tested the impact of different combinations of polarization types on the overall segmentation accuracy. The proposed MixImages can be a new option for conducting urban scene perception tasks.
摘要:
城市智能感知是当前研究的热点之一。以前的大多数基于语义分割的城市感知模型主要使用RGB图像作为单峰输入。然而,在自然的城市场景中,光和影的相互作用往往导致混淆的RGB功能,这削弱了模型的感知能力。多模态偏振数据包含超出RGB的信息维度,可以增强阴影区域的表示,作为援助的额外数据。此外,近年来,变压器在视觉任务中取得了出色的表现,和他们的大,有效的感受野可以为阴影区域提供更多的判别线索。由于这些原因,这项研究提出了一种新的语义分割模型,称为MixImages,它可以结合偏振数据进行像素级感知。我们对城市场景的极化数据集进行了全面的实验。结果表明,在单峰基准中,所提出的MixImages与仅使用RGB图像的对照组模型相比,可以实现3.43%的精度优势,同时在多峰基准中获得4.29%的性能改进。此外,为特定的下游任务提供参考,我们还测试了不同极化类型组合对整体分割精度的影响。所提出的MixImages可以是进行城市场景感知任务的新选择。
公众号