学习用于视频修补的时空纹理转换器网络。Learning a spatial-temporal texture transformer network for video inpainting.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

We study video inpainting, which aims to recover realistic textures from damaged frames. Recent progress has been made by taking other frames as references so that relevant textures can be transferred to damaged frames. However, existing video inpainting approaches neglect the ability of the model to extract information and reconstruct the content, resulting in the inability to reconstruct the textures that should be transferred accurately. In this paper, we propose a novel and effective spatial-temporal texture transformer network (STTTN) for video inpainting. STTTN consists of six closely related modules optimized for video inpainting tasks: feature similarity measure for more accurate frame pre-repair, an encoder with strong information extraction ability, embedding module for finding a correlation, coarse low-frequency feature transfer, refinement high-frequency feature transfer, and decoder with accurate content reconstruction ability. Such a design encourages joint feature learning across the input and reference frames. To demonstrate the advancedness and effectiveness of the proposed model, we conduct comprehensive ablation learning and qualitative and quantitative experiments on multiple datasets by using standard stationary masks and more realistic moving object masks. The excellent experimental results demonstrate the authenticity and reliability of the STTTN.

摘要：

我们研究视频绘画，其目的是从损坏的帧恢复逼真的纹理。通过将其他帧作为参考，从而可以将相关纹理转移到损坏的帧，从而取得了最新进展。然而,现有的视频修复方法忽略了模型提取信息和重建内容的能力，导致无法重建应准确转移的纹理。在本文中,我们提出了一种新颖有效的时空纹理变换网络（STTTN）用于视频修补。STTTN由六个紧密相关的模块组成，这些模块针对视频修补任务进行了优化：特征相似性度量，以实现更准确的帧预修复，具有强大信息提取能力的编码器，用于查找相关性的嵌入模块，粗低频特征传递，精化高频特征传递，和解码器具有准确的内容重建能力。这样的设计鼓励跨输入和参考帧的联合特征学习。为了证明该模型的先进性和有效性，我们通过使用标准的固定掩模和更真实的移动对象掩模，对多个数据集进行全面的消融学习和定性和定量实验。良好的实验结果证明了STTTN的真实性和可靠性。