关键词: artwork analysis class activation maps convolutional neural network explainability iconography

来  源:   DOI:10.3390/jimaging7070106   PDF(Pubmed)

Abstract:
Iconography studies the visual content of artworks by considering the themes portrayed in them and their representation. Computer Vision has been used to identify iconographic subjects in paintings and Convolutional Neural Networks enabled the effective classification of characters in Christian art paintings. However, it still has to be demonstrated if the classification results obtained by CNNs rely on the same iconographic properties that human experts exploit when studying iconography and if the architecture of a classifier trained on whole artwork images can be exploited to support the much harder task of object detection. A suitable approach for exposing the process of classification by neural models relies on Class Activation Maps, which emphasize the areas of an image contributing the most to the classification. This work compares state-of-the-art algorithms (CAM, Grad-CAM, Grad-CAM++, and Smooth Grad-CAM++) in terms of their capacity of identifying the iconographic attributes that determine the classification of characters in Christian art paintings. Quantitative and qualitative analyses show that Grad-CAM, Grad-CAM++, and Smooth Grad-CAM++ have similar performances while CAM has lower efficacy. Smooth Grad-CAM++ isolates multiple disconnected image regions that identify small iconographic symbols well. Grad-CAM produces wider and more contiguous areas that cover large iconographic symbols better. The salient image areas computed by the CAM algorithms have been used to estimate object-level bounding boxes and a quantitative analysis shows that the boxes estimated with Grad-CAM reach 55% average IoU, 61% GT-known localization and 31% mAP. The obtained results are a step towards the computer-aided study of the variations of iconographic elements positioning and mutual relations in artworks and open the way to the automatic creation of bounding boxes for training detectors of iconographic symbols in Christian art images.
摘要:
Iconography通过考虑艺术品中描绘的主题及其表现来研究艺术品的视觉内容。计算机视觉已用于识别绘画中的图像主题,卷积神经网络使基督教艺术绘画中的人物能够有效分类。然而,如果CNN获得的分类结果依赖于人类专家在研究图像时利用的相同图像特性,并且如果可以利用在整个艺术品图像上训练的分类器的体系结构来支持更艰巨的目标检测任务,则仍然需要证明。一种通过神经模型揭示分类过程的合适方法依赖于类激活图,强调图像对分类贡献最大的区域。这项工作比较了最先进的算法(CAM,Grad-CAM,Grad-CAM++,和SmoothGrad-CAM++)在识别图像属性的能力方面,这些属性决定了基督教艺术绘画中人物的分类。定量和定性分析表明,Grad-CAM,Grad-CAM++,和平滑Grad-CAM++具有相似的性能,而CAM具有较低的功效。平滑的Grad-CAM++隔离了多个断开的图像区域,可以很好地识别小的图标符号。Grad-CAM产生更宽,更连续的区域,更好地覆盖大型图标符号。CAM算法计算的显著图像区域已用于估计对象级边界框,定量分析表明,用Grad-CAM估计的框平均IoU达到55%,61%的GT已知本地化和31%的mAP。所获得的结果是朝着计算机辅助研究肖像元素定位和艺术品中相互关系的变化迈出的一步,并为自动创建边界框以训练基督教艺术图像中的肖像符号检测器开辟了道路。
公众号