基于融合视听感知的 GRU - GoogLeNet 模型的多模态机器人音乐表演艺术。Multimodal robotic music performance art based on GRU-GoogLeNet model fusing audiovisual perception.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

The field of multimodal robotic musical performing arts has garnered significant interest due to its innovative potential. Conventional robots face limitations in understanding emotions and artistic expression in musical performances. Therefore, this paper explores the application of multimodal robots that integrate visual and auditory perception to enhance the quality and artistic expression in music performance. Our approach involves integrating GRU (Gated Recurrent Unit) and GoogLeNet models for sentiment analysis. The GRU model processes audio data and captures the temporal dynamics of musical elements, including long-term dependencies, to extract emotional information. The GoogLeNet model excels in image processing, extracting complex visual details and aesthetic features. This synergy deepens the understanding of musical and visual elements, aiming to produce more emotionally resonant and interactive robot performances. Experimental results demonstrate the effectiveness of our approach, showing significant improvements in music performance by multimodal robots. These robots, equipped with our method, deliver high-quality, artistic performances that effectively evoke emotional engagement from the audience. Multimodal robots that merge audio-visual perception in music performance enrich the art form and offer diverse human-machine interactions. This research demonstrates the potential of multimodal robots in music performance, promoting the integration of technology and art. It opens new realms in performing arts and human-robot interactions, offering a unique and innovative experience. Our findings provide valuable insights for the development of multimodal robots in the performing arts sector.

摘要：

多模式机器人音乐表演艺术领域由于其创新潜力而引起了极大的兴趣。传统的机器人在理解音乐表演中的情感和艺术表达方面面临局限性。因此,本文探讨了融合视觉和听觉感知的多模态机器人在音乐表演中的应用，以提高音乐表演的质量和艺术表现力。我们的方法涉及集成GRU（门控递归单元）和GoogLeNet模型进行情感分析。GRU模型处理音频数据并捕获音乐元素的时间动态，包括长期依赖，提取情感信息。GoogLeNet模型擅长图像处理，提取复杂的视觉细节和审美特征。这种协同作用加深了对音乐和视觉元素的理解，旨在产生更多情感共鸣和互动的机器人表演。实验结果证明了我们方法的有效性，显示出多模态机器人在音乐表现方面的显著改善。这些机器人,配备了我们的方法，提供高质量，有效唤起观众情感参与的艺术表演。在音乐表演中融合视听感知的多模态机器人丰富了艺术形式，并提供了多样化的人机交互。这项研究证明了多模式机器人在音乐表演中的潜力，促进技术与艺术的融合。它开辟了表演艺术和人机交互的新领域，提供独特和创新的体验。我们的发现为表演艺术领域多模式机器人的发展提供了宝贵的见解。