multimodal dataset

  • 文章类型: Journal Article
    视觉对象跟踪,对地球观测和环境监测等应用至关重要,在弱光和复杂背景等不利条件下遇到挑战。传统的跟踪技术经常步履蹒跚,特别是在快速运动和环境干扰中跟踪动态物体如飞机时。本研究引入了一种创新的自适应多模图像对象跟踪模型,该模型利用了多光谱图像传感器的功能,结合红外和可见光图像,以显著提高跟踪精度和鲁棒性。通过采用先进的视觉转换器架构,并集成令牌空间滤波(TSF)和跨模态补偿(CMC),我们的模型动态调整以适应不同的跟踪场景。在私有数据集和各种公共数据集上进行的综合实验证明了该模型在极端条件下的卓越性能,肯定了其对快速环境变化和传感器局限性的适应性。这项研究不仅推进了视觉跟踪技术,还提供了对多源图像融合和自适应跟踪策略的广泛见解,为未来基于传感器的跟踪系统的增强奠定了坚实的基础。
    Visual object tracking, pivotal for applications like earth observation and environmental monitoring, encounters challenges under adverse conditions such as low light and complex backgrounds. Traditional tracking technologies often falter, especially when tracking dynamic objects like aircraft amidst rapid movements and environmental disturbances. This study introduces an innovative adaptive multimodal image object-tracking model that harnesses the capabilities of multispectral image sensors, combining infrared and visible light imagery to significantly enhance tracking accuracy and robustness. By employing the advanced vision transformer architecture and integrating token spatial filtering (TSF) and crossmodal compensation (CMC), our model dynamically adjusts to diverse tracking scenarios. Comprehensive experiments conducted on a private dataset and various public datasets demonstrate the model\'s superior performance under extreme conditions, affirming its adaptability to rapid environmental changes and sensor limitations. This research not only advances visual tracking technology but also offers extensive insights into multisource image fusion and adaptive tracking strategies, establishing a robust foundation for future enhancements in sensor-based tracking systems.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在各种认知负荷下对音乐刺激的反应中解码个体隐藏的大脑状态可以释放开发非侵入性闭环脑机接口(CLBMI)的潜力。为了进行初步研究并调查CLBMI背景下的大脑反应,在存在个性化音乐刺激的情况下,我们在工作记忆实验中收集多模态生理信号和行为数据。
    参与者在平静的音乐和令人兴奋的音乐面前进行称为n-back任务的工作记忆实验。利用皮肤电导信号和行为数据,我们解码大脑的认知唤醒和表现状态,分别。我们确定氧合血红蛋白(HbO)数据与性能状态的关联。此外,我们评估每个音乐时段的总血红蛋白(HbT)信号能量。
    在任务难度方面观察到相对较低的唤醒变化,而唤醒基线相对于音乐类型有很大变化。总的来说,在激动人心的会议中,绩效指数得到了提高。在所有参与者的较高认知负荷(3-back任务)中,观察到HbO浓度与表现之间的最高正相关。此外,HbT信号能量峰值出现在激励会话内。
    研究结果可能强调了使用音乐作为干预来调节大脑认知状态的潜力。此外,该实验提供了包含多个生理信号的各种数据,这些信号可用于大脑状态解码器范式,以阐明人类在环实验并了解听觉刺激的网络级机制。
    UNASSIGNED: Decoding an individual\'s hidden brain states in responses to musical stimuli under various cognitive loads can unleash the potential of developing a non-invasive closed-loop brain-machine interface (CLBMI). To perform a pilot study and investigate the brain response in the context of CLBMI, we collect multimodal physiological signals and behavioral data within the working memory experiment in the presence of personalized musical stimuli.
    UNASSIGNED: Participants perform a working memory experiment called the n-back task in the presence of calming music and exciting music. Utilizing the skin conductance signal and behavioral data, we decode the brain\'s cognitive arousal and performance states, respectively. We determine the association of oxygenated hemoglobin (HbO) data with performance state. Furthermore, we evaluate the total hemoglobin (HbT) signal energy over each music session.
    UNASSIGNED: A relatively low arousal variation was observed with respect to task difficulty, while the arousal baseline changes considerably with respect to the type of music. Overall, the performance index is enhanced within the exciting session. The highest positive correlation between the HbO concentration and performance was observed within the higher cognitive loads (3-back task) for all of the participants. Also, the HbT signal energy peak occurs within the exciting session.
    UNASSIGNED: Findings may underline the potential of using music as an intervention to regulate the brain cognitive states. Additionally, the experiment provides a diverse array of data encompassing multiple physiological signals that can be used in the brain state decoder paradigm to shed light on the human-in-the-loop experiments and understand the network-level mechanisms of auditory stimulation.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    城市环境正在经历重大变革,随着步行区成为各种移动模式的复杂枢纽。这种转变需要对城市规划和导航技术采取更细致的方法,强调传统的局限性,以道路为中心的数据集可以捕获行人空间的详细动态。作为回应,我们引入DELTA数据集,旨在改善行人带的分析和绘图,从而满足了对以人行道为中心的多模态数据集的关键需求。DELTA数据集是使用定制设计的模块化多传感电动滑板车平台在单个城市环境中收集的,该平台包含高分辨率和同步音频,视觉,激光雷达,和GNSS/IMU数据。此组件提供了详细的,城市行人环境的情境变化。我们为各种传感器开发了三种不同的行人路线分割模型——4K摄像机,立体摄像机,和LiDAR-每个都经过优化,以利用各自传感器的独特优势和特性。这些模型已经证明了强大的性能,反射率通道的平均交集(IoU)值为0.84,4K相机为0.96,立体声相机为0.92,强调它们在确保不同分辨率和传感器类型的精确行人路线识别方面的有效性。Further,我们探索了基于音频事件的分类,以将独特的音景与特定的地理位置联系起来,通过将独特的听觉特征与其精确的地理起源相关联,丰富了对城市环境的空间理解。我们还讨论了DELTA数据集的潜在用例以及我们研究的局限性和未来可能性,旨在扩大我们对行人环境的理解。
    Urban environments are undergoing significant transformations, with pedestrian areas emerging as complex hubs of diverse mobility modes. This shift demands a more nuanced approach to urban planning and navigation technologies, highlighting the limitations of traditional, road-centric datasets in capturing the detailed dynamics of pedestrian spaces. In response, we introduce the DELTA dataset, designed to improve the analysis and mapping of pedestrian zones, thereby filling the critical need for sidewalk-centric multimodal datasets. The DELTA dataset was collected in a single urban setting using a custom-designed modular multi-sensing e-scooter platform encompassing high-resolution and synchronized audio, visual, LiDAR, and GNSS/IMU data. This assembly provides a detailed, contextually varied view of urban pedestrian environments. We developed three distinct pedestrian route segmentation models for various sensors-the 4K camera, stereocamera, and LiDAR-each optimized to capitalize on the unique strengths and characteristics of the respective sensor. These models have demonstrated strong performance, with Mean Intersection over Union (IoU) values of 0.84 for the reflectivity channel, 0.96 for the 4K camera, and 0.92 for the stereocamera, underscoring their effectiveness in ensuring precise pedestrian route identification across different resolutions and sensor types. Further, we explored audio event-based classification to connect unique soundscapes with specific geolocations, enriching the spatial understanding of urban environments by associating distinctive auditory signatures with their precise geographical origins. We also discuss potential use cases for the DELTA dataset and the limitations and future possibilities of our research, aiming to expand our understanding of pedestrian environments.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    为了研究顾客与服务人员之间的情感互动,单模态刺激被用来激活受试者的情绪,而效率更好的多模态情绪刺激往往被忽略。本研究旨在通过设置广角摄像机并在公司的OceanEngine中搜索连续15天,构建一个多模态情感刺激数据库(CCSIAS),其中包含29名服务人员的真实工作状态的视频记录以及客户与服务人员之间互动的音频剪辑。首先,在研究1中,我们开发了一个工具来评估顾客和服务人员的情绪状态.第二,邀请40名硕士和博士生使用研究1中开发的工具评估研究2中的音频和视频数据,以评估客户和服务人员的情绪状态。第三,招募了118名参与者来测试研究2的结果,以确保得出的数据的稳定性。结果表明,构建了139组稳定的情绪音视频数据(26组高,59套中等,54套低)。情绪信息的数量对于有效激活参与者的情绪状态很重要,视频数据的情感激活程度明显高于音频数据。总的来说,研究表明,情感互动现象的研究需要多模态数据集。CCSIAS(https://osf.io/muc86/)可以扩展情感交互研究的深度和广度,可以应用于组织行为学和心理学领域的客户和服务人员激活之间的不同情感状态。
    To research the emotional interaction between customers and service staff, single-modal stimuli are being used to activate subjects\' emotions while multimodal emotion stimuli with better efficiency are often neglected. This study aims to construct a multimodal emotion stimuli database (CCSIAS) with video records of real work status of 29 service staff and audio clips of interactions between customers and service staff by setting up wide-angle cameras and searching in company\'s Ocean Engine for 15 consecutive days. First, we developed a tool to assess the emotional statuses of customers and service staff in Study 1. Second, 40 Masters and PhD students were invited to assess the audio and video data to evaluate the emotional states of customers and service staff in Study 2, using the tools developed in Study 1. Third, 118 participants were recruited to test the results from Study 2 to ensure the stability of the derived data. The results showed that 139 sets of stable emotional audio & video data were constructed (26 sets were high, 59 sets were medium and 54 sets were low). The amount of emotional information is important for the effective activation of participants\' emotional states, and the degree of emotional activation of video data is significantly higher than that of the audio data. Overall, it was shown that the study of emotional interaction phenomena requires a multimodal dataset. The CCSIAS (https://osf.io/muc86/) can extend the depth and breadth of emotional interaction research and can be applied to different emotional states between customers and service staff activation in the fields of organizational behavior and psychology.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    驾驶员监控系统在中低级别的自动驾驶汽车中发挥着重要作用。我们的工作重点是将认知负荷作为驾驶员状态估计的组成部分进行检测,以提高交通安全。通过在51个受试者上诱导增加强度的单任务和双任务工作量,在连续测量来自多种模态的信号的同时,基于生理测量,如心电图,EDA,EMG,PPG,呼吸频率,皮肤温度和眼睛跟踪器数据,以及行为测量,如从面部视频中提取的动作单位,绩效指标,如反应时间和使用问卷的主观反馈,我们创建了ADABase(自动驾驶认知负荷评估数据库)作为一种参考方法来诱导受试者的认知负荷,我们使用完善的n-back测试,除了我们新颖的基于模拟器的k-drive测试,由现实世界的半自主车辆驱动。我们提取所有测量的专家特征,并发现多种模式的重大变化。最终,我们使用单模态和多模态输入来训练和评估机器学习算法,以区分认知负荷水平。我们仔细评估模型行为并研究特征重要性。总之,我们介绍了一种新颖的认知负荷测试,创建一个认知负荷数据库,使用统计测试验证更改,为机器学习引入新的分类和回归任务,训练和评估机器学习模型。
    Driver monitoring systems play an important role in lower to mid-level autonomous vehicles. Our work focuses on the detection of cognitive load as a component of driver-state estimation to improve traffic safety. By inducing single and dual-task workloads of increasing intensity on 51 subjects, while continuously measuring signals from multiple modalities, based on physiological measurements such as ECG, EDA, EMG, PPG, respiration rate, skin temperature and eye tracker data, as well as behavioral measurements such as action units extracted from facial videos, performance metrics like reaction time and subjective feedback using questionnaires, we create ADABase (Autonomous Driving Cognitive Load Assessment Database) As a reference method to induce cognitive load onto subjects, we use the well-established n-back test, in addition to our novel simulator-based k-drive test, motivated by real-world semi-autonomously vehicles. We extract expert features of all measurements and find significant changes in multiple modalities. Ultimately we train and evaluate machine learning algorithms using single and multimodal inputs to distinguish cognitive load levels. We carefully evaluate model behavior and study feature importance. In summary, we introduce a novel cognitive load test, create a cognitive load database, validate changes using statistical tests, introduce novel classification and regression tasks for machine learning and train and evaluate machine learning models.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    UNASSIGNED:情绪在工作场所中起着决定性和核心作用,尤其是服务型企业。由于服务过程的高度参与性和互动性,员工的情绪在服务提供过程中通常是高度不稳定的,这可能会对企业绩效产生负面影响。因此,有效地判断客服人员的情绪状态很重要。
    UNASSIGNED:我们收集了有关大型公司呼叫中心员工的现实生活工作情况的数据。连续进行了三项研究:首先,29名客户服务人员的情绪状态被广角摄像机录下来。在研究1中,我们通过自由关联测试构建了图片类型量表的评分标准和辅助工具。在研究2中,邀请了两组专家来评估客户服务人员的情绪状态。在研究3中,基于研究2的结果和多模态情感识别方法,构建了一个多模态数据集,以探索每种模态如何传达工作场所客户服务人员的情绪。
    UNASSIGNED:通过2组专家和1组志愿者的评分,我们首先制定了一套评分标准和图片式量表,并结合SAM量表来判断客服人员的情绪状态。然后,我们构建了99个(297个)稳定的多模态情感数据集。根据数据集之间的比较,我们发现声音在工作场所更显著地传达情感效价,面部表情与情绪唤醒有更突出的联系。
    未经批准:理论上,本研究丰富了情感数据的采集方式,为后续多模态情感数据集的开发提供了依据。实际上,可以为职场员工情绪的有效判断提供指导。
    UNASSIGNED: Emotions play a decisive and central role in the workplace, especially in the service-oriented enterprises. Due to the highly participatory and interactive nature of the service process, employees\' emotions are usually highly volatile during the service delivery process, which can have a negative impact on business performance. Therefore, it is important to effectively judge the emotional states of customer service staff.
    UNASSIGNED: We collected data on real-life work situations of call center employees in a large company. Three consecutive studies were conducted: first, the emotional states of 29 customer service staff were videotaped by wide-angle cameras. In Study 1, we constructed scoring criteria and auxiliary tools of picture-type scales through a free association test. In Study 2, two groups of experts were invited to evaluate the emotional states of customer service staff. In Study 3, based on the results in Study 2 and a multimodal emotional recognition method, a multimodal dataset was constructed to explore how each modality conveys the emotions of customer service staff in workplace.
    UNASSIGNED: Through the scoring by 2 groups of experts and 1 group of volunteers, we first developed a set of scoring criteria and picture-type scales with the combination of SAM scale for judging the emotional state of customer service staff. Then we constructed 99 (out of 297) sets of stable multimodal emotion datasets. Based on the comparison among the datasets, we found that voice conveys emotional valence in the workplace more significantly, and that facial expressions have more prominant connection with emotional arousal.
    UNASSIGNED: Theoretically, this study enriches the way in which emotion data is collected and can provide a basis for the subsequent development of multimodal emotional datasets. Practically, it can provide guidance for the effective judgment of employee emotions in the workplace.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    This paper presents our latest extension of the Brno Urban Dataset (BUD), the Winter Extension (WE). The dataset contains data from commonly used sensors in the automotive industry, like four RGB and single IR cameras, three 3D LiDARs, differential RTK GNSS receiver with heading estimation, the IMU and FMCW radar. Data from all sensors are precisely timestamped for future offline interpretation and data fusion. The most significant gain of the dataset is the focus on the winter conditions in snow-covered environments. Only a few public datasets deal with these kinds of conditions. We recorded the dataset during February 2021 in Brno, Czechia, when fresh snow covers the entire city and the surrounding countryside. The dataset contains situations from the city center, suburbs, highways as well as the countryside. Overall, the new extension adds three hours of real-life traffic situations from the mid-size city to the existing 10 h of original records. Additionally, we provide the precalculated YOLO neural network object detection annotations for all five cameras for the entire old data and the new ones. The dataset is suitable for developing mapping and navigation algorithms as well as the collision and object detection pipelines. The entire dataset is available as open-source under the MIT license.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    A multimodal wound image database was created to allow fast development of computer-aided approaches for wound healing monitoring. The developed system with parallel camera optical axes enables multimodal images: photo, thermal, stereo, and depth map of the wound area to be acquired. As a result of using this system a multimodal database of chronic wound images is introduced. It contains 188 image sets of photographs, thermal images, and 3D meshes of the surfaces of chronic wounds acquired during 79 patient visits. Manual wound outlines delineated by an expert are also included in the dataset. All images of each case are additionally coregistered, and both numerical registration parameters and the transformed images are covered in the database. The presented database is publicly available for the research community at https://chronicwounddatabase.eu. That is the first publicly available database for evaluation and comparison of new image-based algorithms in the wound healing monitoring process with coregistered photographs, thermal maps, and 3D models of the wound area. Easily available database of coregistered multimodal data with the raw data set allows faster development of algorithms devoted to wound healing analysis and monitoring.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    上肢和手的功能对于日常生活的许多活动是至关重要的,并且一个人的截肢可导致个体的显著的功能丧失。从这个角度来看,未来先进的假手预计将受益于机器人手和人类用户之间改进的共享控制,但更重要的是,从多模态传感器数据中推断人类意图的改进能力,以提供关于操作上下文的机器人手感知能力。这样的多模态传感器数据可以包括各种环境传感器,包括视觉传感器。以及人体生理和行为传感器,包括肌电图和惯性测量单元。用于环境状态和人类意图估计的融合方法可以结合这些证据来源,以帮助假手运动计划和控制。在本文中,我们提供了一个这种类型的数据集,它是在假手中内置摄像机的预期下收集的,和计算机视觉方法将需要评估这种手视视觉证据,以估计人类的意图。具体来说,在抓取试验的初始状态下,捕获了放置在不同方向的各种对象的人眼视图和手视图的成对图像,接下来是配对视频,在抓握过程中,来自人类手臂的EMG和IMU,电梯,放下,和缩回式试验结构。每次审判,基于显示桌子上的手和物体的场景的眼睛视图图像,多个人被要求按偏好的递减顺序排序,在相对于手的给定配置中,适用于对象的五种抓握类型。通过训练卷积神经网络处理手视图像以预测人类分配的眼睛视图标签,说明了成对的眼睛视图和手视图图像的潜在效用。
    Upper limb and hand functionality is critical to many activities of daily living and the amputation of one can lead to significant functionality loss for individuals. From this perspective, advanced prosthetic hands of the future are anticipated to benefit from improved shared control between a robotic hand and its human user, but more importantly from the improved capability to infer human intent from multimodal sensor data to provide the robotic hand perception abilities regarding the operational context. Such multimodal sensor data may include various environment sensors including vision, as well as human physiology and behavior sensors including electromyography and inertial measurement units. A fusion methodology for environmental state and human intent estimation can combine these sources of evidence in order to help prosthetic hand motion planning and control. In this paper, we present a dataset of this type that was gathered with the anticipation of cameras being built into prosthetic hands, and computer vision methods will need to assess this hand-view visual evidence in order to estimate human intent. Specifically, paired images from human eye-view and hand-view of various objects placed at different orientations have been captured at the initial state of grasping trials, followed by paired video, EMG and IMU from the arm of the human during a grasp, lift, put-down, and retract style trial structure. For each trial, based on eye-view images of the scene showing the hand and object on a table, multiple humans were asked to sort in decreasing order of preference, five grasp types appropriate for the object in its given configuration relative to the hand. The potential utility of paired eye-view and hand-view images was illustrated by training a convolutional neural network to process hand-view images in order to predict eye-view labels assigned by humans.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    Falls, especially in elderly persons, are an important health problem worldwide. Reliable fall detection systems can mitigate negative consequences of falls. Among the important challenges and issues reported in literature is the difficulty of fair comparison between fall detection systems and machine learning techniques for detection. In this paper, we present UP-Fall Detection Dataset. The dataset comprises raw and feature sets retrieved from 17 healthy young individuals without any impairment that performed 11 activities and falls, with three attempts each. The dataset also summarizes more than 850 GB of information from wearable sensors, ambient sensors and vision devices. Two experimental use cases were shown. The aim of our dataset is to help human activity recognition and machine learning research communities to fairly compare their fall detection solutions. It also provides many experimental possibilities for the signal recognition, vision, and machine learning community.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

公众号