visual features

  • 文章类型: Editorial
    随着心理学研究包含更多的自然主义问题和大规模的分析方法,绘画已经成为研究认知的令人兴奋的工具。绘图提供了关于我们如何看待世界的丰富信息,从大部分真实的感知表示到抽象的元认知表示。绘图还需要集成多个过程(例如,愿景,记忆,运动学习),和绘图经验可能会对这些过程产生影响。因此,绘画提出了几个有趣的认知问题,同时也提供了一种方法来获得洞察许多其他人。本期特刊以25项前沿研究为特色,利用绘画揭示了心理学领域的发现。这些不同的研究调查了儿童的绘画,年轻人,老年人,和特殊人群,如失明的人,顺行性健忘症,失用症,和语义痴呆症。这些研究详述了关于记忆机制的新发现,注意,数学推理,和其他认知过程。他们采用了一系列方法,包括心理物理实验,深度学习,和神经成像。最后,其中许多研究涵盖了绘画作为一个过程对其他认知过程的影响,包括绘画专业知识如何影响其他过程,如视觉记忆或空间能力。总的来说,这些研究为绘画作为心理学家用来理解复杂现象的普通工具的令人兴奋的未来铺平了道路。
    As psychological research embraces more naturalistic questions and large-scale analytic methods, drawing has emerged as an exciting tool for studying cognition. Drawing provides rich information about how we view the world, ranging from largely veridical perceptual representations to abstracted meta-cognitive representations. Drawing also requires the integration of multiple processes (e.g., vision, memory, motor learning), and experience with drawing can have an impact on such processes. As a result, drawing presents several interesting cognitive questions, while also providing a way to gain insight into a multitude of others. This Special Issue features 25 cutting-edge studies utilizing drawing to reveal discoveries transversing fields in psychology. These diverse studies investigate drawing across children, young adults, older adults, and special populations such as individuals with blindness, anterograde amnesia, apraxia, and semantic dementia. These studies detail new discoveries about the mechanisms underlying memory, attention, mathematical reasoning, and other cognitive processes. They employ a range of methods including psychophysical experiments, deep learning, and neuroimaging. Finally, many of these studies cover topics about the impact of drawing as a process on other cognitive processes, including how drawing expertise impacts other processes like visual memory or spatial abilities. Overall, this collection of studies paves the way for an exciting future of drawing as a commonplace tool used by psychologists to understand complex phenomena.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    为了解释我们周围的环境,大脑使用视觉分类过程。当前的理论和模型表明,此过程包含不同计算的层次结构,这些计算将复杂的,高维输入到低维表示(即,流形)支持多种分类行为。这里,我们通过分析动态MEG源活动中反映的这些转换来测试这一假设,而个体参与者根据不同的任务积极地将相同的刺激分类:面部表情,面对性别,行人性别,和车辆类型。结果揭示了前额叶皮层引导的三个转化阶段。在阶段1(高维,50-120ms),枕骨来源代表任务相关和任务无关的刺激特征;任务相关的特征进入更高的腹侧/背侧区域,而与任务无关的特征在枕骨-颞部停止。在阶段2(121-150ms),刺激特征表示简化为低维流形,然后在第3阶段(161-350毫秒)内将其转换为与任务相关的特征作为分类行为的基础。我们的发现揭示了大脑的网络机制如何将高维输入转化为支持多种分类行为的特定特征流形。
    To interpret our surroundings, the brain uses a visual categorization process. Current theories and models suggest that this process comprises a hierarchy of different computations that transforms complex, high-dimensional inputs into lower-dimensional representations (i.e., manifolds) in support of multiple categorization behaviors. Here, we tested this hypothesis by analyzing these transformations reflected in dynamic MEG source activity while individual participants actively categorized the same stimuli according to different tasks: face expression, face gender, pedestrian gender, and vehicle type. Results reveal three transformation stages guided by the pre-frontal cortex. At stage 1 (high-dimensional, 50-120 ms), occipital sources represent both task-relevant and task-irrelevant stimulus features; task-relevant features advance into higher ventral/dorsal regions, whereas task-irrelevant features halt at the occipital-temporal junction. At stage 2 (121-150 ms), stimulus feature representations reduce to lower-dimensional manifolds, which then transform into the task-relevant features underlying categorization behavior over stage 3 (161-350 ms). Our findings shed light on how the brain\'s network mechanisms transform high-dimensional inputs into specific feature manifolds that support multiple categorization behaviors.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    一个人的认知状态可以使用情绪状态的环绕模型进行分类,两个维度的连续模型:唤醒和效价。这项研究的目的是选择一个或多个机器学习模型,以集成到虚拟现实(VR)系统中,该系统为患有精神健康障碍的人运行认知补救练习。因此,情绪状态的预测对于为这些个体定制治疗至关重要。我们利用远程协作和情感交互(RECOLA)数据库来使用机器学习技术预测唤醒和效价值。RECOLA包括音频,视频,以及人类参与者之间相互作用的生理记录。为了让学习者专注于最相关的数据,从原始数据中提取特征。这些功能可以预先设计,学会了,或使用深度学习者隐式提取。我们以前在视频录制方面的工作集中在预先设计和学习的视觉特征上。在本文中,我们将我们的工作扩展到深层视觉特征上。我们的深度视觉特征是使用MobileNet-v2卷积神经网络(CNN)提取的,我们以前在RECOLA的全/半脸视频帧上训练过。由于我们工作的最终目的是使用头戴式显示器将我们的解决方案集成到实际的VR应用程序中,我们尝试了半张脸作为概念的证明。然后,通过可优化的集成回归,将提取的深层特征用于预测唤醒和效价值。我们还将提取的视觉特征与预先设计的视觉特征以及使用组合特征集预测的唤醒和效价值融合在一起。为了提高我们的预测性能,我们进一步融合了可优化集成模型的预测与MobileNet-v2模型的预测。决策融合后,在唤醒预测中,均方根误差(RMSE)为0.1140,皮尔逊相关系数(PCC)为0.8000,一致相关系数(CCC)为0.7868.在效价预测中,我们的RMSE为0.0790,PCC为0.7904,CCC为0.7645。
    The cognitive state of a person can be categorized using the circumplex model of emotional states, a continuous model of two dimensions: arousal and valence. The purpose of this research is to select a machine learning model(s) to be integrated into a virtual reality (VR) system that runs cognitive remediation exercises for people with mental health disorders. As such, the prediction of emotional states is essential to customize treatments for those individuals. We exploit the Remote Collaborative and Affective Interactions (RECOLA) database to predict arousal and valence values using machine learning techniques. RECOLA includes audio, video, and physiological recordings of interactions between human participants. To allow learners to focus on the most relevant data, features are extracted from raw data. Such features can be predesigned, learned, or extracted implicitly using deep learners. Our previous work on video recordings focused on predesigned and learned visual features. In this paper, we extend our work onto deep visual features. Our deep visual features are extracted using the MobileNet-v2 convolutional neural network (CNN) that we previously trained on RECOLA\'s video frames of full/half faces. As the final purpose of our work is to integrate our solution into a practical VR application using head-mounted displays, we experimented with half faces as a proof of concept. The extracted deep features were then used to predict arousal and valence values via optimizable ensemble regression. We also fused the extracted visual features with the predesigned visual features and predicted arousal and valence values using the combined feature set. In an attempt to enhance our prediction performance, we further fused the predictions of the optimizable ensemble model with the predictions of the MobileNet-v2 model. After decision fusion, we achieved a root mean squared error (RMSE) of 0.1140, a Pearson\'s correlation coefficient (PCC) of 0.8000, and a concordance correlation coefficient (CCC) of 0.7868 on arousal predictions. We achieved an RMSE of 0.0790, a PCC of 0.7904, and a CCC of 0.7645 on valence predictions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    低级特征通常是连续的(例如,两种颜色之间的色域),但是语义信息通常是分类的(狗和乌龟之间没有相应的梯度)和分层的(动物生活在陆地上,水,或空气)。为了确定这些差异对认知表征的影响,我们描述了五个领域的感知空间的几何结构:一个由语义信息(动物名称表示为单词)主导的领域,由低级特征(彩色纹理)主导的域,和三个中间域(动物图像,轻微纹理化的动物图像,很容易识别,以及难以识别的大量纹理化的动物图像)。每个域具有来自相同动物名称的37个刺激。来自13名参与者(9F),我们通过有效的心理物理排名范式收集了每个领域的相似性判断。然后,我们为每个参与者构建了每个域的几何模型,其中刺激之间的距离占参与者的相似性判断和内在不确定性。值得注意的是,这五个域具有相似的全局属性:每个域都需要5-7个维度,和适度的球面曲率提供了最佳的配合。然而,刺激在这些嵌入中的排列取决于语义信息的水平:从语义域导出的树状图(单词,image,和轻度纹理化的图像)比特征主导的域(重度纹理化的图像和纹理)更“树状”。因此,当语义信息占主导地位时,沿着这种特征主导到语义主导的梯度的领域的感知空间转移到树状组织,同时保留类似的全局几何图形。
    Low-level features are typically continuous (e.g., the gamut between two colors), but semantic information is often categorical (there is no corresponding gradient between dog and turtle) and hierarchical (animals live in land, water, or air). To determine the impact of these differences on cognitive representations, we characterized the geometry of perceptual spaces of five domains: a domain dominated by semantic information (animal names presented as words), a domain dominated by low-level features (colored textures), and three intermediate domains (animal images, lightly texturized animal images that were easy to recognize, and heavily texturized animal images that were difficult to recognize). Each domain had 37 stimuli derived from the same animal names. From 13 participants (9F), we gathered similarity judgments in each domain via an efficient psychophysical ranking paradigm. We then built geometric models of each domain for each participant, in which distances between stimuli accounted for participants\' similarity judgments and intrinsic uncertainty. Remarkably, the five domains had similar global properties: each required 5-7 dimensions, and a modest amount of spherical curvature provided the best fit. However, the arrangement of the stimuli within these embeddings depended on the level of semantic information: dendrograms derived from semantic domains (word, image, and lightly texturized images) were more \"tree-like\" than those from feature-dominated domains (heavily texturized images and textures). Thus, the perceptual spaces of domains along this feature-dominated to semantic-dominated gradient shift to a tree-like organization when semantic information dominates, while retaining a similar global geometry.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    眼球运动通常针对具有特定特征的刺激。数十年的神经生理学研究已经确定,这种行为是通过对编码潜在眼球运动的神经激活进行特征重新加权来实现的。尽管有大量研究研究基于特征的目标选择,尚未提出关于特征重新加权机制的全面理论说明。鉴于这样的理论是我们理解动眼加工本质的基础,我们在这里提出了动眼特征重加权机制。我们首先总结了大量的解剖和功能证据,表明编码潜在眼球运动的动眼基质依赖于视觉皮层的特征信息。接下来,我们强调了我们最近的行为实验的结果,这些实验表明,特征信息在动眼系统中按照特征复杂性的顺序显现,无论功能信息是否与任务相关。根据现有证据,我们提出了一种动眼特征重新加权机制,通过该机制(1)视觉信息仅在视觉表示体现在代表相关特征所需的皮层视觉处理层次结构的最高阶段之后才被投影到动眼系统中,以及(2)这些动态招募的皮层模块(S)然后通过移动神经特征表示来执行特征区分。同时还通过动态重新加权动眼向量来保持皮层和动眼基质中特征表示之间的奇偶校验。最后,我们讨论了我们的行为实验如何扩展到视觉科学的其他领域及其可能的临床应用。
    Eye movements are often directed toward stimuli with specific features. Decades of neurophysiological research has determined that this behavior is subserved by a feature-reweighting of the neural activation encoding potential eye movements. Despite the considerable body of research examining feature-based target selection, no comprehensive theoretical account of the feature-reweighting mechanism has yet been proposed. Given that such a theory is fundamental to our understanding of the nature of oculomotor processing, we propose an oculomotor feature-reweighting mechanism here. We first summarize the considerable anatomical and functional evidence suggesting that oculomotor substrates that encode potential eye movements rely on the visual cortices for feature information. Next, we highlight the results from our recent behavioral experiments demonstrating that feature information manifests in the oculomotor system in order of featural complexity, regardless of whether the feature information is task-relevant. Based on the available evidence, we propose an oculomotor feature-reweighting mechanism whereby (1) visual information is projected into the oculomotor system only after a visual representation manifests in the highest stage of the cortical visual processing hierarchy necessary to represent the relevant features and (2) these dynamically recruited cortical module(s) then perform feature discrimination via shifting neural feature representations, while also maintaining parity between the feature representations in cortical and oculomotor substrates by dynamically reweighting oculomotor vectors. Finally, we discuss how our behavioral experiments may extend to other areas in vision science and its possible clinical applications.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    对比敏感度函数(CSF)是视觉系统的基本特征,已在多个物种中广泛测量。它由正弦光栅在所有空间频率下的可见度阈值定义。这里,我们使用与人类心理物理学相同的2AFC对比检测范式研究了深度神经网络中的CSF。我们检查了240个预先训练了几个任务的网络。要获得相应的CSF,我们在从冻结的预训练网络中提取的特征之上训练了一个线性分类器。线性分类器专门针对具有自然图像的对比度辨别任务进行训练。必须找出两个输入图像中的哪一个具有更高的对比度。通过检测两个图像中的哪一个包含具有变化的方向和空间频率的正弦光栅来测量网络的CSF。我们的结果表明,人类CSF的特征在亮度通道(带限倒U形函数)和彩色通道(两个具有相似特性的低通函数)的深度网络中都表现出来。CSF网络的确切形状似乎与任务相关。通过在诸如图像去噪或自动编码之类的低级视觉任务上训练的网络更好地捕获人类CSF。然而,类似于人类的CSF也出现在中高级任务中,例如边缘检测和对象识别。我们的分析表明,类似人类的CSF出现在所有架构中,但在不同的处理深度,有些在早期层,而其他人在中间层和最后一层。总的来说,这些结果表明(I)深度网络忠实地模拟人类CSF,使它们适合图像质量和压缩应用的候选人,(ii)自然界的有效/有目的的处理驱动CSF形状,和(iii)来自视觉层次结构的所有级别的视觉表示有助于CSF的调谐曲线,反过来,暗示我们直观地认为由低级视觉特征调节的功能可能是由于在视觉系统的所有级别上从更大的一组神经元汇集而产生的。
    The contrast sensitivity function (CSF) is a fundamental signature of the visual system that has been measured extensively in several species. It is defined by the visibility threshold for sinusoidal gratings at all spatial frequencies. Here, we investigated the CSF in deep neural networks using the same 2AFC contrast detection paradigm as in human psychophysics. We examined 240 networks pretrained on several tasks. To obtain their corresponding CSFs, we trained a linear classifier on top of the extracted features from frozen pretrained networks. The linear classifier is exclusively trained on a contrast discrimination task with natural images. It has to find which of the two input images has higher contrast. The network\'s CSF is measured by detecting which one of two images contains a sinusoidal grating of varying orientation and spatial frequency. Our results demonstrate characteristics of the human CSF are manifested in deep networks both in the luminance channel (a band-limited inverted U-shaped function) and in the chromatic channels (two low-pass functions of similar properties). The exact shape of the networks\' CSF appears to be task-dependent. The human CSF is better captured by networks trained on low-level visual tasks such as image-denoising or autoencoding. However, human-like CSF also emerges in mid- and high-level tasks such as edge detection and object recognition. Our analysis shows that human-like CSF appears in all architectures but at different depths of processing, some at early layers, while others in intermediate and final layers. Overall, these results suggest that (i) deep networks model the human CSF faithfully, making them suitable candidates for applications of image quality and compression, (ii) efficient/purposeful processing of the natural world drives the CSF shape, and (iii) visual representation from all levels of visual hierarchy contribute to the tuning curve of the CSF, in turn implying a function which we intuitively think of as modulated by low-level visual features may arise as a consequence of pooling from a larger set of neurons at all levels of the visual system.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    瓷砖因其各种形式而受欢迎,它们经常被用来装饰环境。然而,很少有研究应用客观的方法来探索人们对瓷砖特征的内隐偏好和视觉注意。利用事件相关电位技术可以为瓷砖的研究和应用提供神经生理学证据。
    本研究探讨了模式的影响,亮度,采用主观问卷和事件相关电位(ERP)技术相结合的方法研究了瓷砖颜色系统因素对人们偏好的影响。使用十二种不同条件的瓷砖(2×3×2)作为刺激。在20名参与者观看刺激时收集EEG数据。使用方差分析和相关分析对主观偏好得分和平均ERP进行分析。
    (1)模式,亮度,和颜色系统因素显著影响瓷砖的主观偏好得分;无图案瓷砖,浅色瓷砖,和暖色的瓷砖获得了更高的偏好分数。(2)人们对不同瓷砖特征的偏好调节了ERP幅度。(3)具有高偏好分数的浅色瓷砖比中等色调和深色瓷砖引起更大的N100振幅;具有低偏好分数的图案化瓷砖和暖色瓷砖引起更大的P200和N200振幅。
    在视觉处理的早期阶段,浅色瓷砖引起了更多的关注,可能是因为与偏好相关的积极情绪效应。在视觉处理的中间阶段,图案和中性色瓷砖引起的P200和N200更大,表明图案和中性色瓷砖引起了更多的关注。这可能是由于消极偏见,更多的注意力被分配到人们强烈不喜欢的负面刺激上。从认知过程的角度来看,结果表明,瓷砖的亮度是人们首先检测到的因素,瓷砖的图案和色彩系统因素的视觉处理属于更高层次的视觉处理。本研究为参与瓷砖行业的环境设计师和营销人员评估瓷砖的视觉特征提供了新的视角和相关信息。
    UNASSIGNED: Ceramic tiles are popular because of their various forms, and they are often used to decorate the environment. However, few studies have applied objective methods to explore the implicit preference and visual attention of people toward ceramic tile features. Using event-related potential technology can provide neurophysiological evidence for the study and applications of tiles.
    UNASSIGNED: This study explored the influence of pattern, lightness, and color system factors of ceramic tiles on the preferences of people using a combination of subjective questionnaires and event-related potential (ERP) technology. Twelve different conditions of tiles (2 × 3 × 2) were used as stimuli. EEG data were collected from 20 participants while they watched the stimuli. Subjective preference scores and average ERPs were analyzed using analysis of variance and correlation analysis.
    UNASSIGNED: (1) Pattern, lightness, and color system factors significantly affected the subjective preference scores for tiles; the unpatterned tiles, light-toned tiles, and warm-colored tiles received higher preference scores. (2) The preferences of people for different features of tiles moderated ERP amplitudes. (3) The light-toned tiles with a high preference score caused a greater N100 amplitude than the medium-toned and dark-toned tiles; and the patterned tiles and warm-colored tiles with low preference scores induced greater P200 and N200 amplitudes.
    UNASSIGNED: In the early stage of visual processing, light-toned tiles attracted more attention, possibly because of the positive emotional effects related to the preference. The greater P200 and N200 elicited by the patterned and neutral-colored tiles in the middle stage of visual processing indicates that patterned and neutral-colored tiles attracted more attention. This may be due to negativity bias, where more attention is allocated to negative stimuli that people strongly dislike. From the perspective of cognitive processes, the results indicate that the lightness of ceramic tiles is the factor that people first detect, and the visual processing of pattern and color system factors of ceramic tiles belong to a higher level of visual processing. This study provides a new perspective and relevant information for assessing the visual characteristics of tiles for environmental designers and marketers involved in the ceramic tiles industry.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    大脑解码是从大脑活动中解码人类认知内容的过程。然而,由于大脑的独特特征,提高大脑解码的准确性仍然很困难,例如小样本量和高维度的大脑活动。因此,本文提出了一种有效利用多主体大脑活动来提高大脑解码精度的方法。具体来说,我们区分了共享信息共同的多受试者的大脑活动和基于每个受试者的大脑活动的个人信息,这两种类型的信息都用于解码人类的视觉认知。使用概率生成模型将两种类型的信息提取为属于潜在空间的特征。在实验中,使用了一个公开的数据集和五个受试者,基于0~1的置信度分数验证了估计的准确性,较大的值表示优越性.所提出的方法对最佳受试者的置信度得分为0.867,对五个受试者的置信度得分平均为0.813,与其他方法相比,这是最好的。实验结果表明,与现有的共享信息与个体信息不区分的方法相比,该方法能够准确解码视觉认知。
    Brain decoding is a process of decoding human cognitive contents from brain activities. However, improving the accuracy of brain decoding remains difficult due to the unique characteristics of the brain, such as the small sample size and high dimensionality of brain activities. Therefore, this paper proposes a method that effectively uses multi-subject brain activities to improve brain decoding accuracy. Specifically, we distinguish between the shared information common to multi-subject brain activities and the individual information based on each subject\'s brain activities, and both types of information are used to decode human visual cognition. Both types of information are extracted as features belonging to a latent space using a probabilistic generative model. In the experiment, an publicly available dataset and five subjects were used, and the estimation accuracy was validated on the basis of a confidence score ranging from 0 to 1, and a large value indicates superiority. The proposed method achieved a confidence score of 0.867 for the best subject and an average of 0.813 for the five subjects, which was the best compared to other methods. The experimental results show that the proposed method can accurately decode visual cognition compared with other existing methods in which the shared information is not distinguished from the individual information.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    将指挥控制舱的指挥控制信息以混合现实的形式显示后,其中包含的大量实时信息和静态信息将形成一个随时变化的动态局面。这给系统操作员的认知带来了很大的负担,决策和操作。为了解决这个问题,本文研究了混合现实环境下全息指挥舱信息显示的三维空间布局。共有15人参加了实验,其中10人为实验对象,5人为辅助实验人员。10名受试者使用HoloLens2代进行视觉特征和认知负荷实验,收集并分析受试者任务完成时间,错误率,眼动和脑电图和主观评价数据。通过对实验数据的分析,可以获得混合现实环境中三维空间的视觉和认知特征规律。本文系统地探讨了三个关键属性的影响:深度距离,信息层数和目标相对位置深度距离在三维空间中的信息分布,关于视觉搜索性能和认知负荷。实验结果表明,混合现实环境下信息显示的最佳深度距离范围为:操作交互的最佳深度距离(0.6m~1.0m),准确识别的最佳深度距离(2.4m〜2.8m)和整体态势感知最佳深度距离(3.4m〜3.6m)。在一定的视角下,空间中的信息层数尽可能小,信息层的数量最多不能超过5个。空间信息层之间的相对位置深度距离范围为0.2m至0.35m。基于该理论,三维空间中的信息布局可以实现混合现实环境中更快、更准确的视觉搜索,有效降低认知负荷。
    After the command and control information of the command and control cabin is displayed in the form of mixed reality, the large amount of real-time information and static information contained in it will form a dynamic situation that changes all the time. This brings a great burden to the system operator\'s cognition, decision-making and operation. In order to solve this problem, this paper studies the three-dimensional spatial layout of holographic command cabin information display in a mixed reality environment. A total of 15 people participated in the experiment, of which 10 were the subjects of the experiment and 5 were the staff of the auxiliary experiment. Ten subjects used the HoloLens 2 generation to conduct visual characteristics and cognitive load experiments and collected and analyzed the subjects\' task completion time, error rate, eye movement and EEG and subjective evaluation data. Through the analysis of experimental data, the laws of visual and cognitive features of three-dimensional space in a mixed reality environment can be obtained. This paper systematically explores the effects of three key attributes: depth distance, information layer number and target relative position depth distance of information distribution in a 3D space, on visual search performance and on cognitive load. The experimental results showed that the optimal depth distance range for information display in the mixed reality environment is: the best depth distance for operation interactions (0.6 m~1.0 m), the best depth distance for accurate identification (2.4 m~2.8 m) and the overall situational awareness best-in-class depth distance (3.4 m~3.6 m). Under a certain angle of view, the number of information layers in the space is as small as possible, and the number of information layers should not exceed five at most. The relative position depth distance between the information layers in space ranges from 0.2 m to 0.35 m. Based on this theory, information layout in a 3D space can achieve a faster and more accurate visual search in a mixed reality environment and effectively reduce the cognitive load.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    深度强化学习(RL)经常被批评为数据效率低下和对任务结构变化不灵活。这些问题的部分原因是DeepRL通常使用反向传播来学习端到端,这将导致特定于任务的表示。规避这些问题的一种方法是将DeepRL应用于以更不可知任务的方式学习的现有表示。然而,这只能部分解决了这个问题,因为DeepRL算法学习了所有预先存在的表示的函数,因此仍然容易受到数据效率低下和缺乏灵活性的影响。生物制剂似乎通过在许多任务上形成内部表示并仅根据手头的任务选择这些特征的子集进行决策来解决此问题;通常称为选择性注意的过程。我们从生物制剂中的选择性注意中获得灵感,并提出了一种称为选择性粒子注意(SPA)的新算法,选择DeepRL的现有表示的子集。至关重要的是,这些子集不是通过反向传播学习的,缓慢且容易过度拟合,而是通过粒子滤波器,该粒子滤波器仅使用奖励反馈快速灵活地识别特征的关键子集。我们在两个任务上评估SPA,这两个任务涉及原始像素输入和任务结构的动态变化,并表明它大大提高了下游DeepRL算法的效率和灵活性。
    Deep Reinforcement Learning (RL) is often criticised for being data inefficient and inflexible to changes in task structure. Part of the reason for these issues is that Deep RL typically learns end-to-end using backpropagation, which results in task-specific representations. One approach for circumventing these problems is to apply Deep RL to existing representations that have been learned in a more task-agnostic fashion. However, this only partially solves the problem as the Deep RL algorithm learns a function of all pre-existing representations and is therefore still susceptible to data inefficiency and a lack of flexibility. Biological agents appear to solve this problem by forming internal representations over many tasks and only selecting a subset of these features for decision-making based on the task at hand; a process commonly referred to as selective attention. We take inspiration from selective attention in biological agents and propose a novel algorithm called Selective Particle Attention (SPA), which selects subsets of existing representations for Deep RL. Crucially, these subsets are not learned through backpropagation, which is slow and prone to overfitting, but instead via a particle filter that rapidly and flexibly identifies key subsets of features using only reward feedback. We evaluate SPA on two tasks that involve raw pixel input and dynamic changes to the task structure, and show that it greatly increases the efficiency and flexibility of downstream Deep RL algorithms.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号