  • 文章类型: Editorial
    As psychological research embraces more naturalistic questions and large-scale analytic methods, drawing has emerged as an exciting tool for studying cognition. Drawing provides rich information about how we view the world, ranging from largely veridical perceptual representations to abstracted meta-cognitive representations. Drawing also requires the integration of multiple processes (e.g., vision, memory, motor learning), and experience with drawing can have an impact on such processes. As a result, drawing presents several interesting cognitive questions, while also providing a way to gain insight into a multitude of others. This Special Issue features 25 cutting-edge studies utilizing drawing to reveal discoveries transversing fields in psychology. These diverse studies investigate drawing across children, young adults, older adults, and special populations such as individuals with blindness, anterograde amnesia, apraxia, and semantic dementia. These studies detail new discoveries about the mechanisms underlying memory, attention, mathematical reasoning, and other cognitive processes. They employ a range of methods including psychophysical experiments, deep learning, and neuroimaging. Finally, many of these studies cover topics about the impact of drawing as a process on other cognitive processes, including how drawing expertise impacts other processes like visual memory or spatial abilities. Overall, this collection of studies paves the way for an exciting future of drawing as a commonplace tool used by psychologists to understand complex phenomena.






  • 文章类型: Journal Article
    To interpret our surroundings, the brain uses a visual categorization process. Current theories and models suggest that this process comprises a hierarchy of different computations that transforms complex, high-dimensional inputs into lower-dimensional representations (i.e., manifolds) in support of multiple categorization behaviors. Here, we tested this hypothesis by analyzing these transformations reflected in dynamic MEG source activity while individual participants actively categorized the same stimuli according to different tasks: face expression, face gender, pedestrian gender, and vehicle type. Results reveal three transformation stages guided by the pre-frontal cortex. At stage 1 (high-dimensional, 50-120 ms), occipital sources represent both task-relevant and task-irrelevant stimulus features; task-relevant features advance into higher ventral/dorsal regions, whereas task-irrelevant features halt at the occipital-temporal junction. At stage 2 (121-150 ms), stimulus feature representations reduce to lower-dimensional manifolds, which then transform into the task-relevant features underlying categorization behavior over stage 3 (161-350 ms). Our findings shed light on how the brain\'s network mechanisms transform high-dimensional inputs into specific feature manifolds that support multiple categorization behaviors.






  • 文章类型: Journal Article
    The cognitive state of a person can be categorized using the circumplex model of emotional states, a continuous model of two dimensions: arousal and valence. The purpose of this research is to select a machine learning model(s) to be integrated into a virtual reality (VR) system that runs cognitive remediation exercises for people with mental health disorders. As such, the prediction of emotional states is essential to customize treatments for those individuals. We exploit the Remote Collaborative and Affective Interactions (RECOLA) database to predict arousal and valence values using machine learning techniques. RECOLA includes audio, video, and physiological recordings of interactions between human participants. To allow learners to focus on the most relevant data, features are extracted from raw data. Such features can be predesigned, learned, or extracted implicitly using deep learners. Our previous work on video recordings focused on predesigned and learned visual features. In this paper, we extend our work onto deep visual features. Our deep visual features are extracted using the MobileNet-v2 convolutional neural network (CNN) that we previously trained on RECOLA\'s video frames of full/half faces. As the final purpose of our work is to integrate our solution into a practical VR application using head-mounted displays, we experimented with half faces as a proof of concept. The extracted deep features were then used to predict arousal and valence values via optimizable ensemble regression. We also fused the extracted visual features with the predesigned visual features and predicted arousal and valence values using the combined feature set. In an attempt to enhance our prediction performance, we further fused the predictions of the optimizable ensemble model with the predictions of the MobileNet-v2 model. After decision fusion, we achieved a root mean squared error (RMSE) of 0.1140, a Pearson\'s correlation coefficient (PCC) of 0.8000, and a concordance correlation coefficient (CCC) of 0.7868 on arousal predictions. We achieved an RMSE of 0.0790, a PCC of 0.7904, and a CCC of 0.7645 on valence predictions.






  • 文章类型: Journal Article
    Low-level features are typically continuous (e.g., the gamut between two colors), but semantic information is often categorical (there is no corresponding gradient between dog and turtle) and hierarchical (animals live in land, water, or air). To determine the impact of these differences on cognitive representations, we characterized the geometry of perceptual spaces of five domains: a domain dominated by semantic information (animal names presented as words), a domain dominated by low-level features (colored textures), and three intermediate domains (animal images, lightly texturized animal images that were easy to recognize, and heavily texturized animal images that were difficult to recognize). Each domain had 37 stimuli derived from the same animal names. From 13 participants (9F), we gathered similarity judgments in each domain via an efficient psychophysical ranking paradigm. We then built geometric models of each domain for each participant, in which distances between stimuli accounted for participants\' similarity judgments and intrinsic uncertainty. Remarkably, the five domains had similar global properties: each required 5-7 dimensions, and a modest amount of spherical curvature provided the best fit. However, the arrangement of the stimuli within these embeddings depended on the level of semantic information: dendrograms derived from semantic domains (word, image, and lightly texturized images) were more \"tree-like\" than those from feature-dominated domains (heavily texturized images and textures). Thus, the perceptual spaces of domains along this feature-dominated to semantic-dominated gradient shift to a tree-like organization when semantic information dominates, while retaining a similar global geometry.






  • 文章类型: Journal Article
    Eye movements are often directed toward stimuli with specific features. Decades of neurophysiological research has determined that this behavior is subserved by a feature-reweighting of the neural activation encoding potential eye movements. Despite the considerable body of research examining feature-based target selection, no comprehensive theoretical account of the feature-reweighting mechanism has yet been proposed. Given that such a theory is fundamental to our understanding of the nature of oculomotor processing, we propose an oculomotor feature-reweighting mechanism here. We first summarize the considerable anatomical and functional evidence suggesting that oculomotor substrates that encode potential eye movements rely on the visual cortices for feature information. Next, we highlight the results from our recent behavioral experiments demonstrating that feature information manifests in the oculomotor system in order of featural complexity, regardless of whether the feature information is task-relevant. Based on the available evidence, we propose an oculomotor feature-reweighting mechanism whereby (1) visual information is projected into the oculomotor system only after a visual representation manifests in the highest stage of the cortical visual processing hierarchy necessary to represent the relevant features and (2) these dynamically recruited cortical module(s) then perform feature discrimination via shifting neural feature representations, while also maintaining parity between the feature representations in cortical and oculomotor substrates by dynamically reweighting oculomotor vectors. Finally, we discuss how our behavioral experiments may extend to other areas in vision science and its possible clinical applications.






  • 文章类型: Journal Article
    The contrast sensitivity function (CSF) is a fundamental signature of the visual system that has been measured extensively in several species. It is defined by the visibility threshold for sinusoidal gratings at all spatial frequencies. Here, we investigated the CSF in deep neural networks using the same 2AFC contrast detection paradigm as in human psychophysics. We examined 240 networks pretrained on several tasks. To obtain their corresponding CSFs, we trained a linear classifier on top of the extracted features from frozen pretrained networks. The linear classifier is exclusively trained on a contrast discrimination task with natural images. It has to find which of the two input images has higher contrast. The network\'s CSF is measured by detecting which one of two images contains a sinusoidal grating of varying orientation and spatial frequency. Our results demonstrate characteristics of the human CSF are manifested in deep networks both in the luminance channel (a band-limited inverted U-shaped function) and in the chromatic channels (two low-pass functions of similar properties). The exact shape of the networks\' CSF appears to be task-dependent. The human CSF is better captured by networks trained on low-level visual tasks such as image-denoising or autoencoding. However, human-like CSF also emerges in mid- and high-level tasks such as edge detection and object recognition. Our analysis shows that human-like CSF appears in all architectures but at different depths of processing, some at early layers, while others in intermediate and final layers. Overall, these results suggest that (i) deep networks model the human CSF faithfully, making them suitable candidates for applications of image quality and compression, (ii) efficient/purposeful processing of the natural world drives the CSF shape, and (iii) visual representation from all levels of visual hierarchy contribute to the tuning curve of the CSF, in turn implying a function which we intuitively think of as modulated by low-level visual features may arise as a consequence of pooling from a larger set of neurons at all levels of the visual system.






  • 文章类型: Journal Article
    UNASSIGNED: Ceramic tiles are popular because of their various forms, and they are often used to decorate the environment. However, few studies have applied objective methods to explore the implicit preference and visual attention of people toward ceramic tile features. Using event-related potential technology can provide neurophysiological evidence for the study and applications of tiles.
    UNASSIGNED: This study explored the influence of pattern, lightness, and color system factors of ceramic tiles on the preferences of people using a combination of subjective questionnaires and event-related potential (ERP) technology. Twelve different conditions of tiles (2 × 3 × 2) were used as stimuli. EEG data were collected from 20 participants while they watched the stimuli. Subjective preference scores and average ERPs were analyzed using analysis of variance and correlation analysis.
    UNASSIGNED: (1) Pattern, lightness, and color system factors significantly affected the subjective preference scores for tiles; the unpatterned tiles, light-toned tiles, and warm-colored tiles received higher preference scores. (2) The preferences of people for different features of tiles moderated ERP amplitudes. (3) The light-toned tiles with a high preference score caused a greater N100 amplitude than the medium-toned and dark-toned tiles; and the patterned tiles and warm-colored tiles with low preference scores induced greater P200 and N200 amplitudes.
    UNASSIGNED: In the early stage of visual processing, light-toned tiles attracted more attention, possibly because of the positive emotional effects related to the preference. The greater P200 and N200 elicited by the patterned and neutral-colored tiles in the middle stage of visual processing indicates that patterned and neutral-colored tiles attracted more attention. This may be due to negativity bias, where more attention is allocated to negative stimuli that people strongly dislike. From the perspective of cognitive processes, the results indicate that the lightness of ceramic tiles is the factor that people first detect, and the visual processing of pattern and color system factors of ceramic tiles belong to a higher level of visual processing. This study provides a new perspective and relevant information for assessing the visual characteristics of tiles for environmental designers and marketers involved in the ceramic tiles industry.






  • 文章类型: Journal Article
    Brain decoding is a process of decoding human cognitive contents from brain activities. However, improving the accuracy of brain decoding remains difficult due to the unique characteristics of the brain, such as the small sample size and high dimensionality of brain activities. Therefore, this paper proposes a method that effectively uses multi-subject brain activities to improve brain decoding accuracy. Specifically, we distinguish between the shared information common to multi-subject brain activities and the individual information based on each subject\'s brain activities, and both types of information are used to decode human visual cognition. Both types of information are extracted as features belonging to a latent space using a probabilistic generative model. In the experiment, an publicly available dataset and five subjects were used, and the estimation accuracy was validated on the basis of a confidence score ranging from 0 to 1, and a large value indicates superiority. The proposed method achieved a confidence score of 0.867 for the best subject and an average of 0.813 for the five subjects, which was the best compared to other methods. The experimental results show that the proposed method can accurately decode visual cognition compared with other existing methods in which the shared information is not distinguished from the individual information.






  • 文章类型: Journal Article
    After the command and control information of the command and control cabin is displayed in the form of mixed reality, the large amount of real-time information and static information contained in it will form a dynamic situation that changes all the time. This brings a great burden to the system operator\'s cognition, decision-making and operation. In order to solve this problem, this paper studies the three-dimensional spatial layout of holographic command cabin information display in a mixed reality environment. A total of 15 people participated in the experiment, of which 10 were the subjects of the experiment and 5 were the staff of the auxiliary experiment. Ten subjects used the HoloLens 2 generation to conduct visual characteristics and cognitive load experiments and collected and analyzed the subjects\' task completion time, error rate, eye movement and EEG and subjective evaluation data. Through the analysis of experimental data, the laws of visual and cognitive features of three-dimensional space in a mixed reality environment can be obtained. This paper systematically explores the effects of three key attributes: depth distance, information layer number and target relative position depth distance of information distribution in a 3D space, on visual search performance and on cognitive load. The experimental results showed that the optimal depth distance range for information display in the mixed reality environment is: the best depth distance for operation interactions (0.6 m~1.0 m), the best depth distance for accurate identification (2.4 m~2.8 m) and the overall situational awareness best-in-class depth distance (3.4 m~3.6 m). Under a certain angle of view, the number of information layers in the space is as small as possible, and the number of information layers should not exceed five at most. The relative position depth distance between the information layers in space ranges from 0.2 m to 0.35 m. Based on this theory, information layout in a 3D space can achieve a faster and more accurate visual search in a mixed reality environment and effectively reduce the cognitive load.






  • 文章类型: Journal Article
    Deep Reinforcement Learning (RL) is often criticised for being data inefficient and inflexible to changes in task structure. Part of the reason for these issues is that Deep RL typically learns end-to-end using backpropagation, which results in task-specific representations. One approach for circumventing these problems is to apply Deep RL to existing representations that have been learned in a more task-agnostic fashion. However, this only partially solves the problem as the Deep RL algorithm learns a function of all pre-existing representations and is therefore still susceptible to data inefficiency and a lack of flexibility. Biological agents appear to solve this problem by forming internal representations over many tasks and only selecting a subset of these features for decision-making based on the task at hand; a process commonly referred to as selective attention. We take inspiration from selective attention in biological agents and propose a novel algorithm called Selective Particle Attention (SPA), which selects subsets of existing representations for Deep RL. Crucially, these subsets are not learned through backpropagation, which is slow and prone to overfitting, but instead via a particle filter that rapidly and flexibly identifies key subsets of features using only reward feedback. We evaluate SPA on two tasks that involve raw pixel input and dynamic changes to the task structure, and show that it greatly increases the efficiency and flexibility of downstream Deep RL algorithms.





