Natural images

    Human pose, defined as the spatial relationships between body parts, carries instrumental information supporting the understanding of motion and action of a person. A substantial body of previous work has identified cortical areas responsive to images of bodies and different body parts. However, the neural basis underlying the visual perception of body part relationships has received less attention. To broaden our understanding of body perception, we analyzed high-resolution fMRI responses to a wide range of poses from over 4,000 complex natural scenes. Using ground-truth annotations and an application of three-dimensional (3D) pose reconstruction algorithms, we compared similarity patterns of cortical activity with similarity patterns built from human pose models with different levels of depth availability and viewpoint dependency. Targeting the challenge of explaining variance in complex natural image responses with interpretable models, we achieved statistically significant correlations between pose models and cortical activity patterns (though performance levels are substantially lower than the noise ceiling). We found that the 3D view-independent pose model, compared with two-dimensional models, better captures the activation from distinct cortical areas, including the right posterior superior temporal sulcus (pSTS). These areas, together with other pose-selective regions in the LOTC, form a broader, distributed cortical network with greater view-tolerance in more anterior patches. We interpret these findings in light of the computational complexity of natural body images, the wide range of visual tasks supported by pose structures, and possible shared principles for view-invariant processing between articulated objects and ordinary, rigid objects.






    Pareidolia are perceptions of recognizable images or meaningful patterns where none exist. In recent years, this phenomenon has been increasingly studied in healthy subjects and patients with neurological or psychiatric diseases. The current study examined pareidolia production in a group of 53 stroke patients and 82 neurologically healthy controls who performed a natural images task. We found a significant reduction of absolute pareidolia production in left- and right-hemispheric stroke patients, with right-hemispheric patients producing overall fewest pareidolic output. Responses were categorized into 28 distinct categories, with \'Animal\', \'Human\', \'Face\', and \'Body parts\' being the most common, accounting for 72% of all pareidolia. Regarding the percentages of the different categories of pareidolia, we found a significant reduction for the percentage of \"Body parts\" pareidolia in the left-hemispheric patient group as compared to the control group, while the percentage of this pareidolia type was not significantly reduced in right-hemispheric patients compared to healthy controls. These results support the hypothesis that pareidolia production may be influenced by local-global visual processing with the left hemisphere being involved in local and detailed analytical visual processing to a greater extent. As such, a lesion to the right hemisphere, that is believed to be critical for global visual processing, might explain the overall fewest pareidolic output produced by the right-hemispheric patients.






    Human photoreceptors consist of cones, rods, and melanopsin-expressing intrinsically photosensitive retinal ganglion cells (ipRGCs). First studied in circadian regulation and pupillary control, ipRGCs project to a variety of brain centers suggesting a broader involvement beyond non-visual functions. IpRGC responses are stable, long-lasting, and with a particular codification of photoreceptor signals. In comparison with the transient and adaptive nature of cone and rod signals, ipRGCs\' signaling might provide an ecological advantage to different attributes of color vision. Previous studies have indicated melanopsin\'s influence on visual responses yet its contribution to color perception in humans remains debated. We summarized evidence and hypotheses (from physiology, psychophysics, and natural image statistics) about direct and indirect involvement of ipRGCs in human color vision, by first briefly assessing the current knowledge about the role of melanopsin and ipRGCs in vision and codification of spectral signals. We then approached the question about melanopsin activation eliciting a color percept, discussing studies using the silent substitution method. Finally, we explore various avenues through which ipRGCs might impact color perception indirectly, such as through involvement in peripheral color matching, post-receptoral pathways, color constancy, long-term chromatic adaptation, and chromatic induction. While there is consensus about the role of ipRGCs in brightness perception, confirming its direct contribution to human color perception requires further investigation. We proposed potential approaches for future research, emphasizing the need for empirical validation and methodological thoroughness to elucidate the exact role of ipRGCs in human color vision.






    The cortical visual area, V4, has been considered to code contours that contribute to the intermediate-level representation of objects. The neural responses to the complex contour features intrinsic to natural contours are expected to clarify the essence of the representation. To approach the cortical coding of natural contours, we investigated the simultaneous coding of multiple contour features in monkey (Macaca fuscata) V4 neurons and their population-level representation. A substantial number of neurons showed significant tuning for two or more features such as curvature and closure, indicating that a substantial number of V4 neurons simultaneously code multiple contour features. A large portion of the neurons responded vigorously to acutely curved contours that surrounded the center of classical receptive field, suggesting that V4 neurons tend to code prominent features of object contours. The analysis of mutual information (MI) between the neural responses and each contour feature showed that most neurons exhibited similar magnitudes for each type of MI, indicating that many neurons showing the responses depended on multiple contour features. We next examined the population-level representation by using multidimensional scaling analysis. The neural preferences to the multiple contour features and that to natural stimuli compared with silhouette stimuli increased along with the primary and secondary axes, respectively, indicating the contribution of the multiple contour features and surface textures in the population responses. Our analyses suggested that V4 neurons simultaneously code multiple contour features in natural images and represent contour and surface properties in population.






    Constructing computational decoding models to account for the cortical representation of semantic information plays a crucial role in understanding visual perception. The human visual system processes interactive relationships among different objects when perceiving the semantic contents of natural visions. However, the existing semantic decoding models commonly regard categories as completely separate and independent visually and semantically and rarely consider the relationships from prior information. In this work, a novel semantic graph learning model was proposed to decode multiple semantic categories of perceived natural images from brain activity. The proposed model was validated on the functional magnetic resonance imaging data collected from five normal subjects while viewing 2750 natural images comprising 52 semantic categories. The results showed that the Graph Neural Network-based decoding model achieved higher accuracies than other deep neural network models. Moreover, the co-occurrence probability among semantic categories showed a significant correlation with the decoding accuracy. Additionally, the results suggested that semantic content organized in a hierarchical way with higher visual areas was more closely related to the internal visual experience. Together, this study provides a superior computational framework for multi-semantic decoding that supports the visual integration mechanism of semantic processing.






    Despite the natural occurrence of global and local daylight changes in natural scenes, the human visual system typically adapts well to these changes and develops stable colour perception. In a previous study, the influence of daylight characterized by its Correlated Colour Temperatures (CCT) on different chromatic descriptors was analysed (Ojeda et al., 2017). The results showed that chromatic information is almost constant for CCT values above 14,000 K, with local extremes occurring in the range of low CCTs. The aim of this work is to extend the analysis of the CCT dependence of the illuminant to those that consider the spatio-chromatic structure, including second order descriptors (gradients, spectral slope, spectral signature, and PCA) and higher order descriptors (kurtosis, skewness, and number of relevant colours). Our results show that most of the descriptors exhibit horizontal asymptotic behaviour for CCTs above 15,000 K and local extremes in the range of 3,900 K-9,600 K. For those descriptors that could be analysed in CIELAB space, sufficient statistical evidence was obtained to consider skewness, kurtosis, and the independent spectral slopes of the L* channel as equal in the range of CCTs used. However, the slight variations in spectral signatures and the directions of the principal components when applying PCA to image patches are not statistically significant and cannot be considered equal under different illuminants. The number of relevant colours (NRC) exhibits sensitivity to temperature variations and behaves similarly to the other descriptors, due to its small number.






    Binocular disparity is an important cue to three-dimensional shape. We assessed the contribution of this cue to the reliability and consistency of depth in stereoscopic photographs of natural scenes. Observers viewed photographs of cluttered scenes while adjusting a gauge figure to indicate the apparent three-dimensional orientation of the surfaces of objects. The gauge figure was positioned on the surfaces of objects at multiple points in the scene, and settings were made under monocular and binocular, stereoscopic viewing. Settings were used to create a depth relief map, indicating the apparent three-dimensional structure of the scene. We found that binocular cues increased the magnitude of apparent depth, the reliability of settings across repeated measures, and the consistency of perceived depth across participants. These results show that binocular cues make an important contribution to the precise and accurate perception of depth in natural scenes that contain multiple pictorial cues.






    The primary visual cortex signals the onset of light and dark stimuli with ON and OFF cortical pathways. Here, we demonstrate that both pathways generate similar response increments to large homogeneous surfaces and their response average increases with surface brightness. We show that, in cat visual cortex, response dominance from ON or OFF pathways is bimodally distributed when stimuli are smaller than one receptive field center but unimodally distributed when they are larger. Moreover, whereas small bright stimuli drive opposite responses from ON and OFF pathways (increased versus suppressed activity), large bright surfaces drive similar response increments. We show that this size-brightness relation emerges because strong illumination increases the size of light surfaces in nature and both ON and OFF cortical neurons receive input from ON thalamic pathways. We conclude that visual scenes are perceived as brighter when the average response increments from ON and OFF cortical pathways become stronger.






    Neural mechanisms of face perception are predominantly studied in well-controlled experimental settings that involve random stimulus sequences and fixed eye positions. Although powerful, the employed paradigms are far from what constitutes natural vision. Here, we demonstrate the feasibility of ecologically more valid experimental paradigms using natural viewing behaviour, by combining a free viewing paradigm on natural scenes, free of photographer bias, with advanced data processing techniques that correct for overlap effects and co-varying non-linear dependencies of multiple eye movement parameters. We validate this approach by replicating classic N170 effects in neural responses, triggered by fixation onsets (fixation event-related potentials [fERPs]). Importantly, besides finding a strong correlation between both experiments, our more natural stimulus paradigm yielded smaller variability between subjects than the classic setup. Moving beyond classic temporal and spatial effect locations, our experiment furthermore revealed previously unknown signatures of face processing: This includes category-specific modulation of the event-related potential (ERP)\'s amplitude even before fixation onset, as well as adaptation effects across subsequent fixations depending on their history.






    Perceiving 3D structure in natural images is an immense computational challenge for the visual system. While many previous studies focused on the perception of rigid 3D objects, we applied a novel method on a common set of non-rigid objects-static images of the human body in the natural world. We investigated to what extent human ability to interpret 3D poses in natural images depends on the typicality of the underlying 3D pose and the informativeness of the viewpoint. Using a novel 2AFC pose matching task, we measured how well subjects were able to match a target natural pose image with one of two comparison, synthetic body images from a different viewpoint-one was rendered with the same 3D pose parameters as the target while the other was a distractor rendered with added noises on joint angles. We found that performance for typical poses was measurably better than atypical poses; however, we found no significant difference between informative and less informative viewpoints. Further comparisons of 2D and 3D pose matching models on the same task showed that 3D body knowledge is particularly important when interpreting images of atypical poses. These results suggested that human ability to interpret 3D poses depends on pose typicality but not viewpoint informativeness, and that humans probably use prior knowledge of 3D pose structures.





