Natural scenes

自然场景
  • 文章类型: Journal Article
    这项研究调查了在不同光照下人类对自然颜色变化的期望。了解颜色期望是关于颜色恒定性的科学研究以及颜色和照明在艺术和工业中的应用的关键。我们重新分析了先前研究的不对称颜色匹配的数据,发现颜色调整倾向于与自然预测的光源诱导的颜色偏移一致。而不是人为的,发光体和反射率。我们使用自然场景的高光谱图像进行了三个实验,以测试参与者是否根据自然光源和反射光谱判断颜色变化比人造光谱更合理。这与他们的期望相矛盾。当我们在整个场景中不断地操纵光源(实验1)和反射(实验2)光谱时,观察者选择的自然效果图明显高于机会水平(>25%),但几乎没有比三个人造效果图中的任何一个更频繁,合(>50%)。然而,当我们只操纵一个对象/区域的反射率时(实验3),观察者更可靠地识别出物体具有自然反射率的版本,就像场景的其余部分一样。实验2-3和其他分析的结果表明,关系颜色恒定性强烈地促进了观察者的期望,和稳定的锥激发比不仅限于自然的照明和反射,但也发生在我们的人工渲染。我们的发现表明,有关表面颜色变化的关系颜色恒定性和先验知识有助于在光照变化下消除表面颜色身份的歧义,使人类观察者能够在自然条件下可靠地识别表面颜色。此外,关系颜色恒定性甚至在许多人工条件下都是有效的。
    This study investigates human expectations towards naturalistic colour changes under varying illuminations. Understanding colour expectations is key to both scientific research on colour constancy and applications of colour and lighting in art and industry. We reanalysed data from asymmetric colour matches of a previous study and found that colour adjustments tended to align with illuminant-induced colour shifts predicted by naturalistic, rather than artificial, illuminants and reflectances. We conducted three experiments using hyperspectral images of naturalistic scenes to test if participants judged colour changes based on naturalistic illuminant and reflectance spectra as more plausible than artificial ones, which contradicted their expectations. When we consistently manipulated the illuminant (Experiment 1) and reflectance (Experiment 2) spectra across the whole scene, observers chose the naturalistic renderings significantly above the chance level (>25 %) but barely more often than any of the three artificial ones, collectively (>50 %). However, when we manipulated only one object/area\'s reflectance (Experiment 3), observers more reliably identified the version in which the object had a naturalistic reflectance like the rest of the scene. Results from Experiments 2-3 and additional analyses suggested that relational colour constancy strongly contributed to observer expectations, and stable cone-excitation ratios are not limited to naturalistic illuminants and reflectances but also occur for our artificial renderings. Our findings indicate that relational colour constancy and prior knowledge about surface colour shifts help to disambiguate surface colour identity under illumination changes, enabling human observers to recognise surface colours reliably in naturalistic conditions. Additionally, relational colour constancy may even be effective in many artificial conditions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    多重压力暴露下茶叶生产的主要挑战对其全球市场可持续性产生了负面影响。因此,引入一种内场快速技术来监测茶叶的压力具有巨大的迫切需求。因此,这项研究旨在提出一种基于具有深度学习模型的便携式智能手机检测压力症状的有效方法。首先,开发了一个数据库,其中包含10,000多个复杂自然场景中的茶园树冠图像,其中包括健康(无压力)和三种类型的压力(茶炭疽病(TA),茶泡枯萎病(TB)和晒伤(SB))。然后,YOLOv5m和YOLOv8m算法适用于区分四种类型的压力症状;其中YOLOv8m算法在识别健康叶子方面取得了更好的性能(98%),TA(92.0%),TB(68.4%)和SB(75.5%)。此外,YOLOv8m算法用于构建TA疾病严重程度的鉴别模型,并取得了满意的结果,中度,严重的TA感染占94%,96%,91%,分别。此外,我们发现YOLOv8m的CNN内核可以有效地提取第2层图像的纹理特征,并且这些特征可以清楚地区分不同类型的压力症状。这对YOLOv8m模型实现四类应激症状的高精度区分做出了巨大贡献。总之,我们的研究提供了一个有效的系统来实现低成本,高精度,快,基于智能手机和深度学习算法的复杂自然场景下茶应激症状的现场诊断。
    The primary challenges in tea production under multiple stress exposures have negatively affected its global market sustainability, so introducing an infield fast technique for monitoring tea leaves\' stresses has tremendous urgent needs. Therefore, this study aimed to propose an efficient method for the detection of stress symptoms based on a portable smartphone with deep learning models. Firstly, a database containing over 10,000 images of tea garden canopies in complex natural scenes was developed, which included healthy (no stress) and three types of stress (tea anthracnose (TA), tea blister blight (TB) and sunburn (SB)). Then, YOLOv5m and YOLOv8m algorithms were adapted to discriminate the four types of stress symptoms; where the YOLOv8m algorithm achieved better performance in the identification of healthy leaves (98%), TA (92.0%), TB (68.4%) and SB (75.5%). Furthermore, the YOLOv8m algorithm was used to construct a model for differentiation of disease severity of TA, and a satisfactory result was obtained with the accuracy of mild, moderate, and severe TA infections were 94%, 96%, and 91%, respectively. Besides, we found that CNN kernels of YOLOv8m could efficiently extract the texture characteristics of the images at layer 2, and these characteristics can clearly distinguish different types of stress symptoms. This makes great contributions to the YOLOv8m model to achieve high-precision differentiation of four types of stress symptoms. In conclusion, our study provided an effective system to achieve low-cost, high-precision, fast, and infield diagnosis of tea stress symptoms in complex natural scenes based on smartphone and deep learning algorithms.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    听觉和视觉场景分析理论表明,对场景的感知依赖于其中对象的识别和隔离,类似于面向细节的处理风格。然而,在分析场景时可能会发生更全局的过程,这在视觉领域得到了证明。据我们了解,在听觉领域尚未探索类似的研究领域;因此,我们评估了高级别的全局和低级别的声学信息对听觉场景感知的贡献.另一个目的是通过使用和提供高质量听觉场景的新集合来提高该领域的生态有效性。参与者在8个全局属性上对场景进行了评级(例如,开放式vs.封闭),并通过声学分析评估哪些低级特征预测了评级。我们将整体属性的声学度量和平均评级提交给单独的探索性因子分析(EFAs)。声学测量的EFA揭示了一个七因素结构,解释了数据中57%的方差,而全球房地产衡量标准的EFA揭示了一种双因素结构,解释了数据中64%的方差。回归分析显示,每个全局特性均由至少一个声学变量(R2=0.33-0.87)预测。这些发现使用深度神经网络模型进行了扩展,我们研究了人类对全局属性的评价与两种计算模型的深度嵌入之间的相关性:基于对象的模型和基于场景的模型。结果支持对场景设置的全球分析更有力地解释了参与者的评级,尽管场景感知和听觉感知之间的关系是多方面的,两个模型之间明显存在不同的相关模式。一起来看,我们的结果为从全局角度感知听觉场景的能力提供了证据.一些声学测量预测了全球场景感知的评级,暗示听觉对象的表示可以通过腹侧听觉流中的许多处理阶段来转换,类似于腹侧视觉流中提出的内容。这些发现和我们场景收集的开放可用性将使未来的研究感知,注意,和自然听觉场景的记忆可能。
    Theories of auditory and visual scene analysis suggest the perception of scenes relies on the identification and segregation of objects within it, resembling a detail-oriented processing style. However, a more global process may occur while analyzing scenes, which has been evidenced in the visual domain. It is our understanding that a similar line of research has not been explored in the auditory domain; therefore, we evaluated the contributions of high-level global and low-level acoustic information to auditory scene perception. An additional aim was to increase the field\'s ecological validity by using and making available a new collection of high-quality auditory scenes. Participants rated scenes on 8 global properties (e.g., open vs. enclosed) and an acoustic analysis evaluated which low-level features predicted the ratings. We submitted the acoustic measures and average ratings of the global properties to separate exploratory factor analyses (EFAs). The EFA of the acoustic measures revealed a seven-factor structure explaining 57% of the variance in the data, while the EFA of the global property measures revealed a two-factor structure explaining 64% of the variance in the data. Regression analyses revealed each global property was predicted by at least one acoustic variable (R2 = 0.33-0.87). These findings were extended using deep neural network models where we examined correlations between human ratings of global properties and deep embeddings of two computational models: an object-based model and a scene-based model. The results support that participants\' ratings are more strongly explained by a global analysis of the scene setting, though the relationship between scene perception and auditory perception is multifaceted, with differing correlation patterns evident between the two models. Taken together, our results provide evidence for the ability to perceive auditory scenes from a global perspective. Some of the acoustic measures predicted ratings of global scene perception, suggesting representations of auditory objects may be transformed through many stages of processing in the ventral auditory stream, similar to what has been proposed in the ventral visual stream. These findings and the open availability of our scene collection will make future studies on perception, attention, and memory for natural auditory scenes possible.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    量化场景记忆的准确性和精确度是具有挑战性的,因为不清楚“空间”场景占据了什么(我们如何在误记自然场景时量化错误?)。为了解决这个问题,我们利用了生态有效性,场景发生并表示的度量空间:路由。在延迟估计任务中,参与者短暂地看到了从户外“路线循环”视频中绘制的目标场景,然后使用路线的连续报告轮精确定位场景。准确性很高,没有偏见,表明没有净边界延伸/收缩。有趣的是,自相似的路线的精度更高(以半衰期为特征,以米为单位,路线的多尺度结构相似性指数),与以前的工作一致,发现“相似性优势”,其中内存精度根据任务需求进行调整。总的来说,场景被记住在几米之内他们的实际位置。
    It is challenging to quantify the accuracy and precision of scene memory because it is unclear what \'space\' scenes occupy (how can we quantify error when misremembering a natural scene?). To address this, we exploited the ecologically valid, metric space in which scenes occur and are represented: routes. In a delayed estimation task, participants briefly saw a target scene drawn from a video of an outdoor \'route loop\', then used a continuous report wheel of the route to pinpoint the scene. Accuracy was high and unbiased, indicating there was no net boundary extension/contraction. Interestingly, precision was higher for routes that were more self-similar (as characterized by the half-life, in meters, of a route\'s Multiscale Structural Similarity index), consistent with previous work finding a \'similarity advantage\' where memory precision is regulated according to task demands. Overall, scenes were remembered to within a few meters of their actual location.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    为了创造连贯的视觉体验,大脑在空间上整合了它从环境中接收到的复杂而动态的信息。我们以前证明,当两个空间和时间上连贯的自然输入可以集成到统一的感知中时,与反馈相关的alpha活动会携带刺激特定的信息。在这项研究中,我们试图确定这种整合相关的α动力学是否由视觉输入中的分类一致性触发.在脑电图实验中,我们通过在左右视觉半场的两个孔呈现来自相同或不同类别的成对视频来操纵一致性的程度。严重的,视频对可以是视频级相干的(即,源于同一视频),在它们的基本层次类别中连贯,在它们的上级类别中连贯,或不连贯(即,来自两个完全不同类别的视频)。我们对节律性EEG反应进行了多变量分类分析,以在每种情况下的视频刺激之间进行解码。作为关键结果,我们显著解码了视频级相干和基本级相干刺激,但不是上级连贯和不连贯的刺激,来自皮质阿尔法节律。这表明阿尔法动力学在跨空间整合信息中起着关键作用,并且皮层整合过程足够灵活,可以容纳来自相同基本级别类别的不同样本的信息。
    To create coherent visual experiences, the brain spatially integrates the complex and dynamic information it receives from the environment. We previously demonstrated that feedback-related alpha activity carries stimulus-specific information when two spatially and temporally coherent naturalistic inputs can be integrated into a unified percept. In this study, we sought to determine whether such integration-related alpha dynamics are triggered by categorical coherence in visual inputs. In an EEG experiment, we manipulated the degree of coherence by presenting pairs of videos from the same or different categories through two apertures in the left and right visual hemifields. Critically, video pairs could be video-level coherent (i.e., stem from the same video), coherent in their basic-level category, coherent in their superordinate category, or incoherent (i.e., stem from videos from two entirely different categories). We conducted multivariate classification analyses on rhythmic EEG responses to decode between the video stimuli in each condition. As the key result, we significantly decoded the video-level coherent and basic-level coherent stimuli, but not the superordinate coherent and incoherent stimuli, from cortical alpha rhythms. This suggests that alpha dynamics play a critical role in integrating information across space, and that cortical integration processes are flexible enough to accommodate information from different exemplars of the same basic-level category.NEW & NOTEWORTHY Our brain integrates dynamic inputs across the visual field to create coherent visual experiences. Such integration processes have previously been linked to cortical alpha dynamics. In this study, the integration-related alpha activity was observed not only when snippets from the same video were presented, but also when different video snippets from the same basic-level category were presented, highlighting the flexibility of neural integration processes.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    大多数脊椎动物使用头部和眼睛运动来快速改变注视方向,并以稳定的固定时间采样环境的不同部分。视觉信息必须跨注视进行整合,以构建视觉环境的完整视角。与这种抽样策略一致,神经元适应不变的输入,以节省能量,并确保只处理来自每个固定的新信息。我们演示了适应恢复时间和扫视属性如何相互作用,从而形成在小鼠的运动和视觉系统中观察到的时空权衡,猫,Marmosets,猕猴,和人类。这些权衡预测,为了随着时间的推移实现相似的视觉覆盖,接受野大小较小的动物需要更快的扫视速度。的确,当将扫视行为的测量与感受野大小和V1神经元密度相结合时,我们发现哺乳动物中神经元种群对视觉环境的可比采样。我们建议这些哺乳动物共享一个共同的统计驱动策略,即随着时间的推移,根据其各自的视觉系统特征保持其视觉环境的覆盖范围。重要性声明哺乳动物迅速移动他们的眼睛,在连续的注视中采样他们的视觉环境,但是他们使用不同的空间和时间策略进行采样。我们证明了这些不同的策略随着时间的推移实现了相似的神经元感受野覆盖。因为哺乳动物具有不同的感觉感受野大小和神经元密度,用于采样和处理信息,它们需要不同的眼动策略来编码自然场景。
    Most vertebrates use head and eye movements to quickly change gaze orientation and sample different portions of the environment with periods of stable fixation. Visual information must be integrated across fixations to construct a complete perspective of the visual environment. In concert with this sampling strategy, neurons adapt to unchanging input to conserve energy and ensure that only novel information from each fixation is processed. We demonstrate how adaptation recovery times and saccade properties interact and thus shape spatiotemporal tradeoffs observed in the motor and visual systems of mice, cats, marmosets, macaques, and humans. These tradeoffs predict that in order to achieve similar visual coverage over time, animals with smaller receptive field sizes require faster saccade rates. Indeed, we find comparable sampling of the visual environment by neuronal populations across mammals when integrating measurements of saccadic behavior with receptive field sizes and V1 neuronal density. We propose that these mammals share a common statistically driven strategy of maintaining coverage of their visual environment over time calibrated to their respective visual system characteristics.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    随着照明颜色的变化,场景中的表面的颜色可能看起来不恒定。然而,即使颜色恒定性失败,人类观察者通常可以将照明的变化与表面反射特性的变化区分开。这种操作能力归因于光源变化下表面之间感知的颜色关系的恒定性。反过来基于近似不变的空间比率的锥形感光体激发。这些比率的自然偏差可能,然而,导致光源变化被错误识别。这项工作的目的是测试这种错误识别是否发生在自然场景中,以及它们是否是由于关系颜色恒定性的失败所致。来自高光谱数据的场景图像对并排显示在计算机控制的显示器上。在一边,场景经历了照明变化,另一边,它经历了相同的变化,但图像校正了空间比的任何残余偏差。观察者系统地将校正后的图像错误地识别为由于发光体的变化。误差的频率随着偏差的大小而增加,与估计的相关颜色恒定性故障密切相关。
    The colours of surfaces in a scene may not appear constant with a change in the colour of the illumination. Yet even when colour constancy fails, human observers can usually discriminate changes in lighting from changes in surface reflecting properties. This operational ability has been attributed to the constancy of perceived colour relations between surfaces under illuminant changes, in turn based on approximately invariant spatial ratios of cone photoreceptor excitations. Natural deviations in these ratios may, however, lead to illuminant changes being misidentified. The aim of this work was to test whether such misidentifications occur with natural scenes and whether they are due to failures in relational colour constancy. Pairs of scene images from hyperspectral data were presented side-by-side on a computer-controlled display. On one side, the scene underwent illuminant changes and on the other side, it underwent the same changes but with images corrected for any residual deviations in spatial ratios. Observers systematically misidentified the corrected images as being due to illuminant changes. The frequency of errors increased with the size of the deviations, which were closely correlated with the estimated failures in relational colour constancy.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    视网膜在视网膜神经节细胞(RGC)的尖峰活动中向大脑传输视觉信号。该信号必然是不完美的:一些视觉信息在光传导和视网膜处理中丢失。为了量化传输的视觉信号,我们开发了一种贝叶斯方法,可以从同时记录的四种主要类型的数百个猕猴RGC的尖峰中重建图像。该算法结合了RGC光响应的随机似然模型,该模型适合于尖峰数据,与自然图像的先验模型隐式嵌入在人工神经网络训练图像去噪。当应用于视网膜人群对闪烁图像和抖动图像的反应时,以模拟固定眼球运动,该方法提供的重建性能超过或匹配所有先前的重建算法,在一个可解释的分析框架中,提供了对神经代码的洞察。在行为相关范围内(即使抖动轨迹未知),随着抖动幅度的增加,重建也得到了改善。显示固定眼球运动改善而不是降低视网膜信号。通过小至5毫秒的尖峰时间的人为扰动来降级重建,揭示了比以前研究预期的更精细的时间编码精度。在编码模型中消融细胞间相互作用大大降低了重建质量,表明刺激诱发的相关性在表示视觉场景中的重要性。因此,注视眼球运动有助于高度精确的视网膜种群活动,使视觉信号更准确地传输到大脑。
    Fixational eye movements alter the number and timing of spikes transmitted from the retina to the brain, but whether these changes enhance or degrade the retinal signal is unclear. To quantify this, we developed a Bayesian method for reconstructing natural images from the recorded spikes of hundreds of retinal ganglion cells (RGCs) in the macaque retina (male), combining a likelihood model for RGC light responses with the natural image prior implicitly embedded in an artificial neural network optimized for denoising. The method matched or surpassed the performance of previous reconstruction algorithms, and provides an interpretable framework for characterizing the retinal signal. Reconstructions were improved with artificial stimulus jitter that emulated fixational eye movements, even when the eye movement trajectory was assumed to be unknown and had to be inferred from retinal spikes. Reconstructions were degraded by small artificial perturbations of spike times, revealing more precise temporal encoding than suggested by previous studies. Finally, reconstructions were substantially degraded when derived from a model that ignored cell-to-cell interactions, indicating the importance of stimulus-evoked correlations. Thus, fixational eye movements enhance the precision of the retinal representation.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    早期视觉皮层的大部分神经机制,从局部方向的提取到通过横向交互的上下文调制,被认为已经发展到在自然场景中提供轮廓的稀疏编码,让大脑有效地处理我们接触到的大部分视觉场景。某些视觉刺激,然而,造成视觉压力,一系列副作用,从简单的不适到偏头痛发作,极端的癫痫发作,所有与过度代谢需求有关的现象。有效编码理论表明,过度的代谢需求与偏离自然统计的图像之间存在联系。然而,将能源需求和图像空间内容联系在一起的机制仍然难以捉摸。这里,我们使用视觉编码理论,将图像空间结构和大脑激活联系起来,以表征在早期视觉皮层的基于生物学的神经动力学模型中观察者对图像的反应,该模型包括兴奋层和抑制层,以实现情境影响.我们发现了三个清晰的令人厌恶的图像标记:模型中更大的整体激活,不太稀疏的反应,活动在空间方向上的分布更加不平衡。当模型中激发与抑制的比率增加时,一种假设是个体间视觉不适易感性差异的基础的现象,不适的三个标记逐渐向对不舒服刺激的反应的典型值转移。总的来说,这些发现为为什么图像之间和观察者之间存在差异提供了统一的机械解释,这表明视觉输入和特殊的过度兴奋是如何引起大脑异常反应的,从而导致视觉应激。
    Much of the neural machinery of the early visual cortex, from the extraction of local orientations to contextual modulations through lateral interactions, is thought to have developed to provide a sparse encoding of contour in natural scenes, allowing the brain to process efficiently most of the visual scenes we are exposed to. Certain visual stimuli, however, cause visual stress, a set of adverse effects ranging from simple discomfort to migraine attacks, and epileptic seizures in the extreme, all phenomena linked with an excessive metabolic demand. The theory of efficient coding suggests a link between excessive metabolic demand and images that deviate from natural statistics. Yet, the mechanisms linking energy demand and image spatial content in discomfort remain elusive. Here, we used theories of visual coding that link image spatial structure and brain activation to characterize the response to images observers reported as uncomfortable in a biologically based neurodynamic model of the early visual cortex that included excitatory and inhibitory layers to implement contextual influences. We found three clear markers of aversive images: a larger overall activation in the model, a less sparse response, and a more unbalanced distribution of activity across spatial orientations. When the ratio of excitation over inhibition was increased in the model, a phenomenon hypothesised to underlie interindividual differences in susceptibility to visual discomfort, the three markers of discomfort progressively shifted toward values typical of the response to uncomfortable stimuli. Overall, these findings propose a unifying mechanistic explanation for why there are differences between images and between observers, suggesting how visual input and idiosyncratic hyperexcitability give rise to abnormal brain responses that result in visual stress.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    理解自然场景视觉代码的电路机制是感觉神经科学的中心目标。我们证明了三层网络模型预测视网膜自然场景响应的准确性接近实验极限。模型的内部结构是可解释的,作为中间神经元单独记录和不直接建模与模型中间神经元高度相关。仅适用于自然场景的模型再现了与运动编码相关的各种现象,适应,和预测编码,建立它们与自然视觉计算的行为学相关性。一种新方法将模型神经节细胞的计算分解成模型中间神经元的贡献,允许自动生成关于具有不同时空反应的中间神经元如何组合以生成视网膜计算的新假设,包括目前缺乏解释的预测现象。我们的结果证明了一种统一而通用的方法来研究自然视觉场景下行为学视网膜计算的电路机制。
    Understanding the circuit mechanisms of the visual code for natural scenes is a central goal of sensory neuroscience. We show that a three-layer network model predicts retinal natural scene responses with an accuracy nearing experimental limits. The model\'s internal structure is interpretable, as interneurons recorded separately and not modeled directly are highly correlated with model interneurons. Models fitted only to natural scenes reproduce a diverse set of phenomena related to motion encoding, adaptation, and predictive coding, establishing their ethological relevance to natural visual computation. A new approach decomposes the computations of model ganglion cells into the contributions of model interneurons, allowing automatic generation of new hypotheses for how interneurons with different spatiotemporal responses are combined to generate retinal computations, including predictive phenomena currently lacking an explanation. Our results demonstrate a unified and general approach to study the circuit mechanisms of ethological retinal computations under natural visual scenes.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号