voice processing

  • 文章类型: Journal Article
    许多动物可以从其他物种的发声中提取有用的信息。神经影像学研究已经证明了对灵长类动物大脑皮层的特定认知敏感的区域,但是这些区域如何处理异质发声仍不清楚。使用fMRI引导的电生理学,我们记录了两只猕猴的前颞声带中单个神经元的尖峰活动,同时它们听着复杂的声音,包括来自几个物种的发声。除了对同种猕猴发声有选择性的细胞,我们发现了一个对人类声音具有强烈选择性的神经元亚群,不仅仅是用声音的频谱或时间结构来解释。由这些神经元实现的听觉表征几何结构与通过神经成像在人类语音区域中测得的几何结构密切相关,而与低级声学结构的关系较弱。这些发现为涉及听觉专业知识和灵长类动物通信系统进化的神经机制提供了新的见解。
    Many animals can extract useful information from the vocalizations of other species. Neuroimaging studies have evidenced areas sensitive to conspecific vocalizations in the cerebral cortex of primates, but how these areas process heterospecific vocalizations remains unclear. Using fMRI-guided electrophysiology, we recorded the spiking activity of individual neurons in the anterior temporal voice patches of two macaques while they listened to complex sounds including vocalizations from several species. In addition to cells selective for conspecific macaque vocalizations, we identified an unsuspected subpopulation of neurons with strong selectivity for human voice, not merely explained by spectral or temporal structure of the sounds. The auditory representational geometry implemented by these neurons was strongly related to that measured in the human voice areas with neuroimaging and only weakly to low-level acoustical structure. These findings provide new insights into the neural mechanisms involved in auditory expertise and the evolution of communication systems in primates.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    最近的研究已经检查了面部和语音处理的关联程度,因为两者都进入了一个普通的人感知系统。然而,现有的研究结果尚未完全阐明熟悉度在这种关联中的作用.鉴于此,提出了两个实验,以检查不熟悉的刺激(实验1)和熟悉的刺激(实验2)的面部语音相关性。注意使用避免地板和天花板效果的任务,并使用逼真的基于语音的语音剪辑,结果表明,当识别不熟悉的个体时,面部和语音处理之间存在显著的正相关,但规模很小。相比之下,匹配熟悉个体时的相关性是显著和正的,但要大得多。结果支持现有文献,这些文献表明面部和语音处理是总体人感知系统的组成部分。然而,它们之间关联程度的差异强化了这样一种观点,即熟悉和不熟悉的刺激是以不同的方式处理的。这可能反映了在处理熟悉的面孔和声音时,神经架构中预先存在的心理表征和串扰的重要性。然而,在处理不熟悉的面孔和声音时,依赖于更肤浅的基于刺激和特定模态的分析。
    Recent research has examined the extent to which face and voice processing are associated by virtue of the fact that both tap into a common person perception system. However, existing findings do not yet fully clarify the role of familiarity in this association. Given this, two experiments are presented that examine face-voice correlations for unfamiliar stimuli (Experiment 1) and for familiar stimuli (Experiment 2). With care being taken to use tasks that avoid floor and ceiling effects and that use realistic speech-based voice clips, the results suggested a significant positive but small-sized correlation between face and voice processing when recognizing unfamiliar individuals. In contrast, the correlation when matching familiar individuals was significant and positive, but much larger. The results supported the existing literature suggesting that face and voice processing are aligned as constituents of an overarching person perception system. However, the difference in magnitude of their association here reinforced the view that familiar and unfamiliar stimuli are processed in different ways. This likely reflects the importance of a pre-existing mental representation and cross-talk within the neural architectures when processing familiar faces and voices, and yet the reliance on more superficial stimulus-based and modality-specific analysis when processing unfamiliar faces and voices.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    视觉系统在出生时还没有完全成熟,并且在整个婴儿期一直持续发展,直到在儿童后期和青春期达到成人水平。在出生后和视觉成熟之前,视力的破坏会导致视觉处理的缺陷,进而可能会影响互补感官的发展。研究在出生后早期发育过程中手术切除一只眼睛的人是了解感觉发育的时间表以及双眼在视觉系统成熟中的作用的有用模型。对于低水平和高水平的视觉刺激,都观察到了生命早期一只眼睛丧失后的自适应听觉和视听可塑性。值得注意的是,在生命早期摘除一只眼睛的人对McGurk效应的感知远低于双眼对照。
    当前的研究调查了在生命后期摘除一只眼睛的人是否也存在多感官代偿机制,产后视觉系统成熟后,通过测量他们是否感知到McGurk效应,与双眼对照和早期切除一只眼睛的人相比。
    在生命后期摘除一只眼睛的人感觉到McGurk效应类似于双眼观看控制,不像那些在生命早期切除一只眼睛的人。
    这表明手术取眼时基于年龄的多感觉代偿机制存在差异。这些结果表明,双眼性丧失的跨模态适应可能取决于皮质发育过程中的可塑性水平。
    UNASSIGNED: The visual system is not fully mature at birth and continues to develop throughout infancy until it reaches adult levels through late childhood and adolescence. Disruption of vision during this postnatal period and prior to visual maturation results in deficits of visual processing and in turn may affect the development of complementary senses. Studying people who have had one eye surgically removed during early postnatal development is a useful model for understanding timelines of sensory development and the role of binocularity in visual system maturation. Adaptive auditory and audiovisual plasticity following the loss of one eye early in life has been observed for both low-and high-level visual stimuli. Notably, people who have had one eye removed early in life perceive the McGurk effect much less than binocular controls.
    UNASSIGNED: The current study investigates whether multisensory compensatory mechanisms are also present in people who had one eye removed late in life, after postnatal visual system maturation, by measuring whether they perceive the McGurk effect compared to binocular controls and people who have had one eye removed early in life.
    UNASSIGNED: People who had one eye removed late in life perceived the McGurk effect similar to binocular viewing controls, unlike those who had one eye removed early in life.
    UNASSIGNED: This suggests differences in multisensory compensatory mechanisms based on age at surgical eye removal. These results indicate that cross-modal adaptations for the loss of binocularity may be dependent on plasticity levels during cortical development.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    通过声音识别人类说话者是一种非凡的认知能力。先前的研究已经在右颞叶皮层中建立了一个声音区域,该区域涉及说话者特定的声学特征的整合。这种整合似乎迅速发生,特别是在熟悉的声音的情况下。然而,这个过程的确切时间过程不太清楚。为此,我们在这里研究了人脑的自动变化检测响应,同时听着德国总理安格拉·默克尔的著名声音,嵌入在声音匹配的背景下。经典的被动怪球范式将默克尔发出的短词刺激与两个不熟悉的女性说话者发出的词刺激进行了对比。来自21名参与者的电生理语音处理指数被量化为失配负度(MMN)和P3a差异。通过可变分辨率电磁层析成像近似皮质源。结果显示了MMN和P3a的幅度和延迟效应:与不熟悉的声音相比,著名的(熟悉的)声音引起的MMN较小但较早。P3a,相比之下,对于熟悉的人来说,比不熟悉的声音更大,也更晚。熟悉的声音MMN起源于颞叶皮层的右半球区域,与时间语音区域重叠,而不熟悉的声音MMN源于左颞上回。这些结果表明,非常著名的语音的处理依赖于在声学信号的前150毫秒内的预先注意的正确时间处理。这些发现进一步加深了我们对熟悉的语音处理基础的神经动力学的理解。
    The recognition of human speakers by their voices is a remarkable cognitive ability. Previous research has established a voice area in the right temporal cortex involved in the integration of speaker-specific acoustic features. This integration appears to occur rapidly, especially in case of familiar voices. However, the exact time course of this process is less well understood. To this end, we here investigated the automatic change detection response of the human brain while listening to the famous voice of German chancellor Angela Merkel, embedded in the context of acoustically matched voices. A classic passive oddball paradigm contrasted short word stimuli uttered by Merkel with word stimuli uttered by two unfamiliar female speakers. Electrophysiological voice processing indices from 21 participants were quantified as mismatch negativities (MMNs) and P3a differences. Cortical sources were approximated by variable resolution electromagnetic tomography. The results showed amplitude and latency effects for both MMN and P3a: The famous (familiar) voice elicited a smaller but earlier MMN than the unfamiliar voices. The P3a, by contrast, was both larger and later for the familiar than for the unfamiliar voices. Familiar-voice MMNs originated from right-hemispheric regions in temporal cortex, overlapping with the temporal voice area, while unfamiliar-voice MMNs stemmed from left superior temporal gyrus. These results suggest that the processing of a very famous voice relies on pre-attentive right temporal processing within the first 150 ms of the acoustic signal. The findings further our understanding of the neural dynamics underlying familiar voice processing.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    这项研究调查了说话者的身份产生的语音对句子处理的影响。我们研究了与语音开始时的语音处理(N100和P200)相关的ERP组件与与关键单词开始时的句子处理(N400和后期积极性)相关的ERP组件之间的关系。我们向荷兰语母语人士提供包含真实(和已知)信息的句子,未知(但真实)的信息或违反世界知识的信息,并让他们执行真相评估任务。句子是用母语或外国口音说话的。对于母语和外国口音的说话者所说的陈述,真相评估判断没有什么不同。与母语人士相比,在回应外国人士的声音时,观察到N100和P200降低。虽然包含未知信息或世界知识违规的语句在本机条件下生成的N400比真实语句大,他们在外国条件下没有显著差异,表明对外国口音语音的处理更浅。N100是N400的重要预测因子,因为与母语人士相比,外国人士观察到的N100减少与较小的N400效应有关。这些发现表明,听众从语音中迅速形成的说话者的印象会影响语义处理,这证实了说话者的身份和语言理解不能分离。
    This study investigated the impact of the speaker\'s identity generated by the voice on sentence processing. We examined the relation between ERP components associated with the processing of the voice (N100 and P200) from voice onset and those associated with sentence processing (N400 and late positivity) from critical word onset. We presented Dutch native speakers with sentences containing true (and known) information, unknown (but true) information or information violating world knowledge and had them perform a truth evaluation task. Sentences were spoken either in a native or a foreign accent. Truth evaluation judgments were not different for statements spoken by the native-accented and the foreign-accented speakers. Reduced N100 and P200 were observed in response to the foreign speaker\'s voice compared to the native speaker\'s. While statements containing unknown information or world knowledge violations generated a larger N400 than true statements in the native condition, they were not significantly different in the foreign condition, suggesting shallower processing of foreign-accented speech. The N100 was a significant predictor for the N400 in that the reduced N100 observed for the foreign speaker compared to the native speaker was related to a smaller N400 effect. These finding suggest that the impression of the speaker that listeners rapidly form from the voice affects semantic processing, which confirms that speaker\'s identity and language comprehension cannot be dissociated.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    Prader-Willi syndrome (PWS) is a rare and complex neurodevelopmental disorder of genetic origin. It manifests itself in endocrine and cognitive problems, including highly pronounced hyperphagia and severe obesity. In many cases, impaired acquisition of social and communication skills leads to autism spectrum features, and individuals with this syndrome are occasionally diagnosed with autism spectrum disorder (ASD) using specific scales. Given that communicational skills are largely based on vocal communication, it is important to study human voice processing in PWS. We were able to examine a large number of participants with PWS (N = 61) recruited from France\'s national reference center for PWS and other hospitals. We tested their voice and nonvoice recognition abilities, as well as their ability to distinguish between voices and nonvoices in a free choice task. We applied the hierarchical drift diffusion model (HDDM) with Bayesian estimation to compare decision-making in participants with PWS and controls.
    We found that PWS participants were impaired on both voice and nonvoice processing, but displayed a compensatory ability to perceive voices. Participants with uniparental disomy had poorer voice and nonvoice perception than participants with a deletion on chromosome 15. The HDDM allowed us to demonstrate that participants with PWS need to accumulate more information in order to make a decision, are slower at decision-making, and are predisposed to voice perception, albeit to a lesser extent than controls.
    The categorization of voices and nonvoices is generally preserved in participants with PWS, though this may not be the case for the lowest IQ.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    在巴西,出于法医目的,对说话人的识别仍然依赖于通过对不可信技术的结果分析基于主观性的决策过程。由于缺乏语音数据库,发言人验证目前适用于专门为对抗而收集的样本。然而,通过有争议的话语进行说话者比较分析需要为一系列个体收集过量的语音样本。Further,识别系统必须告知谁与预先选择的个人的有争议的声音最兼容。因此,本文提出了使用线性预测编码(LPC)和普通最小二乘(OLS)的组合作为取证分析的说话者验证工具。拟议的识别技术建立了作为法医报告基础的置信度和相似性,指示有争议的话语的说话者的验证。因此,在本文中,一个准确的,快,帮助验证说话者是否有贡献的替代方法。在运行了七个不同的测试之后,考虑到有限的数据集(巴西葡萄牙语),这项研究初步实现了100%的命中率。此外,所开发的方法提取了大量的共振峰,这对于通过OLS进行统计比较是必不可少的。所提出的框架在一定的噪声水平下是稳健的,对于抑制单词变化的句子,并且具有不同的质量甚至有意义的音频时间差。
    In Brazil, the recognition of speakers for forensic purposes still relies on a subjectivity-based decision-making process through a results analysis of untrustworthy techniques. Owing to the lack of a voice database, speaker verification is currently applied to samples specifically collected for confrontation. However, speaker comparative analysis via contested discourse requires the collection of an excessive amount of voice samples for a series of individuals. Further, the recognition system must inform who is the most compatible with the contested voice from pre-selected individuals. Accordingly, this paper proposes using a combination of linear predictive coding (LPC) and ordinary least squares (OLS) as a speaker verification tool for forensic analysis. The proposed recognition technique establishes confidence and similarity upon which to base forensic reports, indicating verification of the speaker of the contested discourse. Therefore, in this paper, an accurate, quick, alternative method to help verify the speaker is contributed. After running seven different tests, this study preliminarily achieved a hit rate of 100% considering a limited dataset (Brazilian Portuguese). Furthermore, the developed method extracts a larger number of formants, which are indispensable for statistical comparisons via OLS. The proposed framework is robust at certain levels of noise, for sentences with the suppression of word changes, and with different quality or even meaningful audio time differences.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    Several studies report sex differences in sensitivity to gendered stimuli. We assume many of these to reflect differences as to the sex to which one feels attracted rather than to biological sex per se. Investigating voice perception, a function of high social relevance, we show that the behavioural and neural (BOLD) responses to male and female voices are mediated by sex and sexual orientation. In heterosexual men and women, we found an opposite-sex effect, reflected in higher classification accuracy for and a response bias towards voices of the other sex, while the effect became apparent as same-sex effect in homosexual men and women. Overall, sexual orientation had a greater impact in women than in men and homosexual women were closer to men in their behavioural responses to female voices. The activation patterns were similar for hetero- and homosexual men, both groups showing increased activation in response to male compared to female voices in regions distributed across the temporo-parietal and insular cortex. In contrast, women had increased activation in response to voices of the desired sex. It appears that both sex and sexual orientation impact on a function as basal as voice perception. Our results underline the need to assess sexual orientation in study participants if conclusions on sex differences shall be drawn. Many of the reported sex differences in behaviour and brain function might be mediated by sexual orientation and we encourage further research into the interplay between sex and sexual orientation.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    Previously, we have shown that people who have had one eye surgically removed early in life during visual development have enhanced sound localization [1] and lack visual dominance, commonly observed in binocular and monocular (eye-patched) viewing controls [2]. Despite these changes, people with one eye integrate auditory and visual components of multisensory events optimally [3]. The current study investigates how people with one eye perceive the McGurk effect, an audiovisual illusion where a new syllable is perceived when visual lip movements do not match the corresponding sound [4]. We compared individuals with one eye to binocular and monocular viewing controls and found that they have a significantly smaller McGurk effect compared to binocular controls. Additionally, monocular controls tended to perceive the McGurk effect less often than binocular controls suggesting a small transient modulation of the McGurk effect. These results suggest altered weighting of the auditory and visual modalities with both short and long-term monocular viewing. These results indicate the presence of permanent adaptive perceptual accommodations in people who have lost one eye early in life that may serve to mitigate the loss of binocularity during early brain development.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    Person identification, especially in critical environments, has always been a subject of great interest. However, it has gained a new dimension in a world threatened by a new kind of terrorism that uses social networks (e.g., YouTube) to broadcast its message. In this new scenario, classical identification methods (such as fingerprints or face recognition) have been forcedly replaced by alternative biometric characteristics such as voice, as sometimes this is the only feature available. The present study benefits from the advances achieved during last years in understanding and modeling voice production. The paper hypothesizes that a gender-dependent characterization of speakers combined with the use of a set of features derived from the components, resulting from the deconstruction of the voice into its glottal source and vocal tract estimates, will enhance recognition rates when compared to classical approaches. A general description about the main hypothesis and the methodology followed to extract the gender-dependent extended biometric parameters is given. Experimental validation is carried out both on a highly controlled acoustic condition database, and on a mobile phone network recorded under non-controlled acoustic conditions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

公众号