voice processing

  • 文章类型: Journal Article
    Many animals can extract useful information from the vocalizations of other species. Neuroimaging studies have evidenced areas sensitive to conspecific vocalizations in the cerebral cortex of primates, but how these areas process heterospecific vocalizations remains unclear. Using fMRI-guided electrophysiology, we recorded the spiking activity of individual neurons in the anterior temporal voice patches of two macaques while they listened to complex sounds including vocalizations from several species. In addition to cells selective for conspecific macaque vocalizations, we identified an unsuspected subpopulation of neurons with strong selectivity for human voice, not merely explained by spectral or temporal structure of the sounds. The auditory representational geometry implemented by these neurons was strongly related to that measured in the human voice areas with neuroimaging and only weakly to low-level acoustical structure. These findings provide new insights into the neural mechanisms involved in auditory expertise and the evolution of communication systems in primates.






  • 文章类型: Journal Article
    Recent research has examined the extent to which face and voice processing are associated by virtue of the fact that both tap into a common person perception system. However, existing findings do not yet fully clarify the role of familiarity in this association. Given this, two experiments are presented that examine face-voice correlations for unfamiliar stimuli (Experiment 1) and for familiar stimuli (Experiment 2). With care being taken to use tasks that avoid floor and ceiling effects and that use realistic speech-based voice clips, the results suggested a significant positive but small-sized correlation between face and voice processing when recognizing unfamiliar individuals. In contrast, the correlation when matching familiar individuals was significant and positive, but much larger. The results supported the existing literature suggesting that face and voice processing are aligned as constituents of an overarching person perception system. However, the difference in magnitude of their association here reinforced the view that familiar and unfamiliar stimuli are processed in different ways. This likely reflects the importance of a pre-existing mental representation and cross-talk within the neural architectures when processing familiar faces and voices, and yet the reliance on more superficial stimulus-based and modality-specific analysis when processing unfamiliar faces and voices.






  • 文章类型: Journal Article
    UNASSIGNED: The visual system is not fully mature at birth and continues to develop throughout infancy until it reaches adult levels through late childhood and adolescence. Disruption of vision during this postnatal period and prior to visual maturation results in deficits of visual processing and in turn may affect the development of complementary senses. Studying people who have had one eye surgically removed during early postnatal development is a useful model for understanding timelines of sensory development and the role of binocularity in visual system maturation. Adaptive auditory and audiovisual plasticity following the loss of one eye early in life has been observed for both low-and high-level visual stimuli. Notably, people who have had one eye removed early in life perceive the McGurk effect much less than binocular controls.
    UNASSIGNED: The current study investigates whether multisensory compensatory mechanisms are also present in people who had one eye removed late in life, after postnatal visual system maturation, by measuring whether they perceive the McGurk effect compared to binocular controls and people who have had one eye removed early in life.
    UNASSIGNED: People who had one eye removed late in life perceived the McGurk effect similar to binocular viewing controls, unlike those who had one eye removed early in life.
    UNASSIGNED: This suggests differences in multisensory compensatory mechanisms based on age at surgical eye removal. These results indicate that cross-modal adaptations for the loss of binocularity may be dependent on plasticity levels during cortical development.






  • 文章类型: Journal Article
    The recognition of human speakers by their voices is a remarkable cognitive ability. Previous research has established a voice area in the right temporal cortex involved in the integration of speaker-specific acoustic features. This integration appears to occur rapidly, especially in case of familiar voices. However, the exact time course of this process is less well understood. To this end, we here investigated the automatic change detection response of the human brain while listening to the famous voice of German chancellor Angela Merkel, embedded in the context of acoustically matched voices. A classic passive oddball paradigm contrasted short word stimuli uttered by Merkel with word stimuli uttered by two unfamiliar female speakers. Electrophysiological voice processing indices from 21 participants were quantified as mismatch negativities (MMNs) and P3a differences. Cortical sources were approximated by variable resolution electromagnetic tomography. The results showed amplitude and latency effects for both MMN and P3a: The famous (familiar) voice elicited a smaller but earlier MMN than the unfamiliar voices. The P3a, by contrast, was both larger and later for the familiar than for the unfamiliar voices. Familiar-voice MMNs originated from right-hemispheric regions in temporal cortex, overlapping with the temporal voice area, while unfamiliar-voice MMNs stemmed from left superior temporal gyrus. These results suggest that the processing of a very famous voice relies on pre-attentive right temporal processing within the first 150 ms of the acoustic signal. The findings further our understanding of the neural dynamics underlying familiar voice processing.






  • 文章类型: Journal Article
    This study investigated the impact of the speaker\'s identity generated by the voice on sentence processing. We examined the relation between ERP components associated with the processing of the voice (N100 and P200) from voice onset and those associated with sentence processing (N400 and late positivity) from critical word onset. We presented Dutch native speakers with sentences containing true (and known) information, unknown (but true) information or information violating world knowledge and had them perform a truth evaluation task. Sentences were spoken either in a native or a foreign accent. Truth evaluation judgments were not different for statements spoken by the native-accented and the foreign-accented speakers. Reduced N100 and P200 were observed in response to the foreign speaker\'s voice compared to the native speaker\'s. While statements containing unknown information or world knowledge violations generated a larger N400 than true statements in the native condition, they were not significantly different in the foreign condition, suggesting shallower processing of foreign-accented speech. The N100 was a significant predictor for the N400 in that the reduced N100 observed for the foreign speaker compared to the native speaker was related to a smaller N400 effect. These finding suggest that the impression of the speaker that listeners rapidly form from the voice affects semantic processing, which confirms that speaker\'s identity and language comprehension cannot be dissociated.






  • 文章类型: Journal Article
    Prader-Willi syndrome (PWS) is a rare and complex neurodevelopmental disorder of genetic origin. It manifests itself in endocrine and cognitive problems, including highly pronounced hyperphagia and severe obesity. In many cases, impaired acquisition of social and communication skills leads to autism spectrum features, and individuals with this syndrome are occasionally diagnosed with autism spectrum disorder (ASD) using specific scales. Given that communicational skills are largely based on vocal communication, it is important to study human voice processing in PWS. We were able to examine a large number of participants with PWS (N = 61) recruited from France\'s national reference center for PWS and other hospitals. We tested their voice and nonvoice recognition abilities, as well as their ability to distinguish between voices and nonvoices in a free choice task. We applied the hierarchical drift diffusion model (HDDM) with Bayesian estimation to compare decision-making in participants with PWS and controls.
    We found that PWS participants were impaired on both voice and nonvoice processing, but displayed a compensatory ability to perceive voices. Participants with uniparental disomy had poorer voice and nonvoice perception than participants with a deletion on chromosome 15. The HDDM allowed us to demonstrate that participants with PWS need to accumulate more information in order to make a decision, are slower at decision-making, and are predisposed to voice perception, albeit to a lesser extent than controls.
    The categorization of voices and nonvoices is generally preserved in participants with PWS, though this may not be the case for the lowest IQ.







  • 文章类型: Journal Article
    In Brazil, the recognition of speakers for forensic purposes still relies on a subjectivity-based decision-making process through a results analysis of untrustworthy techniques. Owing to the lack of a voice database, speaker verification is currently applied to samples specifically collected for confrontation. However, speaker comparative analysis via contested discourse requires the collection of an excessive amount of voice samples for a series of individuals. Further, the recognition system must inform who is the most compatible with the contested voice from pre-selected individuals. Accordingly, this paper proposes using a combination of linear predictive coding (LPC) and ordinary least squares (OLS) as a speaker verification tool for forensic analysis. The proposed recognition technique establishes confidence and similarity upon which to base forensic reports, indicating verification of the speaker of the contested discourse. Therefore, in this paper, an accurate, quick, alternative method to help verify the speaker is contributed. After running seven different tests, this study preliminarily achieved a hit rate of 100% considering a limited dataset (Brazilian Portuguese). Furthermore, the developed method extracts a larger number of formants, which are indispensable for statistical comparisons via OLS. The proposed framework is robust at certain levels of noise, for sentences with the suppression of word changes, and with different quality or even meaningful audio time differences.







  • 文章类型: Journal Article
    Several studies report sex differences in sensitivity to gendered stimuli. We assume many of these to reflect differences as to the sex to which one feels attracted rather than to biological sex per se. Investigating voice perception, a function of high social relevance, we show that the behavioural and neural (BOLD) responses to male and female voices are mediated by sex and sexual orientation. In heterosexual men and women, we found an opposite-sex effect, reflected in higher classification accuracy for and a response bias towards voices of the other sex, while the effect became apparent as same-sex effect in homosexual men and women. Overall, sexual orientation had a greater impact in women than in men and homosexual women were closer to men in their behavioural responses to female voices. The activation patterns were similar for hetero- and homosexual men, both groups showing increased activation in response to male compared to female voices in regions distributed across the temporo-parietal and insular cortex. In contrast, women had increased activation in response to voices of the desired sex. It appears that both sex and sexual orientation impact on a function as basal as voice perception. Our results underline the need to assess sexual orientation in study participants if conclusions on sex differences shall be drawn. Many of the reported sex differences in behaviour and brain function might be mediated by sexual orientation and we encourage further research into the interplay between sex and sexual orientation.






  • 文章类型: Journal Article
    Previously, we have shown that people who have had one eye surgically removed early in life during visual development have enhanced sound localization [1] and lack visual dominance, commonly observed in binocular and monocular (eye-patched) viewing controls [2]. Despite these changes, people with one eye integrate auditory and visual components of multisensory events optimally [3]. The current study investigates how people with one eye perceive the McGurk effect, an audiovisual illusion where a new syllable is perceived when visual lip movements do not match the corresponding sound [4]. We compared individuals with one eye to binocular and monocular viewing controls and found that they have a significantly smaller McGurk effect compared to binocular controls. Additionally, monocular controls tended to perceive the McGurk effect less often than binocular controls suggesting a small transient modulation of the McGurk effect. These results suggest altered weighting of the auditory and visual modalities with both short and long-term monocular viewing. These results indicate the presence of permanent adaptive perceptual accommodations in people who have lost one eye early in life that may serve to mitigate the loss of binocularity during early brain development.






  • 文章类型: Journal Article
    Person identification, especially in critical environments, has always been a subject of great interest. However, it has gained a new dimension in a world threatened by a new kind of terrorism that uses social networks (e.g., YouTube) to broadcast its message. In this new scenario, classical identification methods (such as fingerprints or face recognition) have been forcedly replaced by alternative biometric characteristics such as voice, as sometimes this is the only feature available. The present study benefits from the advances achieved during last years in understanding and modeling voice production. The paper hypothesizes that a gender-dependent characterization of speakers combined with the use of a set of features derived from the components, resulting from the deconstruction of the voice into its glottal source and vocal tract estimates, will enhance recognition rates when compared to classical approaches. A general description about the main hypothesis and the methodology followed to extract the gender-dependent extended biometric parameters is given. Experimental validation is carried out both on a highly controlled acoustic condition database, and on a mobile phone network recorded under non-controlled acoustic conditions.





