speech intelligibility

  • 文章类型: Journal Article
    A speech intelligibility (SI) prediction model is proposed that includes an auditory preprocessing component based on the physiological anatomy and activity of the human ear, a hierarchical spiking neural network, and a decision back-end processing based on correlation analysis. The auditory preprocessing component effectively captures advanced physiological details of the auditory system, such as retrograde traveling waves, longitudinal coupling, and cochlear nonlinearity. The ability of the model to predict data from normal-hearing listeners under various additive noise conditions was considered. The predictions closely matched the experimental test data under all conditions. Furthermore, we developed a lumped mass model of a McGee stainless-steel piston with the middle-ear to study the recovery of individuals with otosclerosis. We show that the proposed SI model accurately simulates the effect of middle-ear intervention on SI. Consequently, the model establishes a model-based relationship between objective measures of human ear damage, like distortion product otoacoustic emissions, and speech perception. Moreover, the SI model can serve as a robust tool for optimizing parameters and for preoperative assessment of artificial stimuli, providing a valuable reference for clinical treatments of conductive hearing loss.






  • 文章类型: Journal Article
    OBJECTIVE: The aim of this study was to present an institution\'s experience with cochlear reimplantation (CRI), to assess surgical challenges and post-operative outcomes and to increase the success rate of CRI.
    METHODS: Retrospective single-institution study.
    METHODS: Tertiary medical center.
    METHODS: We retrospectively evaluated data from 76 reimplantation cases treated in a tertiary center between 2001 and 2022. Clinical features including etiology of hearing loss, type of failure, surgical issues, and auditory speech performance were analyzed. Categorical Auditory Performance (CAP) and Speech Intelligibility Rating (SIR) scores were used to evaluate pre- and post-CRI outcomes.
    RESULTS: The CRI population comprises of 7 patients from our institute,69 referred patients from other centers. Device failure was the most common reason (68/76, 89.5 %) for CRI; in addition, there were 7 medical failures and 1 had both soft device failure. Medical failures included flap rupture and device extrusion, magnet migration, auditory neuropathy, leukoencephalopathy, foreign-body residue and meningitis. In 21/76 patients, the electrode technology was upgraded. The mean time to failure was 0.58-13 years, with a mean of 4.97 years. The mean (± SD) CAP and SIR scores before and after CRI were 5.2 ± 1.2 versus 5.5 ± 1.1 and 3.4 ± 1.1 versus 3.5 ± 1.1, respectively. Performance was poor in six patients with severe cochlear malformation, auditory nerve dysplasia, leukoencephalopathy, and epilepsy.
    CONCLUSIONS: CRI surgery is a challenging but relatively safe procedure, and most reimplanted patients experience favorable postoperative outcomes. Medical complications and intracochlear damage are the main causes of poor postoperative results. Therefore, adequate preoperative preparation and atraumatic CRI should be carried out for optimal results.






  • 文章类型: Journal Article
    UNASSIGNED: The Chinese Emotional Speech Audiometry Project (CESAP) aims to establish a new material set for Chinese speech audiometry tests, which can be used in both neutral and emotional prosody settings. As the first endeavor of CESAP, this study demonstrates the development of the material foundation and reports its validation in neutral prosody.
    UNASSIGNED: In the development step, 40 phonetically balanced word lists consisting of 30 Chinese disyllabic words with neutral valence were first generated. In a following affective rating experiment, 35 word lists were qualified for validation based on the familiarity and valence ratings from 30 normal-hearing (NH) participants. For validation, performance-intensity functions of each word list were fitted with responses from 60 NH subjects under six presentation levels (-1, 3, 5, 7, 11, and 20 dB HL). The final material set was determined by the intelligibility scores at each decibel level and the mean slopes.
    UNASSIGNED: First, 35 lists satisfied the criteria of phonetic balance, limited repetitions, high familiarity, and neutral valence and were selected for validation. Second, 15 lists were compiled in the final material set based on the pairwise differences in intelligibility scores and the fitted 20%-80% slopes. The established material set had high reliability and validity and was sensitive to detect intelligibility changes (50% slope: 6.20%/dB; 20%-80% slope: 5.45%/dB), with small covariance of variation for thresholds (15%), 50% slope (12%), and 20%-80% slope (12%).
    UNASSIGNED: Our final material set of 15 word lists takes the initiative to control the emotional aspect of audiometry tests, which enriches available Mandarin speech recognition materials and warrants future assessments in emotional prosody among populations with hearing impairments.
    UNASSIGNED: https://doi.org/10.23641/asha.25742814.






  • 文章类型: Journal Article
    Spatial separation and fundamental frequency (F0) separation are effective cues for improving the intelligibility of target speech in multi-talker scenarios. Previous studies predominantly focused on spatial configurations within the frontal hemifield, overlooking the ipsilateral side and the entire median plane, where localization confusion often occurs. This study investigated the impact of spatial and F0 separation on intelligibility under the above-mentioned underexplored spatial configurations. The speech reception thresholds were measured through three experiments for scenarios involving two to four talkers, either in the ipsilateral horizontal plane or in the entire median plane, utilizing monotonized speech with varying F0s as stimuli. The results revealed that spatial separation in symmetrical positions (front-back symmetry in the ipsilateral horizontal plane or front-back, up-down symmetry in the median plane) contributes positively to intelligibility. Both target direction and relative target-masker separation influence the masking release attributed to spatial separation. As the number of talkers exceeds two, the masking release from spatial separation diminishes. Nevertheless, F0 separation remains as a remarkably effective cue and could even facilitate spatial separation in improving intelligibility. Further analysis indicated that current intelligibility models encounter difficulties in accurately predicting intelligibility in scenarios explored in this study.






  • 文章类型: Journal Article
    OBJECTIVE: The clinical effects and surgical procedures of Hogan posterior pharyngeal flap in the treatment of the older patients with velopharyngeal insufficiency (VPI) after cleft palate repair were investigated.
    METHODS: A total of 33 patients (aged 10-35 years; average of 20.4 years) with VPI secondary to cleft palate were included. They underwent Hogan posterior pharyngeal flap to improve velopharyngeal closure function. The clinical efficacy of the ope-ration was evaluated with Chinese speech clarity measurement and nasopharyngeal fiberscope (NPF), and the velopharyngeal closure was graded. The average follow-up time was 13.3 months.
    RESULTS: The wounds of all patients were healed by first intention, and speech assessment showed that the consonant articulation increased and the rate of hypernasality and nasal emission decreased significantly (P<0.05). NPF examination showed that the postoperative velopharyngeal closure function significantly improved, 30 cases (91%) were gradeⅠ, and 3 cases (9%) were grade Ⅱ.
    CONCLUSIONS: Hogan posterior pharyngeal flap for VPI secondary to cleft palate can significantly improve velopharyngeal closure.
    目的: 探讨使用Hogan法咽后壁组织瓣转移术治疗大龄腭咽闭合不全(VPI)患者的临床疗效。方法: 收集33例腭裂术后VPI患者,年龄10~35岁,平均年龄20.4岁。所有患者均行Hogan法咽后壁组织瓣转移术治疗腭咽闭合不全。采用汉语语音清晰度测定法评估患者语音情况,鼻咽纤维镜(NPF)评估腭咽闭合程度,并进行分级。平均随访时间13.3个月。结果: 33例患者术后创口均达到Ⅰ期愈合。术后语音清晰度明显提高,鼻漏气及高鼻音减少,与术前相比差异有统计学意义(P<0.05)。NPF检查示,术后腭咽闭合功能明显改善,30例(91%)患者腭咽闭合率达到Ⅰ级,3例(9%)患者达到Ⅱ级。结论: Hogan法咽后壁组织瓣转移术可显著改善大龄VPI患者的腭咽闭合状况,减少鼻漏气和高鼻音。.






  • 文章类型: Journal Article
    This study investigates the effect of spatial release from masking (SRM) in bilateral bone conduction (BC) stimulation at the mastoid. Nine adults with normal hearing were tested to determine SRM based on speech recognition thresholds (SRTs) in simulated spatial configurations ranging from 0 to 180 degrees. These configurations were based on nonindividualized head-related transfer functions. The participants were subjected to sound stimulation through either air conduction (AC) via headphones or BC. The results indicated that both the angular separation between the target and the masker, and the modality of sound stimulation, significantly influenced speech recognition performance. As the angular separation between the target and the masker increased up to 150°, both BC and AC SRTs decreased, indicating improved performance. However, performance slightly deteriorated when the angular separation exceeded 150°. For spatial separations less than 75°, BC stimulation provided greater spatial benefits than AC, although this difference was not statistically significant. For separations greater than 75°, AC stimulation offered significantly more spatial benefits than BC. When speech and noise originated from the same side of the head, the \"better ear effect\" did not significantly contribute to SRM. However, when speech and noise were located on opposite sides of the head, this effect became dominant in SRM.






  • 文章类型: Journal Article
    Speech recognition crucially relies on slow temporal modulations (<16 Hz) in speech. Recent studies, however, have demonstrated that the long-delay echoes, which are common during online conferencing, can eliminate crucial temporal modulations in speech but do not affect speech intelligibility. Here, we investigated the underlying neural mechanisms. MEG experiments demonstrated that cortical activity can effectively track the temporal modulations eliminated by an echo, which cannot be fully explained by basic neural adaptation mechanisms. Furthermore, cortical responses to echoic speech can be better explained by a model that segregates speech from its echo than by a model that encodes echoic speech as a whole. The speech segregation effect was observed even when attention was diverted but would disappear when segregation cues, i.e., speech fine structure, were removed. These results strongly suggested that, through mechanisms such as stream segregation, the auditory system can build an echo-insensitive representation of speech envelope, which can support reliable speech recognition.






  • 文章类型: Journal Article
    In indoor environments, reverberation often distorts clean speech. Although deep learning-based speech dereverberation approaches have shown much better performance than traditional ones, the inferior speech quality of the dereverberated speech caused by magnitude distortion and limited phase recovery is still a serious problem for practical applications. This paper improves the performance of deep learning-based speech dereverberation from the perspectives of both network design and mapping target optimization. Specifically, on the one hand, a bifurcated-and-fusion network and its guidance loss functions were designed to help reduce the magnitude distortion while enhancing the phase recovery. On the other hand, the time boundary between the early and late reflections in the mapped speech was investigated, so as to make a balance between the reverberation tailing effect and the difficulty of magnitude/phase recovery. Mathematical derivations were provided to show the rationality of the specially designed loss functions. Geometric illustrations were given to explain the importance of preserving early reflections in reducing the difficulty of phase recovery. Ablation study results confirmed the validity of the proposed network topology and the importance of preserving 20 ms early reflections in the mapped speech. Objective and subjective test results showed that the proposed system outperformed other baselines in the speech dereverberation task.






  • 文章类型: Journal Article
    Long short-term memory (LSTM) has been effectively used to represent sequential data in recent years. However, LSTM still struggles with capturing the long-term temporal dependencies. In this paper, we propose an hourglass-shaped LSTM that is able to capture long-term temporal correlations by reducing the feature resolutions without data loss. We have used skip connections in non-adjacent layers to avoid gradient decay. In addition, an attention process is incorporated into skip connections to emphasize the essential spectral features and spectral regions. The proposed LSTM model is applied to speech enhancement and recognition applications. The proposed LSTM model uses no future information, resulting in a causal system suitable for real-time processing. The combined spectral feature sets are used to train the LSTM model for improved performance. Using the proposed model, the ideal ratio mask (IRM) is estimated as a training objective. The experimental evaluations using short-time objective intelligibility (STOI) and perceptual evaluation of speech quality (PESQ) have demonstrated that the proposed model with robust feature representation obtained higher speech intelligibility and perceptual quality. With the TIMIT, LibriSpeech, and VoiceBank datasets, the proposed model improved STOI by 16.21%, 16.41%, and 18.33% over noisy speech, whereas PESQ is improved by 31.1%, 32.9%, and 32%. In seen and unseen noisy situations, the proposed model outperformed existing deep neural networks (DNNs), including baseline LSTM, feedforward neural network (FDNN), convolutional neural network (CNN), and generative adversarial network (GAN). With the Kaldi toolkit for automated speech recognition (ASR), the proposed model significantly reduced the word error rates (WERs) and reached an average WER of 15.13% in noisy backgrounds.






  • 文章类型: Journal Article
    OBJECTIVE: To evaluate the mental health of paediatric cochlear implant users and analyse the relationship between six dimensions (movements, cognitive ability, emotion and will, sociality, living habits and language) and hearing and speech rehabilitation.
    METHODS: Eighty-two cochlear implant users were assessed using the Mental Health Survey Questionnaire. Age at implantation, time of implant use and listening modes were investigated. Categories of Auditory Performance and the Speech Intelligibility Rating Scale were used to score hearing and speech abilities.
    RESULTS: More recipients scored lower in cognitive ability and language. Age at implantation was statistically significant (p < 0.05) for movements, cognitive ability, emotion and will, and language. The time of implant usage and listening mode indicated statistical significance (p < 0.05) in cognitive ability, sociality and language.
    CONCLUSIONS: Timely attention should be paid to the mental health of paediatric cochlear implant users, and corresponding psychological interventions should be implemented to make personalised rehabilitation plans.





