speech intelligibility

语音清晰度
  • 文章类型: Journal Article
    提出了一种语音可懂度(SI)预测模型,该模型包括基于人耳生理解剖结构和活动的听觉预处理组件,分层尖峰神经网络,和基于相关性分析的决策后端处理。听觉预处理组件有效捕获听觉系统的先进生理细节,比如逆行行波,纵向联轴器,和耳蜗非线性。考虑了模型在各种加性噪声条件下预测正常听力听众数据的能力。在所有条件下,预测与实验测试数据紧密匹配。此外,我们开发了带有中耳的McGee不锈钢活塞的集中质量模型,以研究耳硬化症患者的恢复情况。我们证明了所提出的SI模型可以准确地模拟中耳干预对SI的影响。因此,该模型建立了基于模型的人耳损伤客观度量之间的关系,比如失真产物耳声发射,和言语感知。此外,SI模型可以作为优化参数和术前评估人工刺激的强大工具,为临床传导性耳聋的治疗提供有价值的参考。
    A speech intelligibility (SI) prediction model is proposed that includes an auditory preprocessing component based on the physiological anatomy and activity of the human ear, a hierarchical spiking neural network, and a decision back-end processing based on correlation analysis. The auditory preprocessing component effectively captures advanced physiological details of the auditory system, such as retrograde traveling waves, longitudinal coupling, and cochlear nonlinearity. The ability of the model to predict data from normal-hearing listeners under various additive noise conditions was considered. The predictions closely matched the experimental test data under all conditions. Furthermore, we developed a lumped mass model of a McGee stainless-steel piston with the middle-ear to study the recovery of individuals with otosclerosis. We show that the proposed SI model accurately simulates the effect of middle-ear intervention on SI. Consequently, the model establishes a model-based relationship between objective measures of human ear damage, like distortion product otoacoustic emissions, and speech perception. Moreover, the SI model can serve as a robust tool for optimizing parameters and for preoperative assessment of artificial stimuli, providing a valuable reference for clinical treatments of conductive hearing loss.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    目的:本研究的目的是介绍一个机构的人工耳蜗再植入(CRI)的经验,评估手术挑战和术后结果,并提高CRI的成功率。
    方法:回顾性单机构研究。
    方法:三级医疗中心。
    方法:我们回顾性评估了2001年至2022年在三级中心治疗的76例再植入病例的数据。临床特征包括听力损失的病因,失败的类型,手术问题,和听觉言语表现进行了分析。使用分类听觉表现(CAP)和语音清晰度等级(SIR)评分来评估CRI前后的结果。
    结果:CRI人群包括来自我们研究所的7名患者,69名来自其他中心的转诊患者。设备故障是CRI最常见的原因(68/76,89.5%);此外,有7例医疗故障,1例同时出现软装置故障。医疗失败包括皮瓣破裂和设备挤压,磁铁迁移,听神经病,白质脑病,异物残留和脑膜炎.在21/76患者中,电极技术升级。平均失败时间为0.58-13年,平均4.97年。CRI前后的平均(±SD)CAP和SIR评分分别为5.2±1.2和5.5±1.1和3.4±1.1和3.5±1.1。6例严重耳蜗畸形患者表现不佳,听觉神经发育不良,白质脑病,和癫痫。
    结论:CRI手术是一项具有挑战性但相对安全的手术,大多数再植入患者术后结局良好.内科并发症和耳蜗内损伤是术后效果不佳的主要原因。因此,为了获得最佳效果,应进行充分的术前准备和无创伤CRI.
    OBJECTIVE: The aim of this study was to present an institution\'s experience with cochlear reimplantation (CRI), to assess surgical challenges and post-operative outcomes and to increase the success rate of CRI.
    METHODS: Retrospective single-institution study.
    METHODS: Tertiary medical center.
    METHODS: We retrospectively evaluated data from 76 reimplantation cases treated in a tertiary center between 2001 and 2022. Clinical features including etiology of hearing loss, type of failure, surgical issues, and auditory speech performance were analyzed. Categorical Auditory Performance (CAP) and Speech Intelligibility Rating (SIR) scores were used to evaluate pre- and post-CRI outcomes.
    RESULTS: The CRI population comprises of 7 patients from our institute,69 referred patients from other centers. Device failure was the most common reason (68/76, 89.5 %) for CRI; in addition, there were 7 medical failures and 1 had both soft device failure. Medical failures included flap rupture and device extrusion, magnet migration, auditory neuropathy, leukoencephalopathy, foreign-body residue and meningitis. In 21/76 patients, the electrode technology was upgraded. The mean time to failure was 0.58-13 years, with a mean of 4.97 years. The mean (± SD) CAP and SIR scores before and after CRI were 5.2 ± 1.2 versus 5.5 ± 1.1 and 3.4 ± 1.1 versus 3.5 ± 1.1, respectively. Performance was poor in six patients with severe cochlear malformation, auditory nerve dysplasia, leukoencephalopathy, and epilepsy.
    CONCLUSIONS: CRI surgery is a challenging but relatively safe procedure, and most reimplanted patients experience favorable postoperative outcomes. Medical complications and intracochlear damage are the main causes of poor postoperative results. Therefore, adequate preoperative preparation and atraumatic CRI should be carried out for optimal results.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    中国情感言语测听项目(CESAP)旨在为中国言语测听测试建立一套新的材料,可以在中性和情绪韵律环境中使用。作为CESAP的第一个努力,这项研究证明了物质基础的发展,并报告了其在中性韵律中的验证。
    在开发步骤中,首先生成了40个语音平衡单词列表,其中包括30个具有中性效价的中文双音节单词。在接下来的情感评级实验中,根据30名正常听力(NH)参与者的熟悉程度和效价等级,有35个单词列表有资格进行验证。对于验证,每个单词列表的表现强度函数与来自6个呈现水平(-1,3,5,7,11和20dBHL)下的60名NH受试者的反应进行拟合.最终的材料集由每个分贝水平的可懂度分数和平均斜率确定。
    首先,35个列表满足语音平衡的标准,有限的重复,高度熟悉,和中性价,并选择进行验证。第二,基于可懂度分数的成对差异和拟合的20%-80%斜率,在最终材料集中编制了15个列表。建立的材料集具有较高的信度和效度,并且对检测清晰度变化敏感(50%斜率:6.20%/dB;20%-80%斜率:5.45%/dB),阈值变化的协方差很小(15%),50%坡度(12%),和20%-80%的坡度(12%)。
    我们的15个单词列表的最终材料集主动控制听力测试的情感方面,这丰富了现有的普通话语音识别材料,并保证了未来对听力障碍人群的情绪韵律的评估。
    https://doi.org/10.23641/asha.25742814。
    UNASSIGNED: The Chinese Emotional Speech Audiometry Project (CESAP) aims to establish a new material set for Chinese speech audiometry tests, which can be used in both neutral and emotional prosody settings. As the first endeavor of CESAP, this study demonstrates the development of the material foundation and reports its validation in neutral prosody.
    UNASSIGNED: In the development step, 40 phonetically balanced word lists consisting of 30 Chinese disyllabic words with neutral valence were first generated. In a following affective rating experiment, 35 word lists were qualified for validation based on the familiarity and valence ratings from 30 normal-hearing (NH) participants. For validation, performance-intensity functions of each word list were fitted with responses from 60 NH subjects under six presentation levels (-1, 3, 5, 7, 11, and 20 dB HL). The final material set was determined by the intelligibility scores at each decibel level and the mean slopes.
    UNASSIGNED: First, 35 lists satisfied the criteria of phonetic balance, limited repetitions, high familiarity, and neutral valence and were selected for validation. Second, 15 lists were compiled in the final material set based on the pairwise differences in intelligibility scores and the fitted 20%-80% slopes. The established material set had high reliability and validity and was sensitive to detect intelligibility changes (50% slope: 6.20%/dB; 20%-80% slope: 5.45%/dB), with small covariance of variation for thresholds (15%), 50% slope (12%), and 20%-80% slope (12%).
    UNASSIGNED: Our final material set of 15 word lists takes the initiative to control the emotional aspect of audiometry tests, which enriches available Mandarin speech recognition materials and warrants future assessments in emotional prosody among populations with hearing impairments.
    UNASSIGNED: https://doi.org/10.23641/asha.25742814.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    空间分离和基频(F0)分离是改善多说话者场景中目标语音清晰度的有效线索。以前的研究主要集中在额叶半场内的空间配置,俯瞰同侧和整个中间平面,经常发生本地化混乱的地方。这项研究调查了在上述未充分开发的空间配置下,空间和F0分离对可懂度的影响。通过涉及两到四个说话者的三个实验来测量语音接收阈值,在同侧水平面或整个中间平面,利用具有变化的F0s的单调语音作为刺激。结果表明,对称位置的空间分离(同侧水平面或前后对称,中值平面的上下对称性)对清晰度有积极的贡献。目标方向和相对目标掩蔽物分离都会影响归因于空间分离的掩蔽释放。由于说话者的数量超过两个,从空间分离的掩蔽释放减少。然而,F0分离仍然是非常有效的线索,甚至可以促进空间分离以提高清晰度。进一步的分析表明,当前的可懂度模型在准确预测本研究探索的场景中的可懂度方面遇到困难。
    Spatial separation and fundamental frequency (F0) separation are effective cues for improving the intelligibility of target speech in multi-talker scenarios. Previous studies predominantly focused on spatial configurations within the frontal hemifield, overlooking the ipsilateral side and the entire median plane, where localization confusion often occurs. This study investigated the impact of spatial and F0 separation on intelligibility under the above-mentioned underexplored spatial configurations. The speech reception thresholds were measured through three experiments for scenarios involving two to four talkers, either in the ipsilateral horizontal plane or in the entire median plane, utilizing monotonized speech with varying F0s as stimuli. The results revealed that spatial separation in symmetrical positions (front-back symmetry in the ipsilateral horizontal plane or front-back, up-down symmetry in the median plane) contributes positively to intelligibility. Both target direction and relative target-masker separation influence the masking release attributed to spatial separation. As the number of talkers exceeds two, the masking release from spatial separation diminishes. Nevertheless, F0 separation remains as a remarkably effective cue and could even facilitate spatial separation in improving intelligibility. Further analysis indicated that current intelligibility models encounter difficulties in accurately predicting intelligibility in scenarios explored in this study.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    OBJECTIVE: The clinical effects and surgical procedures of Hogan posterior pharyngeal flap in the treatment of the older patients with velopharyngeal insufficiency (VPI) after cleft palate repair were investigated.
    METHODS: A total of 33 patients (aged 10-35 years; average of 20.4 years) with VPI secondary to cleft palate were included. They underwent Hogan posterior pharyngeal flap to improve velopharyngeal closure function. The clinical efficacy of the ope-ration was evaluated with Chinese speech clarity measurement and nasopharyngeal fiberscope (NPF), and the velopharyngeal closure was graded. The average follow-up time was 13.3 months.
    RESULTS: The wounds of all patients were healed by first intention, and speech assessment showed that the consonant articulation increased and the rate of hypernasality and nasal emission decreased significantly (P<0.05). NPF examination showed that the postoperative velopharyngeal closure function significantly improved, 30 cases (91%) were gradeⅠ, and 3 cases (9%) were grade Ⅱ.
    CONCLUSIONS: Hogan posterior pharyngeal flap for VPI secondary to cleft palate can significantly improve velopharyngeal closure.
    目的: 探讨使用Hogan法咽后壁组织瓣转移术治疗大龄腭咽闭合不全(VPI)患者的临床疗效。方法: 收集33例腭裂术后VPI患者,年龄10~35岁,平均年龄20.4岁。所有患者均行Hogan法咽后壁组织瓣转移术治疗腭咽闭合不全。采用汉语语音清晰度测定法评估患者语音情况,鼻咽纤维镜(NPF)评估腭咽闭合程度,并进行分级。平均随访时间13.3个月。结果: 33例患者术后创口均达到Ⅰ期愈合。术后语音清晰度明显提高,鼻漏气及高鼻音减少,与术前相比差异有统计学意义(P<0.05)。NPF检查示,术后腭咽闭合功能明显改善,30例(91%)患者腭咽闭合率达到Ⅰ级,3例(9%)患者达到Ⅱ级。结论: Hogan法咽后壁组织瓣转移术可显著改善大龄VPI患者的腭咽闭合状况,减少鼻漏气和高鼻音。.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    这项研究调查了掩蔽(SRM)空间释放在乳突双侧骨传导(BC)刺激中的作用。对9名听力正常的成年人进行了测试,以根据0至180度范围内的模拟空间配置中的语音识别阈值(SRT)确定SRM。这些配置基于非个性化的头部相关传递函数。参与者通过耳机或BC通过空气传导(AC)受到声音刺激。结果表明,目标和掩蔽器之间的角度间隔,和声音刺激的模态,显著影响语音识别性能。当目标和掩蔽器之间的角度间隔增加到150°时,BC和ACSRT均下降,表明改进的性能。然而,当角度间隔超过150°时,性能略有恶化。对于小于75°的空间间距,BC刺激比AC提供更大的空间益处,尽管这种差异没有统计学意义。对于大于75°的间距,AC刺激提供了比BC明显更多的空间益处。当语音和噪音来自头部的同一侧时,“更好的耳朵效应”对SRM没有显著贡献。然而,当语音和噪音位于头部的相对两侧时,这种效应在SRM中占主导地位。
    This study investigates the effect of spatial release from masking (SRM) in bilateral bone conduction (BC) stimulation at the mastoid. Nine adults with normal hearing were tested to determine SRM based on speech recognition thresholds (SRTs) in simulated spatial configurations ranging from 0 to 180 degrees. These configurations were based on nonindividualized head-related transfer functions. The participants were subjected to sound stimulation through either air conduction (AC) via headphones or BC. The results indicated that both the angular separation between the target and the masker, and the modality of sound stimulation, significantly influenced speech recognition performance. As the angular separation between the target and the masker increased up to 150°, both BC and AC SRTs decreased, indicating improved performance. However, performance slightly deteriorated when the angular separation exceeded 150°. For spatial separations less than 75°, BC stimulation provided greater spatial benefits than AC, although this difference was not statistically significant. For separations greater than 75°, AC stimulation offered significantly more spatial benefits than BC. When speech and noise originated from the same side of the head, the \"better ear effect\" did not significantly contribute to SRM. However, when speech and noise were located on opposite sides of the head, this effect became dominant in SRM.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    语音识别关键依赖于语音中的缓慢时间调制(<16Hz)。最近的研究,然而,已经证明了长延迟的回声,这在在线会议期间很常见,可以消除语音中关键的时间调制,但不影响语音清晰度。这里,我们研究了潜在的神经机制。MEG实验表明,皮层活动可以有效地跟踪回声消除的时间调制,这不能完全用基本的神经适应机制来解释。此外,皮层对回声语音的反应可以通过将语音与回声分离的模型来更好地解释,而不是通过将回声语音作为一个整体进行编码的模型来解释。即使转移了注意力,也可以观察到语音隔离效应,但是当隔离提示时,语音隔离效应就会消失,即,语音精细结构,已删除。这些结果强烈表明,通过流分离等机制,听觉系统可以建立语音包络的回声不敏感表示,可以支持可靠的语音识别。
    Speech recognition crucially relies on slow temporal modulations (<16 Hz) in speech. Recent studies, however, have demonstrated that the long-delay echoes, which are common during online conferencing, can eliminate crucial temporal modulations in speech but do not affect speech intelligibility. Here, we investigated the underlying neural mechanisms. MEG experiments demonstrated that cortical activity can effectively track the temporal modulations eliminated by an echo, which cannot be fully explained by basic neural adaptation mechanisms. Furthermore, cortical responses to echoic speech can be better explained by a model that segregates speech from its echo than by a model that encodes echoic speech as a whole. The speech segregation effect was observed even when attention was diverted but would disappear when segregation cues, i.e., speech fine structure, were removed. These results strongly suggested that, through mechanisms such as stream segregation, the auditory system can build an echo-insensitive representation of speech envelope, which can support reliable speech recognition.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在室内环境中,混响通常会扭曲干净的语音。尽管基于深度学习的语音去混响方法比传统方法表现出更好的性能,由于幅度失真和有限的相位恢复而导致的去混响语音的语音质量差仍然是实际应用中的一个严重问题。本文从网络设计和映射目标优化两个角度提高了基于深度学习的语音去混响性能。具体来说,一方面,设计了分叉融合网络及其制导损失函数,以帮助减少幅度失真,同时增强相位恢复。另一方面,研究了映射语音中早期和晚期反射之间的时间边界,从而在混响拖尾效应和幅度/相位恢复的难度之间取得平衡。提供了数学推导,以显示专门设计的损失函数的合理性。给出了几何插图来解释保留早期反射在降低相位恢复难度方面的重要性。消融研究结果证实了所提出的网络拓扑的有效性以及在映射语音中保留20毫秒早期反射的重要性。客观和主观测试结果表明,该系统在语音去混响任务中的性能优于其他基线。
    In indoor environments, reverberation often distorts clean speech. Although deep learning-based speech dereverberation approaches have shown much better performance than traditional ones, the inferior speech quality of the dereverberated speech caused by magnitude distortion and limited phase recovery is still a serious problem for practical applications. This paper improves the performance of deep learning-based speech dereverberation from the perspectives of both network design and mapping target optimization. Specifically, on the one hand, a bifurcated-and-fusion network and its guidance loss functions were designed to help reduce the magnitude distortion while enhancing the phase recovery. On the other hand, the time boundary between the early and late reflections in the mapped speech was investigated, so as to make a balance between the reverberation tailing effect and the difficulty of magnitude/phase recovery. Mathematical derivations were provided to show the rationality of the specially designed loss functions. Geometric illustrations were given to explain the importance of preserving early reflections in reducing the difficulty of phase recovery. Ablation study results confirmed the validity of the proposed network topology and the importance of preserving 20 ms early reflections in the mapped speech. Objective and subjective test results showed that the proposed system outperformed other baselines in the speech dereverberation task.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    近年来,长短期记忆(LSTM)已被有效地用于表示顺序数据。然而,LSTM仍在努力捕捉长期的时间依赖性。在本文中,我们提出了一个沙漏形的LSTM,它能够通过降低特征分辨率来捕获长期的时间相关性,而不会丢失数据。我们已经在非相邻层中使用了跳过连接以避免梯度衰减。此外,注意过程被纳入到跳过连接,以强调基本的光谱特征和光谱区域。所提出的LSTM模型应用于语音增强和识别应用。提出的LSTM模型不使用未来的信息,产生适合实时处理的因果系统。组合的光谱特征集用于训练LSTM模型以提高性能。使用所提出的模型,理想比率掩码(IRM)被估计为训练目标。使用短时客观清晰度(STOI)和语音质量感知评估(PESQ)的实验评估表明,所提出的具有鲁棒特征表示的模型可以获得更高的语音清晰度和感知质量。有了TIMIT,LibriSpeech,和VoiceBank数据集,所提出的模型将STOI提高了16.21%,16.41%,18.33%的人超过了嘈杂的演讲,而PESQ提高了31.1%,32.9%,和32%。在可见和看不见的嘈杂情况下,所提出的模型优于现有的深度神经网络(DNN),包括基线LSTM,前馈神经网络(FDNN),卷积神经网络(CNN)和生成对抗网络(GAN)。借助用于自动语音识别(ASR)的Kaldi工具包,提出的模型显着降低了单词错误率(WER),在嘈杂的背景下达到了平均WER的15.13%。
    Long short-term memory (LSTM) has been effectively used to represent sequential data in recent years. However, LSTM still struggles with capturing the long-term temporal dependencies. In this paper, we propose an hourglass-shaped LSTM that is able to capture long-term temporal correlations by reducing the feature resolutions without data loss. We have used skip connections in non-adjacent layers to avoid gradient decay. In addition, an attention process is incorporated into skip connections to emphasize the essential spectral features and spectral regions. The proposed LSTM model is applied to speech enhancement and recognition applications. The proposed LSTM model uses no future information, resulting in a causal system suitable for real-time processing. The combined spectral feature sets are used to train the LSTM model for improved performance. Using the proposed model, the ideal ratio mask (IRM) is estimated as a training objective. The experimental evaluations using short-time objective intelligibility (STOI) and perceptual evaluation of speech quality (PESQ) have demonstrated that the proposed model with robust feature representation obtained higher speech intelligibility and perceptual quality. With the TIMIT, LibriSpeech, and VoiceBank datasets, the proposed model improved STOI by 16.21%, 16.41%, and 18.33% over noisy speech, whereas PESQ is improved by 31.1%, 32.9%, and 32%. In seen and unseen noisy situations, the proposed model outperformed existing deep neural networks (DNNs), including baseline LSTM, feedforward neural network (FDNN), convolutional neural network (CNN), and generative adversarial network (GAN). With the Kaldi toolkit for automated speech recognition (ASR), the proposed model significantly reduced the word error rates (WERs) and reached an average WER of 15.13% in noisy backgrounds.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:评估小儿人工耳蜗使用者的心理健康,并分析六个维度(运动,认知能力,情感和意志,社会性,生活习惯和语言)以及听力和言语康复。
    方法:使用心理健康调查问卷对82名人工耳蜗使用者进行了评估。植入时的年龄,研究了植入物使用时间和聆听模式。使用听觉表现类别和言语清晰度评定量表对听力和言语能力进行评分。
    结果:更多的接受者在认知能力和语言方面得分较低。植入时的年龄对运动有统计学意义(p<0.05),认知能力,情感和意志,和语言。植入物使用时间和听音模式在认知能力方面具有统计学意义(p<0.05),社会和语言。
    结论:应及时关注小儿人工耳蜗使用者的心理健康,并实施相应的心理干预措施,制定个性化的康复计划。
    OBJECTIVE: To evaluate the mental health of paediatric cochlear implant users and analyse the relationship between six dimensions (movements, cognitive ability, emotion and will, sociality, living habits and language) and hearing and speech rehabilitation.
    METHODS: Eighty-two cochlear implant users were assessed using the Mental Health Survey Questionnaire. Age at implantation, time of implant use and listening modes were investigated. Categories of Auditory Performance and the Speech Intelligibility Rating Scale were used to score hearing and speech abilities.
    RESULTS: More recipients scored lower in cognitive ability and language. Age at implantation was statistically significant (p < 0.05) for movements, cognitive ability, emotion and will, and language. The time of implant usage and listening mode indicated statistical significance (p < 0.05) in cognitive ability, sociality and language.
    CONCLUSIONS: Timely attention should be paid to the mental health of paediatric cochlear implant users, and corresponding psychological interventions should be implemented to make personalised rehabilitation plans.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号