whisper

耳语
  • 文章类型: Journal Article
    目的:声压和呼出流量已被确定为与较高颗粒排放相关的重要因素。这项研究的目的是评估不同的发声如何独立于其他因素影响粒子的产生。
    方法:实验研究。
    方法:33位经验丰富的歌手以正常的响度重复两个不同的句子,并低声耳语。第一句主要由辅音如/k/和/t/以及开放元音组成,而第二句还包括/s/sound,主要包含封闭的元音。使用冷凝颗粒计数器(CPC,3775TSIInc.)和空气动力学粒子分级器(APS,3321TSI公司)。对于大于4nm的颗粒,CPC测量的颗粒数浓度主要反映小于0.5μm的颗粒数,因为这些颗粒主导总数量浓度。APS在0.5-10µm的尺寸范围内测量了粒径分布和数量浓度,数据分为>1µm和<1µm的粒径范围。构建了广义线性混合效应模型来评估影响颗粒生成的因素。
    结果:耳语比说话产生更多的粒子,句子1比句子2在说话时产生更多的粒子。声压级对粒子产生的影响与发声无关。呼出气流的影响没有统计学意义。
    结论:根据我们的结果,发声类型对颗粒产生具有显着的影响,而与其他因素(例如声压级)无关。
    OBJECTIVE: Sound pressure and exhaled flow have been identified as important factors associated with higher particle emissions. The aim of this study was to assess how different vocalizations affect the particle generation independently from other factors.
    METHODS: Experimental study.
    METHODS: Thirty-three experienced singers repeated two different sentences in normal loudness and whispering. The first sentence consisted mainly of consonants like /k/ and /t/ as well as open vowels, while the second sentence also included the /s/ sound and contained primarily closed vowels. The particle emission was measured using condensation particle counter (CPC, 3775 TSI Inc.) and aerodynamic particle sizer (APS, 3321 TSI Inc.). The CPC measured particle number concentration for particles larger than 4 nm and mainly reflects the number of particles smaller than 0.5 µm since these particles dominate total number concentration. The APS measured particle size distribution and number concentration in the size range of 0.5-10 µm and data were divided into >1 µm and <1 µm particle size ranges. Generalized linear mixed-effects models were constructed to assess the factors affecting particle generation.
    RESULTS: Whispering produced more particles than speaking and sentence 1 produced more particles than sentence 2 while speaking. Sound pressure level had effect on particle production independently from vocalization. The effect of exhaled airflow was not statistically significant.
    CONCLUSIONS: Based on our results the type of vocalization has a significant effect on particle production independently from other factors such as sound pressure level.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:定性音频数据的逐字转录是分析质量和严谨性的基石,然而,这种转录所需的时间和精力可能会耗尽资源,延迟分析,并阻碍定性见解的及时传播。近年来,软件程序提出了一种有希望的机制来加速转录,但是由于昂贵的许可或“每分钟”费用,此类程序的广泛应用受到限制,数据保护问题,并且这些程序在许多语言中的可用性有限。在这篇文章中,我们概述了我们适应自由的过程,开源,语音到文本算法(WhisperbyOpenAI)成为一个可用的和可访问的工具,用于定性转录。我们的节目,我们称之为“Vink”的声音到墨水,可以在宽松的开源许可证下获得(因此是免费的)。
    结果:我们对Vink用14种语言转录真实访谈音频数据的表现进行了原则验证评估。大多数试点测试人员对软件性能进行了积极评估,并表示他们可能会在未来的研究中使用该工具。我们的可用性评估表明Vink易于使用,我们根据飞行员-测试人员的反馈进行了进一步的改进,以提高用户友好性。
    结论:与Vink,我们希望通过减少与转录相关的时间和成本,以及将免费转录软件可用性扩展到更多语言,为促进全球严格的定性研究过程做出贡献。随着Vink在独立计算机上运行,许多其他解决方案中出现的数据隐私问题不适用。
    BACKGROUND: Verbatim transcription of qualitative audio data is a cornerstone of analytic quality and rigor, yet the time and energy required for such transcription can drain resources, delay analysis, and hinder the timely dissemination of qualitative insights. In recent years, software programs have presented a promising mechanism to accelerate transcription, but the broad application of such programs has been constrained due to expensive licensing or \"per-minute\" fees, data protection concerns, and limited availability of such programs in many languages. In this article, we outline our process of adapting a free, open-source, speech-to-text algorithm (Whisper by OpenAI) into a usable and accessible tool for qualitative transcription. Our program, which we have dubbed \"Vink\" for voice to ink, is available under a permissive open-source license (and thus free of cost).
    RESULTS: We conducted a proof-of-principle assessment of Vink\'s performance in transcribing authentic interview audio data in 14 languages. A majority of pilot-testers evaluated the software performance positively and indicated that they were likely to use the tool in their future research. Our usability assessment indicates that Vink is easy-to-use, and we performed further refinements based on pilot-tester feedback to increase user-friendliness.
    CONCLUSIONS: With Vink, we hope to contribute to facilitating rigorous qualitative research processes globally by reducing time and costs associated with transcription and by expanding free-of-cost transcription software availability to more languages. With Vink running on standalone computers, data privacy issues arising within many other solutions do not apply.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    从语音录音中自动识别口吃者(ARS)可以促进对口吃者的客观评估和干预。然而,ARS系统的性能可能取决于如何对语音数据进行分段和标记以进行训练和测试。本研究比较了两种分割方法:基于事件的分割方法,根据他们的流利程度来划分演讲片段,和基于间隔的,它使用固定长度的片段,而不考虑流畅性。
    机器学习模型在基于间隔和基于事件的口吃语音语料库上进行了训练和评估。模型使用从语音信号中提取的声学和语言特征以及由最先进的自动语音识别系统生成的转录。
    结果表明,基于事件的分割比基于间隔的分割具有更好的ARS性能,如通过接收器操作特性的曲线下面积(AUC)所测量的。结果表明,由于分割方法的不同,数据的质量和数量存在差异。包含语言特征改善了对整个单词重复的检测,但不是其他类型的口吃。
    研究结果表明,基于事件的分割比基于间隔的分割更适合ARS,因为它保留了口吃的确切边界和类型。语言特征提供了有用的信息,可将超词汇不流与流利的语音分开,但可能无法捕获口吃的声学特征。未来的工作应该探索更强大和多样化的功能,以及更大、更具代表性的数据集,开发有效的ARS系统。
    UNASSIGNED: Automatic recognition of stutters (ARS) from speech recordings can facilitate objective assessment and intervention for people who stutter. However, the performance of ARS systems may depend on how the speech data are segmented and labelled for training and testing. This study compared two segmentation methods: event-based, which delimits speech segments by their fluency status, and interval-based, which uses fixed-length segments regardless of fluency.
    UNASSIGNED: Machine learning models were trained and evaluated on interval-based and event-based stuttered speech corpora. The models used acoustic and linguistic features extracted from the speech signal and the transcriptions generated by a state-of-the-art automatic speech recognition system.
    UNASSIGNED: The results showed that event-based segmentation led to better ARS performance than interval-based segmentation, as measured by the area under the curve (AUC) of the receiver operating characteristic. The results suggest differences in the quality and quantity of the data because of segmentation method. The inclusion of linguistic features improved the detection of whole-word repetitions, but not other types of stutters.
    UNASSIGNED: The findings suggest that event-based segmentation is more suitable for ARS than interval-based segmentation, as it preserves the exact boundaries and types of stutters. The linguistic features provide useful information for separating supra-lexical disfluencies from fluent speech but may not capture the acoustic characteristics of stutters. Future work should explore more robust and diverse features, as well as larger and more representative datasets, for developing effective ARS systems.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在线儿童剥削材料的增长对欧洲执法机构(LEA)来说是一个重大挑战。此类在线信息的最重要来源之一对应于音频材料,需要对其进行分析以及时和实用地找到证据。这就是为什么LEA需要下一代AI驱动的平台来处理来自在线来源的音频数据。我们建议使用语音识别和关键字识别来转录视听数据,并检测与虐待儿童有关的关键字的存在。所考虑的模型基于两种迄今为止最准确的基于神经的架构:Wav2vec2.0和Whisper。这些系统在不同语言的广泛场景下进行了测试。此外,请记住,从LEA获取数据非常敏感,我们探索使用联合学习来为所解决的应用程序提供更强大的系统,同时维护LEA数据的隐私。所考虑的模型实现了11%到25%的单词错误率,取决于语言。此外,这些系统能够识别出一组发现的单词,真阳性率在82%到98%之间,取决于语言。最后,联合学习策略表明,与集中训练模型相比,它们可以保持甚至提高系统的性能。拟议的系统为AI驱动的平台奠定了基础,该平台用于在虐待儿童的法医应用中自动分析音频。联合学习的使用对于所解决的场景也很有希望,数据隐私是需要管理的重要问题。
    The growth in online child exploitation material is a significant challenge for European Law Enforcement Agencies (LEAs). One of the most important sources of such online information corresponds to audio material that needs to be analyzed to find evidence in a timely and practical manner. That is why LEAs require a next-generation AI-powered platform to process audio data from online sources. We propose the use of speech recognition and keyword spotting to transcribe audiovisual data and to detect the presence of keywords related to child abuse. The considered models are based on two of the most accurate neural-based architectures to date: Wav2vec2.0 and Whisper. The systems were tested under an extensive set of scenarios in different languages. Additionally, keeping in mind that obtaining data from LEAs are very sensitive, we explore the use of federated learning to provide more robust systems for the addressed application, while maintaining the privacy of the data from LEAs. The considered models achieved a word error rate between 11% and 25%, depending on the language. In addition, the systems are able to recognize a set of spotted words with true-positive rates between 82% and 98%, depending on the language. Finally, federated learning strategies show that they can maintain and even improve the performance of the systems when compared to centralized trained models. The proposed systems set the basis for an AI-powered platform for automatic analysis of audio in the context of forensic applications of child abuse. The use of federated learning is also promising for the addressed scenario, where data privacy is an important issue to be managed.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    The study objective was to determine if cattle health and performance comparing a targeted bovine respiratory disease (BRD) control program based on individualized risk prediction generated by a novel technology (Whisper On Arrival) was superior to a negative control (no metaphylaxis) yet no different than a positive control (conventional BRD control; 100% application). Across four study sites, auction market-derived beef calves were randomly allocated to one of four BRD control treatment groups: 1) Negative control (Saline), 2) Positive control (Tildipirosin [TIL] to 100% of the group), 3) Whisper-high (±TIL based on conservative algorithm threshold), and 4) Whisper-low (±TIL based on aggressive algorithm threshold). Within either Whisper On Arrival group, only calves predicted to be above the algorithm threshold by the technology (determined a priori) were administered TIL leaving the remainder untreated. Cattle were followed to either a short-term timepoint (50 or 60 d; health outcomes, all sites; feed performance outcomes, two sites) or to closeout (two sites). Data were analyzed as a completely randomized block design separately at each site. Across all sites, BRD control antibiotic use was reduced by 11% to 43% between the two Whisper On Arrival treatment groups compared to the positive control. The positive control and both Whisper On Arrival groups reduced (P ≤ 0.05) BRD morbidity compared to negative controls at both the short-term timepoint at three of the four sites and at closeout at one of two sites. The positive control and both Whisper-managed groups had improved (P ≤ 0.05) average daily gain (ADG), dry-matter intake (DMI), and feed efficiency compared to negative controls at the short-term timepoint at one of two sites. At closeout, the positive control and both Whisper-managed groups improved (P ≤ 0.05) ADG (deads-in) compared to the negative control at one of the two sites. At one of two sites, the positive control and the Whisper-high group displayed an improvement (P ≤ 0.05) in hot carcass weight compared to the negative control. The Whisper On Arrival technology maintained the benefits of a conventional BRD control program yet reduced BRD control antibiotic use by 11% to 43%. This technology maintained the benefits of a conventional BRD control program while reducing antibiotic costs to the producer and supporting judicious antimicrobial use.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    Many transwomen seek voice and communication therapy to support their transition from their gender assigned at birth to their gender identity. This has led to an increased need to examine the perception of gender and femininity/masculinity to develop evidence-based intervention practices. In this study, we explore the auditory perception of femininity/masculinity in normally phonated and whispered speech. Transwomen, ciswomen, and cismen were recorded producing /hVd/ words. Naïve listeners rated femininity/masculinity of a speaker\'s voice using a visual analog scale, rather than completing a binary gender identification task. The results revealed that listeners rated speakers more ambiguously in whispered speech than normally phonated speech. An analysis of speaker and token characteristics revealed that in the normally phonated condition listeners consistently use f0 to rate femininity/masculinity. In addition, some evidence was found for possible contributions of formant frequencies, particularly F2, and duration. Taken together, this provides additional evidence for the salience of f0 and F2 for voice and communication intervention among transwomen.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    Software Defined Networking (SDN) centralizes network control to improve network programmability and flexibility. Contrary to wired settings, it is unclear how to support SDN in low power and lossy networks like typical Internet of Things (IoT) ones. Challenges encompass providing reliable in-band connectivity between the centralized controller and out-of-range nodes, and coping with physical limitations of the highly resource-constrained IoT devices. In this work, we present Whisper, an enabler for SDN in low power and lossy networks. The centralized Whisper controller of a network remotely controls nodes\' forwarding and cell allocation. To do so, the controller sends carefully computed routing and scheduling messages that are fully compatible with the protocols run in the network. This mechanism ensures the best possible in-band connectivity between the controller and all network nodes, capitalizing on an interface which is already supported by network devices. Whisper\'s internal algorithms further reduce the number of messages sent by the controller, to make the exerted control as lightweight as possible for the devices. Beyond detailing Whisper\'s design, we discuss compelling use cases that Whisper unlocks, including rerouting around low-battery devices and providing runtime defense to jamming attacks. We also describe how to implement Whisper in current IoT open standards (RPL and 6TiSCH) without modifying IoT devices\' firmware. This shows that Whisper can implement an SDN-like control for distributed low power networks with no specific support for SDN, from legacy to next generation IoT devices. Our testbed experiments show that Whisper successfully controls the network in both the scheduling and routing plane, with significantly less overhead than other SDN-IoT solutions, no additional latency and no packet loss.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Comparative Study
    OBJECTIVE: This study compared whispering attempts by adults using tracheoesophageal (TE) speech with those by adults with a larynx. Comparisons were based on listener judgments, visual-perceptual assessment of spectrograms, and measures of the acoustic signal.
    METHODS: This was a prospective, cross-sectional study.
    METHODS: Seventeen TE and 10 laryngeal speakers produced sentences in a whisper and in their spoken voice. Listeners judged sentences as whispered or spoken. Judges signal-typed the spectrograms based on presence-absence of a \"voicing bar.\" Speaking rate, articulation rate, percent pause, and dB sound pressure level were measured.
    RESULTS: Twenty-nine percent of TE speakers were perceived to be whispering on whisper attempts; most others were perceived to be using spoken voice while attempting to whisper. Spectrograms of TE whispering were most often categorized as \"mostly voiced.\" Speaking and articulation rates were slower for TE speakers. There was a significantly greater reduction in speaking rate from spoken to whisper for the TE group. Percent pause did not differ significantly between groups and speaking mode. TE speakers had a significantly smaller difference in dB sound pressure level between spoken and whisper modes.
    CONCLUSIONS: Some individuals using TE speech can whisper based on auditory-perceptual judgment, but most were perceived to be speaking during these attempts. The fact that some TE participants could whisper indicates the behavior is possible and might be considered a therapeutic target if it is of importance to an individual. The percentage of TE speakers who can learn to whisper, and the optimal training approach, are yet to be determined.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Comparative Study
    BACKGROUND: Whisper is known to be produced by different speakers differently, especially with respect to glottal configuration that influences glottal aerodynamics. Differences in whisper production and phonation types imply important linguistic information in many languages, are identified in vocal pathologies, are used to communicate mood and emotion, and are used in vocal performance.
    OBJECTIVE: The present study focused on investigating the aerodynamic differences between whisper and phonation at different loudness and adduction levels.
    METHODS: Three men and five women between 20 and 40 years of age participated in the study. Smooth syllable strings of the syllable /baep:/ were whispered and phonated at three different loudness levels (soft, medium, and loud) and three voice qualities (breathy, normal, and pressed). The voice qualities are associated with different adduction levels. This resulted in 18 treatment combinations (three adduction levels × three loudness levels × two sexes).
    RESULTS: A regression analysis was performed using a PROC MIXED procedure with SAS statistical software. Under similar production conditions, subglottal pressure was significantly lower in whisper than in phonation in 10 of 18 combinations, mean glottal airflow was significantly higher in whisper than in phonation in 13 of 18 combinations, and flow resistance was significantly lower in whisper than in phonation in 14 of 18 combinations, with the female subjects demonstrating these trends more frequently than the male subjects do. Of importance, in general, compared with phonation under similar production conditions, whisper is not always accompanied by lower subglottal pressure and higher airflows.
    CONCLUSIONS: Results from this study suggest that the typical finding of lower subglottal pressure, higher glottal airflow, and decreased flow resistance in whisper compared with phonation cannot be generalized to all individuals and depends on the \"whisper type.\" The nine basic production conditions (three loudness levels and three adduction levels) resulted in data that may help explain the wide range of variation of whisper production reported in earlier studies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

公众号