    OBJECTIVE: Sound pressure and exhaled flow have been identified as important factors associated with higher particle emissions. The aim of this study was to assess how different vocalizations affect the particle generation independently from other factors.
    METHODS: Experimental study.
    METHODS: Thirty-three experienced singers repeated two different sentences in normal loudness and whispering. The first sentence consisted mainly of consonants like /k/ and /t/ as well as open vowels, while the second sentence also included the /s/ sound and contained primarily closed vowels. The particle emission was measured using condensation particle counter (CPC, 3775 TSI Inc.) and aerodynamic particle sizer (APS, 3321 TSI Inc.). The CPC measured particle number concentration for particles larger than 4 nm and mainly reflects the number of particles smaller than 0.5 µm since these particles dominate total number concentration. The APS measured particle size distribution and number concentration in the size range of 0.5-10 µm and data were divided into >1 µm and <1 µm particle size ranges. Generalized linear mixed-effects models were constructed to assess the factors affecting particle generation.
    RESULTS: Whispering produced more particles than speaking and sentence 1 produced more particles than sentence 2 while speaking. Sound pressure level had effect on particle production independently from vocalization. The effect of exhaled airflow was not statistically significant.
    CONCLUSIONS: Based on our results the type of vocalization has a significant effect on particle production independently from other factors such as sound pressure level.






    BACKGROUND: Verbatim transcription of qualitative audio data is a cornerstone of analytic quality and rigor, yet the time and energy required for such transcription can drain resources, delay analysis, and hinder the timely dissemination of qualitative insights. In recent years, software programs have presented a promising mechanism to accelerate transcription, but the broad application of such programs has been constrained due to expensive licensing or \"per-minute\" fees, data protection concerns, and limited availability of such programs in many languages. In this article, we outline our process of adapting a free, open-source, speech-to-text algorithm (Whisper by OpenAI) into a usable and accessible tool for qualitative transcription. Our program, which we have dubbed \"Vink\" for voice to ink, is available under a permissive open-source license (and thus free of cost).
    RESULTS: We conducted a proof-of-principle assessment of Vink\'s performance in transcribing authentic interview audio data in 14 languages. A majority of pilot-testers evaluated the software performance positively and indicated that they were likely to use the tool in their future research. Our usability assessment indicates that Vink is easy-to-use, and we performed further refinements based on pilot-tester feedback to increase user-friendliness.
    CONCLUSIONS: With Vink, we hope to contribute to facilitating rigorous qualitative research processes globally by reducing time and costs associated with transcription and by expanding free-of-cost transcription software availability to more languages. With Vink running on standalone computers, data privacy issues arising within many other solutions do not apply.






    UNASSIGNED: Automatic recognition of stutters (ARS) from speech recordings can facilitate objective assessment and intervention for people who stutter. However, the performance of ARS systems may depend on how the speech data are segmented and labelled for training and testing. This study compared two segmentation methods: event-based, which delimits speech segments by their fluency status, and interval-based, which uses fixed-length segments regardless of fluency.
    UNASSIGNED: Machine learning models were trained and evaluated on interval-based and event-based stuttered speech corpora. The models used acoustic and linguistic features extracted from the speech signal and the transcriptions generated by a state-of-the-art automatic speech recognition system.
    UNASSIGNED: The results showed that event-based segmentation led to better ARS performance than interval-based segmentation, as measured by the area under the curve (AUC) of the receiver operating characteristic. The results suggest differences in the quality and quantity of the data because of segmentation method. The inclusion of linguistic features improved the detection of whole-word repetitions, but not other types of stutters.
    UNASSIGNED: The findings suggest that event-based segmentation is more suitable for ARS than interval-based segmentation, as it preserves the exact boundaries and types of stutters. The linguistic features provide useful information for separating supra-lexical disfluencies from fluent speech but may not capture the acoustic characteristics of stutters. Future work should explore more robust and diverse features, as well as larger and more representative datasets, for developing effective ARS systems.






    The growth in online child exploitation material is a significant challenge for European Law Enforcement Agencies (LEAs). One of the most important sources of such online information corresponds to audio material that needs to be analyzed to find evidence in a timely and practical manner. That is why LEAs require a next-generation AI-powered platform to process audio data from online sources. We propose the use of speech recognition and keyword spotting to transcribe audiovisual data and to detect the presence of keywords related to child abuse. The considered models are based on two of the most accurate neural-based architectures to date: Wav2vec2.0 and Whisper. The systems were tested under an extensive set of scenarios in different languages. Additionally, keeping in mind that obtaining data from LEAs are very sensitive, we explore the use of federated learning to provide more robust systems for the addressed application, while maintaining the privacy of the data from LEAs. The considered models achieved a word error rate between 11% and 25%, depending on the language. In addition, the systems are able to recognize a set of spotted words with true-positive rates between 82% and 98%, depending on the language. Finally, federated learning strategies show that they can maintain and even improve the performance of the systems when compared to centralized trained models. The proposed systems set the basis for an AI-powered platform for automatic analysis of audio in the context of forensic applications of child abuse. The use of federated learning is also promising for the addressed scenario, where data privacy is an important issue to be managed.






    The study objective was to determine if cattle health and performance comparing a targeted bovine respiratory disease (BRD) control program based on individualized risk prediction generated by a novel technology (Whisper On Arrival) was superior to a negative control (no metaphylaxis) yet no different than a positive control (conventional BRD control; 100% application). Across four study sites, auction market-derived beef calves were randomly allocated to one of four BRD control treatment groups: 1) Negative control (Saline), 2) Positive control (Tildipirosin [TIL] to 100% of the group), 3) Whisper-high (±TIL based on conservative algorithm threshold), and 4) Whisper-low (±TIL based on aggressive algorithm threshold). Within either Whisper On Arrival group, only calves predicted to be above the algorithm threshold by the technology (determined a priori) were administered TIL leaving the remainder untreated. Cattle were followed to either a short-term timepoint (50 or 60 d; health outcomes, all sites; feed performance outcomes, two sites) or to closeout (two sites). Data were analyzed as a completely randomized block design separately at each site. Across all sites, BRD control antibiotic use was reduced by 11% to 43% between the two Whisper On Arrival treatment groups compared to the positive control. The positive control and both Whisper On Arrival groups reduced (P ≤ 0.05) BRD morbidity compared to negative controls at both the short-term timepoint at three of the four sites and at closeout at one of two sites. The positive control and both Whisper-managed groups had improved (P ≤ 0.05) average daily gain (ADG), dry-matter intake (DMI), and feed efficiency compared to negative controls at the short-term timepoint at one of two sites. At closeout, the positive control and both Whisper-managed groups improved (P ≤ 0.05) ADG (deads-in) compared to the negative control at one of the two sites. At one of two sites, the positive control and the Whisper-high group displayed an improvement (P ≤ 0.05) in hot carcass weight compared to the negative control. The Whisper On Arrival technology maintained the benefits of a conventional BRD control program yet reduced BRD control antibiotic use by 11% to 43%. This technology maintained the benefits of a conventional BRD control program while reducing antibiotic costs to the producer and supporting judicious antimicrobial use.







    Many transwomen seek voice and communication therapy to support their transition from their gender assigned at birth to their gender identity. This has led to an increased need to examine the perception of gender and femininity/masculinity to develop evidence-based intervention practices. In this study, we explore the auditory perception of femininity/masculinity in normally phonated and whispered speech. Transwomen, ciswomen, and cismen were recorded producing /hVd/ words. Naïve listeners rated femininity/masculinity of a speaker\'s voice using a visual analog scale, rather than completing a binary gender identification task. The results revealed that listeners rated speakers more ambiguously in whispered speech than normally phonated speech. An analysis of speaker and token characteristics revealed that in the normally phonated condition listeners consistently use f0 to rate femininity/masculinity. In addition, some evidence was found for possible contributions of formant frequencies, particularly F2, and duration. Taken together, this provides additional evidence for the salience of f0 and F2 for voice and communication intervention among transwomen.






    Software Defined Networking (SDN) centralizes network control to improve network programmability and flexibility. Contrary to wired settings, it is unclear how to support SDN in low power and lossy networks like typical Internet of Things (IoT) ones. Challenges encompass providing reliable in-band connectivity between the centralized controller and out-of-range nodes, and coping with physical limitations of the highly resource-constrained IoT devices. In this work, we present Whisper, an enabler for SDN in low power and lossy networks. The centralized Whisper controller of a network remotely controls nodes\' forwarding and cell allocation. To do so, the controller sends carefully computed routing and scheduling messages that are fully compatible with the protocols run in the network. This mechanism ensures the best possible in-band connectivity between the controller and all network nodes, capitalizing on an interface which is already supported by network devices. Whisper\'s internal algorithms further reduce the number of messages sent by the controller, to make the exerted control as lightweight as possible for the devices. Beyond detailing Whisper\'s design, we discuss compelling use cases that Whisper unlocks, including rerouting around low-battery devices and providing runtime defense to jamming attacks. We also describe how to implement Whisper in current IoT open standards (RPL and 6TiSCH) without modifying IoT devices\' firmware. This shows that Whisper can implement an SDN-like control for distributed low power networks with no specific support for SDN, from legacy to next generation IoT devices. Our testbed experiments show that Whisper successfully controls the network in both the scheduling and routing plane, with significantly less overhead than other SDN-IoT solutions, no additional latency and no packet loss.







    OBJECTIVE: This study compared whispering attempts by adults using tracheoesophageal (TE) speech with those by adults with a larynx. Comparisons were based on listener judgments, visual-perceptual assessment of spectrograms, and measures of the acoustic signal.
    METHODS: This was a prospective, cross-sectional study.
    METHODS: Seventeen TE and 10 laryngeal speakers produced sentences in a whisper and in their spoken voice. Listeners judged sentences as whispered or spoken. Judges signal-typed the spectrograms based on presence-absence of a \"voicing bar.\" Speaking rate, articulation rate, percent pause, and dB sound pressure level were measured.
    RESULTS: Twenty-nine percent of TE speakers were perceived to be whispering on whisper attempts; most others were perceived to be using spoken voice while attempting to whisper. Spectrograms of TE whispering were most often categorized as \"mostly voiced.\" Speaking and articulation rates were slower for TE speakers. There was a significantly greater reduction in speaking rate from spoken to whisper for the TE group. Percent pause did not differ significantly between groups and speaking mode. TE speakers had a significantly smaller difference in dB sound pressure level between spoken and whisper modes.
    CONCLUSIONS: Some individuals using TE speech can whisper based on auditory-perceptual judgment, but most were perceived to be speaking during these attempts. The fact that some TE participants could whisper indicates the behavior is possible and might be considered a therapeutic target if it is of importance to an individual. The percentage of TE speakers who can learn to whisper, and the optimal training approach, are yet to be determined.






    BACKGROUND: Whisper is known to be produced by different speakers differently, especially with respect to glottal configuration that influences glottal aerodynamics. Differences in whisper production and phonation types imply important linguistic information in many languages, are identified in vocal pathologies, are used to communicate mood and emotion, and are used in vocal performance.
    OBJECTIVE: The present study focused on investigating the aerodynamic differences between whisper and phonation at different loudness and adduction levels.
    METHODS: Three men and five women between 20 and 40 years of age participated in the study. Smooth syllable strings of the syllable /baep:/ were whispered and phonated at three different loudness levels (soft, medium, and loud) and three voice qualities (breathy, normal, and pressed). The voice qualities are associated with different adduction levels. This resulted in 18 treatment combinations (three adduction levels × three loudness levels × two sexes).
    RESULTS: A regression analysis was performed using a PROC MIXED procedure with SAS statistical software. Under similar production conditions, subglottal pressure was significantly lower in whisper than in phonation in 10 of 18 combinations, mean glottal airflow was significantly higher in whisper than in phonation in 13 of 18 combinations, and flow resistance was significantly lower in whisper than in phonation in 14 of 18 combinations, with the female subjects demonstrating these trends more frequently than the male subjects do. Of importance, in general, compared with phonation under similar production conditions, whisper is not always accompanied by lower subglottal pressure and higher airflows.
    CONCLUSIONS: Results from this study suggest that the typical finding of lower subglottal pressure, higher glottal airflow, and decreased flow resistance in whisper compared with phonation cannot be generalized to all individuals and depends on the \"whisper type.\" The nine basic production conditions (three loudness levels and three adduction levels) resulted in data that may help explain the wide range of variation of whisper production reported in earlier studies.





