语音 Voice-医云文献数字医云科研云海量医学决策数据服务

Voice 关注

语音

文献(10354篇)

百科

视频

1 Listening quality leads to greater working alliance and well-being: Testing a social identity model of working alliance.

倾听质量导致更大的工作联盟和福祉：测试工作联盟的社会认同模型。影响指数 : 3.984
发表时间：Jun 2024 30
来源期刊：Br J Clin Psychol PMID：38946045

DOI：10.1111/bjc.12489
文章类型： Journal Article

目的：心理治疗作为“谈话治疗”的特征强调了积极倾听者对谈话疗效的重要性。我们测试工作联盟及其利益是否来自声音的表达，本身，或者是否需要主动倾听。我们研究了倾听在工作联盟的社会认同模型中的作用。
方法：在实验室实验中，大学生参与者向另一个人（同盟国学生）谈论压力管理，他们要么参与或不参与积极倾听。参与者报告了他们对联盟的看法，关键的社会心理变量，和幸福。
结果：积极倾听导致联盟的评分明显更高，程序正义,社会认同,和身份领导力，与没有积极倾听相比。积极倾听也会带来更大的积极影响和满意度。最终,支持一种解释路径模型，其中主动倾听通过社会认同预测工作联盟，身份领导力，程序正义。
结论：听力质量以与工作联盟的社会身份模型一致的方式增强联盟和福祉，是促进治疗联盟的战略。
OBJECTIVE: Characterization of psychotherapy as the \"talking cure\" de-emphasizes the importance of an active listener on the curative effect of talking. We test whether the working alliance and its benefits emerge from expression of voice, per se, or whether active listening is needed. We examine the role of listening in a social identity model of working alliance.
METHODS: University student participants in a laboratory experiment spoke about stress management to another person (a confederate student) who either did or did not engage in active listening. Participants reported their perceptions of alliance, key social-psychological variables, and well-being.
RESULTS: Active listening led to significantly higher ratings of alliance, procedural justice, social identification, and identity leadership, compared to no active listening. Active listening also led to greater positive affect and satisfaction. Ultimately, an explanatory path model was supported in which active listening predicted working alliance through social identification, identity leadership, and procedural justice.
CONCLUSIONS: Listening quality enhances alliance and well-being in a manner consistent with a social identity model of working alliance, and is a strategy for facilitating alliance in therapy.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
2 Musical Performance Anxiety and Voice Handicap in Amateur Evangelical Singers.

业余福音派歌手的音乐表演焦虑和语音障碍。影响指数 : 2.3
发表时间：Jun 2024 29
来源期刊：J Voice PMID：38945728

DOI：10.1016/j.jvoice.2024.05.021
文章类型： Journal Article

目的：比较和关联音乐表演焦虑（MPA）和声音自我感知业余福音歌手，关注焦虑和这个样本中表现方面之间的相互作用。
方法：本研究采用横断面和定量方法，涉及75位来自福音派教会的业余福音歌手，年龄在18至59岁之间。数据收集包括样本识别和表征问卷的管理，肯尼音乐表演焦虑量表（K-MPAI）的巴西葡萄牙语版本，和歌唱语音障碍指数（S-VHI）。描述性分析使用绝对频率和相对频率，集中趋势的措施，和色散(平均值和标准偏差[SD])。为了比较声音自我评估协议和性能方面，应用了Kruskal-Wallis测试.采用Spearman相关检验进行相关分析。所有分析均以5%的显著性水平进行(P<0.05)。
结果：声乐热身和降温活动，表演后声乐不适，声乐自我评估与S-VHI得分显着相关，变量“比声音更响亮的乐器”与K-MPAI得分相关联。参与者的K-MPAI平均得分为85.12分（SD±36.6），样本的声音障碍平均得分为45.22（SD±32.3）。协议之间没有统计学上的显着相关性。
结论：合并声乐热身和冷身活动与S-VHI评分较低显著相关。相反，那些经历表演后声乐不适的人在S-VHI上表现出更高的分数。此外，评估方案之间缺乏相关性表明，虽然观察到显著水平的嗓音障碍，无法确定与MPA的直接联系。总的来说,这些发现有助于对塑造声音健康和业余福音歌手表演的多方面因素的细微差别的理解，从而指导该领域未来的研究和干预。
OBJECTIVE: To compare and correlate musical performance anxiety (MPA) and vocal self-perception among amateur evangelical singers, focusing on the interaction between anxiety and aspects of performance in this sample.
METHODS: This study employed a cross-sectional and quantitative approach, involving 75 amateur gospel singers from evangelical churches, aged between 18 and 59 years. Data collection included the administration of a sample identification and characterization questionnaire, the Brazilian Portuguese version of the Kenny Music Performance Anxiety Inventory (K-MPAI), and the Singing Voice Handicap Index (S-VHI). The descriptive analysis used absolute and relative frequencies, measures of central tendency, and dispersion (mean and standard deviation [SD]). To compare the vocal self-assessment protocols and performance aspects, the Kruskal-Wallis test was applied. Spearman\'s correlation test was used for correlation analysis. All analyses were conducted with a significance level set at 5% (P < 0.05).
RESULTS: Vocal warm-up and cool-down activities, vocal discomfort after performance, and vocal self-assessment were significantly associated with scores on S-VHI, and the variable \"instruments louder than voices\" was associated with the K-MPAI score. Participants exhibited a mean K-MPAI score of 85.12 points (SD ± 36.6), and the vocal handicap of the sample had a mean score of 45.22 (SD ± 32.3). There was no statistically significant correlation between the protocols.
CONCLUSIONS: Incorporating vocal warm-up and cool-down activities was significantly associated with lower scores on S-VHI. Conversely, those experiencing postperformance vocal discomfort exhibited higher scores on S-VHI. Moreover, the absence of correlation between the assessment protocols suggests that while significant levels of voice handicap were observed, a direct link to MPA cannot be definitively established. Overall, these findings contribute to a nuanced understanding of the multifaceted factors shaping vocal health and performance among amateur evangelical singers, thereby guiding future research and interventions in this field.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
3 Functional connectivity during facial and vocal emotion recognition: Preliminary evidence for dissociations in developmental change by nonverbal modality.

面部和声音情感识别过程中的功能连接：非语言模态在发育变化中分离的初步证据。影响指数 : 3.054
发表时间：Jun 2024 28
来源期刊：Neuropsychologia PMID：38945440

DOI：10.1016/j.neuropsychologia.2024.108946
文章类型： Journal Article

情感识别（ER）技能的发展轨迹被认为因非语言模态而异，声乐ER比面部ER成熟晚。为了在行为水平上研究导致这种分离的潜在神经机制，本研究检查了青年在声乐和面部ER任务期间的神经功能连接是否显示出不同的发育变化。8-19岁的年轻人（n=41）在进行功能磁共振成像时完成了面部和声音ER任务，在两个时间点(相隔1年；行为数据n=36，对于神经数据，n=28)。偏最小二乘分析显示，ER期间的功能连通性都可以通过模态来区分（面部与面部的连通性模式不同。声乐ER)和跨时间-连通性的变化对于声乐ER特别明显。面部比声音更准确，并与年龄呈正相关；尽管任务绩效在1年内没有明显变化，潜在功能连接模式随时间的变化预测了参与者在时间2的ER准确性。一起来看,这些结果表明,声乐和面部ER由可区分的神经相关因子支持,这些神经相关因子可能经历不同的发育轨迹.我们的发现也是初步证据，表明网络整合的变化可能支持儿童期和青春期ER技能的发展。
The developmental trajectory of emotion recognition (ER) skills is thought to vary by nonverbal modality, with vocal ER becoming mature later than facial ER. To investigate potential neural mechanisms contributing to this dissociation at a behavioural level, the current study examined whether youth\'s neural functional connectivity during vocal and facial ER tasks showed differential developmental change across time. Youth ages 8-19 (n = 41) completed facial and vocal ER tasks while undergoing functional magnetic resonance imaging, at two timepoints (1 year apart; n = 36 for behavioural data, n = 28 for neural data). Partial least squares analyses revealed that functional connectivity during ER is both distinguishable by modality (with different patterns of connectivity for facial vs. vocal ER) and across time-with changes in connectivity being particularly pronounced for vocal ER. ER accuracy was greater for faces than voices, and positively associated with age; although task performance did not change appreciably across a 1-year period, changes in latent functional connectivity patterns across time predicted participants\' ER accuracy at Time 2. Taken together, these results suggest that vocal and facial ER are supported by distinguishable neural correlates that may undergo different developmental trajectories. Our findings are also preliminary evidence that changes in network integration may support the development of ER skills in childhood and adolescence.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
4 The Characterization of Normal Male and Female Voice from Surface Electromyographic Parameters.

从表面肌电图参数对正常男性和女性语音的表征。影响指数 : 3.508
发表时间：Jun 2024 1
来源期刊：J Pers Med PMID：38929813

DOI：10.3390/jpm14060592
文章类型： Journal Article

目前,关于人类声音的表征没有共识。本研究的目的是描述146名声音正常的人（西班牙语使用者）的喉外部肌肉组织的肌电行为，年龄在20至50岁之间。使用表面肌电图仪（SEMG）记录不同的发声任务。在所有声乐任务中，据观察，女性在舌骨上肌和胸锁乳突肌的激活（µV）高于男性，而男性在舌骨下肌肉有更高的激活。SEMG是一种有效的程序，可以帮助定义所研究人群中的正常声音特征，在临床检查中提供参考值。然而,有必要采用通用的评估任务系统和标准化的测量技术，以便与未来的研究进行比较。
Currently, there is no consensus on the characterization of the human voice. The objective of the present study is to describe the myoelectric behavior of the extrinsic musculature of the larynx in 146 people with normal voice (Spanish speakers), aged between 20 and 50 years old. Different vocal tasks were recorded using a surface electromyograph (SEMG). In all vocal tasks, it was observed that women had higher activation (µV) in the suprahyoid and sternocleidomastoid muscles than men, while men had higher activation in the infrahyoid muscles. SEMG is a valid procedure to help define normal vocal characteristics in the studied population, providing reference values during clinical examination. However, it is necessary to adopt a universal system of assessment tasks and standardized measurement techniques to allow for comparisons with future studies.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
5 Sex ratios in vocal ensembles affect perceptions of threat and belonging.

声乐合奏中的性别比例会影响对威胁和归属感的看法。影响指数 : 4.996
发表时间：06 2024 25
来源期刊：Sci Rep PMID：38914752

DOI：10.1038/s41598-024-65535-x
文章类型： Journal Article

人们经常与群体互动(即，合奏)在社交互动中。鉴于群体级别的信息在导航社会环境中很重要，我们期望对与个人威胁和社会归属感相关的群体的感知敏感性。大多数合奏感知研究都集中在视觉合奏上，很少研究听觉或声乐合奏。在四项研究中，我们提供的证据表明（i）感知者仅从声音中准确地提取一个群体的性别构成，(ii)威胁的判断随着男性人数的增加而增加，（iii）听众的归属感取决于小组中同性其他人的数量。这项工作促进了我们对社会认知的理解，人际交往,和集成编码以包括听觉信息，并揭示了人们从对发声群体的简短接触中提取相关社会信息的能力。
People often interact with groups (i.e., ensembles) during social interactions. Given that group-level information is important in navigating social environments, we expect perceptual sensitivity to aspects of groups that are relevant for personal threat as well as social belonging. Most ensemble perception research has focused on visual ensembles, with little research looking at auditory or vocal ensembles. Across four studies, we present evidence that (i) perceivers accurately extract the sex composition of a group from voices alone, (ii) judgments of threat increase concomitantly with the number of men, and (iii) listeners\' sense of belonging depends on the number of same-sex others in the group. This work advances our understanding of social cognition, interpersonal communication, and ensemble coding to include auditory information, and reveals people\'s ability to extract relevant social information from brief exposures to vocalizing groups.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
6 Maternal recorded voice played to preterm infants in incubators reduces her own depression, anxiety and stress: a pilot randomized control trial.

产妇录制的声音在孵化器中播放给早产儿减少了她自己的抑郁，焦虑和压力：一项随机对照试验。影响指数 : 2.323
发表时间：Dec 2024
来源期刊：J Matern Fetal Neonatal Med PMID：38910112

DOI：10.1080/14767058.2024.2362933
文章类型： Journal Article

■为了研究在NICU中播放母亲录制的声音对早产儿母亲心理健康的影响，焦虑和压力量表-21(DASS-21)问卷。
■这是一项在IV级NICU进行的单中心前瞻性随机对照试验。该试验在clinicaltrials.gov（NCT04559620）注册。纳入标准是胎龄在26周至30周之间的早产儿的母亲。在出生后的第一周，对所有登记的母亲进行了DASS-21问卷调查，然后由音乐治疗师记录他们的声音。在介入组中，在生命的15到21天之间，将记录的母亲声音播放到婴儿孵化器中。在生命的21至23天之间施用第二DASS-21。使用Wilcoxon秩和检验比较两组之间的DASS-21得分，并使用Wilcoxon符号秩检验比较干预前后的DASS-21得分。
■40名符合条件的母亲被随机分配：20名归干预组，20名归对照组。两组产妇和新生儿的基线特征相似。在基线或研究干预后，两组之间的DASS-21评分没有显着差异。实验组介入前后的DASS-21评分或其各个组成部分均无差异。对照组在第1周和第4周之间，DASS-21的总评分和DASS-21的焦虑成分显着降低。
■在这项随机对照试验研究中，根据DASS-21问卷的测量，在早产儿培养箱中播放的母亲声音对母亲的心理健康没有任何影响。在这项初步研究中获得的数据在未来的RCT（随机对照试验）中很有用，以解决这一重要问题。
UNASSIGNED: To study the effects of playing mother\'s recorded voice to preterm infants in the NICU on their mothers\' mental health as measured by the Depression, Anxiety and Stress Scale -21 (DASS-21) questionnaire.
UNASSIGNED: This was a pilot single center prospective randomized controlled trial done at a level IV NICU. The trial was registered at clinicaltrials.gov (NCT04559620). Inclusion criteria were mothers of preterm infants with gestational ages between 26wks and 30 weeks. DASS-21 questionnaire was administered to all the enrolled mothers in the first week after birth followed by recording of their voice by the music therapists. In the interventional group, recorded maternal voice was played into the infant incubator between 15 and 21 days of life. A second DASS-21 was administered between 21 and 23 days of life. The Wilcoxon rank-sum test was used to compare DASS-21 scores between the two groups and Wilcoxon signed-rank test was used to compare the pre- and post-intervention DASS-21 scores.
UNASSIGNED: Forty eligible mothers were randomized: 20 to the intervention group and 20 to the control group. The baseline maternal and neonatal characteristics were similar between the two groups. There was no significant difference in the DASS-21 scores between the two groups at baseline or after the study intervention. There was no difference in the pre- and post-interventional DASS-21 scores or its individual components in the experimental group. There was a significant decrease in the total DASS-21 score and the anxiety component of DASS-21 between weeks 1 and 4 in the control group.
UNASSIGNED: In this pilot randomized control study, recorded maternal voice played into preterm infant\'s incubator did not have any effect on maternal mental health as measured by the DASS-21 questionnaire. Data obtained in this pilot study are useful in future RCTs (Randomized Controlled Trial) to address this important issue.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
7 The time course of person perception from voices in the brain.

人从大脑中的声音感知的时间过程。影响指数 : 12.779
发表时间：Jun 2024 25
来源期刊：Proc Natl Acad Sci U S A PMID：38889147

DOI：10.1073/pnas.2318361121
文章类型： Journal Article

当听众听到一个声音时，他们很快就形成了一个复杂的第一印象，那就是那个声音背后的人可能是谁。我们使用脑电图和代表性相似性分析来描述这些来自声音的多变量第一印象是如何随着时间的推移在不同的抽象级别上出现的。我们发现，对于八个感知的身体(性别，年龄,和健康），特质(吸引力，支配地位,和可信度)，和社会特征(教育程度和专业精神)，表征出现得较早(刺激发作后~80毫秒)，语音声学在~100毫秒到400毫秒之间为这些表示做出贡献。虽然人的特征的印象是高度相关的，我们可以找到高度抽象的证据，个体特征的独立表示。这些抽象的表示随着时间的推移逐渐合并。也就是说,身体特征的表示(年龄，性别)早期出现(从~120毫秒)，而一些特质和社会特征的表征稍后出现(~360毫秒以上)。这些发现与最近的理论模型一致，并阐明了支持人从声音感知的计算。
When listeners hear a voice, they rapidly form a complex first impression of who the person behind that voice might be. We characterize how these multivariate first impressions from voices emerge over time across different levels of abstraction using electroencephalography and representational similarity analysis. We find that for eight perceived physical (gender, age, and health), trait (attractiveness, dominance, and trustworthiness), and social characteristics (educatedness and professionalism), representations emerge early (~80 ms after stimulus onset), with voice acoustics contributing to those representations between ~100 ms and 400 ms. While impressions of person characteristics are highly correlated, we can find evidence for highly abstracted, independent representations of individual person characteristics. These abstracted representationse merge gradually over time. That is, representations of physical characteristics (age, gender) arise early (from ~120 ms), while representations of some trait and social characteristics emerge later (~360 ms onward). The findings align with recent theoretical models and shed light on the computations underpinning person perception from voices.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
8 Who is singing? Voice recognition from spoken versus sung speech.

谁在唱歌？来自口语和歌唱语音的语音识别。影响指数 : 暂无
发表时间：Jun 2024 1
来源期刊：JASA Express Lett PMID：38888432

DOI：10.1121/10.0026385
文章类型： Journal Article

唱歌在社会上很重要，但限制了语音声学，可能掩盖声音身份的某些方面。对于听众如何从演唱语音中提取讲话者的细节或在演唱和口语方式中识别讲话者，人们知之甚少。这里,听众（n=149）接受了训练以识别唱歌或说话的声音，然后以两种方式测试了他们对这些声音的识别。最初通过语音学习人声身份比通过歌曲更容易。在测试中，跨模态语音识别是偶然的，但比模态内识别弱。我们得出的结论是，说话者的信息可以在唱歌的语音中访问，尽管歌曲的声学限制。
Singing is socially important but constrains voice acoustics, potentially masking certain aspects of vocal identity. Little is known about how well listeners extract talker details from sung speech or identify talkers across the sung and spoken modalities. Here, listeners (n = 149) were trained to recognize sung or spoken voices and then tested on their identification of these voices in both modalities. Learning vocal identities was initially easier through speech than song. At test, cross-modality voice recognition was above chance, but weaker than within-modality recognition. We conclude that talker information is accessible in sung speech, despite acoustic constraints in song.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
9 Detection of Parkinson disease using multiclass machine learning approach.

使用多类机器学习方法检测帕金森病。影响指数 : 4.996
发表时间：06 2024 15
来源期刊：Sci Rep PMID：38877028

DOI：10.1038/s41598-024-64004-9
文章类型： Journal Article

帕金森病（PD）是一种常见的神经系统疾病，其特征是运动和认知障碍，通常表现在50岁左右，并出现步态困难和言语障碍等症状。虽然治愈仍然难以捉摸，通过药物治疗症状是可能的。及时检测是有效管理疾病的关键。在这项研究中,我们利用机器学习(ML)和深度学习(DL)技术，特别是K最近邻(KNN)和前馈神经网络(FNN)模型，根据语音信号特征区分PD个体和健康个体。我们的数据集，来自加州大学欧文分校（UCI），包括从31名患者收集的195个录音。为了优化模型性能，我们采用各种策略，包括合成少数民族过采样技术(SMOTE)来解决班级不平衡问题，特征选择以识别最相关的特征，和使用RandomizedSearchCV的超参数调整。我们的实验表明，FNN和KSVM模型，在80-20分割的数据集上分别进行训练和测试，产生最有希望的结果。FNN模型实现了令人印象深刻的99.11%的整体精度，98.78%的召回率，99.96%精度,和99.23%的F1得分。同样，KSVM模型表现出强大的性能，总体准确率为95.89%，召回96.88%，精密度为98.71%,f1评分为97.62%。总的来说,我们的研究展示了ML和DL技术在从语音信号中准确识别PD方面的功效，强调这些方法对帕金森病的早期诊断和干预策略有重要贡献。
Parkinson\'s Disease (PD) is a prevalent neurological condition characterized by motor and cognitive impairments, typically manifesting around the age of 50 and presenting symptoms such as gait difficulties and speech impairments. Although a cure remains elusive, symptom management through medication is possible. Timely detection is pivotal for effective disease management. In this study, we leverage Machine Learning (ML) and Deep Learning (DL) techniques, specifically K-Nearest Neighbor (KNN) and Feed-forward Neural Network (FNN) models, to differentiate between individuals with PD and healthy individuals based on voice signal characteristics. Our dataset, sourced from the University of California at Irvine (UCI), comprises 195 voice recordings collected from 31 patients. To optimize model performance, we employ various strategies including Synthetic Minority Over-sampling Technique (SMOTE) for addressing class imbalance, Feature Selection to identify the most relevant features, and hyperparameter tuning using RandomizedSearchCV. Our experimentation reveals that the FNN and KSVM models, trained on an 80-20 split of the dataset for training and testing respectively, yield the most promising results. The FNN model achieves an impressive overall accuracy of 99.11%, with 98.78% recall, 99.96% precision, and a 99.23% f1-score. Similarly, the KSVM model demonstrates strong performance with an overall accuracy of 95.89%, recall of 96.88%, precision of 98.71%, and an f1-score of 97.62%. Overall, our study showcases the efficacy of ML and DL techniques in accurately identifying PD from voice signals, underscoring the potential for these approaches to contribute significantly to early diagnosis and intervention strategies for Parkinson\'s Disease.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
10 Investigation of Deepfake Voice Detection Using Speech Pause Patterns: Algorithm Development and Validation.

使用语音暂停模式进行 Deepfake 语音检测的研究：算法开发和验证。影响指数 : 暂无
发表时间：Mar 2024 21
来源期刊：JMIR Biomed Eng PMID：38875685

DOI：10.2196/56245
文章类型： Journal Article

背景：数字时代见证了对新闻和信息的数字平台的日益依赖，再加上“deepfake”技术的出现。Deepfakes,利用语音记录和图像的大量数据集的深度学习模型，对媒体真实性构成重大威胁，可能导致不道德的滥用，如冒充和传播虚假信息。
目标：为了应对这一挑战，这项研究旨在引入先天生物过程的概念，以区分真实的人类声音和克隆的声音。我们建议存在或不存在某些感知特征，比如讲话中的停顿,可以有效区分克隆和真实的音频。
方法：共招募了49名具有不同种族背景和口音的成年参与者。每个参与者贡献语音样本，用于训练多达3个不同的语音克隆文本到语音模型和3个控制段落。随后，克隆模型生成了控制段落的合成版本，产生由每个参与者多达9个克隆音频样本和3个对照样本组成的数据集。我们分析了呼吸等生物行为引起的语音停顿，吞咽,和认知过程。计算了对应于语音暂停简档的五个音频特征。评估了这些特征的真实音频和克隆音频之间的差异，和5个经典的机器学习算法实现了使用这些特征来创建预测模型。通过对看不见的数据进行测试，评估了最优模型的泛化能力，结合了一个朴素的生成器，一个模型天真的段落，和幼稚的参与者。
结果：克隆音频显示暂停之间的时间显着增加（P<.001），语音段长度的变化减少（P=0.003），发言时间的总比例增加(P=.04)，语音中的micro和macropauses比率降低（P=0.01）。使用这些功能实现了五个机器学习模型，AdaBoost模型展示了最高的性能，实现5倍交叉验证平衡精度为0.81(SD0.05)。其他模型包括支持向量机(平衡精度0.79，SD0.03)，随机森林(平衡精度0.78，SD0.04)，逻辑回归，和决策树(平衡精度0.76，SD0.10和0.72，SD0.06)。在评估最优AdaBoost模型时，在预测未知数据时，它实现了0.79的总体测试准确性。
结论：引入感知，机器学习模型中的生物特征在区分真实的人类声音和克隆音频方面显示出有希望的结果。
BACKGROUND: The digital era has witnessed an escalating dependence on digital platforms for news and information, coupled with the advent of \"deepfake\" technology. Deepfakes, leveraging deep learning models on extensive data sets of voice recordings and images, pose substantial threats to media authenticity, potentially leading to unethical misuse such as impersonation and the dissemination of false information.
OBJECTIVE: To counteract this challenge, this study aims to introduce the concept of innate biological processes to discern between authentic human voices and cloned voices. We propose that the presence or absence of certain perceptual features, such as pauses in speech, can effectively distinguish between cloned and authentic audio.
METHODS: A total of 49 adult participants representing diverse ethnic backgrounds and accents were recruited. Each participant contributed voice samples for the training of up to 3 distinct voice cloning text-to-speech models and 3 control paragraphs. Subsequently, the cloning models generated synthetic versions of the control paragraphs, resulting in a data set consisting of up to 9 cloned audio samples and 3 control samples per participant. We analyzed the speech pauses caused by biological actions such as respiration, swallowing, and cognitive processes. Five audio features corresponding to speech pause profiles were calculated. Differences between authentic and cloned audio for these features were assessed, and 5 classical machine learning algorithms were implemented using these features to create a prediction model. The generalization capability of the optimal model was evaluated through testing on unseen data, incorporating a model-naive generator, a model-naive paragraph, and model-naive participants.
RESULTS: Cloned audio exhibited significantly increased time between pauses (P<.001), decreased variation in speech segment length (P=.003), increased overall proportion of time speaking (P=.04), and decreased rates of micro- and macropauses in speech (both P=.01). Five machine learning models were implemented using these features, with the AdaBoost model demonstrating the highest performance, achieving a 5-fold cross-validation balanced accuracy of 0.81 (SD 0.05). Other models included support vector machine (balanced accuracy 0.79, SD 0.03), random forest (balanced accuracy 0.78, SD 0.04), logistic regression, and decision tree (balanced accuracies 0.76, SD 0.10 and 0.72, SD 0.06). When evaluating the optimal AdaBoost model, it achieved an overall test accuracy of 0.79 when predicting unseen data.
CONCLUSIONS: The incorporation of perceptual, biological features into machine learning models demonstrates promising results in distinguishing between authentic human voices and cloned audio.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)

Voice 关注

1 Listening quality leads to greater working alliance and well-being: Testing a social identity model of working alliance.

2 Musical Performance Anxiety and Voice Handicap in Amateur Evangelical Singers.

3 Functional connectivity during facial and vocal emotion recognition: Preliminary evidence for dissociations in developmental change by nonverbal modality.

4 The Characterization of Normal Male and Female Voice from Surface Electromyographic Parameters.

5 Sex ratios in vocal ensembles affect perceptions of threat and belonging.

6 Maternal recorded voice played to preterm infants in incubators reduces her own depression, anxiety and stress: a pilot randomized control trial.

7 The time course of person perception from voices in the brain.

8 Who is singing? Voice recognition from spoken versus sung speech.

9 Detection of Parkinson disease using multiclass machine learning approach.

10 Investigation of Deepfake Voice Detection Using Speech Pause Patterns: Algorithm Development and Validation.