演讲 Speech-医云文献数字医云科研云海量医学决策数据服务

Speech 关注

演讲

文献(30512篇)

百科

视频

1 Musical Performance Anxiety and Voice Handicap in Amateur Evangelical Singers.

业余福音派歌手的音乐表演焦虑和语音障碍。影响指数 : 2.3
发表时间：Jun 2024 29
来源期刊：J Voice PMID：38945728

DOI：10.1016/j.jvoice.2024.05.021
文章类型： Journal Article

目的：比较和关联音乐表演焦虑（MPA）和声音自我感知业余福音歌手，关注焦虑和这个样本中表现方面之间的相互作用。
方法：本研究采用横断面和定量方法，涉及75位来自福音派教会的业余福音歌手，年龄在18至59岁之间。数据收集包括样本识别和表征问卷的管理，肯尼音乐表演焦虑量表（K-MPAI）的巴西葡萄牙语版本，和歌唱语音障碍指数（S-VHI）。描述性分析使用绝对频率和相对频率，集中趋势的措施，和色散(平均值和标准偏差[SD])。为了比较声音自我评估协议和性能方面，应用了Kruskal-Wallis测试.采用Spearman相关检验进行相关分析。所有分析均以5%的显著性水平进行(P<0.05)。
结果：声乐热身和降温活动，表演后声乐不适，声乐自我评估与S-VHI得分显着相关，变量“比声音更响亮的乐器”与K-MPAI得分相关联。参与者的K-MPAI平均得分为85.12分（SD±36.6），样本的声音障碍平均得分为45.22（SD±32.3）。协议之间没有统计学上的显着相关性。
结论：合并声乐热身和冷身活动与S-VHI评分较低显著相关。相反，那些经历表演后声乐不适的人在S-VHI上表现出更高的分数。此外，评估方案之间缺乏相关性表明，虽然观察到显著水平的嗓音障碍，无法确定与MPA的直接联系。总的来说,这些发现有助于对塑造声音健康和业余福音歌手表演的多方面因素的细微差别的理解，从而指导该领域未来的研究和干预。
OBJECTIVE: To compare and correlate musical performance anxiety (MPA) and vocal self-perception among amateur evangelical singers, focusing on the interaction between anxiety and aspects of performance in this sample.
METHODS: This study employed a cross-sectional and quantitative approach, involving 75 amateur gospel singers from evangelical churches, aged between 18 and 59 years. Data collection included the administration of a sample identification and characterization questionnaire, the Brazilian Portuguese version of the Kenny Music Performance Anxiety Inventory (K-MPAI), and the Singing Voice Handicap Index (S-VHI). The descriptive analysis used absolute and relative frequencies, measures of central tendency, and dispersion (mean and standard deviation [SD]). To compare the vocal self-assessment protocols and performance aspects, the Kruskal-Wallis test was applied. Spearman\'s correlation test was used for correlation analysis. All analyses were conducted with a significance level set at 5% (P < 0.05).
RESULTS: Vocal warm-up and cool-down activities, vocal discomfort after performance, and vocal self-assessment were significantly associated with scores on S-VHI, and the variable \"instruments louder than voices\" was associated with the K-MPAI score. Participants exhibited a mean K-MPAI score of 85.12 points (SD ± 36.6), and the vocal handicap of the sample had a mean score of 45.22 (SD ± 32.3). There was no statistically significant correlation between the protocols.
CONCLUSIONS: Incorporating vocal warm-up and cool-down activities was significantly associated with lower scores on S-VHI. Conversely, those experiencing postperformance vocal discomfort exhibited higher scores on S-VHI. Moreover, the absence of correlation between the assessment protocols suggests that while significant levels of voice handicap were observed, a direct link to MPA cannot be definitively established. Overall, these findings contribute to a nuanced understanding of the multifaceted factors shaping vocal health and performance among amateur evangelical singers, thereby guiding future research and interventions in this field.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
2 Differences Between Autistic and Non-Autistic Individuals in Audiovisual Speech Integration: A Systematic Review and Meta-analysis.

自闭症和非自闭症个体在视听言语整合方面的差异：系统评价和荟萃分析。影响指数 : 9.052
发表时间：Jun 2024 28
来源期刊：Neurosci Biobehav Rev PMID：38945419

DOI：10.1016/j.neubiorev.2024.105787
文章类型： Journal Article

研究表明，自闭症患者在语音视听整合方面面临独特的挑战，尽管方法上的差异导致了不同的发现。我们进行了系统的文献检索，以确定测量自闭症和非自闭症个体之间视听语音整合的研究。在18项确定的研究中(组合N=952)，自闭症患者与非自闭症患者相比，视听整合受损(g=0.69，95%CI[0.53，0.85]，p<.001)。没有发现这种差异受参与者平均年龄的影响，研究样本量，偏见风险评分，或研究的范式。然而,一项亚组分析显示,儿童研究可能比成人研究显示更大的组间差异.自闭症患者视听言语整合受损的流行模式可能会对交流和社会行为产生级联影响。然而,小样本和设计/分析中的不一致转化为研究结果的相当大的异质性和潜在的单感和注意因素的影响的不透明度。我们建议未来研究的三个关键方向：更大的样本，更多关于成年人的研究，以及方法论和分析方法的标准化。
Research has indicated unique challenges in audiovisual integration of speech among autistic individuals, although methodological differences have led to divergent findings. We conducted a systematic literature search to identify studies that measured audiovisual speech integration among both autistic and non-autistic individuals. Across the 18 identified studies (combined N = 952), autistic individuals showed impaired audiovisual integration compared to their non-autistic peers (g = 0.69, 95% CI [0.53, 0.85], p <.001). This difference was not found to be influenced by participants\' mean ages, studies\' sample sizes, risk-of-bias scores, or paradigms investigated. However, a subgroup analysis suggested that child studies may show larger between-group differences than adult ones. The prevailing pattern of impaired audiovisual speech integration in autism may have cascading effects on communicative and social behavior. However, small samples and inconsistency in design/analysis translated into considerable heterogeneity in findings and opacity regarding the influence of underlying unisensory and attentional factors. We recommend three key directions for future research: larger samples, more research with adults, and standardization of methodology and analytical approaches.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
3 A Systematic Literature Review of the Natural History of Respiratory, Swallowing, Feeding, and Speech Functions in Spinal Muscular Atrophy (SMA).

呼吸自然史的系统文献综述，吞咽,喂养,脊髓性肌萎缩症（ SMA ）的言语功能。影响指数 : 4.693
发表时间：Jun 2024 26
来源期刊：J Neuromuscul Dis PMID：38943396

DOI：10.3233/JND-230248
文章类型： Journal Article

■呼吸和球功能障碍（包括吞咽，喂养,和言语功能）是脊髓性肌萎缩症（SMA）的主要症状，尤其是最严重的形式。证明疾病修饰疗法（DMT）的长期疗效需要了解SMA自然史。
■这项研究总结了已发表的关于呼吸，吞咽,喂养,未接受DMT的SMA患者的语音功能。
■电子数据库(Embase，MEDLINE,和循证医学评论）从数据库开始到2022年6月27日进行搜索，以获取报告1-3型SMA中呼吸和/或球功能结局数据的研究。将数据提取到预定义的模板中，并提供了这些数据的描述性摘要。
■包括91种出版物：43种关于呼吸系统的报告数据，吞咽,喂养,和/或言语功能结果。数据强调了1型SMA患者呼吸功能的早期丧失，通常需要12个月大的通气支持。2型或3型SMA患者随着时间的推移有失去呼吸功能的风险，在生命的第一个和第五个十年之间开始通气支持。吞咽和进食困难，包括窒息，咀嚼问题,和愿望，在SMA光谱中的患者中报告。吞咽和进食困难，需要非口服营养支持，在1岁之前报告了1型SMA，在2型SMA的10岁之前。整理了与其他bulbar功能有关的有限数据。
■自然史数据表明，未经治疗的SMA患者呼吸和延髓功能恶化，与更严重的疾病相关的更快的下降。本研究提供了SMA中Bulbar功能的自然历史数据的综合存储库，它强调了对该领域结局的一致评估对于理解和批准新疗法是必要的。
UNASSIGNED: Respiratory and bulbar dysfunctions (including swallowing, feeding, and speech functions) are key symptoms of spinal muscular atrophy (SMA), especially in its most severe forms. Demonstrating the long-term efficacy of disease-modifying therapies (DMTs) necessitates an understanding of SMA natural history.
UNASSIGNED: This study summarizes published natural history data on respiratory, swallowing, feeding, and speech functions in patients with SMA not receiving DMTs.
UNASSIGNED: Electronic databases (Embase, MEDLINE, and Evidence-Based Medicine Reviews) were searched from database inception to June 27, 2022, for studies reporting data on respiratory and/or bulbar function outcomes in Types 1-3 SMA. Data were extracted into a predefined template and a descriptive summary of these data was provided.
UNASSIGNED: Ninety-one publications were included: 43 reported data on respiratory, swallowing, feeding, and/or speech function outcomes. Data highlighted early loss of respiratory function for patients with Type 1 SMA, with ventilatory support typically required by 12 months of age. Patients with Type 2 or 3 SMA were at risk of losing respiratory function over time, with ventilatory support initiated between the first and fifth decades of life. Swallowing and feeding difficulties, including choking, chewing problems, and aspiration, were reported in patients across the SMA spectrum. Swallowing and feeding difficulties, and a need for non-oral nutritional support, were reported before 1 year of age in Type 1 SMA, and before 10 years of age in Type 2 SMA. Limited data relating to other bulbar functions were collated.
UNASSIGNED: Natural history data demonstrate that untreated patients with SMA experience respiratory and bulbar function deterioration, with a more rapid decline associated with greater disease severity. This study provides a comprehensive repository of natural history data on bulbar function in SMA, and it highlights that consistent assessment of outcomes in this area is necessary to benefit understanding and approval of new treatments.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
4 A large-scale and PCR-referenced vocal audio dataset for COVID-19.

COVID - 19 的大规模 PCR 参考人声音频数据集。影响指数 : 8.501
发表时间：Jun 2024 27
来源期刊：Sci Data PMID：38937483

DOI：10.1038/s41597-024-03492-w
文章类型： Dataset

英国COVID-19声乐音频数据集旨在训练和评估机器学习模型，该模型使用声乐音频对SARS-CoV-2感染状态或相关呼吸道症状进行分类。英国卫生安全局于2021年3月至2022年3月在英国通过国家测试和追踪计划和REACT-1调查招募了自愿参与者，这是在Alpha和DeltaSARS-CoV-2变体以及一些Omicron变体亚谱系的主要传播期间。自愿咳嗽的录音，呼气,和演讲是在“发声并帮助战胜冠状病毒”数字调查中与人口统计一起收集的，症状和自我报告的呼吸状况数据。数字调查提交与SARS-CoV-2测试结果相关。英国COVID-19声乐音频数据集代表了迄今为止最大的SARS-CoV-2PCR参考音频记录集合。PCR结果与72,999名参与者中的70,565名和25,706名阳性病例中的24,105名相关。45.6%的参与者报告了呼吸道症状。该数据集对生物声学研究具有其他潜在用途，11.3%的参与者自我报告哮喘，以及27.2%的流感PCR检测结果。
The UK COVID-19 Vocal Audio Dataset is designed for the training and evaluation of machine learning models that classify SARS-CoV-2 infection status or associated respiratory symptoms using vocal audio. The UK Health Security Agency recruited voluntary participants through the national Test and Trace programme and the REACT-1 survey in England from March 2021 to March 2022, during dominant transmission of the Alpha and Delta SARS-CoV-2 variants and some Omicron variant sublineages. Audio recordings of volitional coughs, exhalations, and speech were collected in the \'Speak up and help beat coronavirus\' digital survey alongside demographic, symptom and self-reported respiratory condition data. Digital survey submissions were linked to SARS-CoV-2 test results. The UK COVID-19 Vocal Audio Dataset represents the largest collection of SARS-CoV-2 PCR-referenced audio recordings to date. PCR results were linked to 70,565 of 72,999 participants and 24,105 of 25,706 positive cases. Respiratory symptoms were reported by 45.6% of participants. This dataset has additional potential uses for bioacoustics research, with 11.3% participants self-reporting asthma, and 27.2% with linked influenza PCR test results.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
5 Early pragmatic expectations in human infancy.

人类婴儿期的早期务实期望。影响指数 : 21.357
发表时间：Jun 2024 27
来源期刊：Behav Brain Sci PMID：38934442

DOI：10.1017/S0140525X23003230
文章类型： Journal Article

核心认知中对交际互动的语用期望没有空间。Spelke利用人类语言能力的组合力量来克服核心认知的局限性。问题是：为什么人类语言教师的组合力量应该支持婴儿的语用期望，而不仅仅是言语，还有非语言交际互动？
There is no room for pragmatic expectations about communicative interactions in core cognition. Spelke takes the combinatorial power of the human language faculty to overcome the limits of core cognition. The question is: Why should the combinatorial power of the human language faculty support infants\' pragmatic expectations not merely about speech, but also about nonverbal communicative interactions?

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
6 Nonlinear Regularization Decoding Method for Speech Recognition.

语音识别的非线性正则化解码方法 [J]. 影响指数 : 3.847
发表时间：Jun 2024 14
来源期刊：Sensors (Basel) PMID：38931629

DOI：10.3390/s24123846
文章类型： Journal Article

现有的端到端语音识别方法通常采用基于CTC和Transformer的混合解码器。然而,这些混合解码器中的误差累积问题阻碍了精度的进一步提高。此外,大多数现有模型都建立在Transformer架构上，这往往是复杂和不友好的小数据集。因此，提出了一种用于语音识别的非线性正则化解码方法。首先,我们介绍了非线性变换器解码器，打破传统的从左到右或从右到左的解码顺序，并实现任何字符之间的关联，减轻小数据集上Transformer体系结构的限制。其次,我们提出了一种新颖的正则化注意力模块来优化注意力得分矩阵，减少早期错误对后期输出的影响。最后,我们引入微小模型来解决模型参数过大的挑战。实验结果表明，我们的模型表现出良好的性能。与基线相比，我们的模型实现了0.12%的识别改进，0.54%，0.51%，和1.2%的Aishell1，Primewords，免费ST中文语料库，和维吾尔语的普通语音16.1数据集，分别。
Existing end-to-end speech recognition methods typically employ hybrid decoders based on CTC and Transformer. However, the issue of error accumulation in these hybrid decoders hinders further improvements in accuracy. Additionally, most existing models are built upon Transformer architecture, which tends to be complex and unfriendly to small datasets. Hence, we propose a Nonlinear Regularization Decoding Method for Speech Recognition. Firstly, we introduce the nonlinear Transformer decoder, breaking away from traditional left-to-right or right-to-left decoding orders and enabling associations between any characters, mitigating the limitations of Transformer architectures on small datasets. Secondly, we propose a novel regularization attention module to optimize the attention score matrix, reducing the impact of early errors on later outputs. Finally, we introduce the tiny model to address the challenge of overly large model parameters. The experimental results indicate that our model demonstrates good performance. Compared to the baseline, our model achieves recognition improvements of 0.12%, 0.54%, 0.51%, and 1.2% on the Aishell1, Primewords, Free ST Chinese Corpus, and Common Voice 16.1 datasets of Uyghur, respectively.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
7 Multiple talker processing in autistic adult listeners.

自闭症成人听众中的多说话者处理。影响指数 : 4.996
发表时间：06 2024 26
来源期刊：Sci Rep PMID：38926416

DOI：10.1038/s41598-024-62429-w
文章类型： Journal Article

适应说话者的变异性是一个复杂的、多层次的认知过程。它涉及将注意力转移到说话者的声音特征以及他们演讲的语言内容。由于语音和语音处理之间的相互依存关系，与单说话者环境相比，多说话者环境通常会产生额外的处理成本。在语音信号中的多个声学线索上有效地分配注意力的失败或不能具有有害的语言学习后果。然而，没有研究检查多说话者处理在具有非典型感知的人群中的影响，交流的社会和语言处理，包括自闭症患者.采用经典的单词监控任务，我们调查了澳大利亚英语自闭症患者（n=24）和非自闭症患者（n=28）的谈话者变异性的影响。听众对目标单词做出了回应(例如，苹果,鸭子,玉米)在随机的单词序列中。一半的序列是由一个说话者说的，另一半是由多个说话者说的。结果显示，自闭症参与者对准确发现的目标单词的敏感性得分与非自闭症参与者没有差异，不管他们是由一个或多个说话的人说。不出所料,非自闭症组显示出与谈话者变异性相关的既定处理成本(例如，响应时间较慢)。值得注意的是,自闭症听众的响应时间在单说话者或多说话者条件下没有差异，表明他们在适应谈话者的可变性时没有表现出感知处理成本。本发现对自闭症感知以及言语和语言处理的理论具有启示意义。
Accommodating talker variability is a complex and multi-layered cognitive process. It involves shifting attention to the vocal characteristics of the talker as well as the linguistic content of their speech. Due to an interdependence between voice and phonological processing, multi-talker environments typically incur additional processing costs compared to single-talker environments. A failure or inability to efficiently distribute attention over multiple acoustic cues in the speech signal may have detrimental language learning consequences. Yet, no studies have examined effects of multi-talker processing in populations with atypical perceptual, social and language processing for communication, including autistic people. Employing a classic word-monitoring task, we investigated effects of talker variability in Australian English autistic (n = 24) and non-autistic (n = 28) adults. Listeners responded to target words (e.g., apple, duck, corn) in randomised sequences of words. Half of the sequences were spoken by a single talker and the other half by multiple talkers. Results revealed that autistic participants\' sensitivity scores to accurately-spotted target words did not differ to those of non-autistic participants, regardless of whether they were spoken by a single or multiple talkers. As expected, the non-autistic group showed the well-established processing cost associated with talker variability (e.g., slower response times). Remarkably, autistic listeners\' response times did not differ across single- or multi-talker conditions, indicating they did not show perceptual processing costs when accommodating talker variability. The present findings have implications for theories of autistic perception and speech and language processing.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
8 Scope of Negation, Gestures, and Prosody: The English Negative Quantifier as a Case in Point.

否定的范围,手势,和韵律：英语负量词作为一个案例。影响指数 : 1.315
发表时间：Jun 2024 26
来源期刊：J Psycholinguist Res PMID：38926243

DOI：10.1007/s10936-024-10075-8
文章类型： Journal Article

本文研究了英语母语人士如何产生范围模糊的句子，以及他们如何利用手势和韵律来消除歧义。作为一个恰当的例子,本研究的参与者产生了英语负量词。他们出现在两个不同的位置：（1）没有候选人的选举是一个惊喜（a：对于那些当选的人，他们中没有一个令人惊讶\';b：\'没有候选人当选，这是一个惊喜\')和(2)没有候选人的选举是一个惊喜(A:\'对于那些当选,他们都不是一个惊喜，b:#没有候选人当选，这是一个惊喜。\'我们能够研究手势的产生和位置效应的韵律模式(即，a-解释在1和2中的两个不同位置可用)和解释效果(即在1）的相同位置有两种不同的解释。我们发现参与者倾向于在（a）解释中轻视不同的位置，但在(B)解释中更多的点头/跳动。虽然在（a）和（b）（1）的解释中没有韵律差异，在(1)和(2)中的(a)解释之间存在音调和持续时间差异。这项研究指出了加泰罗尼亚语和西班牙语等语言之间的抽象相似性（Prieto等人。inLingua131:136-150,2013.10.1016/j.lingua.2013.02.008；Tubau等人。语言学家修订版32(1)：115-142,2015。10.1515/tlr-2014-0016)在手势运动中，意义对于手势模式至关重要。我们强调，当韵律无法做到这一点时，手势模式可以消除歧义。
The present paper examines how English native speakers produce scopally ambiguous sentences and how they make use of gestures and prosody for disambiguation. As a case in point, the participants in the present study produced the English negative quantifiers. They appear in two different positions as (1) The election of no candidate was a surprise (a: \'for those elected, none of them was a surprise\'; b: \'no candidate was elected, and that was a surprise\') and (2) no candidate\'s election was a surprise (a: \'for those elected, none of them was a surprise\'; b: # \'no candidate was elected, and that was a surprise.\' We were able to investigate the gesture production and the prosodic patterns of the positional effects (i.e., a-interpretation is available at two different positions in 1 and 2) and the interpretation effects (i.e., two different interpretations are available in the same position in 1). We discovered that the participants tended to launch more head shakes in the (a) interpretation despites the different positions, but more head nod/beat in the (b) interpretation. While there is not a difference in prosody of no in (a) and (b) interpretation in (1), there are pitch and durational differences between (a) interpretations in (1) and (2). This study points out the abstract similarities across languages such as Catalan and Spanish (Prieto et al. in Lingua 131:136-150, 2013. 10.1016/j.lingua.2013.02.008; Tubau et al. in Linguist Rev 32(1):115-142, 2015. 10.1515/tlr-2014-0016) in the gestural movements, and the meaning is crucial for gesture patterns. We emphasize that gesture patterns disambiguate ambiguous interpretation when prosody cannot do so.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
9 A data-efficient and easy-to-use lip language interface based on wearable motion capture and speech movement reconstruction.

基于可穿戴运动捕捉和语音运动重建的数据高效且易于使用的唇语界面影响指数 : 14.957
发表时间：Jun 2024 28
来源期刊：Sci Adv PMID：38924408

DOI：10.1126/sciadv.ado9576
文章类型： Journal Article

唇语识别迫切需要可穿戴且易于使用的接口，以实现无干扰和高保真的唇读采集，并开发伴随的数据高效解码器建模方法。现有的解决方案遭受不可靠的唇读，渴望数据，并表现出较差的概括性。这里,我们提出了一种可穿戴式唇语解码技术，该技术基于可穿戴式动作捕捉和连续的嘴唇语音运动重建，实现了嘴唇运动的无干扰和高保真采集以及流利唇语的数据高效识别。该方法允许我们从非常有限的用户单词样本语料库中人工生成任何想要的连续语音数据集。通过使用这些人工数据集来训练解码器，对于93个英语句子的实际连续和流畅的唇语语音识别，我们在个体（n=7）中的平均准确率为92.0%，甚至没有观察到用户的训练烧伤，因为所有的训练数据集都是人工生成的。我们的方法极大地减少了用户的训练/学习负荷，并为唇语识别提供了一种数据高效且易于使用的范例。
Lip language recognition urgently needs wearable and easy-to-use interfaces for interference-free and high-fidelity lip-reading acquisition and to develop accompanying data-efficient decoder-modeling methods. Existing solutions suffer from unreliable lip reading, are data hungry, and exhibit poor generalization. Here, we propose a wearable lip language decoding technology that enables interference-free and high-fidelity acquisition of lip movements and data-efficient recognition of fluent lip language based on wearable motion capture and continuous lip speech movement reconstruction. The method allows us to artificially generate any wanted continuous speech datasets from a very limited corpus of word samples from users. By using these artificial datasets to train the decoder, we achieve an average accuracy of 92.0% across individuals (n = 7) for actual continuous and fluent lip speech recognition for 93 English sentences, even observing no training burn on users because all training datasets are artificially generated. Our method greatly minimizes users\' training/learning load and presents a data-efficient and easy-to-use paradigm for lip language recognition.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
10 Realistic About Reference Production: Testing the Effects of Domain Size and Saturation.

现实的参考生产：测试域大小和饱和度的影响。影响指数 : 2.617
发表时间：Jun 2024
来源期刊：Cogn Sci PMID：38924126

DOI：10.1111/cogs.13473
文章类型： Journal Article

视觉接地实验，确定的参考制作通常以填充对象的网格形式操纵简单的视觉场景，例如,以测试扬声器如何受到可见对象数量的影响。关于后者,发现语音开始时间随域大小而增加，至少当说话者指的是没有弹出视觉域的非显著目标对象时。这一发现表明，即使在许多干扰因素的情况下，扬声器执行视觉场景的逐个对象扫描。当前的研究调查了这种系统的处理策略是否可以通过所使用的场景的简化性质来解释。以及是否可以为照片般逼真的视觉场景识别不同的策略。在这样做的时候，我们进行了一个预先登记的实验，操纵域的大小和饱和度;复制测量的言语开始时间;并记录眼球运动，以测量说话人\'观看策略更直接。使用受控的照片逼真的场景，我们发现（1）随着存在更多的干扰物，语音开始时间线性增加；（2）较大的域引起相对较少的固定开关，主要是在演讲开始之前；(3)说话者在较大的领域中相对较少地固定目标，主要是在言语发作后。我们得出的结论是，在我们的照片逼真的场景中，仔细的逐个对象扫描仍然是主要策略，在有限的程度上与低水平的显著性机制相结合。未来研究的相关方向将是采用控制较少的照片逼真的刺激，从而允许根据上下文进行解释。
Experiments on visually grounded, definite reference production often manipulate simple visual scenes in the form of grids filled with objects, for example, to test how speakers are affected by the number of objects that are visible. Regarding the latter, it was found that speech onset times increase along with domain size, at least when speakers refer to nonsalient target objects that do not pop out of the visual domain. This finding suggests that even in the case of many distractors, speakers perform object-by-object scans of the visual scene. The current study investigates whether this systematic processing strategy can be explained by the simplified nature of the scenes that were used, and if different strategies can be identified for photo-realistic visual scenes. In doing so, we conducted a preregistered experiment that manipulated domain size and saturation; replicated the measures of speech onset times; and recorded eye movements to measure speakers\' viewing strategies more directly. Using controlled photo-realistic scenes, we find (1) that speech onset times increase linearly as more distractors are present; (2) that larger domains elicit relatively fewer fixation switches back and forth between the target and its distractors, mainly before speech onset; and (3) that speakers fixate the target relatively less often in larger domains, mainly after speech onset. We conclude that careful object-by-object scans remain the dominant strategy in our photo-realistic scenes, to a limited extent combined with low-level saliency mechanisms. A relevant direction for future research would be to employ less controlled photo-realistic stimuli that do allow for interpretation based on context.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文

Speech 关注

1 Musical Performance Anxiety and Voice Handicap in Amateur Evangelical Singers.

2 Differences Between Autistic and Non-Autistic Individuals in Audiovisual Speech Integration: A Systematic Review and Meta-analysis.

3 A Systematic Literature Review of the Natural History of Respiratory, Swallowing, Feeding, and Speech Functions in Spinal Muscular Atrophy (SMA).

4 A large-scale and PCR-referenced vocal audio dataset for COVID-19.

5 Early pragmatic expectations in human infancy.

6 Nonlinear Regularization Decoding Method for Speech Recognition.

7 Multiple talker processing in autistic adult listeners.

8 Scope of Negation, Gestures, and Prosody: The English Negative Quantifier as a Case in Point.

9 A data-efficient and easy-to-use lip language interface based on wearable motion capture and speech movement reconstruction.

10 Realistic About Reference Production: Testing the Effects of Domain Size and Saturation.