Mesh : Aged Humans Middle Aged California Cognition Speech Speech Perception Speech Recognition Software

来  源:   DOI:10.1002/alz.067887

Abstract:
BACKGROUND: Recent reports have investigated the use of automatic speech recognition (ASR) to analyze and score verbal responses in cognitive tests. ASR scoring is objective, permits the efficient computerized administration of verbal tests, and generates timestamps that enable the detailed temporal analysis of responses. However, ASR transcription accuracy varies by engine, task, and participant, and ASR can incorrectly score responses from participants with atypical speech patterns. Here we describe the speech-transcription pipeline of the California Cognitive Assessment Battery (CCAB), which incorporates consensus ASR (CASR) to produce more accurate transcripts than possible with any single ASR engine. We also developed a Transcript Review Tool (TRT) which facilitates the manual correction of mis-transcribed words in problem subjects.
METHODS: Figure 1 shows the CCAB speech transcription pipeline. Realtime ASR transcriptions are obtained along with the transcriptions of the digital recordings of responses using six cloud-based ASR engines (e.g., Google, etc.). Individual transcripts are then combined to produce a \"consensus\" transcript, and a transcription confidence measure based primarily on the agreement between ASR engines (Figure 2). If needed, \"consensus\" transcripts can be manually corrected using the Transcript Review Tool which enables the review of all words or just those words below a predefined CASR confidence threshold (Figure 3).
RESULTS: ASR transcriptions were obtained from 442 healthy adults (mean age = 65.1 ±14.4) who each underwent three days of cognitive testing that included 25 verbal tests. In all, approximately 276 hours of speech were transcribed. Preliminary analyses show that CASR transcription accuracy surpassed 99% for tests with limited response sets (e.g., digit span, verbal list learning, face-name binding, etc.) and exceeded 95% for discursive speech tests (e.g., picture description and logical memory).
CONCLUSIONS: CASR transcription is more accurate than that of any single ASR engine. When combined with the TRT, \"consensus\" ASR can produce error-free, timestamped transcripts that enable the detailed analysis of verbal responses from older individuals at risk of cognitive decline.
摘要:
背景:最近的报告研究了使用自动语音识别(ASR)来分析和评分认知测试中的言语反应。ASR评分是客观的,允许口头测试的有效计算机化管理,并生成时间戳,以便对响应进行详细的时间分析。然而,ASR转录精度因引擎而异,任务,和参与者,ASR可能会错误地对具有非典型语音模式的参与者的反应进行评分。在这里,我们描述了加州认知评估电池(CCAB)的语音转录管道,它包含共识ASR(CASR),以产生比任何单个ASR引擎更准确的转录本。我们还开发了成绩单审核工具(TRT),该工具有助于手动更正问题主题中错误转录的单词。
方法:图1显示了CCAB语音转录流水线。使用六个基于云的ASR引擎(例如,Google,等。).然后将各个转录本组合以产生“共识”转录本,以及主要基于ASR引擎之间协议的转录置信度度量(图2)。如果需要,“共识”转录本可以使用转录本审查工具手动纠正,该工具可以审查所有单词或仅那些低于预定义CASR置信度阈值的单词(图3)。
结果:ASR转录来自442名健康成年人(平均年龄=65.1±14.4),他们每个人都接受了三天的认知测试,包括25项口头测试。总之,大约276小时的语音被转录。初步分析表明,对于具有有限响应集的测试,CASR转录准确性超过99%(例如,数字跨度,口头列表学习,面名绑定,等。),话语性语音测试超过95%(例如,图片描述和逻辑记忆)。
结论:CASR转录比任何单个ASR引擎都更准确。当与TRT结合时,“共识”ASR可以产生无差错,带有时间戳的笔录,可以详细分析有认知能力下降风险的老年人的言语反应。
公众号