背景:声乐生物标志物,从声音特征的声学分析中得出,提供非侵入性的医疗筛查途径,诊断,和监测。先前的研究证明了通过智能手机记录语音的声学分析来预测2型糖尿病的可行性。在这项工作的基础上,这项研究探讨了音频数据压缩对声学声乐生物标志物开发的影响,这对于在医疗保健中更广泛的适用性至关重要。
目的:本研究的目的是分析常见的音频压缩算法(MP3,M4A,和WMA)由3种不同的转换工具以2种比特率应用,影响对声音生物标志物检测至关重要的特征。
方法:使用转换为MP3,M4A的未压缩语音样本,研究了音频数据压缩对声学声乐生物标志物开发的影响。和WMA格式在2比特率(320和128kbps)与MediaHuman(MH)音频转换器,WonderShare(WS)UniConverter,和快进运动图像专家组(FFmpeg)。数据集包括来自505名参与者的记录,总共17298个音频文件,使用智能手机收集。参与者每天记录一个固定的英语句子,最多6次,最长14天。特征提取,包括音高,抖动,强度,和梅尔频率倒谱系数(MFCC),是使用Python和Parselmouth进行的。使用Wilcoxon符号秩检验和Bonferroni校正进行多重比较用于统计分析。
结果:在这项研究中,最初从505名参与者那里录制了36,970个音频文件,筛选后,有17298张录音符合固定的句子标准。音频转换软件之间的差异,MH,WS,和FFmpeg,值得注意的是,影响压缩结果,如恒定或可变比特率。分析包括不同的数据压缩格式和广泛的语音特征和MFCC。Wilcoxon符号秩检验得出P值,低于Bonferroni校正的显著性水平的那些表明由于压缩引起的显著改变。结果表明了跨格式和比特率的压缩的特定特征影响。与WS转换的文件相比,MH转换的文件表现出更大的弹性。比特率也影响了功能稳定性,38例唯一受单一比特率影响。值得注意的是,语音特征在各种转换方法中显示出比MFCC更高的稳定性。
结论:发现压缩效果具有特定特征,MH和FFmpeg表现出更大的弹性。某些功能一直受到影响,强调理解特征弹性对诊断应用的重要性。考虑到声乐生物标志物在医疗保健中的实施,为数据存储或传输目的找到通过压缩保持一致的功能是很有价值的。专注于特定的功能和格式,未来的研究可以拓宽范围,包括不同的特征,实时压缩算法,和各种记录方法。这项研究增强了我们对音频压缩对语音特征和MFCC的影响的理解,为跨领域开发应用程序提供见解。该研究强调了特征稳定性在处理压缩音频数据中的重要性,为在不断发展的技术环境中使用明智的语音数据奠定基础。
BACKGROUND: Vocal biomarkers, derived from acoustic analysis of vocal characteristics, offer noninvasive avenues for medical screening, diagnostics, and monitoring. Previous research demonstrated the feasibility of predicting type 2 diabetes mellitus through acoustic analysis of smartphone-recorded speech. Building upon this work, this
study explores the impact of audio data compression on acoustic vocal biomarker development, which is critical for broader applicability in health care.
OBJECTIVE: The objective of this research is to analyze how common audio compression algorithms (MP3, M4A, and WMA) applied by 3 different conversion tools at 2 bitrates affect features crucial for vocal biomarker detection.
METHODS: The impact of audio data compression on acoustic vocal biomarker development was investigated using uncompressed voice samples converted into MP3, M4A, and WMA formats at 2 bitrates (320 and 128 kbps) with MediaHuman (MH) Audio Converter, WonderShare (WS) UniConverter, and Fast Forward Moving Picture Experts Group (FFmpeg). The data set comprised recordings from 505 participants, totaling 17,298 audio files, collected using a smartphone. Participants recorded a fixed English sentence up to 6 times daily for up to 14 days. Feature extraction, including pitch, jitter, intensity, and Mel-frequency cepstral coefficients (MFCCs), was conducted using Python and Parselmouth. The Wilcoxon signed rank test and the Bonferroni correction for multiple comparisons were used for statistical analysis.
RESULTS: In this
study, 36,970 audio files were initially recorded from 505 participants, with 17,298 recordings meeting the fixed sentence criteria after screening. Differences between the audio conversion software, MH, WS, and FFmpeg, were notable, impacting compression outcomes such as constant or variable bitrates. Analysis encompassed diverse data compression formats and a wide array of voice features and MFCCs. Wilcoxon signed rank tests yielded P values, with those below the Bonferroni-corrected significance level indicating significant alterations due to compression. The results indicated feature-specific impacts of compression across formats and bitrates. MH-converted files exhibited greater resilience compared to WS-converted files. Bitrate also influenced feature stability, with 38 cases affected uniquely by a single bitrate. Notably, voice features showed greater stability than MFCCs across conversion methods.
CONCLUSIONS: Compression effects were found to be feature specific, with MH and FFmpeg showing greater resilience. Some features were consistently affected, emphasizing the importance of understanding feature resilience for diagnostic applications. Considering the implementation of vocal biomarkers in health care, finding features that remain consistent through compression for data storage or transmission purposes is valuable. Focused on specific features and formats, future research could broaden the scope to include diverse features, real-time compression algorithms, and various recording methods. This
study enhances our understanding of audio compression\'s influence on voice features and MFCCs, providing insights for developing applications across fields. The research underscores the significance of feature stability in working with compressed audio data, laying a foundation for informed voice data use in evolving technological landscapes.