关键词: Acoustic communication Affinity propagation Fuzzy clustering Graded signals Machine learning Supervised classification Support vector machines Uniform manifold approximation and projection (UMAP) Unsupervised clustering Vocal repertoire

Mesh : Animals Vocalization, Animal / physiology Male Pongo pygmaeus / physiology Reproducibility of Results Machine Learning Acoustics Sound Spectrography Borneo

来  源:   DOI:10.7717/peerj.17320   PDF(Pubmed)

Abstract:
Vocal complexity is central to many evolutionary hypotheses about animal communication. Yet, quantifying and comparing complexity remains a challenge, particularly when vocal types are highly graded. Male Bornean orangutans (Pongo pygmaeus wurmbii) produce complex and variable \"long call\" vocalizations comprising multiple sound types that vary within and among individuals. Previous studies described six distinct call (or pulse) types within these complex vocalizations, but none quantified their discreteness or the ability of human observers to reliably classify them. We studied the long calls of 13 individuals to: (1) evaluate and quantify the reliability of audio-visual classification by three well-trained observers, (2) distinguish among call types using supervised classification and unsupervised clustering, and (3) compare the performance of different feature sets. Using 46 acoustic features, we used machine learning (i.e., support vector machines, affinity propagation, and fuzzy c-means) to identify call types and assess their discreteness. We additionally used Uniform Manifold Approximation and Projection (UMAP) to visualize the separation of pulses using both extracted features and spectrogram representations. Supervised approaches showed low inter-observer reliability and poor classification accuracy, indicating that pulse types were not discrete. We propose an updated pulse classification approach that is highly reproducible across observers and exhibits strong classification accuracy using support vector machines. Although the low number of call types suggests long calls are fairly simple, the continuous gradation of sounds seems to greatly boost the complexity of this system. This work responds to calls for more quantitative research to define call types and quantify gradedness in animal vocal systems and highlights the need for a more comprehensive framework for studying vocal complexity vis-à-vis graded repertoires.
摘要:
声乐复杂性是许多关于动物交流的进化假设的核心。然而,量化和比较复杂性仍然是一个挑战,特别是当声乐类型高度分级时。雄性婆罗洲猩猩(Pongopygmaeuswurmbii)会产生复杂而可变的“长叫声”发声,其中包括多种声音类型,这些声音类型在个体内部和个体之间各不相同。先前的研究描述了这些复杂发声中的六种不同的呼叫(或脉冲)类型,但是没有人量化它们的离散性或人类观察者对它们进行可靠分类的能力。我们研究了13个人的长电话:(1)评估和量化三个训练有素的观察者的视听分类的可靠性,(2)使用监督分类和无监督聚类区分调用类型,(3)比较不同特征集的性能。使用46个声学特征,我们使用了机器学习(即,支持向量机,亲和繁殖,和模糊c均值)来识别呼叫类型并评估其离散性。我们还使用均匀流形近似和投影(UMAP)使用提取的特征和频谱图表示来可视化脉冲的分离。监督方法显示观察者间可靠性低,分类精度差,表明脉冲类型不是离散的。我们提出了一种更新的脉冲分类方法,该方法在观察者之间具有很高的可重复性,并且使用支持向量机具有很强的分类准确性。尽管呼叫类型的数量较少表明长呼叫相当简单,声音的连续渐变似乎大大提升了这个系统的复杂性。这项工作响应了进行更多定量研究以定义呼叫类型并量化动物声乐系统中的分级性的呼吁,并强调了需要一个更全面的框架来研究相对于分级曲目的声乐复杂性。
公众号