Primate Subsegment Sorting
Reference
- Kölle, M., Illium, S., Zorn, M., Nüßlein, J., Suchostawski, P., and Linnhoff-Popien, C. 2023. Improving Primate Sounds Classification using Binary Presorting for Deep Learning. Springer CCIS Series.
Automated acoustic classification plays a vital role in wildlife monitoring and bioacoustics research. This study introduces a sophisticated pre-processing and training strategy to significantly enhance the accuracy of multi-class audio classification, specifically targeting the identification of different primate species from field recordings.
A key challenge in bioacoustics is dealing with datasets containing weak labels (where calls of interest occupy only a portion of a labeled segment), varying segment lengths, and poor signal-to-noise ratios (SNR). Our approach addresses this by:
- Subsegment Analysis: Processing audio recordings represented as MEL spectrograms.
- Refined Labeling: Meticulously relabeling subsegments within the spectrograms. This “binary presorting” step effectively identifies and isolates the actual vocalizations of interest within longer, weakly labeled recordings.
- CNN Training: Training Convolutional Neural Networks (CNNs) on these refined, higher-quality subsegment inputs.
- Data Augmentation: Employing innovative data augmentation techniques suitable for spectrogram data to further improve model robustness.

The effectiveness of this methodology was evaluated on the challenging ComParE 2021 Primate dataset. The results demonstrate remarkable improvements in classification performance, achieving substantially higher accuracy and Unweighted Average Recall (UAR) scores compared to existing baseline methods.

This work represents a significant advancement in handling difficult, real-world bioacoustic data, showcasing how careful data refinement prior to deep learning model training can dramatically enhance classification outcomes. [Kölle et al. 2023]