Evaluating and Interpreting Pooling Techniques in Spectrogram-Based Audio Analysis Using Diverse Metrics

Salvato in:
Dettagli Bibliografici
Pubblicato in:International Journal of Advanced Computer Science and Applications vol. 16, no. 7 (2025)
Autore principale: PDF
Pubblicazione:
Science and Information (SAI) Organization Limited
Soggetti:
Accesso online:Citation/Abstract
Full Text - PDF
Tags: Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!

MARC

LEADER 00000nab a2200000uu 4500
001 3240918339
003 UK-CbPIL
022 |a 2158-107X 
022 |a 2156-5570 
024 7 |a 10.14569/IJACSA.2025.0160795  |2 doi 
035 |a 3240918339 
045 2 |b d20250101  |b d20251231 
100 1 |a PDF 
245 1 |a Evaluating and Interpreting Pooling Techniques in Spectrogram-Based Audio Analysis Using Diverse Metrics 
260 |b Science and Information (SAI) Organization Limited  |c 2025 
513 |a Journal Article 
520 3 |a Audio analysis is a rapidly advancing field that spans various domains, including speech, music, and environmental sound data. Using spectrograms with Convolutional Neural Networks (CNNs) enables the visualization and extraction of critical audio features by combining time-frequency representations with deep learning. Pooling plays a crucial role in this process, as it reduces dimensionality while retaining essential information. However, existing evaluations of pooling methods primarily emphasize downstream task performance, such as classification accuracy, often overlooking their effectiveness in preserving critical signal features. To address this gap, we use 17 distinct metrics, categorized into four domains, to comprehensively assess various pooling operations. Furthermore, we explore the underex-amined relationship between specific pooling techniques and their impact on feature retention across diverse audio applications. Our analysis encompasses spectrograms from three audio domains (speech, music, and environmental sound), identifying their key characteristics, and grouping them accordingly. Using this setup, we evaluate the performance of 12 pooling methods across these applications. By investigating the features critical to each task and evaluating how well different pooling techniques preserve them, we give insights into their suitability for specific applications. This work aims to guide researchers in selecting the most appropriate pooling strategies for their applications, enabling more granular evaluations, improving explainability, and thereby advancing the precision and efficiency of audio analysis pipelines. 
653 |a Background noise 
653 |a Speech 
653 |a Music 
653 |a Performance evaluation 
653 |a Machine learning 
653 |a Spectrograms 
653 |a Artificial neural networks 
653 |a Accuracy 
653 |a Deep learning 
653 |a Computer science 
653 |a Fourier transforms 
653 |a Larynx 
653 |a Neural networks 
653 |a Signal processing 
653 |a Sound 
773 0 |t International Journal of Advanced Computer Science and Applications  |g vol. 16, no. 7 (2025) 
786 0 |d ProQuest  |t Advanced Technologies & Aerospace Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3240918339/abstract/embedded/6A8EOT78XXH2IG52?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3240918339/fulltextPDF/embedded/6A8EOT78XXH2IG52?source=fedsrch