Deep Learning-Based Speech Enhancement for Robust Sound Classification in Security Systems

Guardado en:
Detalles Bibliográficos
Publicado en:Electronics vol. 14, no. 13 (2025), p. 2643-2668
Autor principal: Mensah, Samuel Yaw
Otros Autores: Zhang, Tao, Mahmud, Nahid AI, Geng Yanzhang
Publicado:
MDPI AG
Materias:
Acceso en línea:Citation/Abstract
Full Text + Graphics
Full Text - PDF
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!

MARC

LEADER 00000nab a2200000uu 4500
001 3229142959
003 UK-CbPIL
022 |a 2079-9292 
024 7 |a 10.3390/electronics14132643  |2 doi 
035 |a 3229142959 
045 2 |b d20250101  |b d20251231 
084 |a 231458  |2 nlm 
100 1 |a Mensah, Samuel Yaw  |u School of Information Engineering, Tianjin University, 92 Weijin Road, Nankai District, Tianjin 300072, China 
245 1 |a Deep Learning-Based Speech Enhancement for Robust Sound Classification in Security Systems 
260 |b MDPI AG  |c 2025 
513 |a Journal Article 
520 3 |a Deep learning has emerged as a powerful technique for speech enhancement, particularly in security systems where audio signals are often degraded by non-stationary noise. Traditional signal processing methods struggle in such conditions, making it difficult to detect critical sounds like gunshots, alarms, and unauthorized speech. This study investigates a hybrid deep learning framework that combines Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Generative Adversarial Networks (GANs) to enhance speech quality and improve sound classification accuracy in noisy security environments. The proposed model is trained and validated using real-world datasets containing diverse noise distortions, including VoxCeleb for benchmarking speech enhancement and UrbanSound8K and ESC-50 for sound classification. Performance is evaluated using industry-standard metrics such as Perceptual Evaluation of Speech Quality (PESQ), Short-Time Objective Intelligibility (STOI), and Signal-to-Noise Ratio (SNR). The architecture includes multi-layered neural networks, residual connections, and dropout regularization to ensure robustness and generalizability. Additionally, the paper addresses key challenges in deploying deep learning models for security applications, such as computational complexity, latency, and vulnerability to adversarial attacks. Experimental results demonstrate that the proposed DNN + GAN-based approach significantly improves speech intelligibility and classification performance in high-interference scenarios, offering a scalable solution for enhancing the reliability of audio-based security systems. 
653 |a Mean square errors 
653 |a Accuracy 
653 |a Datasets 
653 |a Deep learning 
653 |a Performance evaluation 
653 |a Classification 
653 |a Multilayers 
653 |a Artificial neural networks 
653 |a Real time 
653 |a Signal processing 
653 |a Generative adversarial networks 
653 |a Speech processing 
653 |a Audio recordings 
653 |a Machine learning 
653 |a Access control 
653 |a Sound 
653 |a Statistical analysis 
653 |a Regularization 
653 |a Artificial intelligence 
653 |a Security systems 
653 |a Fourier transforms 
653 |a Signal to noise ratio 
653 |a Intelligibility 
653 |a Neural networks 
653 |a Decision making 
653 |a Network latency 
653 |a Recurrent neural networks 
653 |a Methods 
653 |a Audio signals 
653 |a Surveillance 
653 |a Kalman filters 
653 |a Speech 
700 1 |a Zhang, Tao  |u Digital Signal Processing Laboratory, Tianjin University, 92 Weijin Road, Nankai District, Tianjin 300072, China; zhangtao@tju.edu.cn (T.Z.); gregory@tju.edu.cn (Y.G.) 
700 1 |a Mahmud, Nahid AI  |u School of Electrical & Information Engineering, Tianjin University, 92 Weijin Road, Nankai District, Tianjin 300072, China; nahidalmahmud@tju.edu.cn 
700 1 |a Geng Yanzhang  |u Digital Signal Processing Laboratory, Tianjin University, 92 Weijin Road, Nankai District, Tianjin 300072, China; zhangtao@tju.edu.cn (T.Z.); gregory@tju.edu.cn (Y.G.) 
773 0 |t Electronics  |g vol. 14, no. 13 (2025), p. 2643-2668 
786 0 |d ProQuest  |t Advanced Technologies & Aerospace Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3229142959/abstract/embedded/6A8EOT78XXH2IG52?source=fedsrch 
856 4 0 |3 Full Text + Graphics  |u https://www.proquest.com/docview/3229142959/fulltextwithgraphics/embedded/6A8EOT78XXH2IG52?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3229142959/fulltextPDF/embedded/6A8EOT78XXH2IG52?source=fedsrch