Adversarial Attacks and Robustness in Deep Neural Networks for Sound Event Detection

Guardat en:
Dades bibliogràfiques
Publicat a:PQDT - Global (2025)
Autor principal: Alexandropoulos, Ilias
Publicat:
ProQuest Dissertations & Theses
Matèries:
Accés en línia:Citation/Abstract
Full Text - PDF
Full text outside of ProQuest
Etiquetes: Afegir etiqueta
Sense etiquetes, Sigues el primer a etiquetar aquest registre!

MARC

LEADER 00000nab a2200000uu 4500
001 3252774381
003 UK-CbPIL
020 |a 9798290661575 
035 |a 3252774381 
045 2 |b d20250101  |b d20251231 
084 |a 189128  |2 nlm 
100 1 |a Alexandropoulos, Ilias 
245 1 |a Adversarial Attacks and Robustness in Deep Neural Networks for Sound Event Detection 
260 |b ProQuest Dissertations & Theses  |c 2025 
513 |a Dissertation/Thesis 
520 3 |a As the use of Sound Event Detection (SED) systems expands into real-world and safety-critical applications, ensuring their robustness against malicious manipulation is becoming increasingly important. This thesis explores the vulnerability of deep learning models employed in Sound Event Detection (SED) to black-box adversarial attacks and examines strategies to enhance their robustness.From the attacker’s perspective, two optimization-based attacks—Particle Swarm Optimization (PSO) and Differential Evolution (DE)—are employed to generate adversarial audio samples. To maintain imperceptibility and control the additive noise, regularization terms are employed and experiments are performed under varying signal-to-noise ratios (SNRs). The attacks were evaluated across a broad spectrum of model architectures, including convolutional neural networks (CNNs) with and without Global Average Pooling, ResNet-based models like AudioCLIP, and transformer-based architectures like PaSST. Fine-tuning was applied to adapt pre-trained models like Audio-CLIP to the specific distributions of UrbanSound8K and ESC-50, allowing consistent evaluation across datasets. Experimental results show that AudioCLIP-finetuned model is highly susceptible to attacks, while transformer-based models like PaSST demonstrate greater robustness.To mitigate the effectiveness of the attacks, a denoising autoencoder is employed and integrated in each model’s head. This technique is also used for the detection of adversarial examples before passing them through the models. To be more specific, by analyzing the divergences and distances between the original and reconstructed inputs, we are able to conclude if a sample is manipulated or not.The results demonstrate that the most effective attacks were achieved using the PSO algorithm, reaching a maximum success rate of 76% on the AudioCLIP-finetuned model at a target SNR of 5 dB. As the SNR constraint increased to 15–20 dB, making perturbations less perceptible to human listeners, the attack success rates dropped, stabilizing around 40–50% for vulnerable models and falling below 20% for more robust ones, confirming the trade-off between adversarial effectiveness and imperceptibility. The evaluation with the Autoencoder-based defense showed a consistent reduction of 5–10% in the attack success rate across all models, without noticeably affecting the models’ original classification accuracy on clean inputs, making it an effective yet simple defensive approach. Additionally, the detection experiment based on prediction consistency before and after autoencoder denoising achieved a perfect precision of 1.0 but a recall of approximately 34%, indicating it can reliably flag adversarial samples when detected, although it misses a portion of attacks, suggesting the need for future improvements to increase sensitivity.These findings highlight the urgent need to enhance the robustness of neural networks, particularly for safety-critical applications where adversarial manipulation could have serious consequences. The integration of a denoising autoencoder proved effective, consistently reducing attack success rates without degrading model performance, with noticeable benefits across both CNN-based models and transformer-based architectures like PaSST. Overall, the results emphasize the crucial role of designing inherently robust model architectures and employing strategic preprocessing techniques to strengthen SED systems against adversarial threats. 
653 |a Neural networks 
653 |a Signal processing 
653 |a Computer science 
653 |a Artificial intelligence 
773 0 |t PQDT - Global  |g (2025) 
786 0 |d ProQuest  |t ProQuest Dissertations & Theses Global 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3252774381/abstract/embedded/6A8EOT78XXH2IG52?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3252774381/fulltextPDF/embedded/6A8EOT78XXH2IG52?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u https://dione.lib.unipi.gr/xmlui/handle/unipi/17807