Towards the Advancement of Violence Recognition in Security Footage With Explainable Neural Networks

Guardado en:
Detalles Bibliográficos
Publicado en:ProQuest Dissertations and Theses (2025)
Autor principal: Her, Paris
Publicado:
ProQuest Dissertations & Theses
Materias:
Acceso en línea:Citation/Abstract
Full Text - PDF
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!

MARC

LEADER 00000nab a2200000uu 4500
001 3188204236
003 UK-CbPIL
020 |a 9798310352438 
035 |a 3188204236 
045 2 |b d20250101  |b d20251231 
084 |a 66569  |2 nlm 
100 1 |a Her, Paris 
245 1 |a Towards the Advancement of Violence Recognition in Security Footage With Explainable Neural Networks 
260 |b ProQuest Dissertations & Theses  |c 2025 
513 |a Dissertation/Thesis 
520 3 |a This dissertation investigates the problem of violence recognition in surveillance footage using computer vision and machine learning techniques. More specifically, our goal is to achieve interpretable and explainable deep learning models because violence recognition is a sensitive task. We first propose to perform violence recognition using a 3D convolutional neural network through intuitive hyperparameter tuning and transfer learning. We utilize a state-of-the-art 3D model used for general activity recognition that is lightweight and adjustable. Along with that, we introduce a data augmentation technique called "resize-within" which uses interpolation, rather than cropping, to resize the original input video to a new width, height, during model training. Using this as the base model, we continue to provide a means for model explainability using class activation maps. That is, during model training, the proposed approach compares the areas of the salient regions that the model uses to make its prediction with that of the active regions pertaining to "involved" individuals in the violent act. This forces the model to "look"/ focus on the regions related to violence, reducing the ambiguity of where in the frame the model is using to make its decision. To the best of our knowledge, this is the first work to provide bounding box labels for involved individuals and saliency evaluation in a violence recognition dataset.Finally, we introduce a deep learning model with built-in model interpretability through case-based reasoning through prototypical examples. This approach utilizes the input latent space, i.e. the input feature maps, and compares it with some learned prototypical feature maps. The nearest prototype feature maps are then concatenated with the input latent space and are used together to make the model prediction. As the model makes a prediction based on multiple sources of information, it increases model performance, as well as adding model prediction interpretability through examples. Since the prototypes are feature maps, we can also show direct active regions the model is associating with the input and its nearest prototype. This work demonstrates that deep learning models can simultaneously improve performance and have greater interpretability. The described proposed methods are evaluated on publicly available benchmark violence recognition datasets (RWF-2000, SCFD, and ViolentFlows). 
653 |a Computer engineering 
653 |a Electrical engineering 
653 |a Computer science 
653 |a Information technology 
773 0 |t ProQuest Dissertations and Theses  |g (2025) 
786 0 |d ProQuest  |t ProQuest Dissertations & Theses Global 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3188204236/abstract/embedded/L8HZQI7Z43R0LA5T?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3188204236/fulltextPDF/embedded/L8HZQI7Z43R0LA5T?source=fedsrch