Describir: Towards the Advancement of Violence Recognition in Security Footage With Explainable Neural Networks