Efficient Human Violence Recognition for Surveillance in Real Time

Herwin Alayn Huillcen Baca, Flor de Luz Palomino Valdivia, Juan Carlos Gutierrez Caceres

Research output: Contribution to journalArticlepeer-review


Human violence recognition is an area of great interest in the scientific community due to its broad spectrum of applications, especially in video surveillance systems, because detecting violence in real time can prevent criminal acts and save lives. The majority of existing proposals and studies focus on result precision, neglecting efficiency and practical implementations. Thus, in this work, we propose a model that is effective and efficient in recognizing human violence in real time. The proposed model consists of three modules: the Spatial Motion Extractor (SME) module, which extracts regions of interest from a frame; the Short Temporal Extractor (STE) module, which extracts temporal characteristics of rapid movements; and the Global Temporal Extractor (GTE) module, which is responsible for identifying long-lasting temporal features and fine-tuning the model. The proposal was evaluated for its efficiency, effectiveness, and ability to operate in real time. The results obtained on the Hockey, Movies, and RWF-2000 datasets demonstrated that this approach is highly efficient compared to various alternatives. In addition, the VioPeru dataset was created, which contains violent and non-violent videos captured by real video surveillance cameras in Peru, to validate the real-time applicability of the model. When tested on this dataset, the effectiveness of our model was superior to the best existing models.

Original languageEnglish
Article number668
Issue number2
StatePublished - Jan 2024

Bibliographical note

Publisher Copyright:
© 2024 by the authors.


  • global temporal extractor
  • human violence recognition
  • real time
  • short temporal extractor
  • spatial attention
  • spatial motion extractor
  • video surveillance
  • VioPeru


Dive into the research topics of 'Efficient Human Violence Recognition for Surveillance in Real Time'. Together they form a unique fingerprint.

Cite this