Computer vision tools to support LEA operators in monitoring mass gathering events

By Giorgio Fumera  PRA Lab, University of Cagliari.

In a previous article, we provided an overview on the application of computer vision research to support Law Enforcement Agency (LEA) operators.

In this article we’ll better detail the intelligent video surveillance (aka video analytics) software tools developed by the PRA Lab, University of Cagliari, within the LETSCROWD project.

 

How does computer vision research contribute to LETSCROWD?

Two kinds of monitoring and investigation tasks carried out by LEAs in the context of mass gathering events have been considered in LETSCROWD:

  • during event execution, estimating the number of people in a given region of the event venue, either by LEA operators in the field, or from real-time videos monitored in a control room or command post;
  • during post-event forensic investigations, scanning a large amount of videos acquired during a mass gathering event by a video surveillance camera network, to search for individuals of interest, such as suspect ones.

Manually performing the above tasks is time-consuming and tedious. Specific software tools, which can be deployed as components of software suites used by LEAs to manage their video surveillance systems, can support LEA operators and forensic investigators by reducing the time required to perform such tasks.

Three specific prototype tools have been developed as part of the LETSCROWD project: appearance-based person re-identification, attribute-based people search, crowd density estimation.

 

Appearance-based person re-identification

Face recognition systems are nowadays widely used by LEAs to search for suspect individuals in videos acquired by CCTV systems. However, in unconstrained settings face may be not visible (e.g., due to person pose or to occlusions), or may cover a too small image region, due to camera distance, to be useful for automatic face recognition. In this case appearance-based person re-identification systems can support LEA operators and forensic investigators by automatically retrieving images of an individual of interest based on clothing appearance to a query image of the same individual manually selected by the user from a video frame. From each retrieved image the user can access the video from which it was extracted, which allows one to analyse the behaviour of the corresponding individual. Even when face is visible, appearance-based person re-identification can be used to complement face recognition systems.

 

Attribute-based people search

Often, in forensic investigations, a description of a suspect individual is provided by eyewitnesses, and investigators have to search for similarly looking individuals in videos acquired by a CCTV system. The description may involve clothing appearance, as well as attributes, such as gender and carried items (e.g., bags or backpacks). Attribute-based people search systems allow the user to input the attribute profile of an individual of interest in terms of a predefined set of attributes, such as the colours of upper and lower body clothing, gender, and carried items, and then automatically retrieve images of individuals exhibiting a similar attribute profile.

 

Crowd density estimation

During mass gathering events, LEA operators almost manually monitor the size of a crowd and estimate the number of people in a given region of the event venue. These tasks are time-consuming as well, especially when many video cameras have to be monitored simultaneously.
Currently, computer vision tools already exist for automatically counting the number of people in a scene, but their effectiveness is limited to non-crowded scene when almost no overlapping between people nor occlusions by objects in the scene are present. Automatically estimating the number of people in crowded scenes with significant overlapping or occlusions, known as crowd density estimation in the computer vision field, is still a very challenging task. In the LETSCROWD project a crowd density estimation tool has been developed to provide a real-time estimate of the number of people in a scene, or in a user-defined region of interest inside a scene. This kind of tool can also raise alerts when the estimated number of people exceeds a user-defined threshold, and when anomalous and potentially dangerous behaviours are automatically detected, such as a sudden increase or decrease of the estimated number of people.