Intelligent video surveillance for the security of mass gathering events

Intelligent video surveillance for the security of mass gathering events

By Rita Delussu and Giorgio Fumera, PRA Lab, University of Cagliari.

Video surveillance systems are nowadays widespread for security-related purposes in several contexts, both outdoor (e.g., monitoring public areas) and indoor (e.g., airports halls, banks, etc.).

Due to the huge amount of video data acquired by camera networks, automatic tools based on computer vision algorithms are becoming necessary to support human operators in monitoring and analyzing such data. This is the subject of a multidisciplinary research area named intelligent video surveillance, which involves hardware and software aspects, such as sensors, networks, interfaces, as well as signal processing, pattern recognition and machine learning algorithms, to enable advanced computer vision capabilities.

Research efforts over the past decades have lead to many computer vision methods to implement functionality of interest in intelligent video surveillance, like object and person detection and tracking, detection of events of interest (like anomalous behaviors, e.g., in road traffic or in a mass of people), re-identification of objects (e.g., cars) or individuals across non-overlapping cameras, etc.

However, the recognition capability of computer vision algorithms in real-world application scenarios (like re-identifying individuals of interest in videos acquired by different cameras) has not yet reached a human level, except for very specific tasks with constrained settings. On the other hand, machines can process a huge amount of data at a much higher speed than humans; moreover, the performance of human operators decreases as the amount of data to be analyzed increases (e.g., when videos from several cameras have to be monitored, or simply when the monitoring activity has to be carried out for a long time), and also depends on factors such as operator’s experience, psycho-physical state and working condition. To leverage complementary capabilities of humans and machines, computer vision solutions can be effectively used as tools to support human operators in carrying out complex monitoring and recognition tasks, rather than in a fully automatic way.

One of the goals of the LETSCROWD project is to develop semi-automatic computer vision tools capable of supporting human operators in analyzing videos acquired by camera networks, either in real-time (e.g., for monitoring a crowd during a mass gathering event) or off-line (e.g., for forensic purposes). In particular, the following functionalities will be considered: person re-identification, which is the problem of recognizing an individual in videos acquired by non-overlapping cameras, using an image as a query; people search for textual description, i.e., retrieving videos showing individuals that match a given description of their appearance; and crowd monitoring, e.g., to estimate its density or detecting anomalous behaviors.

In the next article, the person re-identification functionality will be discussed.