Spy Robot and Crime Detection System

Terrorists often choose hostage-like activities, taking its full advantage to negotiate with security forces and/or with governments. Lack of proper information about the crisis scene, such as terrorists’ number, position, etc., in such emergency situation is a hindrance for security forces to take appropriate actions and to combat terrorism. A minor mistake by them may claim many innocents’ lives. The main objective of this project is to develop a robotic swarm surveillance system to facilitate security forces accessing real-time crisis scene information while maintaining stealth. The system comprises a swarm of lizard-like robots. The lizard-like structure of the robots ensures stealth while enabling locomotion on the ceilings and walls at the crisis scene. The robots possess onboard computational resources, which along with computationally powerful remote server deployed in the control unit are organized in client-server architecture. The onboard computers of the robots perform relatively lighter computation such as data acquisition, preprocessing, and its secure communication to the server, whereas the server is responsible for heavy computation including crisis site intelligence extraction, robot swarm formation control and motion planning of individual robot.

The key components of the overall system are:

- Automated Lizard-like Robot: For stealth and agile movement
- Sensing: To capture real-time hostage site information
- Intelligence Gathering: To extract intelligence from sensor data
- Communication: Low Powered Secure Communications

To this aim, we integrate intelligence into the system to automatically detect any suspicious activity by analysing the perceptual information as well as audio of the scene. We develop a weapon detection system for resource-constrained devices without compromising detection performance. Subsequently, we utilize a deep-learning based auto-mask encoder to analyze human actions and determine whether a given frame sequence involves a hostage situation. Training deep learning models requires a real-world dataset that encompasses real-world challenges, ensuring the model performance. For this purpose, we introduce two datasets: the IITP-W and IITP Hostage datasets, which address the limitations of existing datasets.

1. IITP-W Dataset

Weapon detection is the need of today. It plays a crucial role in many applications, such as hostage scenes, surveillance of sensitive areas, anti-terrorist operations, etc. To make the weapon detection model more efficient, we introduce a weapon dataset, named IITP-W, that captures the following properties: a) images depicting real-world scenarios, including complex backgrounds, diverse lighting conditions, object occlusions, and varying image resolutions, b) images having large and small weapons, c) absence of images sharing identical information, and d) exclusion of synthetic images. The dataset includes three types of weapons: (1) Short gun (2) long guns and (3) knife. The IITP-W dataset consists of 4292 instances of short guns, 1047 instances of knifes and 5447 instances of long guns with complex backgrounds, varied sizes, different lightening conditions and different resolutions. The short gun category includes images of real guns belonging to 30 different types in different firing statuses. Similarly, long gun category includes images of real gun belonging 61 different types. Figure 2 depicts images from existing and proposed datasets, highlighting the differences. Additionally, Table 1 furnishes details such as data size, image count with plain backgrounds and synthetic images for both existing and proposed datasets. Visit the download section to download the dataset.

Count of images of different resolutions. The x-axis shows a range of image resolutions, and each bar shows thecount of images for a particular range of image resolutions. The graph shows that 60% of the data has an image resolution between 0.01 to 1-megapixel.

The Figure shows the categorization of images based on the number of weapons present in each image.

2. IITP Hostage Dataset

output_video.mp4

Group activity recognition (GAR) in a video is a problem of critical importance, given its broad applications in video analysis, surveillance systems, and the analysis of social behaviour. However, existing GAR datasets do not include crime scenes. While the UCF-Crime dataset is commonly used for crime and anomaly detection in videos, it has several limitations. These include a limited number of videos per class, extremely low-quality footage due to its age, the inclusion of non-human elements in crime scenes, and a lack of appropriate labelling for direct use in existing GAR models. The proposed IITP Hostage dataset is designed to detect hostage scenes based on group activities of hostages and hostage-takers. The dataset includes two categories, hostage and non-hostage, with 923 videos. The proposed dataset features 137 actors, a significant increase compared to existing datasets, which typically include only 20-30 actors. This expanded diversity enhances the dataset's ability to generalize across various real-world scenarios. IITP Hostage was created by staging mock hostage attacks in various scenarios and extracting clips from movie scenes. In contrast, the non-hostage category encompasses a variety of group activities such as walking, talking, and sitting, making the non-hostage group activities more challenging. A sample video is shown on the left-hand side. Please visit the download section to download the dataset.