M. Siala1, N. Khlifa1, F. Bremond2, K. Hamrouni1 1. Research Unit in Signal Processing, Image Processing and Pattern Recognition (ENIT, Tunisia)
2. PULSAR Team (INRIA, Sophia Antipolis, Nice)
Abstract—Pedestrian detection in a real scene is an interesting application for video surveillance systems. This paper presents our contribution to improve the work of Viola and Jones, originally designed to detect faces. This work uses a cascade of classifiers based on Adaboost using Haar features. It improves the learning step by including a decision tree presenting the different poses and possible occlusions. The method has been tested on real and complex sequences and has given a good detection despite occlusions and poses variation.
In this paper, we focus on the problem of detecting people in video data, such a system could be used in surveillance systems, driver assistance systems, and image indexing. Detecting people in images is more challenging than detecting many other objects due to several reasons. The main challenges of people detection in video sequences are that there is a large variation in the appearance due to changes in clothing and cameras positions. The most difficulty is that in crowded scene, there is a large amount of occlusion among people which makes the task of people detection very difficult.
The cascade of classifiers based on Adaboost is a good and robust method for characterization and detection, but it presents some limitation in complexes scenes. That’s why we propose to study this method and try to improve its performance to detect people in complex environment.
The paper is organized as follows: section 2 summarizes some important work, the next section presents briefly the cascade of Adaboost method and explains the proposition for improving its training phase. The last section gives some results..
There are several techniques that have been proposed in the literature addressing the problem of people detection. Here, we will only present a few of the more recent ones.
Papageorgiou and al. have successfully employed example-based learning techniques to detect people in complex static scenes without assuming any a priori scene structure or using any motion information. Their system detects the full body of a person. Haar wavelets  are used to represent the images and Support Vector Machine (SVM) is used to classify the patterns .
Regarding person detectors that incorporate motion descriptors, Haar feature , first introduced by Viola and Jones for face detection, have been also used for people detection by Viola et al.  and an extension of these have also been proposed by Lienhart et al. .
Recently, Dalal and Triggs have further developed this idea of histogram of gradient and have achieved excellent recognition rate of human detection in images .
Leibe et al.  and  developed an effective static-image pedestrian detector for crowded scenes by coding local image patches against a learned codebook and combining the resulting bottom up labels with top-down refinement.
Mikolajczyk et al.  use position-orientation histograms of binary image edges as image features, combining seven part detectors to build a static-image detector that is robust to occlusions.
In , Yao et al. present a fast method to detect humans from videos captured in surveillance applications. It is based on a cascade of LogitBoost classifiers relying on features mapped from the Riemanian manifold of region covariance matrices computed from input image features.
Indeed, methods cited above do not address occlusion situations. In addition, the cascade of AdaBoost method presents difficulties for the people detection in complex scenes. We will try then to improve its detection rate in real scenes.
The Ameliorated/ Improved Cascade of Adaboost method
Very often in crowded scenes, people are only partially visible to the camera. Hence approaches, that attempt to detect full body, fail in most cases. We have adopted an approach that uses the coarse-to-fine strategy to divide the entire body space into smaller and smaller subspaces to tackle this problem. We learn several body poses of humans using AdaBoost separately, and obtain detectors for each of these body poses. Annotated data of these body poses are fed, during training, separately to AdaBoost algorithm that use Haar features to generate reliable classifiers for the corresponding body poses as in Fig. 1. During the training phase, we typically tune each classifier to obtain a high detection rate even at the cost of a higher false alarm rate. However, the proposed algorithm is able to reliably detect humans and reject false alarms despite the higher false alarm rates of the initial classifiers.
In the next subsection, we will introduce and define the adaboost algorithm, before presenting our improvement proposal.