Mcecs bot People detection, tracking, and following using Kinect

Download 90.93 Kb.
Date conversion02.07.2018
Size90.93 Kb.

People detection, tracking, and following using Kinect.

Final Report, ECE 478


Peter Depeche

  1. Accomplishments.

    a) Using Kinect sensor, v.2 (aka Xbox 360), robot detects people and “follows” closest to the robot person by rotation in place only, aka in "1D" plane, by rotating around z-axis only.

    Demo video has been made.

    b) 99% accomplished: Following a person in 2D plane, along x and y coordinate. Code have been written and tested in simulation, but code could not tested on actual robot. I have no access to the lab: card-key has been deactivated.

  1. KINECT. Advanced Image Processing for Human-Robot interaction: Face recognition, KINECT specific.

    Kinect's RGB (color) camera image and depth (infrared) three-dimensional images are used for people detection and tracking.

    Kinect uses following algorithms:
    Hough transform to find and establish a groundplane.
    Viola-Jones face detection with adoption of Haar wavelets.
    Leg detection.
    AdaBoost (Adaptive Boost) algorithms and cascades.
    Bayes filters.
    Kalman filter.

    a) Hough transform.

    The Hough transform is a feature extraction technique. It finds imperfect instances of objects within a certain class of shapes by a voting procedure. (In our case, “class” is a “plane”). This voting procedure is carried out in a parameter space, from which object candidates are obtained as local maxima in a so-called accumulator space that is explicitly constructed by the algorithm for computing the Hough transform (to z = ax + by + d).

    b) Viola-Jones face detection with adoption of Haar wavelets.

    A Haar-like feature considers adjacent rectangular regions at a specific location in a detection window, sums up the pixel intensities in each region and calculates the difference between these sums. This difference is then used to categorize subsections of an image. For example, let us say we have an image database with human faces. It is a common observation that among all faces the region of the eyes is darker than the region of the cheeks. Therefore a common Haar feature for face detection is a set of two adjacent rectangles that lie above the eye and the cheek region. The position of these rectangles is defined relative to a detection window that acts like a bounding box to the target object (the face in this case).

    In the detection phase of the Viola-Jones object detection framework, a window of the target size is moved over the input image, and for each subsection of the image the Haar-like feature is calculated. This difference is then compared to a learned threshold that separates non-objects from objects. Because such a Haar-like feature is only a weak learner or classifier (its detection quality is slightly better than random guessing) a large number of Haar-like features are necessary to describe an object with sufficient accuracy. In the Viola–Jones object detection framework, the Haar-like features are therefore organized in something called a classifier cascade to form a strong learner or classifier.

    The key advantage of a Haar-like feature over most other features is its calculation speed.

    c) Adaptive Boost (aka "AdaBoost").

    Adaptive Boosting is a machine learning meta-algorithm. Boosting is a general method that can be used for improving the accuracy of a given learning algorithm. It is based on the principle that a highly accurate or ‘strong’ classifier can be produced through the linear combination of many inaccurate or ‘weak’ classifiers. The efficiency of the final classifier is increased further by organizing the weak classifiers into a collection of cascaded layers. Such design consists of a set of layers with an increasing number of weak classifiers, where each layer acts as a non-body-part rejector with increasing complexity. An input image is first passed to the simplest top layer for consideration, and is only moved to the next layer if it is classified as true by the current layer.
    AdaBoost separately trains three different body part detectors using their respective image databases and a ground plane detector.

    d) Bayes filter

    A Bayes filter is an algorithm used for calculating the probabilities of multiple beliefs to allow a robot to infer its position and orientation. Essentially, Bayes filters allow robots to continuously update their most likely position within a coordinate system, based on the most recently acquired sensor data. This is a recursive algorithm. It consists of two parts: prediction and innovation. If the variables are linear and normally distributed the Bayes filter becomes equal to the Kalman filter.
    In a simple example, a robot moving throughout a grid may have several different sensors that provide it with information about its surroundings. The robot may start out with certainty that it is at position (0,0). However, as it moves farther and farther from its original position, the robot has continuously less certainty about its position; using a Bayes filter, a probability can be assigned to the robot's belief about its current position, and that probability can be continuously updated from additional sensor information.
    e) Kalman filter.
    Kalman filter for tracking of a detected person. Kalman filter is an algorithm that uses a series of measurements observed over time, containing noise (random variations) and other inaccuracies, and produces estimates of unknown variables that tend to be more precise than those based on a single measurement alone. In other words, the Kalman filter operates recursively on streams of noisy input data to produce a statistically optimal estimate of the underlying system state.

  2. Technical documentation of the robot.

    1. Components and subsystems of Kinect's people detection and tracking.

      At the time of this writing, the ROS's people detection packages, named "Perception", was not available for required version of ROS, Indigo.

      ROS "operating system". It is an application that is installed on Ubuntu linux.
      Navigation Stack packages, from

      In order to detect and track people by Kinect, the following software is used:

      i) "open_ptrack" package is used for detection and tracking of people.

      ii) Avin2 drivers are used as Kinect drivers.

    2. Technical difficulties and solutions.

      Most of the problems were finding the working software packages. Many of them are not available for the required version of ROS, Indigo. Even though ROS Indigo claims to hav e Kinect drivers for Indigo, they do not work. OpenNI (drivers) been purchased by Apple and is guaranteed to never work again and it does not work anymore, of course, despite being advertised as "working" on "freenect" drivers also do not work.

      One of the solutions was to find good and working drivers such as Avin2 Kinect drivers (link to the software is given below).

Ultimate solution is to use older, stable and reliable packages (software) and avoid the hype, fad, elusive and evanescent trends.

    1. Complete “troubleshooting manual” for the next users.

      i) Online, on permanent and never-disappearing link which is guaranteed to exists for the duration of PSU existence:

      ii) Same manual is also available on private, editable and constantly upgraded by students link at:
Manager of this editable copy of the manual is Richard Armstrong.

In general, the troubleshooting in ROS and in this project is done by running RVIZ graphical development environment (interface) to the robot as described in tutorials.

Also, detailed and extensive troubleshooting procedures and methods are described in the above cited MCECS Bot's Manual.

Another tool used for troubleshooting is rqt_graph which is also part of ROS's graphical development interface. It allows to see the nodes and topics and how they are related to each other. A snapshot of output of rqt_graph is very large graphic and could not be fit here. It is attached as a separate file named rosgraph.png: rosgraph.png

    1. Personal advice to the next students about the project like this.

      i) Work should begin by finding and installing available and working people detection and tracking packages (software), and experiencing the look and feel of recognition system, just playing with it. In the case of this author, about 50% project time was spent on finding, installing and testing the packages. Many times they were either not available or did not work.

      ii) Where should they find information? All information about this project is avaiilable on internet.

      iii) Meetings should be set up just like previous meetings which a re managed by Richard Armstrong. Richard Armstrong's usefullness, help, knowledge and pedagogic value is well above of any teaching assistant. Richard is always ahead of lectures, preparing students for what's coming, familiarizing with lecture topics, and often providing scientific knowledge which is not in the lecture slides. One is very lucky, indeed, to be given a chance to work on Richard's team.

      iv) Useful webpages and books:

      Lecture slides.
      ROS's tutorials and wiki pages.
      v) What should be added to class slides that would help in this project?
      Class already has abundance and wealth of material.

  1. Technical documentation to software of robot.

    The source code is located at:
This link is permanent and is guaranteed to exists as long as PSU exists.

Two files required to follow a closest detected person are detector_array_listener.cpp and kinect_nav_goal_listener.cpp. The source is quoted below. Files should be placed in catkin-ized directories as described in tutorials. Created by catkin_create_package catkin source directories should contain the usual CMakeLists.txt and package.xml files contents of which is quoted below.

============== Beginning of File "detector_markers_array_listener.cpp: =========================

// detector_markers_array_listener.cpp

// Subsribes to /detector/markers_array topic which is published by open_ptrack

// Calculates the position, x and y coords, of closest detected person.

// Publishes "x" and "y" coords of closest detected person.

// Prints to stdout, if desired, person's position coorditates x, y, z,

// and angle from center.

// 2014.12.15

#include "ros/ros.h"

#include "std_msgs/String.h"

#include "geometry_msgs/Twist.h"


#include "geometry_msgs/Pose.h"

#include "geometry_msgs/PoseStamped.h"

#include "visualization_msgs/MarkerArray.h"

#define PRINT_OUTPUT 0 // enables stdout debugging output if set to "1"

class DetectorMarkersArrayListener




void callBack(const visualization_msgs::MarkerArray::ConstPtr& msg);


ros::NodeHandle nh;

ros::Subscriber sub;

ros::Publisher pub;




pub = nh.advertise("/detected_person", 1);

// pub = nh.advertise("/cmd_vel", 10);

sub = nh.subscribe

("/tracker/markers_array", 1,

&DetectorMarkersArrayListener::callBack, this);


void DetectorMarkersArrayListener::callBack(const visualization_msgs::MarkerArray::ConstPtr& msg)


int detections_count;

int index;

double x, y, z;

int minArrayIndex = 0;

double minGlobalVal = 999.999; // init to arbitrary large distance

double minLocalVal = 999.999;

double minXglobalVal, minYglobalVal; // coordinates of closest person

geometry_msgs::Twist msg_out;

geometry_msgs::PoseStamped pose_stamped_msg;

double x_out, y_out; // goal coordinates

const double dist_min_const = 0.5; // meters, dist betw robot & person

const double delta_const = 1.0; // in meters, range for which robot

// will not start approaching again.

double dist_min_coeff = 0.0; // intermediate coefficient

double distance; // current actual distance between

// person and robot

detections_count = msg->markers.size();

if (detections_count > 0) {

for( index = 0; index < detections_count; index++)


// do not dereference position coords inside of ROS_INFO macro,

// else it will seg fault, do NOT do:

// ROS_INFO("x: %lf", msg->markers[0].pose.position.x);

x = msg->markers[index].pose.position.x;

y = msg->markers[index].pose.position.y;

z = msg->markers[index].pose.position.z;

// There can be several people in the Field Of View.

// We could pick closest to the robot.


// Distances are in meters, angles are in radians:

// X: forward, Y: left, Z: up

// Pick a closest person: min(x,y), for each element of array:

if (x < y) minLocalVal = x;

else minLocalVal = y;

if ( index == 0) {

minGlobalVal = minLocalVal;


else {

if (minLocalVal < minGlobalVal) {

minGlobalVal = minLocalVal;

minArrayIndex = index;




ROS_INFO("seq %i: x: % .12f y: % .12f z: % .12f",

index, x, y, z);


minXglobalVal = msg->markers[minArrayIndex].pose.position.x;

minYglobalVal = msg->markers[minArrayIndex].pose.position.y;


ROS_INFO("closest seq: %i x: % .12f y: % .12f\n",

minArrayIndex, minXglobalVal, minYglobalVal);


if (minXglobalVal == 0.0 || minYglobalVal == 0.0)

msg_out.angular.z = 0.0;


msg_out.angular.z = atan(minYglobalVal/minXglobalVal); // yaw


ROS_INFO("angular.z: %f", msg_out.angular.z);
distance = sqrt( powl(minXglobalVal, 2) + powl(minYglobalVal, 2) );

// Check if robot is already within the dist_min:

if ((minXglobalVal == 0.0 && minYglobalVal == 0.0) ||

( distance <= dist_min_const ))


msg_out.linear.x = 0.0;

msg_out.linear.y = 0.0;


else {

dist_min_coeff = 1 - dist_min_const/distance;

msg_out.linear.x = minXglobalVal * dist_min_coeff;

msg_out.linear.y = minYglobalVal * dist_min_coeff;




int main(int argc, char** argv)


ros::init(argc, argv, "DetectorMarkersArrayListener");

DetectorMarkersArrayListener dmal;


ROS_INFO("Spinning node detector_markers_array_listener");


return 0;


======================== end of file "detector_markers_array_listener.cpp ================

The code of "detector_markers_array_listener" simply calculates the target x and y coordinates for robot to use, based on detected person's coordinates, as follows:

================= Beginning of file "kinect_nav_goal_listener.cpp: ============================

// kinect_nav_goal_listener.cpp

// 2014.12.16

// Subscribes to "/detected_person" topic published

// by "detector_markers_array_listener" node

// Transforms closest detected person's coordinates to "map" frame from

// "lazer" frame. For now, "lazer" frame is assumed to have same transform

// as non-existing yet "kinect" frame. Lazer and kinect are positioned at

// approximately the same x and y coordinates relative to robot's

// base_link frame.

// Sends transformed coordinates to ActionServer of ROS's Navigation Stack.

// Navigation Stack is required to run this file.





#include "geometry_msgs/Twist.h"

#include "geometry_msgs/Pose.h"


#include "tf/transform_listener.h"


#define PRINT_OUTPUT 1

#define RUN_WITH_HARDWARE_ATTACHED 1 // set to "0" for debugging

class KinectNavGoalListener




void detectionCallBack(const geometry_msgs::Twist::ConstPtr& msg);

void initialPoseCallback(const

geometry_msgs::PoseWithCovarianceStamped::ConstPtr& msg);

void amclPoseCallback(const

geometry_msgs::PoseWithCovarianceStamped::ConstPtr& msg);

void poseStampedCallBack(const

geometry_msgs::PoseWithCovarianceStamped::ConstPtr& msg);

void runActionClient(void);

void runConcurrencyTest(void); // test


ros::NodeHandle nh;

ros::Subscriber sub_target, sub_initialpose, sub_amcl_pose;

ros∷Subscriber sub_pose_stamped;

ros::Publisher pub_move_base_simple;

geometry_msgs::Twist saved_msg;

volatile double saved_lin_x, saved_lin_y, saved_ang_z;

actionlib::SimpleActionClient *ac;

move_base_msgs::MoveBaseGoal goal;

geometry_msgs::Quaternion q_msg_in, q_msg_out; // temp vars

tf::Quaternion current_pose_q, delta_z_q;

geometry_msgs::PoseWithCovarianceStamped robot_pose_estimate;

geometry_msgs::PoseStamped msg_to_publish;

bool initialPosePublished;

geometry_msgs::PointStamped laser_point, map_point;

tf::TransformListener tfListener;

tf::StampedTransform tfLazerToMap;

double lazerPositionX, lazerPositionY;

tf::Stamped person_global; // transformed point into

// "global", or "map" frame


// This function is called when the initialpose topic is published

// (e.g. by RVIZ user input):

void KinectNavGoalListener::initialPoseCallback(

const geometry_msgs::PoseWithCovarianceStamped::ConstPtr& msg) {

// TODO: Only checks if "initialpose" is published. We might

// need to check if a given confidence value is reached.

initialPosePublished = true;


// This function is called when a pose estimation is published by AMCL node:

void KinectNavGoalListener::amclPoseCallback(

const geometry_msgs::PoseWithCovarianceStamped::ConstPtr& msg) {

// this works when viewed by 'rostopic echo /amcl_pose':

robot_pose_estimate = geometry_msgs::PoseWithCovarianceStamped(*msg);


KinectNavGoalListener::KinectNavGoalListener() // constructor


initialPosePublished = false;

sub_target = nh.subscribe

("/detected_person", 1,

&KinectNavGoalListener::detectionCallBack, this);

sub_initialpose = nh.subscribe

("initialpose", 1,

&KinectNavGoalListener::initialPoseCallback, this);

sub_amcl_pose = nh.subscribe

("amcl_pose", 1, &KinectNavGoalListener::amclPoseCallback, this);

pub_move_base_simple = nh.advertise

("move_base_simple/goal", 1);

sub_pose_stamped = nh.subscribe

("pose_stamped", 1,

&KinectNavGoalListener::poseStampedCallBack, this);

// MoveBase SimpleActionClient: "true/false: spin a thread by default

// "move_base" is not a topic here, but it is a namespace used by Server

// Server "move_base" uses "/move_base_simple/goal" to send

// messages to base

ac = new actionlib::SimpleActionClient

("move_base", true);

runActionClient(); // first, run without Kinect, manually set target


void KinectNavGoalListener::runActionClient(void){

while( ros::ok()) {



ROS_INFO("checking for ActionServer.");

// ActionServer will not come up if hardware is not detected

// (not attached):

while(!ac->waitForServer(ros::Duration(3.0) ) && nh.ok() ){


"Waiting for the move_base action server to come up ...");


ROS_INFO("\"move_base\" action server is running.");


try {

tfListener.lookupTransform("map", "lazer", ros::Time(0), tfLazerToMap);

current_pose_q = tfLazerToMap.getRotation();

lazerPositionX = tfLazerToMap.getOrigin().x(); // current robot

// (lazer) position

lazerPositionY = tfLazerToMap.getOrigin().y();


catch (tf::TransformException &ex) {

ROS_ERROR ("%s", ex.what());


// angle of person in current frame ("laser" or "kinect"):

delta_z_q = tf::createQuaternionFromYaw(saved_msg.angular.z);

current_pose_q *= delta_z_q; // rotate quarternion by delta

tf::Stamped person_local( tf::Pose(current_pose_q,

tf::Vector3(saved_msg.linear.x, saved_msg.linear.y, 0.0)), ros::Time(0), "laser" );

tfListener.transformPose("map", person_local, person_global);
// convert "tf::Stamped person_local" to

// "move_base_msgs::MoveBaseGoal goal"

tf::Quaternion person_global_q = person_global.getRotation();

tf::quaternionTFToMsg(person_global_q, q_msg_out);

goal.target_pose.pose.orientation = q_msg_out;

goal.target_pose.pose.position.x = person_global.getOrigin().x();

goal.target_pose.pose.position.y = person_global.getOrigin().y();

goal.target_pose.pose.position.z = 0.0;

goal.target_pose.header.frame_id = "map";

goal.target_pose.header.stamp = ros::Time::now();


bool finished_before_timeout = ac->waitForResult(ros::Duration(30.0));

actionlib::SimpleClientGoalState state = ac->getState();

if (finished_before_timeout) {

ROS_INFO("\t\tFINISHED: %s\n\n", state.toString().c_str());



ROS_INFO("\t\tFAILED: %s\n\n", state.toString().c_str());



void KinectNavGoalListener::poseStampedCallBack(

const geometry_msgs::PoseWithCovarianceStamped::ConstPtr& msg)


// robot_pose_estimate = geometry_msgs::PoseWithCovarianceStamped(*msg);


void KinectNavGoalListener::detectionCallBack(const geometry_msgs::Twist::ConstPtr& msg)


saved_ang_z = saved_msg.angular.z = msg->angular.z;

saved_lin_x = saved_msg.linear.x = msg->linear.x;

saved_lin_y = saved_msg.linear.y = msg->linear.y;


int main(int argc, char** argv)


ros::init(argc, argv, "kinect_nav_goal_listener");

KinectNavGoalListener kngl;


ROS_INFO("spinning node \"kinect_nav_goal_listener\"");

return 0;


============= end of file "kinect_nav_goal_listener.cpp =================================

File CMakeLists.txt and package.xml of "kinect_nav_goal_listener" catkin (ROS) package should contain dependencies:







Packages are started (run) by:

roslaunch tracker detection_and_tracking.launch

rosrun detector_markers_array_listener detector_markers_array_listener

rosrun kinect_nav_goal_listener kinect_nav_goal_listener

  1. Relation of this work to other work in this project.

    Kinect tracking and person following uses existing "jeeves" service set up by Richard Armstrong. Jeeves service uses lazer scanner to scan for obstacles, map the surroundings of the robot, and scan for obstacles. Jeeves service starts the roscore and the Navigation Stacks of ROS.

    The "/detected_person" topic is used by robot's face to move the eyes in the direction of tracked person.

  2. Future plans.

    Kinects's detection algorithms are not perfect. They sometimes erroneously "detect" inanimate objects that resembles people such as office plants, furniture, etc. This "noise" might cause the robot to approach the object and try to interact with it. Further noise filtering is required to eliminate such "noise". This can be done by detection of movement, temperature of a person, and other methods.

  3. Step-by-step guideline such as “INSTRUCTABLES” to the robot .

    Instructions on how to install the people detection and tracking packages are given in the MCECS Bot's Manual link to which is referenced previously. There will be several errors during installation. They are software bugs. The workarounds to these errors are described in Bot's Manual. They are fixed by re-running the parts of the installation scripts.

    The robot is started up when it is connected to battery power as described by the manual referenced above.

    Whe n Kinect's nodes are started up by ROS's "roslaunch" and "rosrun" commands as described above, the robot will detect and follow a person.
    The detection range is between 6 tf (1.8 m ) to 40 ft (12 m). The person's height must fit into the view of Kinect. Kinect's algorithms will first try to establish a groundplane (floor). These details are described in the MCECS Bot's Manual references (links) to which are cited previously.

    When "roslaunch tracking detection_and_tracking.launch" is started, the stdout output will produce error message in the "tracking" launch terminal window:

    “[ERROR] [ros time stamp]: transform exception: “world” passed to lookupTransform argument target_frame does not exist.”

    This is not an error message but a bug. This message means that algorithms are looking for a groundplane or for a person. Once a person is detected, there will be messages such as "track created" and when detection or track is lost, the message "track deleted" will appear.

    This is described in details in the MCECS Bot's Manual referenced previously.

The database is protected by copyright © 2016
send message

    Main page