W

webcam-head-tracker

A library for webcam-based head tracking

A library for webcam-based head tracking

Introduction

This library provides head tracking based on the webcam typically found in laptop computers or on top of desktop monitors.

Head tracking means that you get a position and orientation of the user's head (6DOF tracking).

Under good conditions, you can get new values at the typical webcam frame rate of 30 frames per second.

How do I use this?

  • Create an instance of the WebcamHeadTracker class.
  • Call WebcamHeadTracker::initWebcam() to initialize the webcam.
  • Call WebcamHeadTracker::initPoseEstimator() to initialize the head pose estimator.
  • While WebcamHeadTracker::isReady() returns true:
    • Acquire a new webcam frame with WebcamHeadTracker::getNewFrame().
    • Compute a new head pose with WebcamHeadTracker::computeHeadPose().
    • Get the latest known pose with WebcamHeadTracker::getHeadPosition() and WebcamHeadTracker::getHeadOrientation().

How does it work?

We use mainly OpenCV and dlib functionality:

These ideas were borrowed from various sources, including screenReality, eyeLike, gazr, this OpenCV tutorial, and this paper. We ended up using an approach similar to gazr, but faster, independent of ROS, and with better filtering.

Limitations

  • Both the face detector and the face landmark detector work best for frontal faces. They fail early if you tilt your head too far.
  • The pose estimation before filtering is very noisy, so extensive filtering is required, which leads to swimming artefacts. With a less noisy pose estimation, we could tweak filter parameters to reduce this effect...
  • This is all just approximation, do not expect the resulting values to guarantee reasonable error bounds.
  • The library uses crude guesses for the camera intrinsic parameters and distortion coefficients. This seems to work surprisingly well most of the time. However, you can also properly calibrate your webcam and use the correct values (see comments in webcam-snapshot.cpp)
  • Under bad lighting, the webcam will have trouble delivering reasonably good images, and the detectors/estimators will have trouble with noisy data that is very unlike the data they were trained with.