Nicholas Dowson at the University of Surrey



Mutual Info.







To track a feature through a video sequence as it changes in appearance, in real-time and without a pre-learned model. Appearance changes can occur due to: variations in pose, variations in lighting, occlusions (being covered) and image noise.


This is often achieved by keeping an example of the feature (template). In each new frame, an algorithm is used to search for the region that best matches the template.

This is possible because given a particular similarity measure, a different function value occurs at each different warp parameter. The collection of similarity values for a range of warps describes a function surface as shown here for rotation and y translation.

One could find the min/maximum value by performing a brute force search as shown in the above AVI, but this is too expensive for practical use. Many so called optimisation algorithms exist to locate a max/minimum using the fewest possible function evaluations. We tend to favour the use of the Levenberg-Marquardt algorithm. This algorithm updates the warp parameter using the Hessian and Jacobian at the current warp position, shifting between a steepest descent approach and Newton approach. An example of this is shown here for a translation warp.


There is also the problem of deciding when to update the template. If the template is never updated, tracking will only work for as long as the feature resembles its original appearance. This is seldom the case for long and failure due to mis-representation occurs.

One alternative is to update the template every frame. However, alignment of the template and the feature in the image is never perfect. The sub-pixel errors from every alignment accumulate, and the tracking algorithm drifts off the feature.

Our Approach: Simultaneous Modeling and Tracking (SMAT)

The approach we use is to build up a more sophisticated model of appearance, by storing every exemplar and fitting a multi-component appearance model to the exemplars. This is done on the fly, hence the name Simultaneous Modelling and Tracking (SMAT). Tracking in this way is still in real-time, but more robust than existing methods. For more information read the papers on the first implementation of SMAT (M3I), the second implementation of SMAT, which used structure models. This was extended to use N-tier hierarchical models of structure with multiple warps: so-called N-SMAT.


Tracking amorphous objects despite difficult lighting conditions for a Hand and Person in a lobby.
Comparison of Tracking using Sum of Square Differences and Mutual Information.
Comparison of Tracking using the Strategic update model and Simultaneous Modelling and Tracking (SMAT).
Comparison of tracking with No, Thin and Thick structure models.
Comparing Tracking with and without a structure model for a signing lady (347k) and a rhino (264k).
Comparing Tracking with pre and post-learned a structure model for signing lady (334k) and a rhino (258k).
Comparing various update methods (none, naive, strategic and SMAT) for a newscaster (135k).
Example of tracking the torso of a pedestrian (265k) using SMAT.


The XviD codec is required to play the AVI sequences on this webpage. It is available in the Downloads section.