This project aims at developing a generic visual recognition system capable of dealing with large numbers of scene and object categories in unconstrained indoor and outdoor environments i.e. traffic scenes (vehicles, pedestrians, human faces, street signs etc.), urban (buildings, rooms) and natural scenes with various rigid and articulated objects as well as textures (landscapes, animals, vegetation). It will address extremely challenging problems in visual recognition, which are simultaneous recognition, localization and segmentation of various objects and scenes independently of viewing conditions, with background clutter and occlusion. The human eye and brain has an outstanding ability to deal with these problems. Unfortunately, existing recognition systems are still far from this level of performance. One of the main limiting factors is the unlimited and unpredictable variability of the appearance of objects even for the same semantic meaning. This implies large amounts of training data, compact image representations and efficient search techniques. Recent results from Pascal VOC Challenge indicate that similar techniques can be applied to scene classification and object category detection, but none of the methods was able to perform those tasks simultaneously. One of the main objectives is therefore to develop new techniques for simultaneous representation of object classes and scenes.
Goal
The main goal of this project is to advance the state of-the-art in visual recognition, to be able to classify large numbers of scenes as well as detect and segment object categories in still images or video frames. The project focuses on a development of:
Novel image representations suitable for simultaneous modeling of scenes and object categories.
New methods for extracting local features robust to viewpoint change and background clutter.
Data structures, clustering and search techniques for efficient recognition.
Generic recognition system capable of dealing with hundreds of scene and object categories.
Approaches
Build a representation of categories with multiple hierarchical models of appearance and structure.
Introduce bottom-up image segmentation methods for extracting robust features.
Investigate tree structures for efficient search in high dimensional spaces.