Non-verbal Communication

What is NVC?

Examples of NVC

When communicating with other people, both the words and the way words are expressed are used to send and receive information. Non-verbal communication is the all of communication apart from the words that are said. This includes facial expression, gesture, body positioning and other sounds not involved with communicating words. Non-verbal information is necessary for understanding many types of social situations.

Why is it useful in computer vision?

We use communication skills in everyday life with little effort. However, a human communicating with a computer currently requires a very different set of skills and usually a greater degree of effort. The computer interface is tailored around the computers systems rather than being designed to suit the human user. This "human centric" computing needs to be usable using skills humans already possess. Given that non-verbal communication is required to understand other humans, it would be useful for computers to also understand human non-verbal communication. We already see the first forays into the use of this type of interface with software and games using gesture interfaces while avoiding button based designs. As this technology matures, computers will become easier to use.

Tracking of Facial Features

As humans change their expression and move parts of their body, their relative positions change with respect to time. This information is useful for computer vision as it provides a way to model the body and face for further processing and understanding. An automatic estimate of the position of a corresponding visual feature in video is called "tracking". But natural conversation contains a very wide range of expressions, body poses, fast motions and occasions when the view of certain features are blocked by obstructions (occlusions). This makes tracking natural conversations very challenging.

To improve the robustness to head pose changes, we have developed a way to track the face and incrementally learn new appearances of the face as it turns away from the camera. This adaptation to the new face appearance is an example of on-line learning.

Data Collection

Many previous studies have investigated deliberately acted emotions. This has limited application to natural situations as spontaneous emotions have different timings and intensities compared to deliberate examples. Also, the way we express emotion is dependent on the cultural situation and may not be applicable to other social situations. We have addressed this difficulty by recording conversations while attempting to limit the role of the experimenter.

Automatic Recognition

We have created a system that is trained and tested on natural conversation in a single social situation. The social signals detected are human communication acts such as: "I understand what you", "I agree with you", "I am thinking" and "I am asking a question". It is hoped that these signals will be more useful as they are more commonly expressed than most of the classic Ekman 6 prototypical expressions (anger, disgust, fear, happiness, sadness and surprise).

The performance is lower compared to previous studies on deliberate expression, but natural conversation contains a much richer variety of social signals and is more challenging than previous data sets. Further work will investigate more advanced methods for NVC recognition.

Last update: July 2009.

See Also

A. Vinciarelli, M. Pantic and H. Bourlard, 'Social Signal Processing: Survey of an Emerging Domain', in Image and Vision Computing Journal, in press, 2009