Results

Data Set

The data set used is the same 164 sign set as used by Kadir et al ^[5] and therefore a direct comparison can be made between their tracking based system and this detection based approach. The data set consists of 1640 examples (10 of each sign). Signs were chosen randomly rather than picking specic signs which are known to be easy to separate. The viseme classi ers are built using only data from the rst 4 of the 10 repetitions of each sign and the word level classier is then trained on up to 5 examples (including the 4 previously seen) leaving 5 completely unseen examples for testing purposes. Furthermore, only visemes from the rst 91 signs are used in the viseme detector learning.

Classifying using individual visemes

A breakdown of viseme classifiers combined with the second stage classifier is shown in table 1. Unfortunately this data is not available for comparison in the Kadir et al paper [5]. As can be seen, each of the first stage classifiers can achieve around 30% accuracy when combined individually with the second stage classier. This is relatively poor performance as it is not possible to distinguish 164 signs on something as simple as hands moving apart.

Stage I Classifier	Ha	Tab	Sig
Mean	33.2%	31.7 %	29.4%
Minimum	31.6%	30.7%	28.2%
Maximum	35.0%	32.2%	30.5%
Std. Deviation	0.9	0.4	0.6

Table 1 Word level results using individual visemes.

Classifiying using combined visemes

Tests were performed on the 5 unseen examples of each of the 164 signs and the results are shown in table 4 along with the results from Kadir et al^[5]. As can be seen, the detection based method is only 6.6% less accurate than the tracking used in their paper for 5 training examples.

Classifier	This system	Kadir et al^[5]
Mean	72.6%	79.2%
Minimum	68.7%	76.1%
Maximum	74.3%	82.4%
Std. Deviation	1.5	2.1

Table 2 Word level results using combined visemes as compared to Kadir et al's tracked work^[5].

It has been shown that near equivalence with tracking can be achieved using solely detection in sign language recognition. This has also been done over a large lexicon database with few training examples. It demonstrates the power of combining viseme level classifiers to create word level classifiers in order to reduce the complexity as the vocabulary of the system increases.

Results

Data Set

Classifying using individual visemes

Classifiying using combined visemes

Learning Sign From Subtitles

Large Lexicon Detection

Spatio-Temporal Features