Data Set

The data set used is the same 164 sign set as used by Kadir et al [5] and therefore a direct comparison can be made between their tracking based system and this detection based approach. The data set consists of 1640 examples (10 of each sign). Signs were chosen randomly rather than picking speci c signs which are known to be easy to separate. The viseme classi ers are built using only data from the rst 4 of the 10 repetitions of each sign and the word level classi er is then trained on up to 5 examples (including the 4 previously seen) leaving 5 completely unseen examples for testing purposes. Furthermore, only visemes from the rst 91 signs are used in the viseme detector learning.

Classifying using individual visemes

A breakdown of viseme classifiers combined with the second stage classifier is shown in table 1. Unfortunately this data is not available for comparison in the Kadir et al paper [5]. As can be seen, each of the first stage classifiers can achieve around 30% accuracy when combined individually with the second stage classi er. This is relatively poor performance as it is not possible to distinguish 164 signs on something as simple as hands moving apart.

Stage I Classifier Ha Tab Sig
Mean 33.2% 31.7 % 29.4%
Minimum 31.6% 30.7% 28.2%
Maximum 35.0% 32.2% 30.5%
Std. Deviation 0.9 0.4 0.6

Table 1 Word level results using individual visemes.

Classifiying using combined visemes

Tests were performed on the 5 unseen examples of each of the 164 signs and the results are shown in table 4 along with the results from Kadir et al[5]. As can be seen, the detection based method is only 6.6% less accurate than the tracking used in their paper for 5 training examples.

Classifier This system Kadir et al[5]
Mean 72.6% 79.2%
Minimum 68.7% 76.1%
Maximum 74.3% 82.4%
Std. Deviation 1.5 2.1

Table 2 Word level results using combined visemes as compared to Kadir et al's tracked work[5].

It has been shown that near equivalence with tracking can be achieved using solely detection in sign language recognition. This has also been done over a large lexicon database with few training examples. It demonstrates the power of combining viseme level classifiers to create word level classifiers in order to reduce the complexity as the vocabulary of the system increases.