Sign Language Recognition : Generalising to More Complex Corpora (bibtex)
by Helen Cooper
Abstract:
The aim of this thesis is to find new approaches to Sign Language Recognition (SLR) which are suited to working with the limited corpora currently available. Data available for SLR is of limited quality; low resolution and frame rates make the task of recognition even more complex. The content is rarely natural, concentrating on isolated signs and filmed under laboratory conditions. In addition, the amount of accurately labelled data is minimal. To this end, several contributions are made: Tracking the hands is eschewed in favour of detection based techniques more robust to noise; for both signs and for linguistically-motivated sign sub-units are investigated, to make best use of limited data sets. Finally, an algorithm is proposed to learn signs from the inset signers on TV, with the aid of the accompanying subtitles, thus increasing the corpus of data available. Tracking fast moving hands under laboratory conditions is a complex task, move this to real world data and the challenge is even greater. When using tracked data as a base for SLR, the errors in the tracking are compounded at the classification stage. Proposed instead, is a novel sign detection method, which views space-time as a 3D volume and the sign within it as an object to be located. Features are combined into strong classfifiers using a novel boosting implementation designed to create optimal classifiers over sparse datasets. Using boosted volumetric features, on a robust frame differenced input, average classification rates reach 71\% on seen signers and 66\% on a mixture of seen and unseen signers, with individual sign classification rates gaining 95\%. Using a classifier per sign approach to SLR, means that data sets need to contain numerous examples of the signs to be learnt. Instead, this thesis proposes learnt classifiers to detect the common sub-units of sign. The responses of these classifiers can then be combined for recognition at the sign level. This approach requires fewer examples per sign to be learnt, since the sub-unit detectors are trained on data from multiple signs. It is also faster at detection time since there are fewer classifiers to consult, the number of these being limited by the linguistics of sign and not the number of signs being detected. For this method, appearance based boosted classifiers are introduced to distinguish the sub-units of sign. Results show that when combined with temporal models, these novel sub-unit classifiers, can outperform similar classifiers learnt on tracked results. As an added side effect; since the sub-units are linguistically derived they can be used independently to help linguistic annotators. Since sign language data sets are costly to collect and annotate, there are not many publicly available. Those which are, tend to be constrained in content and often taken under laboratory conditions. However, in the UK, the British Broadcasting Corporation (BBC) regularly produces programs with an inset signer and corresponding subtitles. This provides a natural signer, covering a wide range of topics, in real world conditions. While it has no ground truth, it is proposed that the translated subtitles can provide weak labels for learning signs. The final contributions of this thesis, lead to an innovative approach to learn signs from these co-occurring streams of data. Using a unique, temporally constrained, version of the Apriori mining algorithm, similar sections of video are identified as possible sign locations. These estimates are improved upon by introducing the concept of contextual negatives, removing contextually similar noise. Combined with an iterative honing process, to enhance the localisation of the target sign, 23 word/sign combinations are learnt from a 30 minute news broadcast, providing a novel method for automatic data set creation
Reference:
Helen Cooper, "Sign Language Recognition : Generalising to More Complex Corpora", PhD thesis, Centre For Vision Speech and Signal Processing, University Of Surrey, 2010.
Bibtex Entry:
@PHDTHESIS{Cooper_Sign_2010b,
  author = {Helen Cooper},
  title = {Sign Language Recognition : Generalising to More Complex Corpora},
  school = {Centre For Vision Speech and Signal Processing, University Of Surrey},
  year = {2010},
  abstract = {The aim of this thesis is to find new approaches to Sign Language
	Recognition (SLR) which are suited to working with the limited corpora
	currently available. Data available for SLR is of limited quality;
	low resolution and frame rates make the task of recognition even
	more complex. The content is rarely natural, concentrating on isolated
	signs and filmed under laboratory conditions. In addition, the amount
	of accurately labelled data is minimal. To this end, several contributions
	are made: Tracking the hands is eschewed in favour of detection based
	techniques more robust to noise; for both signs and for linguistically-motivated
	sign sub-units are investigated, to make best use of limited data
	sets. Finally, an algorithm is proposed to learn signs from the inset
	signers on TV, with the aid of the accompanying subtitles, thus increasing
	the corpus of data available. Tracking fast moving hands under laboratory
	conditions is a complex task, move this to real world data and the
	challenge is even greater. When using tracked data as a base for
	SLR, the errors in the tracking are compounded at the classification
	stage. Proposed instead, is a novel sign detection method, which
	views space-time as a 3D volume and the sign within it as an object
	to be located. Features are combined into strong classfifiers using
	a novel boosting implementation designed to create optimal classifiers
	over sparse datasets. Using boosted volumetric features, on a robust
	frame differenced input, average classification rates reach 71\%
	on seen signers and 66\% on a mixture of seen and unseen signers,
	with individual sign classification rates gaining 95\%. Using a classifier
	per sign approach to SLR, means that data sets need to contain numerous
	examples of the signs to be learnt. Instead, this thesis proposes
	learnt classifiers to detect the common sub-units of sign. The responses
	of these classifiers can then be combined for recognition at the
	sign level. This approach requires fewer examples per sign to be
	learnt, since the sub-unit detectors are trained on data from multiple
	signs. It is also faster at detection time since there are fewer
	classifiers to consult, the number of these being limited by the
	linguistics of sign and not the number of signs being detected. For
	this method, appearance based boosted classifiers are introduced
	to distinguish the sub-units of sign. Results show that when combined
	with temporal models, these novel sub-unit classifiers, can outperform
	similar classifiers learnt on tracked results. As an added side effect;
	since the sub-units are linguistically derived they can be used independently
	to help linguistic annotators. Since sign language data sets are
	costly to collect and annotate, there are not many publicly available.
	Those which are, tend to be constrained in content and often taken
	under laboratory conditions. However, in the UK, the British Broadcasting
	Corporation (BBC) regularly produces programs with an inset signer
	and corresponding subtitles. This provides a natural signer, covering
	a wide range of topics, in real world conditions. While it has no
	ground truth, it is proposed that the translated subtitles can provide
	weak labels for learning signs. The final contributions of this thesis,
	lead to an innovative approach to learn signs from these co-occurring
	streams of data. Using a unique, temporally constrained, version
	of the Apriori mining algorithm, similar sections of video are identified
	as possible sign locations. These estimates are improved upon by
	introducing the concept of contextual negatives, removing contextually
	similar noise. Combined with an iterative honing process, to enhance
	the localisation of the target sign, 23 word/sign combinations are
	learnt from a 30 minute news broadcast, providing a novel method
	for automatic data set creation},
  url = {http://personal.ee.surrey.ac.uk/Personal/H.Cooper/research/papers/SLR_GeneralisingtoMoreComplexCorpora.pdf}
}
Powered by bibtexbrowser