@inproceedings{SingampalliJackson_UKSpeech08,
        AUTHOR  =       "Singampalli, V. D. and Jackson, P. J. B. ",
        TITLE   =       "Towards deriving compact and meaningful articulatory representations: an analysis of feature extraction techniques",
        BOOKTITLE =	"Proc.\ One-day Mtg. for Young Spch.\ Res. (UK Speech'08)",
	ADDRESS =	"Guildford, UK",
        PAGES   =       "29",
        MONTH   =       "July",
        YEAR    =       "2008",
	ABSTRACT = 
"We present an analysis of linear feature extraction techniques to derive a 
compact and meaningful representation of the articulatory data. We used 
14-channel EMA (ElectroMagnetic Articulograph) data from two speakers from 
the MOCHA database [A.A. Wrench. A new resource for production modelling in 
speech technology. In Proc. Inst. of Acoust., Stratford-upon-Avon, UK, 
2001.]. As representations, we considered the registered articulator 
fleshpoint coordinates, transformed PCA (Principal Component Analysis) and 
LDA (Linear Discriminant Analysis) features. Various PCA schemes were 
considered, grouping coordinates according to correlations amongst the 
articulators. For each phone, critical dimensions were identified using the 
algorithm in [Veena D Singampalli and Philip JB Jackson. Statistical 
identification of critical, dependent and redundant articulators. In Proc. 
Interspeech, Antwerp, Belgium, pages 70-73, 2007.]: critical articulators 
with registered coordinates, and critical modes with PCA and LDA. The phone 
distributions in each representation were modelled as univariate Gaussians 
and the average number of critical dimensions was controlled using a 
threshold on the 1-D Kullback Leibler divergence (identification 
divergence). The 14-D KL divergence (evaluation divergence) was employed to 
measure goodness of fit of the models to estimated phone distributions. 
Phone recognition experiments were performed using coordinate, PCA and LDA 
features, for comparison.

We found that, of all representations, the LDA space yielded the best fit 
between the model and phone pdfs. The full PCA representation (including 
all articulatory coordinates) gave the next best fit, closely followed by 
two other PCA representations that allowed for correlations across the 
tongue. At the threshold where average number of critical dimensions 
matched those obtained from IPA, the goodness of fit improved by 34% 
(22%/46% for male/female data) when LDA was used over the best PCA 
representation, and by 72% (77%/66%) over articulatory coordinates. For PCA 
and LDA, the compactness of the representation was investigated by 
discarding the least significant modes. No significant change in the 
recognition performance was found as the dimensionality was reduced from 14 
to 8 (95% confidence t-test), although accuracy deteriorated as further 
modes were discarded. Evaluation divergence also reflected this pattern. 
Experiments on LDA features increased recognition accuracy by 2% on average 
over the best PC representation. An articulatory interpretation of the PCA 
and LDA modes is discussed. Future work focuses on articulatory trajectory 
generation in feature spaces guided by the findings of this study."
}

