Measurement of the Vocal Tract using Magnetic Resonance Imaging


The study of the mechanism of speech production is a topic of interest for many disciplines such as linguistics, neurology, psychology, physiology and engineering . In this project, we are interested in measuring the shape of the vocal tract, as a part of the speech production system, which can be described as the area from the larynx to the lips. From an engineering point of view, we can think of the vocal tract as a resonator that filters the sounds produced by different sources: the periodic sounds produced with the vocal folds, and noisy sounds produced at the teeth, lips or elsewhere. The shape of the vocal tract and thus the characteristics of the acoustic filter are determined by the articulators, which consist of the lips, tongue, velum, larynx and jaw . By measuring the shape and movements of these articulators, we can provide an objective evaluation and obtain a better understanding of the elements involved in speech production.
Magnetic Resonance Imaging (MRI) is a common medical imaging technique that was used for measuring the shape of the vocal tract. A significant drawback of using MRI in speech studies is the long scanning time. Previous work was reported where the subject had to sustain each vowel or consonant for the whole period of the scan. Static measurements of the vocal tract might be imperfect because the shape of the vocal tract while the subject is sustaining a vowel could be different than if the a vowel was in a context. In addition, static measurement does not provide information about the transition period or the movements of the articulators when changing from one phoneme to another. A number of researchers have been trying to overcome the timing limitations of using MRI in speech research. Foldvick et al reported a `movie' of volume images of the vocal tract, during production of the diphthong [aI] reconstructed using the built-in processing for cardiology imaging. A different method is being developed by Masaki at ATR, Japan, but that also uses the built-in processing for cardiac imaging. In both of these methods, the subject is asked to repeat an utterance many times and to synchronize these repetitions with an audio signal. This involved the trained subject in synchronization which resulted unnatural speech environment.
We developed a new method for increasing the temporal resolution of MR images needed for dynamic speech studies. This method works by post-processing the collected MR images with total control of the reconstruction procedure. The subject does not need to be phonetically trained and is not involved in the measurement procedure which provided a more natural speech environment. By reconstructing the collected images, we were able to obtain a 25 sequential frames representing the changing shape of the vocal tract during the short utterance /pasi/. These frames provided us with the much needed information about the dynamics of articulatory movement and demonstrated the usage of MRI for dynamic speech studies.

  Click to see these as a movie.

The 25 frames represent 530msec of speech.

Each frame represent 21msec of speech.

Frames 1-8 represent the vowel /a/.

Frames 9-13 represent the fricative /s/.

Frames 14-21 represent the vowel /i/.

In frames 22-25, vocal tract is getting ready for a new /pasi/.

Published in Eurospeech'97 ( Look under publication list )

This is an ongoing Research


  1. T. Baer, J.C. Gore, L.C. Gracco and P.W. Nye, Analysis of Vocal Tract Shape and Dimension Using MRI: Vowels, Journal Accoust. Soc. Am., 1991, vol.90(1), pp.799-828.
  2. T. Baer, J.C. Gore, S. Boyce and P.W. Nye, Application of MRI to the Analysis of Speech Production, Magn. Reson. Imag., 1987, vol.5, pp.1-7.
  3. J. Dang, K. Honda, and H. Suzuki, Morphological and Acoustical Analysis of the Nasal and Paranasal Cavities, J.Accoust.Soc.Am, 1994, vol.96(4), pp.2088-2100.
  4. A.K. Foldvik, U. Kristiansen, J. Kvaerness, A. Torp, and H. Torp, 3-D Ultrasound and Magnetic Resonance Imaging: a new Dimension in Phonetic Research, in Proc. ICPHS, 1995, Stockholm, vol.4, pp.46-49.
  5. A.K. Foldvik, O. Husby, J. Kvaerness, I.C. Nordli and P.A. Rinck, MRI Film of Articulatory Movements, in Proc. ICSLP, 1990, pp.421-422.
  6. A.K. Foldvik, U. Kristiansen and J. Kvaerness, Evolving Three-Dimensional Vocal Tract Model By Means of Magnetic Resonance Imaging, in Euro speech, 1993, Berlin, pp.557-558.
  7. A.R. Greenwood, C.C. Good and P.A. Martin, Measurements of the Vocal Tract Shapes Using Magnetic Resonance Imaging, in IEE Proceedings-I, Dec, 1992, vol.139(6), pp553-560.
  8. K. Honda, Modeling Vocal Tract Organs Based on MRI and EMG Observations and Its Implication on Brain Function, Ann. Bull. RILP, vol.27,pp.37-49,1993.
  9. A.V. Lakshiminarayanan, S. Lee and M.J. McCutcheon, MR imaging of the vocal tract during vowel production, Journal Magn. Reson. Imag., 1991, vol.1, pp.71-76.
  10. S. Masaki, R. Akahane-Yamada, M. Tiede, Y. Shimada, and I. Fujimoto, An MRI-Based Analysis of the English /r/ and /I/ Articulators, ICSLP 96,pp. 1581-1584.
  11. S. Masaki, M. Tiede, K. Honda, Y. Shimada, I. Fujimoto, Y. Nakamura and N. Ninomiya, Synchronized MRI Sampling method for Articulatory Movement Recording( in Japanese ), Proceedings for 1997 Spring Meeting of Acoustical Society of Japan, pp. 325-326, march 17th-19th, 1997.
  12. M. Matsumura and A. Sugiura, Modeling of 3-Dimensional Vocal Tract shapes Obtained By Magnetic Resonance Imaging for Speech Synthesis, ICSLP 90, pp. 425-428.
  13. M. Matsumura, Measurement of Three-Dimensional Shapes of Vocal Tract and Nasal Cavity Using Magnetic Resonance Imaging Technique, ICSLP 92, pp. 779-782.
  14. C.A. Moore, The Correspondence of Vocal Tract Resonance With Volumes Obtained from Magnetic Resonance Images, J. of speech and Hearing Research, oct, 1992, vol.35(5), pp.1009-1023.
  15. S. Narayanan, A. Alwan and K. Haker, An MRI Study of Fricative Consonants, in proc. ICSLP, Yokohama, 1994, pp.627-630.
  16. S. Narayanan, A. Alwan and K. Haker, An Articulatory Study of Fricative Consonants Using Magnetic Resonance Imaging, J.Accoust.Soc.Am, 1995, vol.98(3), pp.1325-1347.
  17. S. Narayanan, A. Alwan and K. Haker, An Articulatory Study of Liquid approximants in American English, ICPHS, Stockholm, vol.3, pp. 576-579, 1995.
  18. S. Narayanan, A. Alwan and K. Haker, Three-Dimensional Tongue Shapes of Sibilant Fricatives, JASA, 1996, vol.96(5), pp.3342A.
  19. S. Narayanan, A. Alwan and K. Haker, Toward Articulatory-Acoustic Models for Liquid Approximants Based on MRI and EPG data. part I. The Laterals, JASA, vol.101(2), pp. 1064-1077, 1997.
  20. S. Narayanan, A. Alwan and K. Haker, Toward Articulatory-Acoustic Models for Liquid Approximants Based on MRI and EPG data. part II. The Rhotics, JASA, vol.101(2), pp. 1078-1089, 1997.
  21. M. Rokkaku, K. Hashimoto, S. Imaizumi, S. Nimi and S. Kirtani, Measurements of the Three-Dimensional Shape of the Vocal Tract Based on the Magnetic Resonance Imaging Technique, Ann. Bull. RILP, 1986, vol.20,pp.47-54.
  22. C. Shadle, M. Tiede, S. Masaki, Y. Shimada and I. Fujimoto, JASA, vol.100(4), pt.2, pp. 2660, 1996.
  23. C. Shadle, M. Tiede, S. Masaki, Y. Shimada and I. Fujimoto, An MRI Study of the Effects of Vowel Context on Fricatives, Proceedings of the Institute of Acoustics, vol.18(9), 1996.
  24. M. Stone, Imaging the Tongue and Vocal Tract, British Journal of Disorders of Communications, Apr, 1991, vol.26(1), pp.11-23.
  25. M. Stone, An MRI and EPG Examination of /r/ and /I/, JASA, vol.100(4), pt.2, 1996.
  26. B. Story, Vocal Tract Area Functions from Magnetic Resonance Imaging, JASA, vol.100(1), pp. 537-554, 1996.
  27. A.M. Sulter, D.G. Miller, R.F. Wolf, H.K. Schutte and E.L. Mooyaart, On the relation Between the Dimensions and Resonance Characteristics of the Vocal Tract: A Study With MRI, Magnetic Resonance Imaging, 1992, vol.10, pp.365-373.
  28. M.K. Tiede, Yehia and E. Vatikiotis-Bateson, A Shape-Based Approach to Vocal Tract Area Function Estimation,in 1st ESCA Tutorial and Research Workshop on Speech Production Modeling - 4th Speech Production Seminar, may, 1996,Autrans, France, pp.41-44.
  29. C. Yang and H. Kasuya, Accurate Measurements of Vocal Tract Shape from Magnetic Resonance Images of Child, Female and Male Subjects, in Proc. ICSLP, 1994, Yokohama, 623-626.
  30. C. Yang and H. Kasuya, Uniform and Non-Uniform Normalization of Vocal Tracts Measured By MRI Across Male, Female and Child Subjects, in IEICE Trans. Info. and Systems, 1995, vol.E780(6), pp.732-737.

For further information please contact: Mohammad Mohammad.

ISIS project list
Image, Speech and Intelligent Systems research group
Dept. Electronics and Computer Science
University of Southampton

Email any comments or suggestions to Philip Jackson who last updated this on 6 November 2002.