Nephthys project - aspiration in speech production
Turbulence noise occurs in a large class of speech sounds, in
These pages briefly review the various definitions of aspiration, and describe some preliminary work: a method for computing the transfer functions of the vocal tract, and development of analysis tools for use on various types of aspirated, breathy, and hoarse speech.
Aspiration has been variously defined as "big breath" (Dixit RP, 1988. On defining aspiration, Proceedings of the XIIIth International Conference of Linguistics , Tokyo, Japan, pp.606-610), glottal friction, a larger glottal opening with cavity friction, turbulence noise caused by rapid airflow through the glottis, voicing lag or a period of voicelessness after articulatory stricture and as a time when the vocal folds are further apart than they are in regular voiced sound. Fant (1960. Acoustic Theory of Speech Production , eds. Jakobson R & Van Shooneveld CH, Mouton, The Hague, Netherlands) uses aspiration to mean a collection of turbulent noise sources, not located at a fricative constriction, "produced with greater articulatory opening", which tends to excite the entire vocal tract.
For the purposes of this project, it is considered to be turbulence noise, that is not frication. The aim of the project is to discover the mechanisms by which it is generated, and their characteristics. The nature of aspiration will be related to such issues as, the mode of vibration of the vocal folds (for voiced aspirates) and articulatory dynamics (e.g., voice onset time, VOT).
Computing transfer functions of the vocal tract
A computer program for calculating the vocal-tract acoustics, called VOAC, has been developed by Davies POAL, McGowan RS, and Shadle CH (see the project publication list). It uses information about the area function of the vocal tract and the aerodynamic conditions, relaxing many of the traditional assumptions, to calculate frequency response functions along the tract. VOAC has been upgraded to increase its flexibility and functionality with a view to using magnetic resonance images (MRI, see Fig. 1 and Jaleel project) as input to the program, and introducing distributed sound sources in the tract.
Fig. 1. Sagittal MRI during the vowel /i/, with outlines of
main anatomical features overlaid: lips
*back of pharynx and larynx;
*front of larynx and epiglottis;
Application of signal processing techniques
With the goal of separating the periodic and noise components of a speech signal, we used a comb filter, the Wiener filter and a pitch-scaled method to obtain some preliminary results. The Pitch-Scaled Harmonic Filter (PSHF) was developed through the course of Jackson's PhD, so the process and results of testing are described in detail in his thesis (Jackson 2000) and more briefly in an IEEE Trans. Speech & Audio Processing paper (abstract). Some examples of our final decomposition and some attempts at synthesis are given below, as referred to by our publications in Proc. Spch. Prod. Sem. (abstract), Proc. ICASSP (abstract), and J. Acoust. Soc. Am. (abstract). See Jackson's publication list for further results.
Examples of decomposed speech (wav-files)
A large JPEG image (475K) contains spectrograms of an example decomposition, with the following caption:
Fig. 2. Wide-band (upper half, 5 ms) and narrow-band (lower half, 43 ms) spectrograms (Hann window, 4 times zero-padded, fixed grey-scale) of [phaza] by PJ, computed (top) from the original signal s(n), (middle) from the periodic estimates of the voiced component v(n), and (bottom) from the aperiodic estimates of the unvoiced component u(n).
Here are the corresponding sound files:
Synthetic speech sounds (wav-files)
The signals below were generated according to the procedures described in our
(Jackson PJB, and CH Shadle, 2000.
Aero-acoustic modelling of voiced and unvoiced fricatives
based on MRI data.
In Proc. 5th Spch. Prod. Sem., pp. 185-188, Seeon, FRG