Nephthys project - aspiration in speech production

 
Contents
   About the Egyptian god
   Introduction
   Definition of aspiration
   Vocal-tract acoustics
   Speech decomposition
      - Preliminary results
      - Sound files
      - Spectrographic example
   Speech synthesis

People
   Philip Jackson [p.jackson@surrey.ac.uk]   
   Christine Shadle [chs@ecs.soton.ac.uk]   

Publications
   ePrints listing
   Jackson's listing

News...
 

The pitch-scaled harmonic filter is available online:
see the PSHF website for more details.
 
Also see the Columbo project for more information.
 

Nephthys sketch

cherub

Introduction

Turbulence noise occurs in a large class of speech sounds, in

  • a constant flow mode,
  • accompanied by (periodic) voicing, and
  • in a transient form.
When the source of the turbulence is relatively localized and well-understood, the resulting noise is usually called frication. Otherwise it tends to be called aspiration. By improving our understanding of aspiration and developing a model to describe it, synthetic speech can be made to sound more natural and ways of helping in the diagnosis of pathological speech can be devised.

These pages briefly review the various definitions of aspiration, and describe some preliminary work: a method for computing the transfer functions of the vocal tract, and development of analysis tools for use on various types of aspirated, breathy, and hoarse speech.

Defining aspiration

Aspiration has been variously defined as "big breath" (Dixit RP, 1988. On defining aspiration, Proceedings of the XIIIth International Conference of Linguistics , Tokyo, Japan, pp.606-610), glottal friction, a larger glottal opening with cavity friction, turbulence noise caused by rapid airflow through the glottis, voicing lag or a period of voicelessness after articulatory stricture and as a time when the vocal folds are further apart than they are in regular voiced sound. Fant (1960. Acoustic Theory of Speech Production , eds. Jakobson R & Van Shooneveld CH, Mouton, The Hague, Netherlands) uses aspiration to mean a collection of turbulent noise sources, not located at a fricative constriction, "produced with greater articulatory opening", which tends to excite the entire vocal tract.

For the purposes of this project, it is considered to be turbulence noise, that is not frication. The aim of the project is to discover the mechanisms by which it is generated, and their characteristics. The nature of aspiration will be related to such issues as, the mode of vibration of the vocal folds (for voiced aspirates) and articulatory dynamics (e.g., voice onset time, VOT).

Computing transfer functions of the vocal tract

A computer program for calculating the vocal-tract acoustics, called VOAC, has been developed by Davies POAL, McGowan RS, and Shadle CH (see the project publication list). It uses information about the area function of the vocal tract and the aerodynamic conditions, relaxing many of the traditional assumptions, to calculate frequency response functions along the tract. VOAC has been upgraded to increase its flexibility and functionality with a view to using magnetic resonance images (MRI, see Fig. 1 and Jaleel project) as input to the program, and introducing distributed sound sources in the tract.

Fig. 1. Sagittal MRI during the vowel /i/, with outlines of main anatomical features overlaid: lips (*upper, *lower); *hard palate; *velum; *back of pharynx and larynx; *front of larynx and epiglottis; *tongue; *lower mandible.
MRI /i/

Application of signal processing techniques

With the goal of separating the periodic and noise components of a speech signal, we used a comb filter, the Wiener filter and a pitch-scaled method to obtain some preliminary results. The Pitch-Scaled Harmonic Filter (PSHF) was developed through the course of Jackson's PhD, so the process and results of testing are described in detail in his thesis (Jackson 2000) and more briefly in an IEEE Trans. Speech & Audio Processing paper (abstract). Some examples of our final decomposition and some attempts at synthesis are given below, as referred to by our publications in Proc. Spch. Prod. Sem. (abstract), Proc. ICASSP (abstract), and J. Acoust. Soc. Am. (abstract). See Jackson's publication list for further results.

Examples of decomposed speech (wav-files)

A large JPEG image (475K) contains spectrograms of an example decomposition, with the following caption:

Fig. 2. Wide-band (upper half, 5 ms) and narrow-band (lower half, 43 ms) spectrograms (Hann window, 4 times zero-padded, fixed grey-scale) of [phaza] by PJ, computed (top) from the original signal s(n), (middle) from the periodic estimates of the voiced component v(n), and (bottom) from the aperiodic estimates of the unvoiced component u(n).

Here are the corresponding sound files:

Utterance [phaza]:
Sound File original speech    Sound File periodic part    Sound File aperiodic part

Synthetic speech sounds (wav-files)

The signals below were generated according to the procedures described in our paper (Jackson PJB, and CH Shadle, 2000. Aero-acoustic modelling of voiced and unvoiced fricatives based on MRI data. In Proc. 5th Spch. Prod. Sem., pp. 185-188, Seeon, FRG (abstract).

Plosive burst /p/:
Sound File transient

Vowel /a/:
Sound File voicing

Fricative /z/:
Sound File plain addition    Sound File simple modulation    Sound File delayed modulation

Vowel /i/:
Sound File voicing

Further information

Some further examples are given on the Columbo project pages. Meanwhile, please contact Philip Jackson with any comments or any requests.


[ Projects | ISIS group | Dept. Electronics and Computer Science | University of Southampton ]

© maintained by Philip Jackson, last updated on 7 May 2013.