This thesis is a study of the production of human speech sounds by
acoustic modelling and signal analysis.
It concentrates on sounds that are not produced by voicing (although that may
be present), namely plosives, fricatives and aspiration, which all contain
noise generated by flow turbulence.
It combines the application of advanced speech analysis techniques with
acoustic flow-duct modelling of the vocal tract, and draws on dynamic magnetic
resonance image (dMRI) data of the pharyngeal and oral cavities, to relate the
sounds to physical shapes.
Having superimposed vocal-tract outlines on three sagittal dMRI slices of an
adult male subject, a simple description of the vocal tract suitable for
acoustic modelling was derived through a sequence of transformations.
The vocal-tract acoustics program VOAC, which relaxes many of the assumptions
of conventional plane-wave models, incorporates the effects of net flow
into a one-dimensional model (viz., flow separation, increase of entropy, and
changes to resonances), as well as wall vibration and cylindrical wavefronts.
It was used for synthesis by computing transfer functions from sound sources
specified within the tract to the far field.
Being generated by a variety of aero-acoustic mechanisms, unvoiced sounds are
somewhat varied in nature.
Through analysis that was informed by acoustic modelling, resonance and
anti-resonance frequencies of ensemble-averaged plosive spectra were examined
for the same subject, and their trajectories observed during release.
The anti-resonance frequencies were used to compute the place of occlusion.
In vowels and voiced fricatives, voicing obscures the aspiration and frication
So, a method was devised to separate the voiced and unvoiced parts of a
speech signal, the pitch-scaled harmonic filter (PSHF), which was tested
extensively on synthetic signals.
Based on a harmonic model of voicing, it outputs harmonic and
anharmonic signals appropriate for subsequent analysis as time series or
as power spectra.
By applying the PSHF to sustained voiced fricatives, we found that, not
only does voicing modulate the production of frication noise, but that the
timing of pulsation cannot be explained by acoustic propagation alone.
In addition to classical investigation of voiceless speech sounds,
VOAC and the PSHF demonstrated their practical value in helping further to
characterise plosion, frication and aspiration noise.
For the future, we discuss developing VOAC within an articulatory
synthesiser, investigating the observed flow-acoustic mechanism in a
dynamic physical model of voiced frication, and applying the PSHF more
widely in the field of speech research.