Philip Jackson    

Abstracts of my publications

The University of Surrey

 

Listing  

Journal papers  
    JASA, 2006  
    CSL, 2005  
    El. Lett., 2002  
    IEEE-SAP, 2001  
    JASA, 2000  

Conferences  
    Interspeech 2006  
    Interspeech 2005  
    AES 2005  
    OMYSR 2005  
    3DPVT 2004  
    FSTS 2004  
    ASA 2004  
    OMYSR 2004  
    Eurospeech 2003  
    ICPhS 2003  
    EC-VIP-MC 2003  
    OMYSR 2003  
    ICSLP 2002  
    ASI 2002  
    ICCBDED 2002  
    CRAC 2001  
    WISP 2001  
    ICASSP 2000  
    SPS5 2000  
    ICPhS 1999  
    ASA-EAA 1999  
    ICVPB 1999  
    ICA-ASA 1998  
    ASME 1996  

Book chapter  
    G of ED, 2005  

Doctoral thesis  
    PhD, 2000  

FTP site  


Academic Journal Papers


Russell, M.J., X. Zheng and Jackson, P.J.B. (2007). Modelling speech signals using formant frequencies as an intermediate representation.
IET Signal Processing, Vol. 1 (1), pp. 43-50.

Abstract:

Multiple-level segmental hidden Markov models (M-SHMMs) in which the relationship between symbolic and acoustic representations of speech is regulated by a formant-based intermediate representation are considered. New TIMIT phone recognition results are presented, confirming that the theoretical upper-bound on performance is achieved provided that either the intermediate representation or the formant-to-acoustic mapping is sufficiently rich. The way in which M-SHMMs exploit formant-based information is also investigated, using singular value decomposition of the formant-to-acoustic mappings and linear discriminant analysis. The analysis shows that if the intermediate layer contains information which is linearly related to the spectral representation, that information is used in preference to explicit formant frequencies, even though the latter are useful for phone discrimination. In summary, although these results confirm the utility of M-SHMMs for automatic speech recognition, they provide empirical evidence of the value of nonlinear formant-to-acoustic mappings.

INSPEC codes: A4370; B6130E; C5260S; A4360; A0210; A0250; B0210; B0240J; C1110; C1140J
 

More: Please contact Philip Jackson if you would like further information.
 


Pincas, J. and Jackson, P.J.B. (2006). Amplitude modulation of turbulence noise by voicing in fricatives.
Journal of the Acoustical Society of America, Vol. 120 (6), pp. 3966-3977.

Abstract:

The two principal sources of sound in speech, voicing and frication, occur simultaneously in voiced fricatives as well as at the vowel-fricative boundary in phonologically voiceless fricatives. Instead of simply overlapping, the two sources interact. This paper is an acoustic study of one such interaction effect: the amplitude modulation of the frication component when voicing is present. Corpora of sustained and fluent-speech English fricatives were recorded and analyzed using a signal-processing technique designed to extract estimates of modulation depth. Results reveal a pattern, consistent across speaking style, speakers and places of articulation, for modulation at f0 to rise at low voicing strengths and subsequently saturate. Voicing strength needed to produce saturation varied 60-66 dB across subjects and experimental conditions. Modulation depths at saturation varied little across speakers but significantly for place of articulation (with [z] showing particularly strong modulation) clustering at approximately 0.4-0.5 (a 40-50% fluctuation above and below unmodulated amplitude); spectral analysis of modulating signals revealed weak but detectable modulation at the second and third harmonics (i.e., 2f0 and 3f0).

PACS numbers: 43.70.Bk, 43.72.Ar
 

Top
 

More: Please contact Philip Jackson if you would like further information.
[ preprint - PDF: raw (1.0 MB) ]
 


Russell, M.J. and Jackson, P.J.B. (2005). A multiple-level linear/linear segmental HMM with a formant-based intermediate layer.
Computer Speech and Language, Vol. 19 (2), pp. 205-225.

Abstract:

A novel multi-level segmental HMM (MSHMM) is presented in which the relationship between symbolic (phonetic) and surface (acoustic) representations of speech is regulated by an intermediate `articulatory' representation. Speech dynamics are characterised as linear trajectories in the articulatory space, which are transformed into the acoustic space using an articulatory-to-acoustic mapping. Recognition is then performed. The results of phonetic classification experiments are presented for monophone and triphone MSHMMs using three formant-based `articulatory' parameterisations and sets of between 1 and 49 linear articulatory-to-acoustic mappings. The NIST Matched Pair Sentence Segment (Word Error) test shows that, for a sufficiently rich combination of articulatory parameterisation and mappings, differences between these results and those obtained with an optimal classifier are not statistically significant. It is also shown that, compared with a conventional HMM, superior performance can be achieved using a MSHMM with 25% fewer parameters.

Top
 

More: Please contact Philip Jackson if you would like further information.
[ preprint - PDF: raw (380 kB) ]
 


Jackson, P.J.B., Lo, B.-H. and Russell, M.J. (2002). Data-driven, non-linear, formant-to-acoustic mapping for ASR.
IEE Electronics Letters, Vol. 38 (13), pp. 667-669.

Abstract:

The underlying dynamics of speech can be captured in an automatic speech recognition system via an articulatory representation, which resides in a domain other than that of the acoustic observations. Thus, given a set of models in this hidden domain, it is essential that a mapping can be obtained to relate the intermediate representation to the acoustic domain. In this paper, two methods for mapping from formants to short-term spectra are compared: multi-layered perceptrons (MLPs) and radial-basis function (RBF) networks. Both are capable of providing non-linear transformations, and were trained using features extracted from the TIMIT database. Various schemes for dividing the frames of speech data according to their phone class were also investigated. Results showed that the RBF networks performed approximately 10 % better than the MLPs, in terms of the rms error, and that a classification based on discrete regions of the articulatory space gave the greatest improvements over a single network.
 

Top
 

More: Please contact Philip Jackson if you would like further information.
[ preprint - PDF: raw (150 kB) ]
 


Jackson, P.J.B. and Shadle, C.H. (2001). Pitch-scaled estimation of simultaneous voiced and turbulence-noise components in speech.
IEEE Transactions on Speech and Audio Processing, Vol. 9 (7), pp. 713-726.

Abstract:

Almost all speech contains simultaneous contributions from more than one acoustic source within the speaker's vocal tract. In this paper we propose a method - the pitch-scaled harmonic filter (PSHF) - which aims to separate the voiced and turbulence-noise components of the speech signal during phonation, based on a maximum likelihood approach. The PSHF outputs periodic and aperiodic components that are estimates of the respective contributions of the different types of acoustic source. It produces four reconstructed time series signals by decomposing the original speech signal, first, according to amplitude, and then according to power of the Fourier coefficients. Thus, one pair of periodic and aperiodic signals is optimized for subsequent time-series analysis, and another pair for spectral analysis. The performance of the PSHF algorithm was tested on synthetic signals, using three forms of disturbance (jitter, shimmer and additive noise), and the results were used to predict the performance on real speech. Processing recorded speech examples elicited latent features from the signals, demonstrating the PSHF's potential for analysis of mixed-source speech.

EDICS number: 1-ANLS

Keywords: Periodic-aperiodic decomposition, speech modification, speech pre-processing.
 

Top
 

More: Please contact Philip Jackson if you would like a copy or further information.
 


Jackson, P.J.B. and Shadle, C.H. (2000). Frication noise modulated by voicing, as revealed by pitch-scaled decomposition.
Journal of the Acoustical Society of America, Vol. 108 (4), pp. 1421-1434.

Abstract:

A decomposition algorithm that uses a pitch-scaled harmonic filter was evaluated using synthetic signals and applied to mixed-source speech, spoken by three subjects, to separate the voiced and unvoiced parts. Pulsing of the noise component was observed in voiced frication, which was analyzed by complex demodulation of the signal envelope. The timing of the pulsation, represented by the phase of the anharmonic modulation coefficient, showed a step change during a vowel-fricative transition corresponding to the change in location of the sound source within the vocal tract. Analysis of fricatives  /[phonetic beta], v, [edh], z, [yog], [vee with swirl], [backward glottal stop]/  demonstrated a relationship between steady-state phase and place, and f0 glides confirmed that the main cause was a place-dependent delay.

PACS numbers: 43.70.Bk, 43.72.Ar
 

Top
 

More: Please contact Philip Jackson if you would like a copy or further information.
 


Refereed Conference Proceedings


Every, M. and Jackson, P.J.B. (2006). Enhancement of harmonic content of speech based on a dynamic programming pitch tracking algorithm.
In Proceedings of Interspeech 2006, 4pp., Pittsburgh PA.

Abstract:

For pitch tracking of a single speaker, a common requirement is to find the optimal path through a set of voiced or voiceless pitch estimates over a sequence of time frames. Dynamic programming (DP) algorithms have been applied before to this problem. Here, the pitch candidates are provided by a multi-channel autocorrelation-based estimator, and DP is extended to pitch tracking of multiple concurrent speakers. We use the resulting pitch information to enhance harmonic content in noisy speech and to obtain separations of target from interfering speech.

Index Terms: speech enhancement, dynamic programming
 

Top
 

More: Please contact Philip Jackson if you would like further information.
[ poster - PDF: raw (280 kB) ]
 


Pincas, J. and Jackson, P.J.B. (2005b). Amplitude modulation of frication noise by voicing saturates.
In Proceedings of Interspeech 2005, 4pp., Lisbon.

Abstract:

The two distinct sound sources comprising voiced frication, voicing and frication, interact. One effect is that the periodic source at the glottis modulates the amplitude of the frication source originating in the vocal tract above the constriction. Voicing strength and modulation depth for frication noise were measured for sustained English voiced fricatives using high-pass filtering, spectral analysis in the modulation (envelope) domain, and a variable pitch compensation procedure. Results show a positive relationship between strength of the glottal source and modulation depth at voicing strengths below 66 dB SPL, at which point the modulation index was approximately 0.5 and saturation occurred. The alveolar [z] was found to be more modulated than other fricatives.

Top
 

More: Please contact Philip Jackson if you would like further information.
[ poster - PDF: raw (150 kB) ]
 


Dewhirst, M., Zielinski, S., Jackson, P.J.B. and Rumsey F. (2005). Objective assessment of spatial localisation attributes of surround-sound reproduction systems.
In Proceedings of 118th Convention of the Audio Engineering Society, AES 2005, 16pp., Barcelona, Spain.

Abstract:

A mathematical model for objective assessment of perceived spatial quality was developed for comparison across the listening area of various sound reproduction systems: mono, two-channel stereo (TCS), 3/2 stereo (i.e., 5.0 surround sound), Wave Field Synthesis (WFS) and Higher Order Ambisonics (HOA). Models for mono, TCS and 3/2 stereo are based on conventional microphone techniques and loudspeaker configurations for each system. WFS and HOA models use circular arrays of thirty-two loudspeakers driven by signals derived from a virtual microphone array and the Fourier-Bessel spatial decomposition of the soundfield respectively. Directional localisation, ensemble width and ensemble envelopment of monochromatic tones, extracted from binaural signals, are analysed under a range of test conditions.

Top
 

More: Please contact Philip Jackson if you would like further information.
[ poster - PDF: raw (1.7 MB) ]
 


Pincas, J. and Jackson, P.J.B. (2005a). Amplitude profiles of fricatives described by temporal moments.
In Proceedings of One-day Meeting for Young Speech Researchers, OMYSR 2005, p. 12, London.

Abstract:

As well as the rapid fluctuations in amplitude that make up the `fine structure' of noise, various degrees of slower loudness change, or envelope fluctuation , are present in fricative sounds. In voiced fricatives, noise is generally amplitude modulated by the voicing component, resulting in a periodic pulsing [Pincas and Jackson 2004, Proc. of From Sound to Sense , MIT, 73- 78]. In addition, all fricatives display some build up and decay of noise power from frication onset to offset. This paper focuses on these latter amplitude changes, which we term amplitude profiles.

Frication build-up and decay for an 8-speaker corpus of intervocalic fricatives was investigated by treating their amplitude profiles as statistical distributions whose properties are fully specified by their first four standard moments: mean, standard deviation, skewness and kurtosis (`peakiness'). This is an adaptation of the spectral moments technique previously used to describe the main features of fricative spectra [Jongman et al. 2000, JASA 108(3):1252-1263].

Analysis of these temporal moments shows that the sibilant/non-sibilant split is consistently manifested in the `flatness' of profiles, whereas voicing status has more effect on whether build-up is skewed towards the beginning or the end of the fricative. These acoustic results are examined in light of probable articulatory explanations.

The perceptual significance of amplitude profiles is also discussed. It is known, for example, that the temporal acuity of the auditory system is good enough to distinguish even very fast amplitude fluctuations [Viemeister 1990, JASA 88(3):1367-1373], but it is unclear to what extent differences in profiles could function as a linguistic cue or naturalness enhancer.

Top
 

More: Please contact Philip Jackson if you would like further information.
[ poster - PDF: raw (1.1 MB) ]
 


Ypsilos, I.A., Hilton, A., Turkmani, A. and Jackson, P.J.B. (2004). Speech-driven face synthesis from 3D video.
In IEEE Proceedings of the 2nd International Symposium on 3D Data Processing, Visualization and Transmission (3DPVT'04), pp.  58-65, Thessaloniki, Greece.

Abstract:

This paper presents a framework for speech-driven synthesis of real faces from a corpus of 3D video of a person speaking. Video-rate capture of dynamic 3D face shape and colour appearance provides the basis for a visual speech synthesis model. A displacement map representation combines face shape and colour into a 3D video. This representation is used to efficiently register and integrate shape and colour information captured from multiple views. To allow visual speech synthesis viseme primitives are identified from the corpus using automatic speech recognition. A novel non-rigid alignment algorithm is introduced to estimate dense correspondence between 3D face shape and appearance for different visemes. The registered displacement map representation together with a novel optical flow optimisation using both shape and colour, enables accurate and efficient non-rigid alignment. Face synthesis from speech is performed by concatenation of the corresponding viseme sequence using the non-rigid correspondence to reproduce both 3D face shape and colour appearance. Concatenative synthesis reproduces both viseme timing and co-articulation. Face capture and synthesis has been performed for a database of 51 people. Results demonstrate synthesis of 3D visual speech animation with a quality comparable to the captured video of a person.

Top
 

More: Please contact Philip Jackson if you would like further information.
[ paper - PDF: raw (560 kB) ]
 


Pincas, J. and Jackson, P.J.B. (2004). Acoustic correlates of voicing-frication interaction in fricatives.
In Proceedings of From Sound to Sense, J Slifka, S Manuel and M Matthies (eds.), pp. C73-C78, Cambridge MA.

Abstract:

This paper investigates the acoustic effects of source interaction in fricative speech sounds. A range of parameters has been employed, including a measure designed specifically to describe quantitatively the amplitude modulation of frication noise by voicing, a phenomenon which has mainly been qualitatively reported. The signal processing technique to extract this measure is presented. Results suggest that fricative duration is the main determinant of how much the sources overlap at the VF boundary of voiceless fricatives and that the amount of modulation occurring in voiced fricatives is chiefly dependent on voicing strength. Furthermore, it appears that individual speakers have differing tendencies for amount of source-source overlap and degree of modulation where overlap does occur.

Top
 

More: Please contact Philip Jackson if you would like further information.
[ paper - PDF: raw (410 kB) ]
 


Jackson, P.J.B., Jesus, L.M.T., Shadle, C.H. and Pincas, J. (2004). Measures of voiced frication for automatic classification.
Journal of the Acoustical Society of America, Vol. 115 (5, Pt. 2), p. 2429, New York NY (abstract).

Abstract:

As an approach to understanding the characteristics of the acoustic sources in voiced fricatives, it seems apt to draw on knowledge of vowels and voiceless fricatives, which have been relatively well studied. However, the presence of both phonation and frication in these mixed-source sounds offers the possibility of mutual interaction effects, with variations across place of articulation. This paper examines the acoustic and articulatory consequences of these interactions and to explore automatic techniques for finding parametric and statistical descriptions of these phenomena. A reliable and consistent set of such acoustic cues could be used for phonetic classification or speech recognition. Following work on devoicing of European Portuguese voiced fricatives [Jesus & Shadle, In Mamede, et al. (Eds.), pp. 1-8, Berlin: Springer-Verlag, 2003] and the modulating effect of voicing on frication [Jackson & Shadle, JASA, 108(4): 1421-1434, 2000], the present study focuses on three types of information: (i) sequences and durations of acoustic events in VC transitions, (ii) temporal, spectral and modulation measures from the periodic and aperiodic components of the acoustic signal, and (iii) voicing activity derived from simultaneous EGG data. Analysis of interactions observed in British/American English and European Portuguese speech corpora will be compared, and the principal findings discussed.

Top
 

More: Please contact Philip Jackson if you would like further information.
 


Russell, M.J. and Jackson, P.J.B. (2004). Regularized re-estimation of stochastic duration models.
Journal of the Acoustical Society of America, Vol. 115 (5, Pt. 2), p. 2429, New York NY (abstract).

Abstract:

Recent research has compared the performance of various distributions (uniform, boxcar, exponential, gamma, discrete) for modeling segment (state) durations in hidden semi-Markov models used for phone classification on the TIMIT database. These experiments have shown that a gamma distribution is more appropriate than exponential (which is implicit in first-order Markov models), and achieved a 3% relative reduction in phone-classification errors [Jackson, Proc. ICPhS, 1349-1352, 2003]. The parameters of these duration distributions were estimated once for each model from initial statistics of state occupation (offline), and remained unchanged during subsequent iterations of training. The present work investigates the effect of re-estimating the duration models in training (online) with respect to the phone-classification scores. First, tests were conducted on duration models re-estimated directly from statistics gathered in the previous iteration of training. It was found that the boxcar and gamma models were unstable, meanwhile the performance of the other models also tended to degrade. Secondary tests, using a scheme of annealed regularization, demonstrated that the losses could be recouped and a further 1% improvement was obtained. The results from this pilot study imply that similar gains in recognition accuracy deserve investigation, along with further optimization of the duration model re-estimation procedure.

Top
 

More: Please contact Philip Jackson if you would like further information.
 


Pincas, J. and Jackson, P.J.B. (2004). Quantifying voicing-frication interaction effects in voiced and voiceless fricatives.
In Proceedings of One-day Meeting for Young Speech Researchers, OMYSR 2004, p. 27, London.

Abstract:

Although speech does not, in general, switch cleanly between periodic and aperiodic noise sources, regions of mixed source sound have received little attention: aerodynamic treatments of source production mechanisms show that interaction will result in decreased amplitude of both sources, and limited previous research has suggested some spectral modification of frication sources by voicing. In this paper, we seek to extend current knowledge of voicing-frication interaction by applying a wider range of measures suitable for quantifying interaction effects to a specially recorded corpus of /VFV/ sequences. We present data for one male and one female subject (from a total of 8). Regions of voicing-frication overlap at the onset of voiceless fricatives often show interaction effects. The extent of such overlapping source regions is investigated with durational data. We have created a measure designed to quantify the magnitude of modulation where overlap does occur, in both these areas and in fully voiced fricatives. We employ high-pass filtering and short-time smoothing to produce an envelope which characterises temporal fluctuation of the aperiodic component. Periodicity at or around the fundamental frequency is interpreted as modulation of frication by voicing, and magnitude of amplitude modulation is computed with spectral analysis of the envelope. Further statistical techniques have been employed to describe the profile of aperiodic sound generation over the course of the fricative. In addition to the above, gradients of f0 contours in VF transitions and total duration of frication are analysed. Results are compared across the voiced/voiceless distinction and place of articulation. Source overlap and interaction effects are often ignored in synthesis systems; thus findings from this paper could potentially be used to improve naturalness of synthetic speech. Planned perceptual experiments will extend the work done by establishing how significant interaction effects are to listeners.

Top
 

More: Please contact Philip Jackson if you would like further information.
 


Jackson, P.J.B., Moreno, D.M., Russell, M.J. and Hernando, J. (2003). Covariation and weighting of harmonically decomposed streams for ASR.
In Proceedings of Eurospeech 2003, pp. 2321-2324, Geneva.

Abstract:

Decomposition of speech signals into simultaneous streams of periodic and aperiodic information has been successfully applied to speech analysis, enhancement, modification and recently recognition. This paper examines the effect of different weightings of the two streams in a conventional HMM system in digit recognition tests on the Aurora 2.0 database. Comparison of the results from using matched weights during training showed a small improvement of approximately 10% relative to unmatched ones, under clean test conditions. Principal component analysis of the covariation amongst the periodic and aperiodic features indicated that only 45 (51) of the 78 coefficients were required to account for 99% of the variance, for clean (multi-condition) training, which yielded an 18.4% (10.3%) absolute increase in accuracy with respect to the baseline. These findings provide further evidence of the potential for harmonically-decomposed streams to improve performance and substantially to enhance recognition accuracy in noise.

Session: OWeDc, Speech Modeling & Features 2 (oral).
 

Top
 

More: Please contact Philip Jackson if you would like further information.
[ paper - PDF: raw (128 kB)
| presentation - PPT: raw (870 kB) ]
 


Russell, M.J. and Jackson, P.J.B. (2003). The effect of an intermediate articulatory layer on the performance of a segmental HMM.
In Proceedings of Eurospeech 2003, pp. 2737-2740, Geneva.

Abstract:

We present a novel multi-level HMM in which an intermediate `articulatory' representation is included between the state and surface-acoustic levels. A potential difficulty with such a model is that advantages gained by the introduction of an articulatory layer might be compromised by limitations due to an insufficiently rich articulatory representation, or by compromises made for mathematical or computational expediency. This paper decribes a simple model in which speech dynamics are modelled as linear trajectories in a formant-based `articulatory' layer, and the articulatory-to-acoustic mappings are linear. Phone classification results for TIMIT are presented for monophone and triphone systems with a phone-level syntax. The results demonstrate that provided the intermediate representation is sufficiently rich, or a sufficiently large number of phone-class-dependent articulatory-to-acoustic mapping are employed, classification performance is not compromised.

Session: PThBf, Robust Speech Recognition 3 (poster).
 

Top
 

More: Please contact Philip Jackson if you would like further information.
[ paper - PDF: raw (87 kB) ]
 


Jackson, P.J.B. (2003). Improvements in phone-classification accuracy from modelling duration.
In Proceedings of the 15th International Congress of Phonetic Sciences, ICPhS 2003, pp. 1349-1352, Barcelona.

Abstract:

Durations of real speech segments do not generally exhibit exponential distributions, as modelled implicitly by the state transitions of Markov processes. Several duration models were considered for integration within a segmental-HMM recognizer: uniform, exponential, Poisson, normal, gamma and discrete. The gamma distribution fitted that measured for silence best, by an order of magnitude. Evaluations determined an appropriate weighting for duration against the acoustic models. Tests showed a reduction of 2% absolute (6+% relative) in the phone-classification error rate with gamma and discrete models; exponential ones gave approximately 1% absolute reduction, and uniform no significant improvement. These gains in performance recommend the wider application of explicit duration models.
[http://www.ee.surrey.ac.uk/Personal/P.Jackson/Balthasar/]

Session: T.3.P2, Automatic speech recognition / Auditory mechanisms (poster).
 

Top
 

More: Please contact Philip Jackson if you would like further information.
[ paper - PDF: raw (155 kB)
| poster - PS: gzip (574 kB) - PDF: raw (248 kB) ]
 


Moreno, D.M., Jackson, P.J.B., Hernando, J. and Russell, M.J. (2003). Improved ASR in noise using harmonic decomposition.
In Proceedings of the 15th International Congress of Phonetic Sciences, ICPhS 2003, pp. 751-754, Barcelona.

Abstract:

Application of the pitch-scaled harmonic filter (PSHF) to automatic speech recognition in noise was investigated using the Aurora 2.0 database. The PSHF decomposed the original speech into periodic and aperiodic streams. Digit-recognition tests with the extended features compared the noise robustness of various parameterisations against standard 39 MFCCs. Separately, each stream reduced word accuracy by less than 1% absolute; together, the combined streams gave substantial increases under noisy conditions. Applying PCA to concatenated features proved better than to separate streams, and to static coefficients better than after calculation of deltas. With multi-condition training, accuracy improved by 7.8% at 5dB SNR, thus providing resilience from corruption by noise.
[http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/]

Session: M.4.5, Automatic speech recognition I (oral).
 

Top
 

More: Please contact Philip Jackson if you would like further information.
[ paper - PDF: raw (191 kB)
| presentation - PPS: raw (1.3 MB) ]
 


Russell, M.J., Jackson, P.J.B. and Wong, M.L.P. (2003). Development of articulatory-based multi-level segmental HMMs for phonetic classification in ASR.
In Proceedings of EURASIP Conference on Video/Image Processing and Multimedia Communications, EC-VIP-MC~2003, Vol. 2, pp. 655-660, Zagreb, Croatia.

Abstract:

A simple multiple-level HMM is presented in which speech dynamics are modelled as linear trajectories in an intermediate, formant-based representation and the mapping between the intermediate and acoustic data is achieved using one or more linear transformations. An upper-bound on the performance of such a system is established. Experimental results on the TIMIT corpus demonstrate that, if the dimension of the intermediate space is sufficiently high or the number of articulatory-to-acoustic mappings is sufficiently large, then this upper-bound can be achieved.

Keywords: Automatic speech recognition, Hidden Markov Models, segment models.
 

Top
 

More: Please contact Philip Jackson if you would like further information.
[ paper - PDF: raw (229 kB) ]
 


Moreno, D.M. and Jackson, P.J.B. (2003). A front end using periodic and aperiodic streams for ASR.
In Proceedings of One-day Meeting for Young Speech Researchers, OMYSR 2003, p. 18, London.

Abstract:

Various acoustic mechanisms produce cues in human speech, such as voicing, frication and plosion. Automatic speech recognition (ASR) front ends often treat them alike, although studies demonstrate the dependence of their signal characteristics on the presence or absence of vocal-fold vibration. Typically, Mel-frequency cepstral coefficients (MFCCs) are used to extract features that are not strongly influenced by source characteristics. In contrast, harmonic and noise-like cues were segregated before characterisation, by separating the contribution of voicing from those of other acoustic sources to improve feature extraction for both parts. The pitch-scaled harmonic filter (PSHF) divides an input speech signal into two synchronous streams: periodic and aperiodic, respective estimates of voiced and unvoiced components of the signal at any time. In digit-recognition experiments with the Aurora 2.0 database (clean and noisy conditions, 4kHz bandwidth), features were extracted from each of the decomposed streams, then combined (by concatenation or further manipulation) into an extended feature vector. Thus, the noise robustness of our parameterisation was compared against a conventional one (39 MFCCs, deltas, delta-deltas). Each separate stream reduced recognition accuracy by less than 1% absolute, compared to the baseline on the original speech; combined, they increased accuracy under noisy conditions (by 7.8% under 5dB SNR, after multi-condition training). Voiced regions provided resilience to corruption by noise. However, no significant improvement on 99.0% baseline accuracy was achieved under clean test conditions. Principal component analysis (PCA) of concatenated features tended to perform better than of the separate streams, and PCA of static coefficients better than after calculation of deltas. With PCA of concatenated static MFCCs, plus deltas, the improvement was 5.6%, implying some redundancy between the complementary streams. Future plans to evaluate the PSHF front end for phoneme recognition with higher bandwidth could help to identify the source of these substantial performance benefits.

Top
 

More: Please contact Philip Jackson if you would like further information.
[ poster - PS: raw (1.5 MB), gzip (248 kB) ]
 


Jackson, P.J.B. and Russell, M.J. (2002). Models of speech dynamics in a segmental-HMM recognizer using intermediate linear representations.
In Proceedings of the International Conference on Spoken Language Processing, ICSLP 2002, pp. 1253-1256, Denver CO.

Abstract:

A theoretical and experimental analysis of a simple multi-level segmental HMM is presented in which the relationship between symbolic (phonetic) and surface (acoustic) representations of speech is regulated by an intermediate (articulatory) layer, where speech dynamics are modeled using linear trajectories. Three formant-based parameterizations and measured articulatory positions are considered as intermediate representations, from the TIMIT and MOCHA corpora respectively. The articulatory-to-acoustic mapping was performed by between 1 and 49 linear transformations. Results of phone-classification experiments demonstrate that, by appropriate choice of intermediate parameterization and mappings, it is possible to achieve close to optimal performance.

Session: Acoustic modelling
 

Top
 

More: Please contact Philip Jackson if you would like further information.
[ paper - PDF: raw (110 kB) - PS: raw (480 kB)
| presentation - PPT: raw (520 kB) ]
 


Jackson, P.J.B., Lo, B.-H. and Russell, M.J. (2002). Models of speech dynamics for ASR, using intermediate linear representations.
Presented at NATO Advanced Study Institute on the Dynamics of Speech Production and Perception, Il Ciocco, Italy.

Abstract:

A theoretical and experimental analysis of a simple multi-level segmental HMM is presented in which the relationship between symbolic (phonetic) and surface (acoustic) representations of speech is regulated by an intermediate (articulatory) layer, where speech dynamics are modeled using linear trajectories. Three formant-based parameterizations and measured articulatory positions are considered as intermediate representations, from the TIMIT and MOCHA corpora respectively. The articulatory-to-acoustic mapping was performed by between 1 and 49 linear transformations. Results of phone-classification experiments demonstrate that, by appropriate choice of intermediate parameterization and mappings, it is possible to achieve close to optimal performance.

Top
 

More: Please contact Philip Jackson if you would like further information.
[ poster - PPT: raw (466 kB) ]
 


Jackson, P.J.B. (2002). Mama and papa: the ancestors of modern-day speech science.
In Proceedings of the International Conference and Commemoration of the Bicentenary of the Death of Erasmus Darwin, ICCBDED, p. 14, Lichfield, UK.

Abstract:

Erasmus Darwin's writings on the subject of human speech included discussion of the alphabet as an unsatisfactory phonetic representation of the spoken word, of mechanisms of speech production and, indeed, of a mechanical speaking machine [1,2]. His studies of the acoustic properties of speech were limited, as it was not until many generations later that the physical behaviour of sound waves began to be understood in any detail [3]. Nevertheless, his analysis of sounds on the basis of their manner of production and place of articulation was highly insightful, and is comparable to the classification scheme laid down by the International Phonetic Association. Furthermore, the wooden and leather device he had built was capable of pronouncing the vowel /a/ and labial consonants which, in English, are /p/, /b/ and /m/. These could be combined to create some simple utterances, as in my title. This paper will examine many of the technical aspects of Darwin's investigations into the nature of speech, and relate them to the findings of contemporary research in the field. In particular, it will review the application of articulatory information in approaches to speech synthesis, and show how magnetic resonance images, together with a model of the vocal-tract acoustics, can be used for such purposes. Where appropriate, demonstrations will be given, to illustrate the different aspects of the technology, and connexions will be made between those aspects that Darwin brought to light and what speech science knows of them now.

References:

  1. Darwin, Erasmus (1803), "The Temple of Nature", J. Johnson, London, Add. Note XV:107-120.
  2. King-Hele, Desmond (1981), "The Letters of Erasmus Darwin", Cambridge University Press, Cambridge, UK.
  3. Lord Rayleigh (1877), "The Theory of Sound", 2nd edition, Dover, New York.

Session: Erasmus Darwin and technology
 

Top
 

More: Please contact Philip Jackson if you would like further information.
[ abstract - DOC: raw (21 kB)
| presentation - PPT: raw (1.1 MB) ]
 


Jackson, P.J.B. (2001). Acoustic cues of voiced and voiceless plosives for determining place of articulation.
In Proceedings of Workshop on Consistent and Reliable Acoustic Cues for sound analysis, CRAC 2001, pp. 19-22, Aalborg, Denmark.

Abstract:

Speech signals from stop consonants with trailing vowels were analysed for cues consistent with their place of articulation. They were decomposed into periodic and aperiodic components by the pitch-scaled harmonic filter to improve the quality of the formant tracks, to which exponential trajectories were fitted to get robust formant loci at voice onset. Ensemble-average power spectra of the bursts exhibited dependence on place (and on vowel context for velar consonants), but not on voicing. By extrapolating the trajectories back to the release time, formant estimates were compared with spectral peaks, and connexions were made between these disparate acoustic cues.

Keywords: acoustic cues, plosive, stop consonants.
 

Top
 

More: Please contact Philip Jackson if you would like further information.
[ paper - PDF: raw (320 kB) - PS: raw (560 kB) ]
 


Jackson, P.J.B. and Shadle, C.H. (2001). Uses of the pitch-scaled harmonic filter in speech processing.
In Proceedings of the Institute of Acoustics, Workshop on Innovation in Speech Processing 2001, Vol. 23 (3), pp. 309-321, Stratford-upon-Avon, UK.

Abstract:

The pitch-scaled harmonic filter (PSHF) is a technique for decomposing speech signals into their periodic and aperiodic constituents, during periods of phonation. In this paper, the use of the PSHF for speech analysis and processing tasks is described. The periodic component can be used as an estimate of the part attributable to voicing, and the aperiodic component can act as an estimate of that attributable to turbulence noise, i.e., from fricative, aspiration and plosive sources. Here we present the algorithm for separating the periodic and aperiodic components from the pitch-scaled Fourier transform of a short section of speech, and show how to derive signals suitable for time-series analysis and for spectral analysis. These components can then be processed in a manner appropriate to their source type, for instance, extracting zeros as well as poles from the aperiodic spectral envelope. A summary of tests on synthetic speech-like signals demonstrates the robustness of the PSHF's performance to perturbations from additive noise, jitter and shimmer. Examples are given of speech analysed in various ways: power spectrum, short-time power and short-time harmonics-to-noise ratio, linear prediction and mel-frequency cepstral coefficients. Besides being valuable for speech production and perception studies, the latter two analyses show potential for incorporation into speech coding and speech recognition systems. Further uses of the PSHF are revealing normally-obscured acoustic features, exploring interactions of turbulence-noise sources with voicing, and pre-processing speech to enhance subsequent operations.

Keywords: periodic/aperiodic decomposition, acoustic features.
 

Top
 

More: Please contact Philip Jackson if you would like further information.
[ presentation - PPT: raw (470 kB)
| paper - PDF: raw (1.3 MB) - PS: raw (4.3 MB), gzip (470 kB) ]
 


Jackson, P.J.B. and Shadle, C.H. (2000). Performance of the pitch-scaled harmonic filter and applications in speech analysis.
In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 3, pp. 1311-1314, Istanbul.

Abstract:

The pitch-scaled harmonic filter (PSHF) is a technique for decomposing speech signals into their voiced and unvoiced constituents. In this paper, we evaluate its ability to reconstruct the time series of the two components accurately using a variety of synthetic, speech-like signals, and discuss its performance. These results determine the degree of confidence that can be expected for real speech signals: typically, 5 dB improvement in the signal-to-noise ratio of the harmonic component and approximately 5 dB more than the initial harmonics-to-noise ratio (HNR) in the anharmonic component. A selection of the analysis opportunities that the decomposition offers is demonstrated on speech recordings, including dynamic HNR estimation and separate linear prediction analyses of the two components. These new capabilities provided by the PSHF can facilitate discovering previously hidden features and investigating interactions of unvoiced sources, such as frication, with voicing.

Session: 3.2 Speech analysis

Keywords: harmonics-to-noise ratio, voiced/unvoiced decomposition, frication, aspiration noise.
 

Top
 

More: Please contact Philip Jackson if you would like further information.
[ paper - PDF: raw (580 kB), zip (440 kB), gzip (440 kB) - PS: gzip (170 kB) ]
 


Jackson, P.J.B. and Shadle, C.H. (2000). Aero-acoustic modelling of voiced and unvoiced fricatives based on MRI data.
In Proceedings of the 5th Seminar on Speech Production, pp. 185-188, Seeon, Germany.

Abstract:

We would like to develop a more realistic production model of unvoiced speech sounds, namely fricatives, plosives and aspiration noise. All three involve turbulence noise generation, with place-dependent source characteristics that vary with time (rapidly, in plosives). In this study, we aimed to produce, using an aero-acoustic model of the vocal-tract filter and source, voiced as well as unvoiced fricatives that provide a good match to analyses of speech recordings. The vocal-tract transfer function (VTTF) was computed by the vocal-tract acoustics program, VOAC [Davies, McGowan and Shadle. Vocal Fold Physiology: Frontiers in Basic Science, ed. Titze, Singular Pub., CA, 93-142, 1993], using geometrical data, in the form of cross-sectional area and hydraulic radius functions, along the length of the tract. VOAC incorporates the effects of net flow into the transmission of plane waves through a tubular representation of the tract, and relaxes assumptions of rrigid walls and isentropic propagation. The geometry functions were derived from multiple-slice, dynamic, magnetic resonance images (MRI) [Mohammad. PhD thesis, Dept. ECS, U. Southampton, UK, 1999; Shadle, Mohammad, Carter, and Jackson. Proc. ICPhS, S.F. CA, 1:623-626, 1999], using a method of converting from the pixel outlines that was improved over earlier efforts on vowels. A coloured noise source signal was combined with the VTTF and radiation characteristic to synthesize the unvoiced fricative [s]. For its voiced counterpart [z], many researchers have noted that the noise source appears to be modulated by voicing. Furthermore, the phase of the modulation has been shown to be perceptually significant. Based on our analysis [Jackson and Shadle. Proc. IEEE-ICASSP, Istanbul, 2000.] of recordings by the same subject, the frication source of [z] was varied periodically according to fluctuations in the flow velocity at the constriction exit, and the modulation phase was governed by the convection time for the flow perturbation to travel from the constriction to the obstacle. The synthesized fricatives were compared to the speech recordings in a simple listening test, and comparisons of the predicted and measured time series suggested that the model, which brings together physical, aerodynamic and acoustic information, can replicate characteristics of real speech, such as the modulation in voiced fricatives (please note the change of URL, Nov '02:
http://www.ee.surrey.ac.uk/Personal/P.Jackson/Nephthys/).
 

Top
 

More: Please contact Philip Jackson if you would like further information.
[ paper - PDF: raw (290 kB), zip (180 kB), gzip (180 kB) - PS: gzip (90 kB) ]
 


Shadle, C.H., Mohammad, M., Carter, J.N. and Jackson, P.J.B. (1999). Dynamic Magnetic Resonance Imaging: new tools for speech research.
In Proceedings of the 14th International Congress of Phonetic Sciences, Vol. 1, pp. 623-626, San Francisco, CA.

Abstract:

A multiplanar Dynamic Magnetic Resonance Imaging (MRI) technique that extends our earlier work on single-plane Dynamic MRI is described. Scanned images acquired while an utterasne is repeated are recombined to form pseudo-time-varying images of the vocal tract using a simultaneously recorded audio signal. There is no technical limit on the utterance length or number of slices that can be so imaged, though the number of repetitions required may be limited by the subject's stamina. An example of [pasi] imaged in three sagittal planes is shown; with a Signa GE 0.5T MR scanner, 360 tokens were reconstructed to form a sequence of 39 3-slice 16ms frames. From these, a 3-D volume was generated for each time frame, and tract surfaces outlined manually. Parameters derived from these include: palate-tongue distances for [a,s,i]; estimates of tongue volume and of the area function using only the midsagittal, and then all three slices. These demonstrate the accuracy and usefulness of the technique.
 

Top
 

More: Please contact Philip Jackson if you would like further information.
 


Jackson, P.J.B. and Shadle, C.H. (1999). Modelling vocal-tract acoustics validated by flow experiments.
Journal of the Acoustical Society of America, Vol. 105 (2, Pt. 2), p. 1161, Berlin, Germany (abstract).

Abstract:

Modelling the acoustic response of the vocal tract is a complex task, both from the point of view of acquiring details of its internal geometry and of accounting for the acoustic-flow interactions. A vocal-tract acoustics program (VOAC) has been developed [P. Davies, R. McGowan & C. Shadle, Vocal Fold Phys., ed. I. Titze, San Diego: Singular Pub., 93-142 (1993)], which uses a more realistic, aeroacoustic model of the vocal tract than classic electrical-analogue representations. It accommodates area and hydraulic radius profiles, smooth and abrupt area changes, incorporating end-corrections, side-branches, and net fluid flows, including turbulence losses incurred through jet formation. Originally, VOAC was tested by comparing vowel formant frequencies (i) uttered by subjects, (ii) predicted using classic electrical analogues, and (iii) predicted by VOAC. In this study, VOAC is further validated by comparing the predicted frequency response functions for a range of flow rates with measurements of the radiated sound from a series of mechanical models of unvoiced fricatives [C. Shadle, PhD thesis, MIT-RLE Tech. Rpt. 506 (1985)]. Results show VOAC is more accurate in predicting the complete spectrum at a range of flow rates. Finally, preliminary work is presented with VOAC used to simulate the sound generated at a sequence of stages during the release of a plosive.
 

Top
 

More: Please contact Philip Jackson if you would like further information.
 


Jackson, P.J.B. and Shadle, C.H. (1999). Analysis of mixed-source speech sounds: aspiration, voiced fricatives and breathiness.
In Proceedings of the 2nd International Conference on Voice Physiology and Biomechanics, p. 30, Berlin, Germany (abstract).

Abstract:

Our initial goal was to model the source characteristics of aspiration more accurately. The term is used inconsistently in the literature, but there is general agreement that aspiration is produced by turbulence noise generated in the vicinity of the glottis. Thus, in order to model aspiration, we must refine its concept, and in particular define its relation to other kinds of noise produced near the glottis, such as breathiness and hoarseness. For instance, do similar aeroacoustic processes operate transiently during a plosive release and steadily during a breathy vowel? In unvoiced fricatives, localized sources produce well-defined spectral troughs. We have therefore developed a series of analysis methods that generate spectra for transient and voice-and-noise-excited sounds. These methods include pitch-synchronous decomposition into harmonic and anharmonic components (based on a hoarseness metric of Muta et al., 1988), short-time spectra, ensemble averaging, and short-time harmonics-to-noise ratios (Jackson and Shadle, 1998). These have been applied to a corpus of repeated nonsense words consisting of aspirated stops in three vowel contexts and voiced and unvoiced fricatives, spoken in four voice qualities, thus providing multiple examples of mixed-source and transient-source speech sounds. Ensemble-averaged spectra derived throughout a stop release show evidence of a highly-localized noise source becoming more distributed. Variations by place are also apparent, complementing and extending previous work (Stevens and Blumstein, 1978; Stevens, 1993). The coordination of glottal and supraglottal articulation, described and modelled for aspiration by Scully and Mair (1995), is in a sense reversed for voiced fricatives. Use of the decomposition algorithm on voiced fricatives revealed greater complexity than expected: the anharmonic component appears sometimes to be modulated by the harmonic component, sometimes to be independent of it, and tends to change from one case to the other in the course of the fricative. In sum, we have made some progress in describing not only spectral but time-varying properties of an aspiration model, and in so doing, have improved our descriptions of other mixed-source, time-varying speech sounds.
 

Top
 

More: Please contact Philip Jackson if you would like further information.
 


Jackson, P.J.B. and Shadle, C.H. (1998). Pitch-synchronous decomposition of mixed-source speech signals.
In Proceedings of the International Congress on Acoustics and Metting of the Acoustical Society of America, Vol. 1, pp. 263-264, Seattle, WA.

Abstract:

As part of a study of turbulence-noise sources in speech production, a method has been developed for decomposing an acoustic signal into harmonic (voiced) and anharmonic (unvoiced) components, based on a hoarseness metric (Muta et al., 1988, J. Acoust. Soc. Am. 84, pp.1292-1301). Their pitch-synchronous harmonic filter (PSHF) has been extended (to EPSHF) to yield time histories of both harmonic and anharmonic components. Our corpus includes many examples of turbulence noise, including aspiration, voiced and unvoiced fricatives, and a variety of voice qualities (e.g. breathy, whispered). The EPSHF algorithm plausibly decomposed breathy vowels, but the harmonic component of voiced fricatives still contained significant noise, similar in shape to (though weaker than) the ensemble-averaged anharmonic spectrum. In general the algorithm performed best on sustained sounds. Tracking errors at rapid transitions, and due to jitter and shimmer, were spuriously attributed to the anharmonic component. However, the extracted anharmonic component clearly exhibited modulation in voiced fricatives. While such modulation has been previously reported (and also in hoarse voice), it was verified by tests on synthetic signals, where constant and modulated noise signals were extracted successfully. The results suggest that the EPSHF will continue to enable exploration of the interaction of phonation and turbulence noise.
 

Top
 

More: Please contact Philip Jackson if you would like further information.
 


Jackson, P.J.B. and Ross, C.F. (1996). Application of active noise control to corporate aircraft.
In Proceedings of the American Society of Mechanical Engineers, Vol. DE93, pp. 19-25, Atlanta, GA.

Abstract:

Following the successful introduction of Active Noise Control (ANC) systems as standard production fits on commuter aircraft (Saab2000, Saab340B and Dash8Q series 100, 200 & 300), recent efforts have focused on developing low-cost, low-weight systems for smaller corporate aircraft. This paper describes the approach taken by Ultra to the new technical challenges and the resulting improvements to the design methodology. A review of system performance on corporate (King Air & Twin Commander) turboprop aircraft shows repeatable global Tonal Noise Reductions (TNRs) of >8 dBA throughout the whole cabin, achieving reductions >20 dB in some locations at the blade-pass frequency (BPF), and major comfort benefits throughout the flight envelope with a weight penalty of less than 20 kg.

Top
 

 
More: Please contact Philip Jackson if you would like further information.
[ paper - PDF: raw (420 kB) - DOC: raw (680 kB) ]
 


Book chapter


Jackson, P.J.B. (2005). Mama and papa: the ancestors of modern-day speech science.
Chapter 15 in The Genius of Erasmus Darwin, CUM Smith and RG Arnott (eds.), Aldershot, UK: Ashgate, pp. 217-236, ISBN 0-754-63671-2.

Abstract:

While many talk of the rapid pace of technological advancement in the present age, the lack of progress in the realm of ideas over the past two hundred years is perhaps more remarkable, which is most evident when looking at what had been accomplished so many moons ago, back in the days of the Lunar Society. As an engineering researcher of spoken language systems, my interest in Erasmus Darwin's (ED's) work on speech was first ignited when I moved to Lichfield and, like many others, I was struck by his achievements. ED's writings on the subject of human speech included discussion of the alphabet as an unsatisfactory phonetic representation of the spoken word, of mechanisms of speech production and, indeed, of a mechanical speaking machine (Darwin 1803; King-Hele 1981). His studies of the acoustic properties of speech were limited, as no form of sound reproduction had yet been invented and it was not until many generations later that the physical behaviour of sound waves began to be understood in any detail (Rayleigh 1877). Nevertheless, his analysis of sounds on the basis of their manner of production and place of articulation was highly insightful, and is comparable to the classification scheme laid down by the International Phonetic Association. Furthermore, the wooden and leather device he built was capable of pronouncing the vowel /{\cursa}/ and the English labial consonants /p/, /b/ and /m/, which could be combined to create some simple utterances, as in my title. It is no surprise, therefore, that Darwin's contemporaries were impressed (and sometimes alarmed!) by his inventions too.

Subject category: Erasmus Darwin and technology.
 

Top
 

More: Please contact Philip Jackson if you would like further information.
[ chapter - PDF: raw (1.4 MB), gzip (390 kB) ]
 


PhD thesis


Jackson, P.J.B. (2000). Characterisation of plosive, fricative and aspiration components in speech production.
PhD. Thesis, Department of Electronics and Computer Science, University of Southampton, Southampton, UK.

Abstract:

This thesis is a study of the production of human speech sounds by acoustic modelling and signal analysis. It concentrates on sounds that are not produced by voicing (although that may be present), namely plosives, fricatives and aspiration, which all contain noise generated by flow turbulence. It combines the application of advanced speech analysis techniques with acoustic flow-duct modelling of the vocal tract, and draws on dynamic magnetic resonance image (dMRI) data of the pharyngeal and oral cavities, to relate the sounds to physical shapes.

Having superimposed vocal-tract outlines on three sagittal dMRI slices of an adult male subject, a simple description of the vocal tract suitable for acoustic modelling was derived through a sequence of transformations. The vocal-tract acoustics program VOAC, which relaxes many of the assumptions of conventional plane-wave models, incorporates the effects of net flow into a one-dimensional model (viz., flow separation, increase of entropy, and changes to resonances), as well as wall vibration and cylindrical wavefronts. It was used for synthesis by computing transfer functions from sound sources specified within the tract to the far field.

Being generated by a variety of aero-acoustic mechanisms, unvoiced sounds are somewhat varied in nature. Through analysis that was informed by acoustic modelling, resonance and anti-resonance frequencies of ensemble-averaged plosive spectra were examined for the same subject, and their trajectories observed during release. The anti-resonance frequencies were used to compute the place of occlusion.

In vowels and voiced fricatives, voicing obscures the aspiration and frication components. So, a method was devised to separate the voiced and unvoiced parts of a speech signal, the pitch-scaled harmonic filter (PSHF), which was tested extensively on synthetic signals. Based on a harmonic model of voicing, it outputs harmonic and anharmonic signals appropriate for subsequent analysis as time series or as power spectra. By applying the PSHF to sustained voiced fricatives, we found that, not only does voicing modulate the production of frication noise, but that the timing of pulsation cannot be explained by acoustic propagation alone.

In addition to classical investigation of voiceless speech sounds, VOAC and the PSHF demonstrated their practical value in helping further to characterise plosion, frication and aspiration noise. For the future, we discuss developing VOAC within an articulatory synthesiser, investigating the observed flow-acoustic mechanism in a dynamic physical model of voiced frication, and applying the PSHF more widely in the field of speech research.
 

Top
 

More: Please contact Philip Jackson if you would like further information.
[ table of contents - TXT: raw (15 kB)
| abstract - PDF: raw (32 kB) - PS: gzip (34 kB)
| thesis - PDF: raw (10 MB), zip (5.2 MB), gzip (5.2 MB) - PS: raw (11 MB), gzip (2.2 MB) ]
 


CVSSP [Colleagues | Group | Dept. | Sch. | Univ.]

© 2002-6, maintained by Philip Jackson, last updated on 16 Dec 2006.

EE