Philip Jackson    

Abstracts of my publications

The University of Surrey

 

Listing  

Journal papers  
    IET-SPR, 2007  
    JASA, 2006  
    CSL, 2005  
    El. Lett., 2002  
    T-SAP, 2001  
    JASA, 2000  

Conferences  

Book chapter  

Doctoral thesis  

FTP site  


Academic Journal Papers


Russell, M.J., X. Zheng and Jackson, P.J.B. (2007). Modelling speech signals using formant frequencies as an intermediate representation.
IET Signal Processing, Vol. 1 (1), pp. 43-50.

Abstract:

Multiple-level segmental hidden Markov models (M-SHMMs) in which the relationship between symbolic and acoustic representations of speech is regulated by a formant-based intermediate representation are considered. New TIMIT phone recognition results are presented, confirming that the theoretical upper-bound on performance is achieved provided that either the intermediate representation or the formant-to-acoustic mapping is sufficiently rich. The way in which M-SHMMs exploit formant-based information is also investigated, using singular value decomposition of the formant-to-acoustic mappings and linear discriminant analysis. The analysis shows that if the intermediate layer contains information which is linearly related to the spectral representation, that information is used in preference to explicit formant frequencies, even though the latter are useful for phone discrimination. In summary, although these results confirm the utility of M-SHMMs for automatic speech recognition, they provide empirical evidence of the value of nonlinear formant-to-acoustic mappings.

INSPEC codes: A4370; B6130E; C5260S; A4360; A0210; A0250; B0210; B0240J; C1110; C1140J
 

abstract | pdf  | preprint ]
 


Pincas, J. and Jackson, P.J.B. (2006). Amplitude modulation of turbulence noise by voicing in fricatives.
Journal of the Acoustical Society of America, Vol. 120 (6), pp. 3966-3977.

Abstract:

The two principal sources of sound in speech, voicing and frication, occur simultaneously in voiced fricatives as well as at the vowel-fricative boundary in phonologically voiceless fricatives. Instead of simply overlapping, the two sources interact. This paper is an acoustic study of one such interaction effect: the amplitude modulation of the frication component when voicing is present. Corpora of sustained and fluent-speech English fricatives were recorded and analyzed using a signal-processing technique designed to extract estimates of modulation depth. Results reveal a pattern, consistent across speaking style, speakers and places of articulation, for modulation at f0 to rise at low voicing strengths and subsequently saturate. Voicing strength needed to produce saturation varied 60-66 dB across subjects and experimental conditions. Modulation depths at saturation varied little across speakers but significantly for place of articulation (with [z] showing particularly strong modulation) clustering at approximately 0.4-0.5 (a 40-50% fluctuation above and below unmodulated amplitude); spectral analysis of modulating signals revealed weak but detectable modulation at the second and third harmonics (i.e., 2f0 and 3f0).

PACS numbers: 43.70.Bk, 43.72.Ar
 

Top
 

abstract | pdf ]
 


Russell, M.J. and Jackson, P.J.B. (2005). A multiple-level linear/linear segmental HMM with a formant-based intermediate layer.
Computer Speech and Language, Vol. 19 (2), pp. 205-225.

Abstract:

A novel multi-level segmental HMM (MSHMM) is presented in which the relationship between symbolic (phonetic) and surface (acoustic) representations of speech is regulated by an intermediate `articulatory' representation. Speech dynamics are characterised as linear trajectories in the articulatory space, which are transformed into the acoustic space using an articulatory-to-acoustic mapping. Recognition is then performed. The results of phonetic classification experiments are presented for monophone and triphone MSHMMs using three formant-based `articulatory' parameterisations and sets of between 1 and 49 linear articulatory-to-acoustic mappings. The NIST Matched Pair Sentence Segment (Word Error) test shows that, for a sufficiently rich combination of articulatory parameterisation and mappings, differences between these results and those obtained with an optimal classifier are not statistically significant. It is also shown that, compared with a conventional HMM, superior performance can be achieved using a MSHMM with 25% fewer parameters.

Top
 

abstract | pdf  | preprint ]
 


Jackson, P.J.B., Lo, B.-H. and Russell, M.J. (2002). Data-driven, non-linear, formant-to-acoustic mapping for ASR.
IEE Electronics Letters, Vol. 38 (13), pp. 667-669.

Abstract:

The underlying dynamics of speech can be captured in an automatic speech recognition system via an articulatory representation, which resides in a domain other than that of the acoustic observations. Thus, given a set of models in this hidden domain, it is essential that a mapping can be obtained to relate the intermediate representation to the acoustic domain. In this paper, two methods for mapping from formants to short-term spectra are compared: multi-layered perceptrons (MLPs) and radial-basis function (RBF) networks. Both are capable of providing non-linear transformations, and were trained using features extracted from the TIMIT database. Various schemes for dividing the frames of speech data according to their phone class were also investigated. Results showed that the RBF networks performed approximately 10 % better than the MLPs, in terms of the rms error, and that a classification based on discrete regions of the articulatory space gave the greatest improvements over a single network.
 

Top
 

abstract | preprint ]
 


Jackson, P.J.B. and Shadle, C.H. (2001). Pitch-scaled estimation of simultaneous voiced and turbulence-noise components in speech.
IEEE Transactions on Speech and Audio Processing, Vol. 9 (7), pp. 713-726.

Abstract:

Almost all speech contains simultaneous contributions from more than one acoustic source within the speaker's vocal tract. In this paper we propose a method - the pitch-scaled harmonic filter (PSHF) - which aims to separate the voiced and turbulence-noise components of the speech signal during phonation, based on a maximum likelihood approach. The PSHF outputs periodic and aperiodic components that are estimates of the respective contributions of the different types of acoustic source. It produces four reconstructed time series signals by decomposing the original speech signal, first, according to amplitude, and then according to power of the Fourier coefficients. Thus, one pair of periodic and aperiodic signals is optimized for subsequent time-series analysis, and another pair for spectral analysis. The performance of the PSHF algorithm was tested on synthetic signals, using three forms of disturbance (jitter, shimmer and additive noise), and the results were used to predict the performance on real speech. Processing recorded speech examples elicited latent features from the signals, demonstrating the PSHF's potential for analysis of mixed-source speech.

EDICS number: 1-ANLS

Keywords: Periodic-aperiodic decomposition, speech modification, speech pre-processing.
 

Top
 

abstract | pdf ]
 


Jackson, P.J.B. and Shadle, C.H. (2000). Frication noise modulated by voicing, as revealed by pitch-scaled decomposition.
Journal of the Acoustical Society of America, Vol. 108 (4), pp. 1421-1434.

Abstract:

A decomposition algorithm that uses a pitch-scaled harmonic filter was evaluated using synthetic signals and applied to mixed-source speech, spoken by three subjects, to separate the voiced and unvoiced parts. Pulsing of the noise component was observed in voiced frication, which was analyzed by complex demodulation of the signal envelope. The timing of the pulsation, represented by the phase of the anharmonic modulation coefficient, showed a step change during a vowel-fricative transition corresponding to the change in location of the sound source within the vocal tract. Analysis of fricatives  /[phonetic beta], v, [edh], z, [yog], [vee with swirl], [backward glottal stop]/  demonstrated a relationship between steady-state phase and place, and f0 glides confirmed that the main cause was a place-dependent delay.

PACS numbers: 43.70.Bk, 43.72.Ar
 

Top
 

abstract | pdf ]
 


CVSSP [Colleagues | Group | Dept. | Faculty | Univ.]

© 2002-7, maintained by Philip Jackson, last updated on 24 August 2007.

EE