|
Harmonic decomposition applied to automatic speech recognition
|
The aim of this project is to exploit the pitch-scaled harmonic filter (PSHF) for purposes of Automatic Speech Recognition (ASR). The PSHF was originally developed at the University of Southampton as part of the Nephthys project with Dr Christine Shadle, and was developed through collaboration with the Universitat Politècnica de Catalunya (Barcelona). The current implementation of the PSHF, in C, is now publicly available (see News above).
Here is an example of the decomposition, using one of the Aurora 2.0 files,
produced using our implementation of the PSHF in C,
![]() |
![]() |
![]() | |
Here is another example, with accompanying plots of the signals and spectra
(click on the pictures to enlarge),
![]() |
![]() |
![]() | |
![]() Fig. 1. (a) Waveforms. |
![]() (b) Narrowband spectrograms. |
![]() (c) MFCC-based spectrograms. |
Here is a noise-corrupted example from the Aurora 2.0 files,
showing the performance at +5 dB SNR,
clean speech | input signal | periodic output | aperiodic output | |
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
Here's a final example, taken from the thesis where the PSHF was developed
(Jackson 2000),
![]() |
![]() |
![]() |
|
Fig. 2. Wide-band (upper half, 5 ms) and narrow-band (lower half, 43 ms) spectrograms (Hann window, 4 times zero-padded, fixed grey-scale) of [paza] by PJ, computed (top) from the original signal s(n), (middle) from the periodic estimates of the voiced component v(n), and (bottom) from the aperiodic estimates of the unvoiced component u(n).
On Wednesday 5 Feb 03, David Moreno's Masters thesis "Harmonic decomposition applied to automatic speech recognition" was awarded Matrícula de Honor (distinction) by the viva committee at the Universitat Politècnica de Catalunya (UPC) in Barcelona. The thesis was supervised by Dr Philip Jackson (CVSSP) and forms a key part of the Columbo project, investigating novel feature-extraction techniques for speech recognition. There are also some published conference papers:
PJB Jackson ,DM Moreno ,MJ Russell ,J Hernando (2003 ). "Covariation and weighting of harmonically decomposed streams for ASR ". InProc. Eurospeech 2003 ,2321-2324 ,Geneva . [abstract/preprint | request ]
DM Moreno ,PJB Jackson ,J Hernando ,MJ Russell (2003 ). "Improved ASR in noise using harmonic decomposition ". InProc. Int. Cong. of Phon. Sci. , ICPhS 2003,751-754 ,Barcelona . [abstract/preprint | request ]
DM Moreno ,PJB Jackson (2003 ). "A front end using periodic and aperiodic streams for ASR ". InProc. One-day Meeting for Young Speech Researchers , p.18 A,London, UK . [abstract/download ]
Details of other related articles are available in Philip Jackson's personal publication listing.
If you require any further information, please contact me with your comments, questions or requests.
![]() |
© 2002-10, maintained by Philip Jackson, last updated on 16 Dec 2010. |
![]() |