EEM.ssr: Lab 3  HMM decoding and training
Assessment: 
students should work individually and submit their results in the form of a short technical report as a PDF file via SurreyLearn to Dr Jackson before 4pm on Tuesday 13 May 2014 
Required software: 
Matlab (v.7) is the recommended application for this laboratory 
Additional software: 
HTK (v.3.4.1) for feature extraction, Excel for spreadsheet calculations 
Aims of the Experiment
To gain experience of HMM decoding and training using features extracted from your recorded speech signals by HTK.
To deepen your understanding of the Viterbi and ExpectationMaximization algorithms through Matlab implementation and experimentation.
Recommended reading
 Young et al.: Young, S.J. et al.,
The HTK Book, CUED (v3.4), 2006.
website 
pdf (on campus only)
 Rabiner: Rabiner, L.R.,
A tutorial on HMM and selected applications in speech recognition,
In Proc. IEEE, Vol. 77, No. 2, pp. 257286, Feb. 1989.
1. Preparation
In preparation for the lab, you should:
 Make sure you have the HTKMFCC files that you created in the previous practical.
 Understand how to calculate the probability density for a multivariate Gaussian distribution.
 Familiarise yourself with the BaumWelch equations for reestimating HMM parameters.
 Identify any Matlab functions you may need, such as: max, log and diary.
Here are some useful functions, examples and resources that you can find in the lab3 directory:
 MultiGauss.m, LogMultiGauss.m  Matlab functions for computing multivariate Gaussian proabilities and log probabilities
 LogAddProb.m  Matlab functions for computing the sum of two log probabilities, for use in the forward and backward procedures
 assignment_SURNAME.tex, assignment_refs.bib, assignment_SURNAME.pdf  a LaTeX document template that can be used for writing the assignment report
2. Overview
An overview of the tasks for this assignment is as follows:
 Using the MFCC features previously extracted with HTK from your speech recordings and a prototype HMM from last time, run the Viterbi algorithm to get the best path and its cumulative likelihood.
 Calculate the likelihoods required for BaumWelch training.
 Perform model reestimation and reevaluate the best path.
 Present the results in the form of a brief technical report.
The deliverable to be submitted is a short technical report that should contain the following results:
 Running the Viterbi algorithm:
 the observation likelihoods for each state in your model λ_{three} with your utterance "three"
 the best path X* when using the corresponding model λ_{three}
 its cumulative likelihood Δ* = p(O,X*λ_{three})
 best path and cumulative likelihood using a different model λ_{oh} on the same observations
 Calculating likelihoods for BaumWelch training:
 the forward likelihoods α_{t}(i) and observation likelihood p(Oλ_{three})
 the backward likelihoods β_{t}(i)
 the occupation likelihoods γ_{t}(i)
 the transition likelihoods ξ_{t}(i,j)
 Reestimating the model and evaluating the new best path:
 reestimated model parameters
λ′_{three}
= {A′, B′}
where B′ = {b′_{i}} for i=1..5 and b′_{i} = {μ′_{i}, Σ′_{i}}
 best path and cumulative likelihood using the updated model λ′_{three}
 Presenting the results in a technical report:
 Title page  assignment title "HMM decoding and training report", your name, module code (EEEM034), my name (Dr Philip Jackson), date
 Abstract  an executive summary of the contents of the report
 Glossary  an optional list of symbols and their meanings
 Introduction  a brief description of the assignment
 Method  a statement of the equations used to calculate the results and any important implementation details
 Results  a complete set of the results for each point, typically in tables
 Conclusion  a summary of what you did, statement of the main findings and a short discussion of any points of interest
 References  a list of any relevant texts that are cited in standard IEEE format
 Appendix  you can include any intermediate results or evidence of your working here, along with program listings for any code that you have authored
The deadline for submission of your report as a PDF file via SurreyLearn to Dr Jackson is 4pm on Tuesday 13 May 2014.
Figure 1: HMM state topology for the prototype, including entry and exit null states.
3. Experimental Work
3.1 Running the Viterbi algorithm

Extract the observation sequence O from your utterance "three" by taking the whole of the middle section that corresponds to the spoken word.
Given your initial values of
μ_{i} and Σ_{i} in your initial model for this word λ_{three},
calculate the output probability
b_{i}(o_{t})
on these observations O={o_{t}} at every time t={1..T}.
Note that the values will be identical for emitting states 1, 2, 3, 4 and 5, since you have used the same mean and diagonal covariance in each state.

By implementing the Viterbi algorithm in Matlab, find the best path X* for these observations
using the initial parameters λ_{three} = {A, B} where
A = {a_{ij}} for i,j=1..5, and
B = {b_{i}} for i=1..5 and b_{i} = {μ_{i}, Σ_{i}}.
 Report the cumulative likelihood Δ* = p(O,X*λ_{three}) along that path.
 Repeat the above three steps using the initial model representing the word "oh", and report the best path and cumulative likelihood with that model λ_{oh}.
3.2 Calculating likelihoods for BaumWelch training
 Using your initial parameters
λ_{three}, compute the forward likelihoods α_{t}(i) throughout the trellis, i=1..5 and t=1..T, and record the observation likelihood
p(Oλ_{three}).
 Taking β_{t}(i) for i=1..5 based on the exit probabilities η_{i}, calculate the backward likelihoods β_{t}(i) throughout the
trellis, t=1..T.
 Using your values of α_{t} and β_{t} from the previous steps, compute the occupancy likelihoods γ_{t}(i) for each node in the trellis, t=1..T.
 Also using α_{t} and β_{t}, calculate the transition likelihoods ξ_{t}(i,j) for each connection within the trellis between states i and j, t=2..T.
3.3 Reestimating the model and evaluating the new best path
 Use the values of γ_{t} and ξ_{t} from the previous section to
reestimate the parameters of the model,
λ′_{three}
= {A′, B′},
according to the BaumWelch formulae,
where B′ = {b′_{i}} and b′_{i} = {μ′_{i}, Σ′_{i}} for i=1..5.
 Finally, recompute the best path X* and cumulative likelihood p(O,X*λ′_{three}) using the updated model λ′_{three}.
3.4 Assignment submission

Collate the results identified as deliverables in section 2 within one file as a short technical report, using tables and appendices as appropriate.

Produce a PDF version of the document, e.g., called lab3_YourName.pdf.

Submit the file as assignment 3 in SurreyLearn.
4. Further work
It is beyond the scope of this assignment to train all the models and to perform recognition tests of every word with every model.
However, having written the programs to do this already, you may be interested to see whether your digit recognizer works.
Good luck!
FAQ. Questions and answers
Here are some questions from students together with my answers that you may find helpful in completing the assignment.
Q.1
I'm not certain on how the HMM relates to the MFCC relates to the trellis diagrams you used in the lectures etc.
A.1
The MFCCs represent the observations (as a sampled continuous 39D vector sequence). The HMM is a model that acts as the template. In the trellis diagram the observation sequence goes along the x axis, and the states of the HMM up the y axis.
Q.2
So the MFCCs define the sequence of output emission probabilities for each state, b_{i}(o_{t}) which is a scalar calculated from the 39D multivariate gaussian equation.
Then for each time step on the x axis, there are 5 corresponding emission probabilities (one for each state). But in the initial model these are all the same. How does that affect the path?
Also, I've used π=[1 0 0 0 0], η=[0 0 0 0 1]^{T} and A = [0.4 0.6 0 0 0; 0 0.4 0.6 0 0; 0 0 0.4 0.6 0 ... etc.].
Were there any constraints on these because they obviously affect the model?
Aren't the observations duplicated for each state?
A.2
Yes, you're right.
The observations can be used with any state.
The only constraints are at the start and the end where the path must pass through the first and the last states respectively. Your implementation of the Viterbi algorithm will determine the actual path that's output as the best one
(because there are paths that have identical probability).
Note that the selfloop and onward transition probs are the other way round in the prototype, 0.6 and 0.4 respectively; the exit prob has a value of η=0.4 for the final emitting state.
Q.3
With regards to the SSR coursework, when
we're attempting to calculate the output probability of the
observations using MultiGauss.m, I'm a little confused as to what the
vector of observation points is. I have tried using a 39x1 vector, a
39x39 matrix of zeros, ones etc. but I keep getting the error in the
end of:
Matrix is close to singular or badly scaled
and the output probabilities are all zerovalued. Could you perhaps
clarify what this vector is and how large it is expected to be?
A.3
At each time step, the observation vector is a 39x1 feature vector taken from the MFCC file and corresponding to the current time frame.
The mean is also a 39x1 vector and is taken from the prototype model that you produced. The covariance matrix should be a diagonal matrix with the elements of variance from the model along the diagonal and zeros elsewhere. With these parameters, the MultiGauss function ought to give nonzero values most of the time. However, when working with linear probability values, underflow is a problem that occurs frequently (even in Matlab!), so you may want to consider converting your program to the log prob version. Nevertheless, it is useful to have both as it allows you to verify the first few frames of your implementation against hand calculations in both cases.
Q.4
Are the probabilities for the forward and backward procedures over this size trellis still going to come out to be identical?
Mine are different!
A.4
Yes, they should do (to within the precision of the application).
Are you sure you've not missed out a term in the calculation of one of them?