The University of Surrey

EE1.LabB: ME1 Speech Capturing and Editing

[ Section: 1, 2, 3 | People: Dr Every, Dr Zielinski, Dr Jackson ]

Teams: 2-3 members per group
Lab is split into two halves (3.1-3.2 and 3.3-3.4) that can be completed independently
Software: Audacity (v1.2.4 or later) is the recommended application for this laboratory

Aims of the Experiment

To explore the effect of microphone placement on the quality of recorded speech. To evaluate the effects of sample rate and bit resolution on the quality of captured speech, and the effect of adding dither prior to quantization. To familiarise oneself with the audio capturing equipment and editing software.

Required reading

J. Borwick, "Sound Recording Practice", 4th ed., Oxford University Press, Oxford, 1996.

1. Overview

1.1 Acoustical considerations

Although it may appear to be a trivial task, there are many factors that must be considered in order to obtain high quality recordings of speech, such as the acoustical environment, the particular characteristics and placement of the microphone, and the conversion of the analogue signal into a digital format. It is recommended that the reverberation time of a recording studio should not exceed 0.4 seconds. There are several criteria for selecting a microphone type. Directional microphones are often used since the most sensitive region in the directivity pattern of the microphone can be directed at the desired source, making it less sensitive to room reflections arriving from other angles of incidence. However, it must be born in mind that directional microphones (pressure gradient microphones in particular) exhibit a proximity effect, and consequently their frequency response is generally not flat, and depends on the distance between the speaker and the microphone. Occasionally, this property is deliberately exploited as a form of spectral modification. For example, by placing a microphone near the speaker it is possible to boost low frequencies in speech and hence make the sound "warmer".

Figure 1: Directivity patterns of two typical microphone types

1.2 Analogue-to-digital (A/D) conversion

There are two main parameters that represent the process of analogue-to-digital conversion of an audio signal: the sample rate (number of samples per second, in Hz) and the number of bits used to represent each sample (the bit resolution). Sampling the analogue signal at discrete time intervals is, in principle, a lossless process (i.e., it allows for a perfect representation and reconstruction of the signal), providing that the highest frequency component within the signal is less than the Nyquist frequency (half the sample rate). In contrast to sampling, quantization is a lossy process. In other words, once the continuous value of the analogue signal is quantized as a sequence of bits at some moment, some detail is lost. This detail is referred to as the quantization error (fig. 2). The maximum amplitude of this depends on the bit resolution: the larger the number of bits, the more accurately the signal can be encoded, and hence, the smaller the quantization error. The quantization error sounds like white noise superimposed on the signal, or if the signal amplitude to quantizing level (the minimum difference between two sampled values) is small, as an interference that is correlated with the original signal. For the sinusoid in fig. 2, the quantization error is also periodic, i.e. harmonics of the sinusoid are added during the quantization process. To avoid this effect, which is annoying especially for low-level, low frequency tones, a small amount of random noise or dither can be added to the signal prior to quantization. Although this results in a small continuous noise being added to the signal, perceptually speaking, this is overcome by the randomization of the quantization error.

Figure 2: A/D Conversion at 3-bit resolution

In order to obtain high quality digitised speech it is advisable to use the highest possible sample rate and bit resolution (standard CD quality audio is sampled at 44.1 kHz, 16 bit resolution, whereas a high-end analogue-to-digital converter (ADC) may have a 96 kHz sample rate and 24-bit resolution). However, these are often restricted by a limited bandwidth available in a telecommunication channel or by limited space available on a storage medium. Typically, in digital telephony, speech is first band-limited to between 200 and 3400 Hz and then sampled at 8 kHz, although wideband speech codecs exist with 16kHz sample rates. For a given application, a trade-off between the overall bit rate (sample rate × bit resolution) and speech quality must be sought.

2. Preparation

3. Experimental Work

NB: Unless directed not to do so, make sure that recordings are saved as mono '.wav' files, and no dither is added! Check the settings in the "Preferences" before starting.

3.1 Choosing the microphone position

Table 1: Effects of the microphone position - Informal listening test report.
Position of the microphone Near Intermediate Far
Distance between the microphone and the speaker   50 cm  
Clarity of speech (e.g. high, low, medium). This attribute is related to how easy it is to understand.      
Is speech timbre "coloured" by the room acoustics? (Yes/No)      
Can you hear a proximity effect? (Yes/No)      
Are any distortions audible? (e.g. plosive sounds, background/electrical noise). If so, what kind?      
Which distance out of these three would you recommend?      
Other comments  

3.2 Editing

3.3 Effects of the sample rate on audio quality

Table 2: Effects of the sample rate on audio quality - Informal listening test report.
16 bits, 44.1kHz (original) 16 bits, 32 kHz 16 bits, 16 kHz 16 bits, 8 kHz
Brightness (e.g. very bright, bright, dull, very dull). This is a perceptual term used to describe the ratio of energy in high frequencies to low frequencies        
Clarity of speech (e.g. high, low, medium). This attribute is related to how easy it is to understand        
Hiss (e.g. imperceptible, perceptible but not annoying, very annoying). Hiss is a degradation similar to the sound "s" or white noise        
Overall Sound Quality. Use the following grading scale:
5 - Excellent, 4 - Good, 3 - Fair, 2 - Poor, 1 - Bad
Other comments  

3.4 Effects of bit resolution and dither on audio quality

Further reading

If you would like to find out more on PCM and dithering, you should have a look at the following academic article:

Lipshitz, S.P., and Vanderkooy, J., (2004). "Pulse code modulation - an overview", Journal of the Audio Engineering Society, 52(3): 200-215.

[Section: 1, 2, 3 | People: Dr Every, Dr Zielinski, Dr Jackson ]

2006-07, written by Mark Every, Slawek Zielinski and Philip Jackson, who maintains it, last updated on 1 Feb 2007.