Tutorial: Voice Digitization (1)

There are three steps in voice digitization: quantization, sampling and coding.

Quantization: Change from continuous in value to discrete in value

Sampling: Change from continuous in time to discrete in time

Coding: Code value of sample into 1s and 0s

We look in detail at the voice digitization process to derive the data rate - number of bits per second - required to communicate a digitized voice signal, and understand the steps involved. Once the voice - half of a phone call - is digitized, it can be segmented and carried in IP packets along with everything else in the modern broadband converged IP telecommunications network.

Voice Digitization: Quantization, Sampling and Coding

A continuous signal is one that exists at all values. Pick any two values, and there can always be a value between them... like the voltage on copper wires representing your voice when making a phone call. A discrete signal is one that is defined only at specific values, and is not defined between... like the number of people in a room.

Quantization is the process of changing from a signal that is continuous in value to a signal that is discrete in value. This is accomplished by dividing the possible range of values into a number of bins or levels or steps, and assigning a number to each of these levels. Later, when asked what the value of the signal is, we say that the signal is "in level #4" rather than quoting its voltage accurate to some number of decimal places.

Another example of quantization is sugar cubes. Instead of putting some fractional value of a bag of sugar in your coffee, your choice is "one lump or two". The sugar has been quantized into uniform lumps.

The quantization levels have to be determined in advance. Many hardware chips implement 16-bit quantization, meaning 65,536 levels. These are consolidated into a smaller number of levels by software during coding below.

The second step is sampling. Sampling is the process of changing the signal from being continuous in time to one that is discrete in time. On a regular basis, we take the value of the signal and record it. The value of the signal is the quantization bin number.

How often do we need to sample the signal? A mathematician by the name of Nyquist proved that the signal has to be sampled more than twice as often as the frequency bandwidth of the signal to be able to reproduce it. This is called the Nyquist Rule.

The third step is coding. The value of the signal taken at each sample (the level number) must be coded into 1s and 0s so that it can be efficiently transmitted or stored in a computer. We are interested in using standard coding methods like G.711 for landlines or the AMR codec used for cellular, so that any device or software app can decode the value at the far end. Skype uses a proprietary coding method, meaning that only the Skype app can be used to decode the values at the far end; whatsapp, for example, is not compatible.

The objective of this process is to take the codes representing the value of each sample and transmit them to the far end. At the far end, we perform the reverse process: re-creating the analog waveform from the received codes by de-coding the level number, generating a voltage with a value equal to that of the center of the level, and smoothly changing the voltage in this manner as each new code comes down the line.

The reason for doing this is to move the voice signal from the near end to the far end without adding in any noise, and to be able to move the voice in IP packets interspersed with video, data and Internet traffic.

There is in fact a small amount of noise added in, up front, as part of the analog-to-digital conversion. This is the quantization error, the difference in value between the center of the level, and where the signal actually was.

How do we make the quantization error smaller on average? Make the levels finer. How many levels does the telephone company use? Enough so that a human can't hear the quantization error noise on the line.

NEXT TUTORIAL: Voice Digitization 2