Tutorial: Voice Digitization (2)

There are three steps in voice digitization: quantization, sampling and coding. The previous tutorial Voice Digitization (1) established the fundamental ideas. In this tutorial, we derive the standard way of digitizing voice using the G.711 codec to produce 64 kb/s, carried in DS0 channels or in IP packets.

Here is a summary:

Quantization

256 levels

Sampling

8,000 samples/second

Coding: 8 bits/sample

"Pulse Code Modulation" (PCM)
8,000 bytes per second
64,000 bits/second = 64 kb/s

G.711 codec: 64 kb/s

Carried in channels: DS0 rate
Carried in packets: VoIP

The telephone system quantizes the voice signal to 256 levels. This number is chosen to reduce the quantization error, which would be heard as noise after the signal is reconstructed, so that a person can't hear it on the line. The diagram shows bin numbers 127 and 128 around zero volts.

Voice Digitization: 64 kb/s standard

The second step is sampling. Since this is a voiceband signal, the frequency bandwidth is about 3000 Hz, and so the sampling rate must be at least 6001 times per second, following Dr. Nyquist's sampling theorem. To ensure that there are no aliasing errors, the telephone system samples more often: 8,000 samples per second.

The third step is coding. The telephone system uses 8 bits to code the value of each sample. This technique of using 8 bits per sample is called by some Pulse Code Modulation (PCM), which doesn't really mean anything.

To determine the number of bits per second required, multiply the number of samples per second (8,000) by the number of bits per sample (8) to get 64,000 bits per second, or 64 kb/s for short.

This is standardized as the G.711 coding standard.

When carried in a channel, i.e. a reserved stream of 64 kb/s on a transmission system, this 64 kb/s rate is called a DS0-rate signal (Digital service level zero, called "DS0s" in the business). This is the base rate of most channelized transmission systems. When someone talks about a channel on a digital transmission system, they usually mean a DS0.

Transmission systems deployed up until about year 2000 were designed to carry digitized voice in channels, and thus move multiple DS0s. Since they are digital systems, they were easily be adapted to carry data or video as well as digitized voice.

New installations move digitized voice in IP packets (Voice over IP) interspersed with Internet traffic, commercial data and television in IP packets. However, even though the digitized voice is put in packets, and even though there are more efficient coding techniques available, in new installations most carriers still use the G.711 codec to digitize voice to 64 kb/s - and carry it in packets instead of channels.

The bottom line: we move a byte (representing the value of the sample) 8,000 times per second from one end to the other, for each voice.