Purpose
There are many factors involved when calculating the
bandwidth required through a network. This white paper aims to
explain these factors, and to offer a simple means of making such
calculations. It starts with a basic 'rule of thumb', and then
expands this to take specific voice coding algorithms into account.
There are many ways to reduce the bandwidth requirements, and these can
be particularly important in the wide area network. These include
silence suppression, RTP header compression and RTP multiplexing. These methods are not considered in this document.
Header overhead
In our white paper Voice
over IP Protocols for Voice Transmission, we concluded that the
standard method of transporting voice samples through an IP based network
required the addition of three headers; one for each layer. These
headers are IP, UDP and RTP. An IPv4 header is 20 octets; a UDP
header is 8 octets and an RTP header is 12 octets.
The total length of this header information is 40 octets (bytes), or
320 bits, and these headers are sent each time a packet containing voice
samples is transmitted. The additional bandwidth occupied by this
header information is determined by the number if packets which are sent
each second.
Packet frequency
For the purposes of this document, we define packet
frequency as the number of packets containing voice samples which are sent
per second. The packet frequency is the inverse of the duration in
seconds represented by the voice samples. For example, if the voice
samples in one packet represent a duration of 50 milliseconds, then 20 of
these samples would be required each second. The packet frequency
would therefore be 20.
The selection of this payload duration is a compromise between
bandwidth requirements and quality. Smaller payloads demand higher
bandwidth per channel band, because the header length remains at forty
octets. However, if payloads are increased, the overall delay of the
system will increase, and the system will be more susceptible to the loss
of individual packets by the network.
We know of no recommendations concerning packet duration. In
RFC1889,
the Internet Engineering Task Force include an example where the duration
is 20ms, but they do not suggest this as a recommended value. MICOM
Communications have published a white paper called Voice/Fax
Over IP: Internet, Intranet and Extranet, in which they base
their bandwidth calculations on voice samples of 40ms.
There is no absolute answer to this question, but for the remainder of
this document, we will assume that voice samples representing 20ms are
sent in each packet.
Simple bandwidth calculation
If one packet carries the voice samples representing 20
milliseconds, the 50 such samples are required to be transmitted in every
second. Each sample carries a IP/UDP/RTP header overhead of 320
bits. Therefore, in each second, 16,000 header bits are sent.
Therefore, as a general rule of thumb, it can be assumed that header
information will add 16kbps to the bandwidth requirement for voice over
IP. For example, if an 8kbps algorithm such as G.729 is used, the
total bandwidth required to transmit each voice channel would be 24kbps.
Effects of coding algorithms
The designer of any network convergence solution that includes voice
will need to decide upon which coding algorithm to use. CODECs
perform the conversion from an analogue voice waveform to a digital stream
of information. They sample the analogue signal at regular
intervals (125 microseconds is a typical value), and convert the measured
analogue value into a numeric representation (known as quantising).
The resultant output comprises discreet blocks of information sent at
regular intervals.
The method suggested in the previous section offers a simplistic view
of the bandwidth calculation process. It is valid for most coding
algorithms, however, it assumes that voice samples can be transmitted
within a 20ms datagram. For coding algorithms which use much smaller
sampling periods, multiple samples can be sent within each packet, and the
samples can be buffered for up to 20ms. However, some algorithms do
not produce samples which can be fitted exactly into 20ms datagrams, and
for those algorithms, the 16kbps rule of thumb becomes invalid.
The following tables shows the relevant characteristics of the most
common coding algorithms.
|
Coding
algorithm |
Bandwidth |
Sample |
IP
bandwidth |
|
G.711 |
PCM |
64kbps |
0.125ms |
80kbps |
|
G.723.1 |
ACELP |
5.6kbps |
30ms |
16.27kbps |
|
6.4kbps |
17.07kbps |
|
G.726 |
ADPCM |
32kbps |
0.125ms |
48kbps |
|
G.728 |
LD-CELP |
16kbps |
0.625ms |
32kbps |
|
G.729(A) |
CS-ACELP |
8kbps |
10ms |
24kbps |
The algorithms listed which do not fit into the 16kbps rule of
thumb are the two G.723.1 systems (highlighted on the table).
As their sample duration is 30ms rather than 20ms, only 33 frames are sent
each second. This reduces the header overhead to 10.66kbps.
Detailed consideration of each coding method is beyond the scope of
this document, but it should be understood that the various coding methods
vary in the levels of complexity, delay characteristics and quality.
The CODECs which are expected to become prevalent within the Voice over IP
arena are G.729A and G.723.1.