A migration strategy
Moving straight to a full-blown IP telephony solution essentially
means loading up your existing voice equipment and moving it out
with a forklift, while probably at the same time upgrading your
cabling system to ensure there are enough cat-5 outlets available.
The sudden equipment and cabling obsolescence combined with the
need for retraining of staff quite possibly make this approach a
difficult sell to senior management (particularly since the immediate
benefits of an IP telephone handset over a traditional phone may
not be entirely obvious). There is an alternative approach that
can be taken: IP-enabled PBXs.
Rather than replace the cabling and handsets, just upgrade or replace
the PBX so that it speaks IP telephony to the outside world (and
makes each of the attached phones appear to the outside world like
an IP telephony endpoint). The solution would look like that shown
in the diagram below.

First of all, check with your existing PBX vendor to determine
whether a suitable upgrade is available. Both Lucent and Nortel
will shortly be providing such support for their Definity and Magellan
PBXs, respectively.
The other alternative is simply to replace your existing PBX with
a pure IP PBX or "iPBX." This approach tends to be better
suited to smaller office environments. Some models only support
IP telephony, while others support both IP telephony and direct
PSTN connections (with intelligent routing between the two). Some
models are PC-based (typically running on Windows NT), while others
are stand-alone units. Representative vendors of this class of product
include AltiServ, NetPhone, ShoreTelecom and Vertical Networks.
Encoding schemes
When you speak, you cause air molecules to vibrate thats
how sound is transported. If you were to plot the displacement of
air molecules versus time you would be drawing the waveform of human
speech, which might look like the graph below.

The function of the microphone in the telephone mouthpiece is to
convert this waveform to an electrical signal. The electrical signal,
if plotted against the same time scale would look the same. This
electrical signal is an analog signal at any one time, the
signal may have any value between the top and bottom peak values
(as opposed to a digital signal, which is generated using only two
signal levels representing the binary digits zero and one). In order
to transport this analog voice signal over a digital network it
is necessary to convert the analog signal to a digital data stream
of ones and zeros. The process used to do this is called voice encoding
(and the device or software program used for encoding and decoding
is called a codec).
There are two basic approaches you can take to encoding. The first
is to sample the signal strength itself at a rate higher than the
frequency of the signal, as shown in the following diagram.

Sampling theory tells us that in order to reproduce the original
signal from a digital sample we must sample at a rate at least 2.2
times the maximum frequency represented in the underlying signal.
Since the human voice is made up of frequencies in the range 300Hz
to a bit under 4,000Hz, we can use a sampling rate of 8.000 times
per second. If for each sample we use 8 bits to represent the signal
strength then well need a bandwidth of 8 bits, 8,000 times
per second or 64kbps. Such an approach is called Pulse Code Modulation
(PCM) and is the most widely used method to transport voice on todays
digital public telephone networks. This sampling approach can be
refined by doing some additional processing. Adaptive Differential
Pulse Code Modulation (ADPCM), for example, predicts what the next
value will be based on previous values then sends only the difference
between the actual and predicted. Since the difference is smaller
than the signal, less bits are required for transport. PCM and ADPCM
are used in ITU standards G.711 and G.726, respectively.
The second approach to encoding is to split the voice signal up
into substantially larger chunks, which represent whole, recognizable
sounds used in human speech. This is the approach used for Codebook
Excited Linear Prediction (CELP) and its variants; prevalent examples
include MPE/ACELP (ITU Standard G.723.1) and CS-ACELP (G.729).
So which encoding scheme should you go for? The standard by which
telephone voice quality is measured is so-called "toll quality,"
which in effect means the quality delivered by PCM (G.711), which
is predominantly used in todays phone networks and is what
you are used to when you use the public telephone network in developed
countries. The great thing about PCM is that the algorithm itself
is pretty straightforward, so not too much processing power is required
that means high performance and relatively inexpensive encoding
equipment. The downside of PCM is that it uses up a whole 64kbps
for each voice circuit not so bad if youre a carrier
with bandwidth to burn, but less optimal if youre a corporate
customer getting heavily charged for that bandwidth. CELP makes
a big difference in bandwidth requirements because the sampling
frequency can be much lower this means that "near toll
quality" can be achieved with 8kbps. In order to do that CELP
does a lot more processing than PCM, which originally meant substantially
higher costs and lower throughput. All that said, 8kbps is simply
four times better use of bandwidth than PCM and twice as good as
ADPCM (which realistically needs 32kbps for near toll quality).
For that reason, youll generally want to look for G.723.1
or G.729 encoding the two standard CELP implementations.
If youre working on calculating how much bandwidth youll
need for a particular number of voice circuits youll also
need to consider packet overhead. Rule of thumb? Triple the bandwidth
requirement. I know that sounds pretty extreme, but heres
the rationale: Voice packets need to be kept relatively small to
minimize delay effects. If you assume the use of a G.729 codec,
two samples of 10ms each will go into a single 20-byte packet. But
the PPP/IP/UDP/RTP header is 49 bytes not a very efficient
arrangement. There are header compression schemes available (RFC
2508 for RTP compression and RFC 2393 for compression of all headers
except IP), but these are not widely implemented, and RFC 2508 wont
operate across an IPSEC VPN. Unless youre implementing MPLS
(discussed later) youre just going to have to put up with
this overhead for now.
Quality of Service
Encoding is not the only driver of voice quality. The human ear
is very sensitive to even minor changes in an audio signal (interestingly,
the eye is much less sensitive to imperfect video). What this means
is that if a signal is to be packetized, the packets must arrive
predictably with minimal delay (a specific Quality of Service or
QOS). These requirements do not apply to data networking, which
is pretty tolerant of variable network performance even heavily
transactional data applications wont be affected by the odd
500ms delay. Unfortunately, IP was originally designed as a data
networking protocol so until recently IP networks offered little
in the way of built in Quality of Service.
There are several approaches that can be taken to assure Quality
of Service:
Data Link Layer QOS. ATM has well-known, built-in quality-of-service
capabilities and with the adoption last year of 802.1Q VLAN tags,
even Ethernet, can provide eight levels of prioritization tagging
for each frame. The problem with data link approaches is that
they only really work if the whole data path is based on the same
data link layer. If your Ethernet LANs are interconnected by Frame
Relay, the 802.1Q tags will do little good since they wont
be passed.
Type of Service (TOS). Part of every Ipv4 header is the
TOS byte, which includes 3 bits, allocated to priority (therefore
three levels, the same as 801.Q) and another four bits used to
define the type of service. For this to translate into something
resembling a QOS mechanism, two things need to happen. Firstly,
your router network needs to have the functionality to recognize
TOS fields and provide different classes of service based on them
(either automatically or manually, using filters). Secondly, the
TOS bits need to be set either by the IP end system (e.g.,
our VoIP relay) or by the access router detecting the traffic
type and setting the TOS bits.
Resource Reservation Protocol (RSVP). The way that
RSVP works is that at the start of a session (or voice conversation)
control packets are first sent through the network to reserve
resources for the connection. If appropriate resources are not
available (e.g., theres insufficient bandwidth available),
then the connection is rejected. This approach can provide very
strong QOS assurance, but it does not necessarily scale well.
It is therefore suitable for intra-corporation requirements (like
our toll bypass scenario), but it is strategically unsuitable
for a world in which a large proportion of inter-business phone
calls are placed via IP.
DiffServ. The Differentiated Services working group
has refined the use of the TOS byte so that per-hop behaviors
can be requested by a sender. This approach focuses on providing
classes of service that the network makes available and which
applications can choose to use, contrasted with RSVP in which
the application dictates its requirements. DiffServ is less complex
than RSVP and better suited to meeting long-term, Internet-scale
QOS requirements.
Multi-Protocol Label Switching (MPLS). MPLS is another
working party looking at how to improve network layer performance
through switching of packet labels. In an MPLS network, the IP
datagram header is replaced (at the access router) with a much
shorter (13 byte) label, which (apart from speeding switching
performance) can be used to identify the class of service requirement
at the network ingress point so that intermediate nodes can prioritize
traffic appropriately. MPLS also provides for routes to be chosen
for a particular stream in response to the QOS required for that
stream.
From a design perspective there are two different issues you need
to consider when establishing how to provide the required QOS. First
of all, clearly youll need to establish the capabilities of
your router infrastructure and what, if any, software upgrades would
be required to support appropriate QOS capabilities. Secondly, youll
need to make sure that the IP telephony end systems can also interwork
with the router QOS mechanisms. The simplest way for this to work
is for the end systems to indicate the class of service they require
by setting the TOS byte (either using the traditional settings or
the new uses recommended by DiffServ) and having the routers automatically
detect this setting and allocate resources accordingly. Alternatively,
under RSVP, the application will need to be capable of making resource
requests.
Philip Carden is a managing analyst with no-8.capital, an e-business
and telecommunications investment firm. He has written numerous
features articles on Internet and telecommunications subjects and
has contributed to two books on Internet Security. He can be reached
at pcarden@no-8.com.