In this section we assume that calls are
completed with SIP [13]; however, call completion can work
with any appropriate rendezvous protocol. The overview of
the ICE methodology is shown in Figure 3. Making a call
starts by sending a SIP INVITE message with an SDP
describing on which IP address(es) and port(s) the
application can receive audio and/or video packets. These
addresses and ports are known as candidates. These
candidates are obtained from the AnyFirewall Engine and are
inserted into the SDP of a SIP INVITE message, which is sent
to the callee.
Figure 3: SIP call flow using ICE
Specifically, a candidate is an IP address
and port at which one peer can receive data from another
peer.
There are 3 types of candidates:
Local candidate: a local IP
address of the client.
Reflexive or STUN candidates: an
IP address of the client's NAT (assuming they are only
behind a single NAT). These are determined from another
entity, and then communicated back to the client.
Relay or TURN candidate: an
address on a relay server that has been allocated for
use by the client.
Traffic can always be sent successfully
using relay candidates, unless a firewall blocks all traffic
towards the client, in which case no legitimate firewall
traversal technique can ever work. The problem with using
relay candidates, however, is that they require server
resources, and relayed traffic introduces additional delay,
loss and jitter in the traffic stream.
We now describe how the ICE methodology
works for SIP-based calls, as related to the steps in Figure
4:
Caller gathers transport candidates
Figure 4a illustrates how the caller determines its
server reflexive and relay candidates for a connection.
The client sends an ALLOCATE request to the server,
which instructs the server to allocate an IP/port on the
server (the relay candidate). Upon successfully
allocating the relaying address for the client, the
server returns the caller's IP/port as seen by the
server (the client's server reflexive candidate). The
reflexive candidate contains the public address that the
client is using, which is usually the address of a NAT
that the client is behind.
Figure 4: Gathering of candidates by user agent
Caller sends a SIP INVITE
After gathering the candidates, the caller encodes them
in the call setup message (e.g. a SIP INVITE), as shown
in Figure 4c, and sends the message to the called party.
Calee gathers transport candidates
Upon receiving the SIP INVITE, the called party also
gathers its candidates in the same manner as the caller
in step 1.
Calee sends SIP 1xx response
In response to the SIP INVITE, the callee sends its ICE
candidates within the SDP of a provisional response,
such as a SIP 183 (Session Progress), to the caller. The
message should be sent reliably (i.e. with
retransmission), and should not be considered
successfully sent until either a 200 OK is received in
acknowledgement, or a connectivity check from the caller
is received by the callee, as is described in the next
section.
Both conduct ICE connectivity checks
Once the callee
has sent its ICE candidates, and once the caller
receives them, they each start the ICE connectivity
checks. At this point, both the parties know about their
peer’s potential candidates. Each possible pair of local
and remote candidates is formed, creating a number of
candidate pairs. A connectivity check is done by sending
STUN messages from the local candidate to the remote
candidate of each pair, starting with the highest
priority (i.e. most preferred) candidate pair first.
Both parties exchange STUN messages in this way to
determine the best possible candidate pair that they can
use to communicate. Once a valid (i.e. successful)
message has been sent both ways on a single candidate
pair, the connectivity check can stop and media can be
sent/received using that candidate pair.
Figure 5 shows how the ICE connectivity checks are
carried out between the user agents by sending STUN
messages between the candidate pairs. For simplicity,
the figure shows only a subset of the candidate pairs
that may be checked.
Figure 5: Performing ICE conectivity check by user agents
Callee sends SIP 180 ringing
A 180 RINGING message is sent to the caller after the
connectivity checks are completed, signaling that the
callee's phone has begun ringing. Ringing the callee's
phone happens now, after signaling is complete, so that
there is no delay in receiving audio once the phone is
picked up.
Callee sends 200 OK to invite
If the user accepts the call, the callee sends a final
response to the caller(200 OK).
Caller sends SIP re-invite
If the candidates chosen differ from the original
connection candidate put into the media and connection
lines of the SDP of the SIP message (i.e. the candidate
chosen differs from the one thought to have worked),
then a new SIP INVITE message is sent to the callee with
the agreed upon candidates in the m/c line of the
enclosed SDP message.
Callee sends 200 OK to re-invite
If a re-invite is sent, then it must be
acknowledged, which is done with a SIP 200 OK message.
Caller sends ACK
A final acknowledgement is sent from the caller to the
callee indicating the completion of the call setup.
Voice/video media transport starts
Now that the call has been established, both the caller
and callee send media to/from their successful candidate
addresses. (usually using RTP protocol)