To be able to establish Internet telephone calls, a video conference or any other session set up using SIP, it is essential to locate the SIP servers responsible for the destination address. How to locate SIP servers is specified in RFC 3263 and although this specification is rather straightforward, many SIP stacks in use today does not implement this specification correctly.
The most common divergence from 3263 seems to be that NAPTR records are not supported, but there are also implementations that do not process multiple SRV records correctly and some that even drops all but one of the SRV records they can find. Even SIP stacks that fully supports 3263 may have problems with redundancy, as the timers used for retransmitting requests in case of timeouts are rather large for unreliable transport protocols like UDP. In such cases it could take a long time (up to 32 seconds) before a second transport and/or destination is tried.
When Kirei was tasked to design a distributed and geographically redundant SIP service for the Swedish emergency services (112), we started to look at what alternatives might be available to provide redundancy for all those broken SIP stacks. When something is on fire or someone is injured, it is usually not the time and place to start arguing about non-compliant SIP implementations – it’s better to get the work done and connect the call while providing as much redundancy as possible.
What about anycast?
Based on our experience from the world of DNS, we started to look at BGP anycast to provide a redundant SIP service. BGP anycast works by announcing the same network prefix (IP address) at multiple locations and letting the BGP routing system choose the best (i.e., topologically closest) path from the client to the server. As this works very well with DNS, we though it might work for SIP as well.
One large difference between DNS and SIP, when looking at how the protocols are used, is that the length of sessions are very different. A DNS query/response usually takes less than 50 milliseconds, whereas the average emergency call to the Swedish emergency services is about 1-2 minutes and an extended call be last for more than 30 minutes. Given that BGP changes topology from time to time, and that a change in topology might re-route an already existing call from one SIP server to another, we might end up disconnecting established calls as an established call cannot be transferred between servers. Something had to be done to resolve this, if we want to use BGP anycast with SIP.
Even if we cannot use anycast for routing the entire SIP session, we should be able to use it to route the initial signaling. This would not help already established sessions, but it would create better redundancy for session setup from broken SIP stacks. It would also give us faster session setup for well-behaved SIP stacks, since they don’t have to try multiple destinations – just one anycasted destination. A set of unicasted fallback addresses are probably wise to have, but they would only be used in case of failure or problem with the anycasted servers.
To use SIP anycast for the initial signaling only, we need to move the session from anycast to unicast as soon as possible. This means that one must configure the destination SIP user agent so that the contact address used in the reply to the initial INVITE is a unicast address. Also, if there are any proxies in the signaling path, one must also make sure that no SIP record routing includes the anycast address.
In the system we’re looking at currently we have a session border controller (SBC) acting as a back-to-back user agent (B2BUA) between a SIP user agent on some external network and a media gateway behind the SBC. In this case we need to configure the SBC with dual addresses – one anycast address shared between all SBC:s and one unique unicast address per SBC. We also need to configure the SBC so that the contact address returned to the SIP user agent on the external network is always the unicast address, even if that user agent contacted the SBC using its anycast address. Record routing isn’t used in this case, since the SBC looks like a SIP user agent and not a proxy.
A complete session would look something like this:
This means that any subsequent signaling after session setup will be sent to the unicast address and the sessions therefore independent of changes in BGP topology where the anycast address, from the clients point of view, moves from one SIP server to another.
Time will tell if this will help providing redundancy for less featured and broken SIP stacks, but at this point we believe this is our best option.