Today’s business lives and depends on the Internet. More and more companies rely on the Internet for voice and video. This is particularly true as we adopt Unified Communications as a Service (UCaaS). The public Internet, though, is a challenging environment to deliver business-quality real-time services. Aside from the general issues of packet loss and unpredictability, the Internet was optimized for short application or data sessions, not the long sessions seen in voice and video conferencing usage.
Public Internet Routing: Bad News for Voice
That the Internet is not optimized for real-time traffic isn’t all that new. Take a look at this whitepaper, originally published in 1999, at the beginning of the VoIP era, which discusses the issues of transmitting quality, real-time voice. It might be old, but the premise and conclusions are very much relevant to today.
When we communicate remotely, we naturally pause to allow the other party to talk. After waiting about 250-300 milliseconds, we start speaking again. If the round-trip latency of the IP packets exceeds the human latency window (speaker mouth to listeners ear and back again), the result is interruptions in the conversation and an awkward experience.
Now, the typical round-trip VoIP flow constitutes a minimum of six packets, 20 msecs each. That leaves just 130-180 msecs of cushion for the entire transmission network. This may sound sufficient, the reality is that, after delays due to distance and network processing, that window is scarcely sufficient for delivering quality voice on a well-run network, let alone one with the variance of the public Internet. For example, in this report, AT&T indicates that the range of latencies between city core nodes in their controlled network vary from 15 to 227 msecs, and that does not include peering typical in most open connections. A peering can double the latency, raising the average US regional latency to over 120 msecs. And those are averages, clearly there are a lot of transmission that will be outside the acceptable window.
The public Internet reorganizes to solve issues without regard for the impact on applications. There is no packet classification or priority (except for certain hosts driving large content delivery). Real-time traffic is interleaved with a range of other traffic, such as web browsing, streaming media, and other applications. There’s no way to identify and prioritize the real-time traffic; it is delayed with all of the other traffic.
If a link between major sites goes down, all flows are routed along alternative paths — traffic from a lightly utilized link may move to one that is heavily utilized, resulting in increased latency (jitter) and packet loss. If a link becomes congested, Internet routers use Random Early Discard (RED) queuing algorithms to reduce the flow rate of TCP sessions, dropping packets as necessary. Dropping packet may not impact web browsing, but it may very much impact a voice session, losing pieces of a conversation.
The Longer the Call, the Greater the Risk
To make matters worse, voice sessions tend to be fairly long, increasing the likelihood users will experience a problem during a call. Upon initiation, IP sessions are established through specific route paths. Even if there are changes in network conditions, the route does not change. While the route is initially selected as an optimized path using BGP or OSPF, over time the loading and traffic patterns may change, significantly degrading communications.
Within short sessions this degradation may not be noticeable. It’s one reason why typical Internet applications, such as when loading a Web page, perform well. The photos and the actual HTML page content are often served from a separate web server using separate sessions of only few milliseconds. As such, there’s little time for the application to be impacted by Internet performance issues.
Contrast that with your weekly conference call. The voice session is maintained for the full duration of the hour-long call. The continual stream of packets (one every 20 msecs for voice) provides extensive opportunity for the Internet to interfere with. And the duration virtually assures that there will be issues over the call. Other long sessions, like video streaming, use buffering to manage the Internet variability and deliver a great experience. However, introducing 5-10 seconds of buffered delay in either direction in a phone call makes the call unusable. It’s not an option.
Traditional Networks No Longer Work for Voice or UCaaS
Unfortunately, price/performance ratio of legacy networks have made it challenging to support the explosion of VoIP and UCaaS traffic. Initially, most VoIP and video deployments used MPLS, leased line or fiber for their site connectivity to ensure delivery of general data, along with real-time voice and video.
However, prices for dedicated TDM trunks (often used for branch data) are increasing by multiples of 2-10x. MPLS remains both expensive and difficult to provision and manage. The ability to get relatively low-cost Internet IP connections is changing the edge access, especially for branch and smaller locations. With 100 Mbps connectivity available for $100 price points from the cable TV companies or DSL carriers (albeit at slightly lower speeds) creates a strong incentive to move to the open Internet. This problem is compounded as more users go mobile and use their real-time applications outside the office. For these mobile workers, using the Internet for their voice and video is no longer an option; it is a requirement of doing business.
UCaaS has further challenged the use of MPLS and other “non-Internet” solutions to deliver quality voice. With UCaaS, the tether point for many real-time media flows is in a cloud data center not on the private network. Legacy network solutions often end up adding too much latency, such when establishing an Internet-based VPN to the corporate headquarters or datacenter and from there to the UCaaS provider or an MPLS path (or even open Internet path) to and from the UCaaS provider. Both add separate trips over the Internet and back to the user location.
The result is becoming obvious as more users and organizations move to using the public Internet for their real-time communications. Over an hour conversation duration, there is abundant opportunity for new outages and events to impact Internet reliability and the quality of the communications. In early 2018, for example, a four-hour outage was seen for many UCaaS providers due to a peering failure between Comcast and some of its peering partners. A four-hour outage without communications can be a major issue, especially at the wrong time.
SD-WAN: A Safer Way to Use the Internet
While this may seem daunting, there is a new hope on the horizon. SD-WAN is offering an alternative to traditional corporate networks. For many organizations, SD-WAN opens the use of low-cost, public Internet connections with the quality users demand. Can SD-WAN be used for quality voice and video over the Internet as well? Possibly. We’ll explore that issue and the key capabilities to optimize your traffic in our upcoming posts.