UCaaS: Why the Internet and Voice Is A Match Made in Hell

Today’s business lives and depends on the Internet. More and more companies rely on the Internet for voice and video. This is particularly true as... Read ›
UCaaS: Why the Internet and Voice Is A Match Made in Hell Today’s business lives and depends on the Internet. More and more companies rely on the Internet for voice and video. This is particularly true as we adopt Unified Communications as a Service (UCaaS). The public Internet, though, is a challenging environment to deliver business-quality real-time services.  Aside from the general issues of packet loss and unpredictability, the Internet was optimized for short application or data sessions, not the long sessions seen in voice and video conferencing usage. Public Internet Routing: Bad News for Voice That the Internet is not optimized for real-time traffic isn’t all that new. Take a look at this whitepaper, originally published in 1999, at the beginning of the VoIP era, which discusses the issues of transmitting quality, real-time voice. It might be old, but the premise and conclusions are very much relevant to today. When we communicate remotely, we naturally pause to allow the other party to talk. After waiting about 250-300 milliseconds, we start speaking again. If the round-trip latency of the IP packets exceeds the human latency window (speaker mouth to listeners ear and back again), the result is interruptions in the conversation and an awkward experience. Now, the typical round-trip VoIP flow constitutes a minimum of six packets, 20 msecs each. That leaves just 130-180 msecs of cushion for the entire transmission network. This may sound sufficient, the reality is that, after delays due to distance and network processing, that window is scarcely sufficient for delivering quality voice on a well-run network, let alone one with the variance of the public Internet. For example, in this report, AT&T indicates that the range of latencies between city core nodes in their controlled network vary from 15 to 227 msecs, and that does not include peering typical in most open connections. A peering can double the latency, raising the average US regional latency to over 120 msecs. And those are averages, clearly there are a lot of transmission that will be outside the acceptable window. The public Internet reorganizes to solve issues without regard for the impact on applications. There is no packet classification or priority (except for certain hosts driving large content delivery). Real-time traffic is interleaved with a range of other traffic, such as web browsing, streaming media, and other applications.  There’s no way to identify and prioritize the real-time traffic; it is delayed with all of the other traffic. If a link between major sites goes down, all flows are routed along alternative paths — traffic from a lightly utilized link may move to one that is heavily utilized, resulting in increased latency (jitter) and packet loss. If a link becomes congested, Internet routers use Random Early Discard (RED) queuing algorithms to reduce the flow rate of TCP sessions, dropping packets as necessary. Dropping packet may not impact web browsing, but it may very much impact a voice session, losing pieces of a conversation. The Longer the Call, the Greater the Risk To make matters worse, voice sessions tend to be fairly long, increasing the likelihood users will experience a problem during a call. Upon initiation, IP sessions are established through specific route paths. Even if there are changes in network conditions, the route does not change. While the route is initially selected as an optimized path using BGP or OSPF, over time the loading and traffic patterns may change, significantly degrading communications. Within short sessions this degradation may not be noticeable. It’s one reason why typical Internet applications, such as when loading a Web page, perform well. The photos and the actual HTML page content are often served from a separate web server using separate sessions of only few milliseconds. As such, there’s little time for the application to be impacted by Internet performance issues. Contrast that with your weekly conference call. The voice session is maintained for the full duration of the hour-long call. The continual stream of packets (one every 20 msecs for voice) provides extensive opportunity for the Internet to interfere with. And the duration virtually assures that there will be issues over the call. Other long sessions, like video streaming, use buffering to manage the Internet variability and deliver a great experience. However, introducing 5-10 seconds of buffered delay in either direction in a phone call makes the call unusable. It’s not an option. Traditional Networks No Longer Work for Voice or UCaaS Unfortunately, price/performance ratio of legacy networks have made it challenging to support the explosion of VoIP and UCaaS traffic. Initially, most VoIP and video deployments used MPLS, leased line or fiber for their site connectivity to ensure delivery of general data, along with real-time voice and video. However, prices for dedicated TDM trunks (often used for branch data) are increasing by multiples of 2-10x. MPLS remains both expensive and difficult to provision and manage. The ability to get relatively low-cost Internet IP connections is changing the edge access, especially for branch and smaller locations. With 100 Mbps connectivity available for $100 price points from the cable TV companies or DSL carriers (albeit at slightly lower speeds) creates a strong incentive to move to the open Internet. This problem is compounded as more users go mobile and use their real-time applications outside the office. For these mobile workers, using the Internet for their voice and video is no longer an option; it is a requirement of doing business. UCaaS has further challenged the use of MPLS and other “non-Internet” solutions to deliver quality voice. With UCaaS, the tether point for many real-time media flows is in a cloud data center not on the private network. Legacy network solutions often end up adding too much latency, such when establishing an Internet-based VPN to the corporate headquarters or datacenter and from there to the UCaaS provider or an MPLS path (or even open Internet path) to and from the UCaaS provider. Both add separate trips over the Internet and back to the user location. The result is becoming obvious as more users and organizations move to using the public Internet for their real-time communications. Over an hour conversation duration, there is abundant opportunity for new outages and events to impact Internet reliability and the quality of the communications. In early 2018, for example, a four-hour outage was seen for many UCaaS providers due to a peering failure between Comcast and some of its peering partners. A four-hour outage without communications can be a major issue, especially at the wrong time. SD-WAN: A Safer Way to Use the Internet While this may seem daunting, there is a new hope on the horizon. SD-WAN is offering an alternative to traditional corporate networks. For many organizations, SD-WAN opens the use of low-cost, public Internet connections with the quality users demand. Can SD-WAN be used for quality voice and video over the Internet as well? Possibly.  We’ll explore that issue and the key capabilities to optimize your traffic in our upcoming posts.

Choosing an SD-WAN Architecture for Real-Time Communications

While there are many considerations when choosing an SD-WAN, real-time traffic presents its own set of challenges. Besides the general sensitivity to loss and latency,... Read ›
Choosing an SD-WAN Architecture for Real-Time Communications While there are many considerations when choosing an SD-WAN, real-time traffic presents its own set of challenges. Besides the general sensitivity to loss and latency, the widespread adoption of Unified Communications as a Service (UCaaS) makes well-performing cloud connections as important (if not more important) than site-to-site connections. Let’s take a look at the considerations you need to think about when selecting an SD-WAN architecture for voice, video, and other forms of real-time traffic. SD-WAN Architectural Options To talk about real-time and SD-WAN you first need to define the differences between SD-WANs. By my count, there are roughly 40 solutions on the market today, but from an architectural viewpoint, these SD-WAN solutions can be grouped into three major approaches: Premises-based solutions where most of the SD-WAN functionality runs in an edge appliance. These solutions rely on the Internet, MPLS, or some other third-party network for connecting locations. Cloud-based solutions where most of the SD-WAN functionality runs in the cloud. These solutions can use MPLS in hybrid configurations, but they also bring a private interconnect — their own network of Points of Presence (POPs) that manages the traffic flow across the middle-mile. Carrier-managed SD-WAN where regional carriers or MPLS providers with regional networks integrate third-party premise-based SD-WAN solutions with their traditional offerings, such as MPLS. In this case, MPLS is used for major sites, while the premise-based SD-WAN solution may be used for smaller branches and out-of-coverage locations. The carrier effectively acts as the integrator, pulling the pieces together, and then managing them. Real-Time Considerations for SD-WANs Now let’s turn our attention to real-time. There are four major impact topics to consider when assessing the SD-WAN architecture for real-time traffic: Media Quality – The quality of the real-time traffic flows/sessions carried by the SD-WAN can be impacted by overall latency, average and maximum jitter, and percentage of lost packets introduce by the SD-WAAN. The way the SD-WAN architecture manages failover from one IP path to another also impacts overall media quality.    Troubleshooting and Tools – Invariably, real-time traffic flows/sessions will have problems. Voice will be garbled; calls will drop; video will be distorted. In the past, the network’s been a black box and there’s been relatively little that could be done to understand how the internal operation of the service impacted real-time traffic. With SD-WAN, tools become available that provide deep visibility into the network operation making for smoother troubleshooting. In addition, how quickly the NOC of a cloud-based or carrier-managed service provider resolves problems is critical. Cloud Access – Unified Communications as a Service (UCaaS) and other real-time cloud services are growing at a rapid rate. The way the SD-WAN manages access to the cloud services is critical. Path Privacy – Many organizations are concerned about the path taken by their data and real-time traffic. If traffic passes through a country, the laws of that country may allow it to be monitored or captured and used in legal proceedings. Control of these paths is critical for many organizations, especially for real-time traffic. Media Quality From a real-time packet handling capability, all of the SD-WAN architectures have similar capabilities. For real-time traffic, a mechanism to detect packet issues and move to alternative paths is critical. While this is a basic capability of SD-WAN, especially for real-time media flows, measuring factors of latency, packet loss, and jitter are required to understand real-time issues. If the SD-WAN is going to carry real-time traffic, understanding the real-time analysis capabilities is also essential. The ability to track the key real-time packet parameters, while not an architectural requirement per see, is important to review. Premises-based solutions are exposed to being impacted by major issues in the Internet core. As traffic must transition the actual Internet, major congestion, failures, or hacking could impact traffic.  Both the cloud-based and carrier-managed solutions generally have their own backbones, avoiding Internet issues in the core. Note that enterprise topologies that have all sites close or all on a single access carrier may not have this exposure to the core Internet. From an architectural perspective, premise-based solutions are best suited for smaller organizations that have close geographic sites and/or use a single access carrier or for very large organizations that have a global presence and the scale of operations to manage their own network connections. Cloud-based solutions cover the broadest range of the market. They can be used by virtually any organization for all or a percentage of their traffic. By using a private interconnect to move traffic across the middle-mile, cloud-based SD-WANs eliminate a number of core Internet issues. This can be exacerbated to locations that are in parts of the Internet where core traffic can have more impact, such as Asia. Another capability in some cloud-based SD-WANs are mechanisms to mitigate packet loss. While endpoint general retransmission is not effective due to the time domain for real-time traffic, the SD-WAN can do intermittent retransmits or other packet loss mitigation techniques.  This allows the SD-WAN to essentially recover a lost packet in the SD-WAN and on time. Other capabilities, like packet loss mitigation techniques, should be considered as well. The primary benefits of integrating the SD-WAN with an access carrier’s network, carrier-managed SD-WAN,  is in the elimination of a separate SD-WAN connection into the MPLS-connected datacenter or location. By using the existing MPLS connections into major sites, real-time traffic can be aggregated from the SD-WAN onto these paths. While there is no specific benefit from a real-time perspective to this, it may have advantages if the real-time call processing is located in a datacenter with the MPLS path. A final consideration for media quality is policy management. Network traffic can be managed based on packet type, which is a normal capability in SD-WAN. However, in the event of network issues or congestion, mechanisms to allocate the available reduced bandwidth for optimal business value are critical. While most SD-WANs have policies to manage this based on traffic types, other policy considerations may be important. One area is identity. If the SD-WAN understands identity, that can become a policy to allocate resources to individuals during an event to assure they are able to continue their work. Understanding the policy options for mitigation of impacts of failures or reduce capacity is another consideration. Troubleshooting and Tools One key factor to consider is the availability of tools to manage the SD-WAN and other solution components when an issue emerges. In this way, the premises-based and cloud-based SD-WANs are very different. The premises-based solutions augment the existing network and real-time monitoring and troubleshooting tools. The SD-WAN becomes just another component to monitor, though the operation may make that a challenge. Unless the management software specifically understands the SD-WAN system, the SD-WAN will look like a black box and the SD-WAN path analysis will have to be separate from the standard tools. By contrast, both the cloud-based and carrier-managed SD-WANs will have extensive monitoring and operational controls. Cloud-based SD-WANs include the monitoring and operation of the components in the cloud solutions, as well as any premises elements. Carrier-managed SD-WAN will allow you to monitor your premises elements and monitor the service, but often any configuration changes require opening trouble tickets with the carrier. The choice for troubleshooting and tools is clear. If you are part of a large organization that has the means and staff to monitor and manage the SD-WAN yourself, then the premises solution works. For most other organizations, the option of acquiring a managed solution as a cloud SD-WAN is the better option. However, mechanisms to evaluate the cloud delivery SLA may be required. As in the premises case, the ability of specific management tools to isolate and analyze SD-WAN performance for SLA adherence may be challenging. Cloud Access Access to the cloud compute resources (Amazon, Microsoft, Google, IBM, Rackspace, etc.) used for both private and public cloud as well as SaaS services is becoming an important part of any network design. The same is true for SD-WAN. As more organizations move to the cloud for their communications services (RingCentral, 8x8, Microsoft, Cisco, Vonage, etc.), there is a need for quality access to the datacenter locations for these services. Direct connections into cloud datacenters are not generally possible with a premises-based SD-WAN. Organizations moving to cloud communications should look at either cloud-based SD-WANs, or alternatives for cloud access and SD-WAN from a core site to branches. One consideration is the impact on the quality for remote users that are routed through a corporate site to the cloud. For organizations looking for Communication as a Service (CaaS), a cloud-based SD-WAN would appear to be the best option. Path Privacy Path privacy is a new aspect of IP based communications. It is an offshoot of cloud data storage. In data storage, data stored in a specific geographic area is subject to the laws (and subpoena/disclosure) laws in that geography. Similarly, real-time traffic may be captured or recorded in certain geographies. For example, communications can be recorded and subpoenaed in the US. EU privacy regulations conflict with US subpoena law, which is why most cloud providers have storage in the EU. And as a result, Swiss banks avoid certain countries when routing some types of traffic. With recordable VoIP and other media, the same level of control may emerge in the real-time space. Regional data storage is the precursor to managing paths for privacy. Path Privacy is designed to assure that the route a session follows can be defined and controlled so that specific conversation does not go through a geography that may be hostile. The cloud-based architecture is generally going to be the best for managing the path of flows. Premise-based solutions do not generally have the tools to relate an IP path to a physical path. Generally, access carriers will have good route/geography data for their territory but often will not have good path management in other areas. The cloud-based SD-WAN players, with a large number of defined POPs, are in the best position to manage and control paths both to and especially between their POPs. In this case, for example, a video call from Europe to Mexico, for example, might be routed around the US as the set-up precluded the call being monitored by US authorities. Conclusions While all three basic SD-WAN architectures can carry voice and video, cloud-based SD-WANs seems to have the best set of characteristics for real-time data. The combination of flow management or media quality, tools, CaaS access, and path management seem suited to the broadest range of customers.  The use of private interconnections between POPs is an architectural design that eliminates a level of variability and enhances real-time delivery. Pricing must be assessed, obviously, but from an architecture standpoint, cloud-based SD-WAN seems best positioned to address the range of challenges faced by anyone looking at UCaaS or other real-time applications.