Choosing an SD-WAN Architecture for Real-Time Communications
While there are many considerations when choosing an SD-WAN, real-time traffic presents its own set of challenges. Besides the general sensitivity to loss and latency, the widespread adoption of Unified Communications as a Service (UCaaS) makes well-performing cloud connections as important (if not more important) than site-to-site connections. Let’s take a look at the considerations you need to think about when selecting an SD-WAN architecture for voice, video, and other forms of real-time traffic.
SD-WAN Architectural Options
To talk about real-time and SD-WAN you first need to define the differences between SD-WANs. By my count, there are roughly 40 solutions on the market today, but from an architectural viewpoint, these SD-WAN solutions can be grouped into three major approaches:
- Premises-based solutions where most of the SD-WAN functionality runs in an edge appliance. These solutions rely on the Internet, MPLS, or some other third-party network for connecting locations.
- Cloud-based solutions where most of the SD-WAN functionality runs in the cloud. These solutions can use MPLS in hybrid configurations, but they also bring a private interconnect — their own network of Points of Presence (POPs) that manages the traffic flow across the middle-mile.
- Carrier-managed SD-WAN where regional carriers or MPLS providers with regional networks integrate third-party premise-based SD-WAN solutions with their traditional offerings, such as MPLS. In this case, MPLS is used for major sites, while the premise-based SD-WAN solution may be used for smaller branches and out-of-coverage locations. The carrier effectively acts as the integrator, pulling the pieces together, and then managing them.
Real-Time Considerations for SD-WANs
Now let’s turn our attention to real-time. There are four major impact topics to consider when assessing the SD-WAN architecture for real-time traffic:
- Media Quality – The quality of the real-time traffic flows/sessions carried by the SD-WAN can be impacted by overall latency, average and maximum jitter, and percentage of lost packets introduce by the SD-WAAN. The way the SD-WAN architecture manages failover from one IP path to another also impacts overall media quality.
- Troubleshooting and Tools – Invariably, real-time traffic flows/sessions will have problems. Voice will be garbled; calls will drop; video will be distorted. In the past, the network’s been a black box and there’s been relatively little that could be done to understand how the internal operation of the service impacted real-time traffic. With SD-WAN, tools become available that provide deep visibility into the network operation making for smoother troubleshooting. In addition, how quickly the NOC of a cloud-based or carrier-managed service provider resolves problems is critical.
- Cloud Access – Unified Communications as a Service (UCaaS) and other real-time cloud services are growing at a rapid rate. The way the SD-WAN manages access to the cloud services is critical.
- Path Privacy – Many organizations are concerned about the path taken by their data and real-time traffic. If traffic passes through a country, the laws of that country may allow it to be monitored or captured and used in legal proceedings. Control of these paths is critical for many organizations, especially for real-time traffic.
From a real-time packet handling capability, all of the SD-WAN architectures have similar capabilities. For real-time traffic, a mechanism to detect packet issues and move to alternative paths is critical. While this is a basic capability of SD-WAN, especially for real-time media flows, measuring factors of latency, packet loss, and jitter are required to understand real-time issues. If the SD-WAN is going to carry real-time traffic, understanding the real-time analysis capabilities is also essential. The ability to track the key real-time packet parameters, while not an architectural requirement per see, is important to review.
Premises-based solutions are exposed to being impacted by major issues in the Internet core. As traffic must transition the actual Internet, major congestion, failures, or hacking could impact traffic. Both the cloud-based and carrier-managed solutions generally have their own backbones, avoiding Internet issues in the core. Note that enterprise topologies that have all sites close or all on a single access carrier may not have this exposure to the core Internet.
From an architectural perspective, premise-based solutions are best suited for smaller organizations that have close geographic sites and/or use a single access carrier or for very large organizations that have a global presence and the scale of operations to manage their own network connections.
Cloud-based solutions cover the broadest range of the market. They can be used by virtually any organization for all or a percentage of their traffic. By using a private interconnect to move traffic across the middle-mile, cloud-based SD-WANs eliminate a number of core Internet issues. This can be exacerbated to locations that are in parts of the Internet where core traffic can have more impact, such as Asia.
Another capability in some cloud-based SD-WANs are mechanisms to mitigate packet loss. While endpoint general retransmission is not effective due to the time domain for real-time traffic, the SD-WAN can do intermittent retransmits or other packet loss mitigation techniques. This allows the SD-WAN to essentially recover a lost packet in the SD-WAN and on time. Other capabilities, like packet loss mitigation techniques, should be considered as well.
The primary benefits of integrating the SD-WAN with an access carrier’s network, carrier-managed SD-WAN, is in the elimination of a separate SD-WAN connection into the MPLS-connected datacenter or location. By using the existing MPLS connections into major sites, real-time traffic can be aggregated from the SD-WAN onto these paths. While there is no specific benefit from a real-time perspective to this, it may have advantages if the real-time call processing is located in a datacenter with the MPLS path.
A final consideration for media quality is policy management. Network traffic can be managed based on packet type, which is a normal capability in SD-WAN. However, in the event of network issues or congestion, mechanisms to allocate the available reduced bandwidth for optimal business value are critical. While most SD-WANs have policies to manage this based on traffic types, other policy considerations may be important. One area is identity. If the SD-WAN understands identity, that can become a policy to allocate resources to individuals during an event to assure they are able to continue their work. Understanding the policy options for mitigation of impacts of failures or reduce capacity is another consideration.
Troubleshooting and Tools
One key factor to consider is the availability of tools to manage the SD-WAN and other solution components when an issue emerges. In this way, the premises-based and cloud-based SD-WANs are very different. The premises-based solutions augment the existing network and real-time monitoring and troubleshooting tools. The SD-WAN becomes just another component to monitor, though the operation may make that a challenge. Unless the management software specifically understands the SD-WAN system, the SD-WAN will look like a black box and the SD-WAN path analysis will have to be separate from the standard tools. By contrast, both the cloud-based and carrier-managed SD-WANs will have extensive monitoring and operational controls. Cloud-based SD-WANs include the monitoring and operation of the components in the cloud solutions, as well as any premises elements. Carrier-managed SD-WAN will allow you to monitor your premises elements and monitor the service, but often any configuration changes require opening trouble tickets with the carrier.
The choice for troubleshooting and tools is clear. If you are part of a large organization that has the means and staff to monitor and manage the SD-WAN yourself, then the premises solution works. For most other organizations, the option of acquiring a managed solution as a cloud SD-WAN is the better option. However, mechanisms to evaluate the cloud delivery SLA may be required. As in the premises case, the ability of specific management tools to isolate and analyze SD-WAN performance for SLA adherence may be challenging.
Access to the cloud compute resources (Amazon, Microsoft, Google, IBM, Rackspace, etc.) used for both private and public cloud as well as SaaS services is becoming an important part of any network design. The same is true for SD-WAN. As more organizations move to the cloud for their communications services (RingCentral, 8×8, Microsoft, Cisco, Vonage, etc.), there is a need for quality access to the datacenter locations for these services.
Direct connections into cloud datacenters are not generally possible with a premises-based SD-WAN. Organizations moving to cloud communications should look at either cloud-based SD-WANs, or alternatives for cloud access and SD-WAN from a core site to branches. One consideration is the impact on the quality for remote users that are routed through a corporate site to the cloud. For organizations looking for Communication as a Service (CaaS), a cloud-based SD-WAN would appear to be the best option.
Path privacy is a new aspect of IP based communications. It is an offshoot of cloud data storage. In data storage, data stored in a specific geographic area is subject to the laws (and subpoena/disclosure) laws in that geography. Similarly, real-time traffic may be captured or recorded in certain geographies. For example, communications can be recorded and subpoenaed in the US. EU privacy regulations conflict with US subpoena law, which is why most cloud providers have storage in the EU. And as a result, Swiss banks avoid certain countries when routing some types of traffic. With recordable VoIP and other media, the same level of control may emerge in the real-time space. Regional data storage is the precursor to managing paths for privacy. Path Privacy is designed to assure that the route a session follows can be defined and controlled so that specific conversation does not go through a geography that may be hostile.
The cloud-based architecture is generally going to be the best for managing the path of flows. Premise-based solutions do not generally have the tools to relate an IP path to a physical path. Generally, access carriers will have good route/geography data for their territory but often will not have good path management in other areas.
The cloud-based SD-WAN players, with a large number of defined POPs, are in the best position to manage and control paths both to and especially between their POPs. In this case, for example, a video call from Europe to Mexico, for example, might be routed around the US as the set-up precluded the call being monitored by US authorities.
While all three basic SD-WAN architectures can carry voice and video, cloud-based SD-WANs seems to have the best set of characteristics for real-time data. The combination of flow management or media quality, tools, CaaS access, and path management seem suited to the broadest range of customers. The use of private interconnections between POPs is an architectural design that eliminates a level of variability and enhances real-time delivery. Pricing must be assessed, obviously, but from an architecture standpoint, cloud-based SD-WAN seems best positioned to address the range of challenges faced by anyone looking at UCaaS or other real-time applications.