The core of the Internet is based on a plethora of peering agreements between the various carriers that transport IP traffic. These peering relationships are...
Why Carrier IP Peering is a Major Issue for Real-Time Traffic The core of the Internet is based on a plethora of peering agreements between the various carriers that transport IP traffic. These peering relationships are complex both financially and operationally. Peering disruptions, as recent events show, can significantly impact UCaaS and other real-time cloud traffic delivery.
Peering Problems Impact Peering Partners
For example, in my No Jitter article about cloud availability issues, I discussed a number of events reported on downdetector.com that give insight into the challenges of operating cloud solutions. One event was a RingCentral outage on April 3. The outage was one of several outages in that timeframe.
In discussing the issue with Curtis Peterson, RingCentral's SVP of Cloud Operations, he indicated that at least a previous outage impacting RingCental users on March 15, 2018 for about six hours was, in fact, caused by Comcast peering issues with certain other carriers. This had a significant impact on the ability to provides services and complete calls for UCaaS users on Comcast. While it was documented on RingCentral, the issue would have impacted any real-time traffic peering to Comcast.
Peering issues can also be specific between carriers and not across all of the peering paths. For example, a new release of software for routers in Carrier A has an issue with the router software in Carrier B. The result is a significant degradation in capacity and latency across the peering connection. However, connections to Carrier C that uses the same router vendor as Carrier A will continue to work. If the issues are intermittent, routing protocols may put real-time sessions into that path even though there are real-time performance problems with that path.
What SD-WANs Can Do About Peering Problems
The challenge is that there are few ways to either determine or react in real time to an issue like this if the paths are constrained by the Carriers and their routing agreements. By deploying an SD-WAN solution, the underlying issues in the path can be identified and analyzed. Connections can be moved to paths that are built on other peering relationships that are not impacted.
A core capability of SD-WAN solutions is the ability to determine on a flow path any issues. Advanced SD-WAN solutions include specific analysis for real-time characteristics like latency and jitter. This enables the identification of paths that include peering points that may be having issues. As virtually all carriers have multiple connections to other carriers, this assures that any paths impacted by peering can be avoided.
An OTT SD-WAN provider, like Cato Networks, includes a private interconnect as well. The private interconnect routes traffic between the PoPs over a private transport interconnect. This further reduces the impact of peering by generally avoiding any intermediate peering connections. This is especially important if the session path is international or across regional geographies that have different major IP Access Providers. A private interconnect can avoid core internet peering issues as well as congestion.
For organizations and leaders looking to optimize their organization use of technology and reduce cost, deploying an SD-WAN is an ideal way to provide value. SD-WAN solutions minimize the issue of peering disruptions while optimizing other issues. Clearly, discussing the range of peering is something that should be discussed when selecting an IP access provider. With an advanced SD-WAN and an access provider with the right connections, the use of the Internet for real-time is much more reliable.
As I discussed in my previous post, real-time traffic has two characteristics that are challenging for the Internet. First, as the packets have a limited...
How Route Diversity in SD-WAN Provides MPLS-Like Determinism Required for Real-Time Traffic As I discussed in my previous post, real-time traffic has two characteristics that are challenging for the Internet. First, as the packets have a limited time value and cannot be re-transmitted, any significant change in the transport and packet delivery has the potential of being audible (or visible in the case of video) to the participants. And, as most real-time conversations last orders of magnitude longer than most other types of internet interactions, the probability of a network incident impacting the packet transmission is dramatically higher. The result is that real-time traffic needs a deterministic transport with minimal latency.
The challenge is that in the network world getting Service Level Agreement (SLA) determinism generally has a steep price. Whether a dedicated wire or MPLS, the cost of traditional WAN technology increases in direct relationship to determinism. SD-WAN solves this very problem by utilizing route and component diversity made feasible by the improvements in technology and the affordable costs of Internet bandwidth.
The basic concept of SD-WAN is the concept of route diversification. The two SD-WAN edge points (the point between the Enterprise LAN and the carrier WAN) create multiple route paths between them. For example, in the diagram, each of the red paths represents a different route between the SD-WAN node on the left and right. When the actual traffic arrives at the SD-WAN node, it can decide, based on a factor such as traffic type or current route performance, which route to put the actual data packets into. All of this can be controlled by the SD-WAN controller that oversees the operation.
While the diagram shows a simple premises SD-WAN, the addition of Points of Presence (POP) in a core cloud SD-WAN enables management of the paths between the POPs. This can enable enhanced determinism as much of the variation in Internet traffic delivery happens in the core that is bypassed by having a cloud core. We will discuss this specific topic in a future post.
The benefits to real-time traffic are clear. In the traditional network, if the path that is currently being used for the real-team session flow is impacted, whether through failures or peering issues that limit capacity, the user traffic will stay in that route and the quality of the real-time interaction traffic may be compromised. In the SD-WAN, the traffic can be dynamically moved from the impacted route to the best route available at that time. Through this mechanism, an SD-WAN has the potential of using the best possible route at any point in time between two locations on the Internet, all of the while using the lower cost service of the open Internet, assuming there are sufficient paths for route diversity.
The result is that SD-WAN changes the determinism and cost model of the modern WAN. because of route diversity and path management, SD-WAN enables the MPLS equivalent determinism required by real-time traffic at close to the open Internet cost model. At the core, the concept is simple, but there are many layers of complexity and value that must be considered as part of a well-engineered SD-WAN solution. For example, the routes must be monitored for their current transport characteristics, the traffic type of flows must be determined, the different flows and their relative policies must be included, and more. All of these are critical for VoIP and other real-time traffic.
In considering an SD-WAN solution, there are a number of factors that should be evaluated if optimizing real-time traffic. Whether the SD-WAN is implemented as a premise or cloud solution is a consideration. If backhaul is required and the use of Points of Presence can also have an impact. Also, how the SD-WAN classifies the traffic — this too can have a major impact on real-time determinism. Other considerations like cloud Software as a Service (SaaS) access and security are important. Over the next few months, we will both discuss how to use SD-WAN, but also some of those key characteristics and capabilities that an SD-WAN solution must have to maximize value to real-time traffic.
Today’s business lives and depends on the Internet. More and more companies rely on the Internet for voice and video. This is particularly true as...
UCaaS: Why the Internet and Voice Is A Match Made in Hell Today’s business lives and depends on the Internet. More and more companies rely on the Internet for voice and video. This is particularly true as we adopt Unified Communications as a Service (UCaaS). The public Internet, though, is a challenging environment to deliver business-quality real-time services. Aside from the general issues of packet loss and unpredictability, the Internet was optimized for short application or data sessions, not the long sessions seen in voice and video conferencing usage.
Public Internet Routing: Bad News for Voice
That the Internet is not optimized for real-time traffic isn’t all that new. Take a look at this whitepaper, originally published in 1999, at the beginning of the VoIP era, which discusses the issues of transmitting quality, real-time voice. It might be old, but the premise and conclusions are very much relevant to today.
When we communicate remotely, we naturally pause to allow the other party to talk. After waiting about 250-300 milliseconds, we start speaking again. If the round-trip latency of the IP packets exceeds the human latency window (speaker mouth to listeners ear and back again), the result is interruptions in the conversation and an awkward experience.
Now, the typical round-trip VoIP flow constitutes a minimum of six packets, 20 msecs each. That leaves just 130-180 msecs of cushion for the entire transmission network. This may sound sufficient, the reality is that, after delays due to distance and network processing, that window is scarcely sufficient for delivering quality voice on a well-run network, let alone one with the variance of the public Internet. For example, in this report, AT&T indicates that the range of latencies between city core nodes in their controlled network vary from 15 to 227 msecs, and that does not include peering typical in most open connections. A peering can double the latency, raising the average US regional latency to over 120 msecs. And those are averages, clearly there are a lot of transmission that will be outside the acceptable window.
The public Internet reorganizes to solve issues without regard for the impact on applications. There is no packet classification or priority (except for certain hosts driving large content delivery). Real-time traffic is interleaved with a range of other traffic, such as web browsing, streaming media, and other applications. There’s no way to identify and prioritize the real-time traffic; it is delayed with all of the other traffic.
If a link between major sites goes down, all flows are routed along alternative paths — traffic from a lightly utilized link may move to one that is heavily utilized, resulting in increased latency (jitter) and packet loss. If a link becomes congested, Internet routers use Random Early Discard (RED) queuing algorithms to reduce the flow rate of TCP sessions, dropping packets as necessary. Dropping packet may not impact web browsing, but it may very much impact a voice session, losing pieces of a conversation.
The Longer the Call, the Greater the Risk
To make matters worse, voice sessions tend to be fairly long, increasing the likelihood users will experience a problem during a call. Upon initiation, IP sessions are established through specific route paths. Even if there are changes in network conditions, the route does not change. While the route is initially selected as an optimized path using BGP or OSPF, over time the loading and traffic patterns may change, significantly degrading communications.
Within short sessions this degradation may not be noticeable. It’s one reason why typical Internet applications, such as when loading a Web page, perform well. The photos and the actual HTML page content are often served from a separate web server using separate sessions of only few milliseconds. As such, there’s little time for the application to be impacted by Internet performance issues.
Contrast that with your weekly conference call. The voice session is maintained for the full duration of the hour-long call. The continual stream of packets (one every 20 msecs for voice) provides extensive opportunity for the Internet to interfere with. And the duration virtually assures that there will be issues over the call. Other long sessions, like video streaming, use buffering to manage the Internet variability and deliver a great experience. However, introducing 5-10 seconds of buffered delay in either direction in a phone call makes the call unusable. It’s not an option.
Traditional Networks No Longer Work for Voice or UCaaS
Unfortunately, price/performance ratio of legacy networks have made it challenging to support the explosion of VoIP and UCaaS traffic. Initially, most VoIP and video deployments used MPLS, leased line or fiber for their site connectivity to ensure delivery of general data, along with real-time voice and video.
However, prices for dedicated TDM trunks (often used for branch data) are increasing by multiples of 2-10x. MPLS remains both expensive and difficult to provision and manage. The ability to get relatively low-cost Internet IP connections is changing the edge access, especially for branch and smaller locations. With 100 Mbps connectivity available for $100 price points from the cable TV companies or DSL carriers (albeit at slightly lower speeds) creates a strong incentive to move to the open Internet. This problem is compounded as more users go mobile and use their real-time applications outside the office. For these mobile workers, using the Internet for their voice and video is no longer an option; it is a requirement of doing business.
UCaaS has further challenged the use of MPLS and other “non-Internet” solutions to deliver quality voice. With UCaaS, the tether point for many real-time media flows is in a cloud data center not on the private network. Legacy network solutions often end up adding too much latency, such when establishing an Internet-based VPN to the corporate headquarters or datacenter and from there to the UCaaS provider or an MPLS path (or even open Internet path) to and from the UCaaS provider. Both add separate trips over the Internet and back to the user location.
The result is becoming obvious as more users and organizations move to using the public Internet for their real-time communications. Over an hour conversation duration, there is abundant opportunity for new outages and events to impact Internet reliability and the quality of the communications. In early 2018, for example, a four-hour outage was seen for many UCaaS providers due to a peering failure between Comcast and some of its peering partners. A four-hour outage without communications can be a major issue, especially at the wrong time.
SD-WAN: A Safer Way to Use the Internet
While this may seem daunting, there is a new hope on the horizon. SD-WAN is offering an alternative to traditional corporate networks. For many organizations, SD-WAN opens the use of low-cost, public Internet connections with the quality users demand. Can SD-WAN be used for quality voice and video over the Internet as well? Possibly. We’ll explore that issue and the key capabilities to optimize your traffic in our upcoming posts.
While there are many considerations when choosing an SD-WAN, real-time traffic presents its own set of challenges. Besides the general sensitivity to loss and latency,...
Choosing an SD-WAN Architecture for Real-Time Communications While there are many considerations when choosing an SD-WAN, real-time traffic presents its own set of challenges. Besides the general sensitivity to loss and latency, the widespread adoption of Unified Communications as a Service (UCaaS) makes well-performing cloud connections as important (if not more important) than site-to-site connections. Let’s take a look at the considerations you need to think about when selecting an SD-WAN architecture for voice, video, and other forms of real-time traffic.
SD-WAN Architectural Options
To talk about real-time and SD-WAN you first need to define the differences between SD-WANs. By my count, there are roughly 40 solutions on the market today, but from an architectural viewpoint, these SD-WAN solutions can be grouped into three major approaches:
Premises-based solutions where most of the SD-WAN functionality runs in an edge appliance. These solutions rely on the Internet, MPLS, or some other third-party network for connecting locations.
Cloud-based solutions where most of the SD-WAN functionality runs in the cloud. These solutions can use MPLS in hybrid configurations, but they also bring a private interconnect — their own network of Points of Presence (POPs) that manages the traffic flow across the middle-mile.
Carrier-managed SD-WAN where regional carriers or MPLS providers with regional networks integrate third-party premise-based SD-WAN solutions with their traditional offerings, such as MPLS. In this case, MPLS is used for major sites, while the premise-based SD-WAN solution may be used for smaller branches and out-of-coverage locations. The carrier effectively acts as the integrator, pulling the pieces together, and then managing them.
Real-Time Considerations for SD-WANs
Now let’s turn our attention to real-time. There are four major impact topics to consider when assessing the SD-WAN architecture for real-time traffic:
Media Quality – The quality of the real-time traffic flows/sessions carried by the SD-WAN can be impacted by overall latency, average and maximum jitter, and percentage of lost packets introduce by the SD-WAAN. The way the SD-WAN architecture manages failover from one IP path to another also impacts overall media quality.
Troubleshooting and Tools – Invariably, real-time traffic flows/sessions will have problems. Voice will be garbled; calls will drop; video will be distorted. In the past, the network’s been a black box and there’s been relatively little that could be done to understand how the internal operation of the service impacted real-time traffic. With SD-WAN, tools become available that provide deep visibility into the network operation making for smoother troubleshooting. In addition, how quickly the NOC of a cloud-based or carrier-managed service provider resolves problems is critical.
Cloud Access – Unified Communications as a Service (UCaaS) and other real-time cloud services are growing at a rapid rate. The way the SD-WAN manages access to the cloud services is critical.
Path Privacy – Many organizations are concerned about the path taken by their data and real-time traffic. If traffic passes through a country, the laws of that country may allow it to be monitored or captured and used in legal proceedings. Control of these paths is critical for many organizations, especially for real-time traffic.
From a real-time packet handling capability, all of the SD-WAN architectures have similar capabilities. For real-time traffic, a mechanism to detect packet issues and move to alternative paths is critical. While this is a basic capability of SD-WAN, especially for real-time media flows, measuring factors of latency, packet loss, and jitter are required to understand real-time issues. If the SD-WAN is going to carry real-time traffic, understanding the real-time analysis capabilities is also essential. The ability to track the key real-time packet parameters, while not an architectural requirement per see, is important to review.
Premises-based solutions are exposed to being impacted by major issues in the Internet core. As traffic must transition the actual Internet, major congestion, failures, or hacking could impact traffic. Both the cloud-based and carrier-managed solutions generally have their own backbones, avoiding Internet issues in the core. Note that enterprise topologies that have all sites close or all on a single access carrier may not have this exposure to the core Internet.
From an architectural perspective, premise-based solutions are best suited for smaller organizations that have close geographic sites and/or use a single access carrier or for very large organizations that have a global presence and the scale of operations to manage their own network connections.
Cloud-based solutions cover the broadest range of the market. They can be used by virtually any organization for all or a percentage of their traffic. By using a private interconnect to move traffic across the middle-mile, cloud-based SD-WANs eliminate a number of core Internet issues. This can be exacerbated to locations that are in parts of the Internet where core traffic can have more impact, such as Asia.
Another capability in some cloud-based SD-WANs are mechanisms to mitigate packet loss. While endpoint general retransmission is not effective due to the time domain for real-time traffic, the SD-WAN can do intermittent retransmits or other packet loss mitigation techniques. This allows the SD-WAN to essentially recover a lost packet in the SD-WAN and on time. Other capabilities, like packet loss mitigation techniques, should be considered as well.
The primary benefits of integrating the SD-WAN with an access carrier’s network, carrier-managed SD-WAN, is in the elimination of a separate SD-WAN connection into the MPLS-connected datacenter or location. By using the existing MPLS connections into major sites, real-time traffic can be aggregated from the SD-WAN onto these paths. While there is no specific benefit from a real-time perspective to this, it may have advantages if the real-time call processing is located in a datacenter with the MPLS path.
A final consideration for media quality is policy management. Network traffic can be managed based on packet type, which is a normal capability in SD-WAN. However, in the event of network issues or congestion, mechanisms to allocate the available reduced bandwidth for optimal business value are critical. While most SD-WANs have policies to manage this based on traffic types, other policy considerations may be important. One area is identity. If the SD-WAN understands identity, that can become a policy to allocate resources to individuals during an event to assure they are able to continue their work. Understanding the policy options for mitigation of impacts of failures or reduce capacity is another consideration.
Troubleshooting and Tools
One key factor to consider is the availability of tools to manage the SD-WAN and other solution components when an issue emerges. In this way, the premises-based and cloud-based SD-WANs are very different. The premises-based solutions augment the existing network and real-time monitoring and troubleshooting tools. The SD-WAN becomes just another component to monitor, though the operation may make that a challenge. Unless the management software specifically understands the SD-WAN system, the SD-WAN will look like a black box and the SD-WAN path analysis will have to be separate from the standard tools. By contrast, both the cloud-based and carrier-managed SD-WANs will have extensive monitoring and operational controls. Cloud-based SD-WANs include the monitoring and operation of the components in the cloud solutions, as well as any premises elements. Carrier-managed SD-WAN will allow you to monitor your premises elements and monitor the service, but often any configuration changes require opening trouble tickets with the carrier.
The choice for troubleshooting and tools is clear. If you are part of a large organization that has the means and staff to monitor and manage the SD-WAN yourself, then the premises solution works. For most other organizations, the option of acquiring a managed solution as a cloud SD-WAN is the better option. However, mechanisms to evaluate the cloud delivery SLA may be required. As in the premises case, the ability of specific management tools to isolate and analyze SD-WAN performance for SLA adherence may be challenging.
Access to the cloud compute resources (Amazon, Microsoft, Google, IBM, Rackspace, etc.) used for both private and public cloud as well as SaaS services is becoming an important part of any network design. The same is true for SD-WAN. As more organizations move to the cloud for their communications services (RingCentral, 8x8, Microsoft, Cisco, Vonage, etc.), there is a need for quality access to the datacenter locations for these services.
Direct connections into cloud datacenters are not generally possible with a premises-based SD-WAN. Organizations moving to cloud communications should look at either cloud-based SD-WANs, or alternatives for cloud access and SD-WAN from a core site to branches. One consideration is the impact on the quality for remote users that are routed through a corporate site to the cloud. For organizations looking for Communication as a Service (CaaS), a cloud-based SD-WAN would appear to be the best option.
Path privacy is a new aspect of IP based communications. It is an offshoot of cloud data storage. In data storage, data stored in a specific geographic area is subject to the laws (and subpoena/disclosure) laws in that geography. Similarly, real-time traffic may be captured or recorded in certain geographies. For example, communications can be recorded and subpoenaed in the US. EU privacy regulations conflict with US subpoena law, which is why most cloud providers have storage in the EU. And as a result, Swiss banks avoid certain countries when routing some types of traffic. With recordable VoIP and other media, the same level of control may emerge in the real-time space. Regional data storage is the precursor to managing paths for privacy. Path Privacy is designed to assure that the route a session follows can be defined and controlled so that specific conversation does not go through a geography that may be hostile.
The cloud-based architecture is generally going to be the best for managing the path of flows. Premise-based solutions do not generally have the tools to relate an IP path to a physical path. Generally, access carriers will have good route/geography data for their territory but often will not have good path management in other areas.
The cloud-based SD-WAN players, with a large number of defined POPs, are in the best position to manage and control paths both to and especially between their POPs. In this case, for example, a video call from Europe to Mexico, for example, might be routed around the US as the set-up precluded the call being monitored by US authorities.
While all three basic SD-WAN architectures can carry voice and video, cloud-based SD-WANs seems to have the best set of characteristics for real-time data. The combination of flow management or media quality, tools, CaaS access, and path management seem suited to the broadest range of customers. The use of private interconnections between POPs is an architectural design that eliminates a level of variability and enhances real-time delivery. Pricing must be assessed, obviously, but from an architecture standpoint, cloud-based SD-WAN seems best positioned to address the range of challenges faced by anyone looking at UCaaS or other real-time applications.