SD-WAN is one of the hottest technologies in the networking space. Being “hot”, though, doesn’t mean that SD-WAN has a solid business case to support...
The Business Impact of WAN Transformation with SD-WAN SD-WAN is one of the hottest technologies in the networking space. Being “hot”, though, doesn't mean that SD-WAN has a solid business case to support it. How can IT executives justify the investment in this technology? In short, SD-WAN promises to have a positive business impact in the following areas:
Improve network capacity, availability, and agility to maximize end user productivity
Optimize global connectivity for fixed and mobile users to support global growth
Enable strategic IT initiatives such as cloud infrastructure and applications migration
Improves WAN capacity, availability, and agility
The network is the foundation of the business. Historically, MPLS was the default WAN choice to maximize uptime and ensure predictable network behavior. But, MPLS was expensive and subject to capacity and availability constraints.
SD-WAN enables locations to use multiple WAN transports concurrently including MPLS, cable, xDSL, or LTE, and dynamically route traffic based on transport quality and application needs.
SD-WAN enables the business to boost overall capacity by aggregating all transports and reduce cost by utilizing affordable Internet services. Agility is also improved, because IT can deploy new sites quickly with available transports and not wait for the lengthy rollout of premium services, such as MPLS.
SD-WAN maximizes end user productivity by boosting the WAN’s capacity and resiliency. SD-WAN also supports quick alignment of the enterprise network with emerging business needs such as onboarding of new locations and users.
Optimizes global connectivity for fixed and mobile users
Organizations often use global carrier MPLS for consistent network experience for their remote locations. The only other option available to lower costs is to use the inconsistent and unpredictable public Internet. Mobile users can’t leverage MPLS-connected locations on the road, and have to accept service levels provided by the Internet.
As discussed before, SD-WAN enables businesses to use inexpensive, last-mile Internet connections within the WAN. For regional businesses, and especially in the developed world, the Internet is pretty reliable over short distances, but using the public Internet can be a challenge in a global context. IT organizations must use predictable, global connectivity to ensure consistent service levels.
In a classic hybrid WAN setup, MPLS provides this consistency while the Internet adds capacity at lower cost. To reduce costs even further, affordable MPLS alternatives, such as SLA-backed cloud networks, can ultimately replace MPLS services. Mobile users remain an afterthought, even for SD-WAN, and can’t benefit from either legacy MPLS or SD-WAN appliances. Yet, these users have the same needs for optimal global access. Only a subset of SD-WAN solutions can extend their fabrics to mobile users globally.
Global connectivity requires a consistent and predictable transport. To reduce or eliminate the cost of MPLS in a global context, SD-WAN solutions must incorporate an affordable MPLS alternative that ideally can extend to branch locations and mobile users.
Enables strategic IT initiatives
Many enterprises are migrating, or considering the migration of, all or parts of their applications to cloud datacenters, such as Amazon AWS and Microsoft Azure. This change, alongside the use of public cloud applications, such as Office 365, makes legacy network designs obsolete.
Instead of focusing network planning on the branch to datacenter routes using dedicated MPLS connections, network architects must address the increased share of traffic going to the cloud. Wasteful backhauling, also known as the Trombone Effect, is saturating MPLS links and adds latency because the traffic goes to the datacenter only to be securely sent to the Internet. Sending Internet traffic directly from the branch makes more sense.
Direct Internet access in the branch, using SD-WAN, enables Internet- and cloud-bound traffic to directly exit the branch without backhauling. There is a cost to this optimization, as security now has to be applied at the branch. Simple firewalls incorporated into SD-WAN appliances have limited inspection and threat protection capabilities; and a full blown security stack in every branch creates appliance sprawl and increases complexity.
Firewall as a Service (FWaaS) is an emerging technology that enables IT to secure Internet traffic at the branch without deploying physical appliances alongside SD-WAN appliances.
Security is one consideration. Optimizing cloud access from the branch is another. Even the branch offices of regional companies often need to access distant cloud resources. MPLS was designed for branch-to-physical datacenter connectivity not branch-to-cloud. Alternative approaches, such as cloud networks, can optimally support cloud traffic by extending the network fabric to both customer locations and cloud destinations, and by using private SLA-backed backbones to optimize performance.
SD-WAN can support strategic cloud migration initiatives by securing and optimizing traffic between business locations, mobile users, and cloud resources. Appropriate SD-WAN architectures, built for secure and optimized cloud connectivity, should be evaluated.
SD-WAN is a strategic WAN transformation initiative. Better network availability, capacity, and agility, high performance global connectivity, and secure and optimized cloud integration, are all major business impact drivers. Addressing them holistically will ensure a high return on investment in the SD-WAN solution.
Gur Shatz is co-founder and CTO of Cato Networks. Prior to Cato Networks, he was the co-founder and CEO of Incapsula Inc., a cloud-based web applications security and acceleration company. Before Incaspula, Gur was director of product development, VP of engineering and products at Imperva, a web application security and data security company. Gur holds a BSc in computer science from Tel Aviv College.
Anyone with hands-on experience setting up long-haul VPNs over the Internet knows it’s not a pleasant exercise. Even factoring out the complexity of appliances and...
This is Why the Internet is Broken: a Technical Perspective Anyone with hands-on experience setting up long-haul VPNs over the Internet knows it’s not a pleasant exercise. Even factoring out the complexity of appliances and the need to work with old relics like IPSEC, managing latency, packet loss and high availability remain huge problems. Service providers also know this -- and make billions on MPLS.
The bad news is that it is not getting any better. It doesn’t matter that available capacity has increased dramatically. The problem is in the way providers are interconnected and with how global routes are mismanaged. It lies at the core of how the Internet was built, its protocols, and how service providers implemented their routing layer. The same architecture that allowed the Internet to cost-effectively scale to billions of devices also set its limits.
Addressing these challenges requires a deep restructuring in the fabric of the Internet and core routing - and should form the foundation for possible solutions. There isn’t going to be a shiny new router that would magically solve it all.
IP Routing’s Historical Baggage: Simplistic Data Plane
Whether the traffic is voice, video, HTTP, or email, the Internet is made of IP packets. If they are lost along the way, it is the responsibility of higher-level protocols such as TCP to recover them. Packets hop from router to router, only aware of their next hop and their ultimate destination.
Routers are the ones making the decisions about the packets, according to their routing tables. When a router receives a packet, it performs a calculation according to its routing table - identifying the best next hop to send the packet to.
From the early days of the Internet, routers were shaped by technical constraints. There was a shortage of processing power available to move packets along their path, or data plane. Access speeds and available memory were limited, so routers had to rely on custom hardware that performed minimal processing per packet and had no state management. Communicating with this restricted data plane was simple and infrequent.
Routing decisions were moved out to a separate process, the control plane, which pushed its decisions, finding the next router on the way to the destination, back into the data plane.
This separation of control and data planes allowed architects to build massively scalable routers, handling millions of packets per second. However, even as processing power increased on the data plane, it wasn't really used. The control plane makes all the decisions, the data plane executes the routing table, and apart from routing table updates, they hardly communicate.
A modern router does not have any idea how long it actually took a packet to reach its next hop, or whether it reached it at all. The router doesn’t know if it’s congested. And to the extent it does have information to share, it will not be communicated back to the control plane, where routing decisions are actually made.
BGP - The Routing Decisions Protocol
BGP is the routing protocol that glues the Internet together. In very simple terms, its task is to communicate the knowledge of where an IP address (or a whole IP subnet) originates. BGP involves routers connecting with their peers, and exchanging information about which IP subnets they originate, and also “gossip” about IP subnets they learned about from other peers. As these rumors propagate between the peers and across the globe, they are appended with the accumulated rumor path from the originator (this is called the AS-Path). As more routers are added to the path, the “distance” grows.
Here is an example of what a router knows about a specific subnet, using Hurricane Electric’s excellent looking glass service. It learned about this subnet from multiple peers, and selected the shortest AS-Path. This subnet originates from autonomous system 13150, the rumor having reached the router across system 5580. Now the router can update its routing table accordingly.
If we want to see how traffic destined for this IP range is actually routed, we can usetraceroute. Note that in this case, there was a correlation between the AS-Path, and the path the actual packets traveled.
BGP is a very elegant protocol, and we can see why it was able to scale with the Internet: it requires very little coordination across network elements. Assuming the routers performing the protocols are the ones that are actually routing traffic, it has a built in resiliency. When a router fails, so will the routes it propagated, and other routers will be selected.
BGP has a straightforward way of assessing distance: it uses the AS-Path, so if it got the route first-hand it is assumed to be closest. Rumored routes are considered further away as the hearsay “distance” increases. The general assumption is that the router that reported the closest rumor is also the best choice send packets. BGP doesn’t know if a specific path has 0% or 20% packet loss. Also, using the AS-Path as a method to select smallest latency is pretty limited: it’s like calculating the shortest path between two points on the map by counting traffic lights, instead of miles, along the way.
A straightforward route between Hurricane Electric (HE), a tier-1 service provider, as seen from Singapore, to an IP address in China, has a path length of 1.
But if we trace the path the packets actually take from Singapore to China, the story is really different: packets seem to make a “connection” in Los Angeles.
This packet traveled to the West coast of the U.S. to get from Singapore to China simply because HE peers with China Telecom in Los Angeles. Every packet from anywhere within the HE autonomous system will go through Los Angeles to reach China Telecom.
BGP Abused: BGP Meets the Commercial Internet
To work around BGP’s algorithms, the protocol itself extends to include a host of manual controls to allow manipulation of the “next best hop” decisions. Controls such as weight, local preference (prioritizing routes from specific peers), communities (allow peers to add custom attributes, which may then affect the decisions of other peers along the path), and AS path prepending (manipulates the propagated AS path) allow network engineers to tweak and improve problematic routes and to alleviate congestion issues.
The relationship between BGP peers on the Internet is a reflection of commercial contracts of ISPs. Customers pay for Internet traffic. Smaller service providers pay larger providers, and most pay tier-1 providers. Any non-commercial relationship has to be mutually beneficial, or very limited.
BGP gives service providers the tools to implement these financial agreements:
Service providers usually prefer routing traffic for “paying” connections.
Service providers want to quickly get rid of “unpaid” packets, rather than carrying them across their backbone (so called “hot potato” routing).
Sometimes, service providers will carry the packets over long distances just to get the most financially beneficial path.
All this comes at the expense of best path selection.
The MPLS Racket
To address these problems, service providers came up with an alternative offering: private networks, built on their own backbones, using MPLS as the routing protocol.
MPLS is in many ways the opposite of BGP. Instead of an open architecture, MPLS uses policy based, end-to-end routing. A packet's path through the network is predetermined, which makes it suitable only for private networks. This is why MPLS is sold by a single provider, even if the provider patched together multiple networks behind the scenes to reach customer premises.
MPLS is a control plane protocol. It has many of the same limitations as BGP: routing is decided by policy, not real traffic conditions, such as latency or packet loss. Providers are careful about bandwidth management to maintain their SLAs.
The combination of single vendor lock-in and the need for planning and overprovisioning to maintain SLAs make these private networks a premium, expensive product. As the rest of the Internet, with its open architecture, became increasingly competitive and cost-efficient, MPLS faces pressure. As a backbone implementation, it is not likely to ever become affordable.
A Way Forward
The Internet just works. Not flawlessly, not optimally, but packets generally reach their destination. The basic structure of the Internet has not changed much over the past few decades, and has proven itself probably beyond the wildest expectations of its designers.
However, it has key limitations:
The data plane is clueless. Routers, which form the data plane, are built for traffic load, and are therefore stateless, and have no notion of individual packet or traffic flows.
Control plane intelligence is limited. Because the control plane and the data plane are not communicating, the routing decisions are not aware of packet loss, latency, congestion, or actual best routes.
Shortest path selection is abused: Service providers’ commercial relationships often work against the end user interest in best path selection.
The limited exchange between the control and data planes has been taken to the extreme in OpenFlow and Software-defined Networking (SDN): the separation of the control plane and data plane into two different machines. This might be a good solution for cutting costs in the data center, but to improve global routing, it makes more sense to substantially increase information sharing between the control plane and the data plane.
To solve the limitations of the Internet it’s time to converge the data and control planes to work closely together, so they are both aware of actual traffic metrics, and dynamically selecting the best path.
This article was first published on Tech Zone 360