Home Blog Building a Resilient City: How Cato Rolls Out PoP Changes Safely

January 22, 2026 8m read

Building a Resilient City: How Cato Rolls Out PoP Changes Safely

Dr. Guy Waizel , Dudu Goldberger

Wondering where to begin your SASE journey?

We've got you covered!

Listen to post:

Getting your Trinity Audio player ready...

Imagine a new city that promises cheap housing and ultra-modern infrastructure. People move in, only to discover that the roads are constantly jammed, power cuts happen every evening, water pressure drops without warning, and there are no cameras or sensors to detect where things are breaking. There is no central control room to test changes safely before the next “improvement” hits the streets. It does not matter how attractive the city looked on paper. With unreliable infrastructure and no easy way to maintain it, people will not stay.

Your global SASE deployment is that city. Users, sites and applications depend on it every day, and they feel every “pothole” instantly.

In our earlier blog, “Gradual by Design: What the Cloudflare Outage Reveals About Robust SASE Architecture and Operations,” we explored how a single change in a global platform can ripple into a broad incident. In this blog, we go one level deeper and focus on operations. We look behind the scenes at how Cato rolls out changes to its PoPs: gradual, phased deployments instead of one-step upgrades across the entire cloud, pre-deployment staging, post-deployment verification,and deep monitoring that ties every phase to clear signals. The goal is simple: continuous innovation with built-in resiliency and controlled risk, so your traffic does not become the experiment.

Gradual by Design: What the Cloudflare Outage Reveals About Robust SASE Architecture and Operations | Read the Blog

Why rollout strategy matters

Recent high profile outages at large cloud and edge providers have shown how dangerous a single, global change can be. A configuration update or new software build that goes everywhere at once can turn into a global incident in minutes, long before operators have time to react. In our analysis of the Cloudflare outage, we showed how a single configuration refresh, propagated globally and quickly, can undermine even a mature network when there are insufficient guardrails around rollout and monitoring.

This is exactly why Cato SASE uses gradual, phased rollouts with robust monitoring and verification at each step. Instead of pushing a new version to all PoPs in one shot, we introduce changes in phases and waves:

Each phase is a mix of PoPs that includes different regions, for example, APAC, EMEA, Americas, Israel, and Japan.
The rollout is spread across multiple weeks, not hours. The early phases are intentionally small and include all the main flavors of our services and infrastructure, serving as a safety net to help detect specific issues that might not be apparent in testing.
Within a phase, we can split further into sub phases to avoid restarting too many PoPs at the same time.

We do not rebuild the whole “city” at once. We modernize district by district, with clear criteria to move forward or pause.

Governance before deployment

Change governance is the first layer of resiliency. New PoP software versions are produced regularly, but there is a clear separation between “a build exists” and “it can carry live traffic.”Every new branch or patch goes through:

Version selection that includes both manual testing and automated validation before it is approved for production.
A formal handoff to network operations.
Internal approval in our change management systems is required before it can reach production PoPs.
Real-time permission grant for the actual execution

When a patch is needed to address an issue seen in the field, we treat it with the same discipline. We may choose to redeploy early phases, or to move forward only in later phases, depending on risk and urgency. In all cases, there is an explicit approval step before a new revision is allowed into the rollout plan.

For customers, the important part is this: only vetted, approved versions are allowed to serve your traffic.

Staging before activation

Once a version is approved, it still does not go directly into a restart. We separate the deployment into two main steps.

Staging (warm up): The new software is delivered to the target PoPs over management channels ahead of time.
- No user impact.
- No restarts.
- Can be done outside maintenance windows.

By the time we reach the deployment window, PoPs already have the new bits locally. This shortens the actual maintenance activity and reduces risk.

Deployment in maintenance windows: The actual rollout for a phase is executed during planned maintenance windows. This step restarts the relevant PoP services and re establishes tunnels.
- Can run per phase or per sub phase, to control how many PoPs update at once.
- Includes a controlled failover so that the new version is exercised as the active one and the previous version remains as the backup.

This separation between “delivery” and “activation” is a key resiliency pattern. It gives us much more control over when users feel a change, and how many PoPs are affected at any given moment. In Figure 1, we show an example of a high-level view of a deployment phase that lists all PoPs in that phase, each colored by the software revision they are currently running. Cato’s Network Operators can instantly see whether all PoPs reached the target version, whether any PoP is still on an older revision, and whether any PoP failed to report back after the deployment.

Example_for_Phase_rollout_verification_view

Figure 1. Example of phased rollout verification view

Treating each phase as a small experiment

Every phase is treated as a small, controlled experiment with a clear before and after.

Before deployment for a phase or sub-phase, we:

Capture the current number of PoPs in the phase.
Check existing service statuses and alerts for those PoPs.
Identify any PoPs that are already in maintenance for other changes, and decide whether to skip them temporarily.

This snapshot becomes our reference point.

After deployment, we verify that:

All targeted PoPs are running the desired software revision.
The number of PoPs in the phase has not decreased, which would indicate a PoP that failed to come back correctly.
No new service issues appeared that were not present in the pre deployment snapshot.

To support this, we rely on both dashboards and targeted alerting.

In figure 2 we show our PoP health and post deployment monitoring dashboard. A snapshot of the PoP monitoring dashboard highlighting key health indicators such as uptime, tunnel stability, latency and service statuses. This view helps Cato’s NOC validate the health of a phase after deployment and quickly identify any PoP that behaves differently from the rest of its group.

Figure 2. PoP health and monitoring dashboard

Humans can miss things, especially when changes are frequent. To reduce that risk, dedicated monitoring that specifically examines the time window immediately following a deployment. If a new service status or anomaly appears on a PoP shortly after a phased update, an alert is triggered and tied to that deployment context. This makes it easier to answer the critical question: “Did the new code introduce this, or was it already there?” If there is any doubt, the next phases can be paused while the team investigates.

Built-in resiliency tools

Gradual rollout and monitoring are only part of the story. Resiliency also depends on what you can do when something does go wrong.

Here are some of the mechanisms we use behind the scenes:

Standby upgrade for fault scenarios: In some cases, we can pre-position an alternative, validated revision on PoPs without activating it. If a PoP encounters a predicted fault condition or a soft lockup issue, it can automatically switch to that standby revision. If no such issue occurs before the next regular deployment, the standby package is simply cleared. This provides another tool to protect stability in dynamic situations.
Quick mitigation and rollback options: If a single PoP behaves oddly compared to others in the same location, operators can perform a focused service restart on that PoP to restore normal behavior while root cause is investigated. For more severe issues or if a new revision causes repeated failures, we can roll back a PoP to a previously known good snapshot that includes both software and critical configuration. Hard fault issues that cause availability problems on a machine are also monitored and automatically handled, with escalation to engineering when necessary.

The goal is not to pretend that software will never have faults. The goal is to have clear, proven ways to contain issues, recover quickly, and keep the rest of the network safe.

Aligned with security and compliance frameworks

Although this blog focuses on resiliency and customer experience, the same practices are also expected by modern security and compliance frameworks for operators of critical infrastructure.

PCI DSS: The way we approve versions before production, use controlled maintenance windows, and validate behavior after each phase is consistent with PCI DSS expectations around documented change control, testing changes before they go live, and continuous monitoring of in-scope systems. In our dedicated blog about achieving PCI DSS v4.0.1 certification, we describe the control framework behind Cato SASE in detail. The gradual rollout and monitoring practices in this post are the operational side of that story. They are part of how we turn written requirements about change management and monitoring into day-to-day behavior in our PoPs.
SOC 2 (Security and Availability): SOC 2 emphasizes formal change management, separation of duties, and ongoing monitoring to protect service availability. Our staging, phased rollout, and post-deployment monitoring reflect exactly this type of operational discipline behind a cloud service.
ISO/IEC 27001: ISO 27001 requires that changes to information systems are managed in a controlled way and that systems are monitored for anomalies. The multi-step process described here, from approvals and phased rollout to verification and rollback, is how Cato implements these principles in a global SASE cloud.
NIST style best practices: NIST guidance around secure operations and configuration management stresses minimizing blast radius, maintaining rollback options, and observing systems closely after changes. Our gradual, monitored, and reversible deployment strategy is designed with that mindset.

For customers, this means that the operational practices behind Cato SASE are not only about comfort and uptime. They follow patterns that auditors and security teams already recognize as good practice.

What this means for you

Putting it all together, here is what you gain from this approach:

Reduced blast radius: No single change is introduced everywhere at once. Phases and sub-phases limit the impact of any unexpected issue.
Predictable change windows: Deployments occur during planned maintenance windows, with staging completed ahead of time to minimize the duration of those windows.
Deep operational visibility: Dashboards and focused alerts make it clear what happened before, during and after each deployment phase.
Fast, controlled recovery options: Patch deployments, standby upgrades, service restarts and rollbacks provide multiple ways to restore stability if needed.
Continuous innovation without constant risk: Cato can keep improving the platform while keeping your SASE environment stable, observable and aligned with modern regulatory expectations.

In other words, the “city” behind your SASE platform is not just growing fast. It is planned, instrumented, and monitored so that when we build a new road, your traffic keeps moving.

Dr. Guy Waizel

Tech Evangelist

Dr. Guy Waizel is a Tech Evangelist at Cato Networks and a member of Cato CTRL. As part of his role, Guy collaborates closely with Cato's researchers, developers, and tech teams to bridge and evangelize tech by researching, writing, presenting, and sharing key insights, innovations, and solutions with the broader tech and cybersecurity community. Prior to joining Cato in 2025, Guy led and evangelized security efforts at Commvault, advising CISOs and CIOs on the company’s entire security portfolio. Guy also worked at TrapX Security (acquired by Commvault) in various hands-on and leadership roles, including support, incident response, forensic investigations, and product development. Guy has more than 25 years of experience spanning across cybersecurity, IT, and AI, and has held key roles at tech startups acquired by Philips, Stanley Healthcare, and Verint. Guy holds a PhD with magna cum laude honors from Alexandru Ioan Cuza University, his research thesis focused on the intersection of marketing strategies, cloud adoption, cybersecurity, and AI; an MBA from Netanya Academic College; a B.Sc. in technology management from Holon Institute of Technology; and multiple cybersecurity certifications.

Dudu Goldberger

Sr. Director, Network Operations

Dudu Goldberger is a seasoned operations leader with deep expertise in cloud infrastructure, network architecture, and service delivery. He drives scalable systems, efficient incident response, and cross-functional execution, helping organizations achieve resilience and operational excellence across global environments for the last 15 years.

SASE Essentials

SASE as a Journey: Why Single-Vendor Doesn’t Mean Single Project

Eyal Webber Zvik

Senior Vice President, Product Marketing and Strategic Alliances

When IT leaders hear the term “single-vendor SASE,” many assume it implies an immediate, all-encompassing migration—a daunting project requiring the wholesale replacement of existing network and security infrastructure. This misconception can lead to hesitation in embracing a more modern and efficient approach to secure access. The...

Read

SASE Essentials

Say Goodbye to SaaS Security Gaps with Cato CASB

Andrea Napoli

Senior Product Marketing Manager

Introduction As organizations increasingly depend on SaaS applications, IT teams struggle with visibility and governance. Shadow IT, unmanaged devices, and limited monitoring capabilities create blind spots, exposing enterprises to compliance violations, data exfiltration, and insider threats. Risks don’t come only from unsanctioned applications: even widely used...

Read

SASE Essentials

The Partner Advantage: Turn Customer M&A Chaos Into Opportunity

Jessica (Hatz) Blodgett

Channel Product Marketing Manager

Every merger or acquisition follows a familiar script: two companies, two networks, two security stacks, one clock. Partners who deliver Day-1 access quickly, then guide a clean path to standardization and modernization, help customers realize deal value sooner. Do that repeatedly and you become the trusted...

Read

Building a Resilient City: How Cato Rolls Out PoP Changes Safely

Table of Contents

Wondering where to begin your SASE journey?

Why rollout strategy matters

Governance before deployment

Staging before activation

Treating each phase as a small experiment

Built-in resiliency tools

Aligned with security and compliance frameworks

What this means for you

Related Topics

Wondering where to begin your SASE journey?

Dr. Guy Waizel

Dudu Goldberger

Related Articles

SASE as a Journey: Why Single-Vendor Doesn’t Mean Single Project

Say Goodbye to SaaS Security Gaps with Cato CASB

The Partner Advantage: Turn Customer M&A Chaos Into Opportunity

Innovate, grow and thrive

With a true SASE platform