Vadim Freger

A Brief History of Graduality

19/07/2024

4m read

In the early hours of July 19th, 2024, CrowdStrike endpoints on Windows machines worldwide received a faulty content update, causing what is shaping up to... Read ›

Vadim Freger

19/07/2024

4m read

A Brief History of Graduality In the early hours of July 19th, 2024, CrowdStrike endpoints on Windows machines worldwide received a faulty content update, causing what is shaping up to be the one of the largest global IT outages to date. All over the world reports of Windows workstations and servers stuck in a boot loop with a BSOD were pouring in, impacting airlines, airports, banks, hospitals and many other critical infrastructures such as emergency services call centers, and the list goes on. Many details including a detailed RCA from CrowdStrike will surely follow and shed more light on this, detailing why an update was pushed to the entire install base and how it passed testing, but until then nothing but our best wishes for our colleagues at CrowdStrike managing this incident. Nonetheless, this is a good opportunity to discuss and highlight Cato’s Gradual Deployment Model, which is at the very core of how we manage our cloud service and the managed endpoints using the Cato Client. Graduality, and more graduality At Cato there isn’t a single stricter guideline throughout the entire Engineering and Operations organization than graduality. And it is without a doubt the most followed through guideline whether it’s in coding practices, performing production changes or publishing new software updates. In simple terms, nothing is EVER executed on everything all at once. That ‘everything’ can be servers in our Cato SASE Cloud service (e.g. cloud PoPs, backend management services, Kubernetes clusters, etc.), managed Socket devices or Cato Clients running on the endpoints of our customers. Over the years we’ve developed multiple dedicated infrastructures and feature suites serving this methodology, including automation for deployment with real-time checks of failures in between phases of deployment and features allowing admin full control of how they manage updates of Cato Sockets and Cato Clients inside their organization.Graduality allows them to do it at a pace that’s acceptable and meets the parameters that each IT organization sets for itself, providing the necessary time in between every phase and update group to make sure that if something goes wrong there is time to discover it and reduce the impact radius. Cato Client Gradual Rollout - Client Upgrade Policy For comparison, we will highlight the way Cato manages updates to its Cato Client, which is similar to how the CrowdStrike agent is installed on all workstations of the organization. When a new client version is approved for release, following its extensive automation and regression testing, it goes into a release pipeline that is managed from start to finish. New client versions are distributed gradually between groups of customers and are never made available to all the groups at once. A worthwhile mention is that Cato employs “dogfooding”, and the very first clients to be upgraded are all the Cato Clients managed by Cato’s own IT department, and using the same tools and methods as do our customers, as a final gate of quality control. At the scope of a specific customer for which an update has been made available, their IT administrator is able to control how the client will be published to the users withing the organization using the Client Upgrade Policy. The Client Upgrade Policy is a native graduality mechanism that the admin uses to control the pace of upgrades of the Client, with granularity to control different rollouts based on the endpoint platform. Initially a “Pilot Group” of users receives the update, typically these are IT members and other early adopters that can identify and report any issues first.After the Pilot Group, the client update continues to rollout gradually to the rest of the install base, with the administrator being able to track the progress in the CMA and pause the update at any moment if it’s required. Figure 1 – Client Rollout screen showing multiple client version and their rollout status Summary This recent global outage highlights the critical need for robust deployment practices. At Cato Networks, our [quite overzealous] commitment to gradual deployment models ensures that any changes or updates to our cloud services and endpoints are meticulously controlled and monitored. By deploying updates in phases and giving the tools and fine-tuned control of Client updates to the IT teams we minimize the risk of widespread disruptions and provide ample time to detect and address issues early. This approach not only enhances the reliability of our services but also gives our customers confidence in the stability of their IT operations.

CVE-2024-6387 OpenSSH RCE vulnerability (“regreSSHion”) – Cato Networks impact and analysis

04/07/2024

4m read

TL; DR – Multiple versions of OpenSSH are vulnerable to remote code execution. There is no working public PoC, and researchers have only been able... Read ›

Vadim Freger

Dolev Moshe Attiya

04/07/2024

4m read

CVE-2024-6387 OpenSSH RCE vulnerability (“regreSSHion”) – Cato Networks impact and analysis TL; DR – Multiple versions of OpenSSH are vulnerable to remote code execution. There is no working public PoC, and researchers have only been able to exploit the vulnerability under unique lab conditions. Cato’s cloud infrastructure is NOT impacted Cato Sockets use one of the vulnerable OpenSSH versions, patches containing an upgrade to the latest OpenSSH version are in testing phase and will be released to the field for all supported Socket platforms (physical & virtual) for the following Socket versions:Version 19 – last stable Version 20 – latest Cato Sockets by default do NOT have a publicly exposed SSH interface, it is always recommended to keep Cato Sockets LAN interface exposed only internally and use comprehensive network access controls to manage SSH access. Vulnerability overview Researchers from Qualys published their findings on July 1st, deeming it worthy of a name like all pet CVEs making big news in the industry, naming it “regreSSHion” due to it being caused by a previous fix in OpenSSH and causing this regression in the code. OpenSSH is one of the most widely used suite of tools on Unix based systems, used all over the world for securing communications to servers over the internet, secure file transfers and more. It is considered one of the more secure applications in the Unix world, to quote the researchers from Qualys - “this vulnerability is one slip-up in an otherwise near-flawless implementation”, and CVEs such as this finding are very rare indeed. Impacted OpenSSH versions are: OpenSSH versions earlier than 4.4p1 OpenSSH versions between 8.5p1 and 9.7p1 * Versions between 4.4p1 and 8.5p1 (not inclusive) are not vulnerable due to previously applied patch for a different vulnerability (CVE-2006-5051). In the present research published by Qualys, under lab conditions and only successful against a 32bit system, the attack on average takes 6 – 8 hours to succeed, likely increasing in several orders of magnitude on 64bit systems and was not demonstrated. Analysis of the vulnerability The vulnerability was introduced to newer OpenSSH versions in October 2020 and is tied to a code regression of CVE-2006-5051, which was fixed originally for version 4.4p1 and later an incorrect fix of another CVE brought this issue back (hence the regression) and made version starting from 8.5p1 vulnerable. The exploit leverages a race condition in the signal handler of sshd, the server component of OpenSSH. If the client fails to complete the authentication process within LoginGraceTime (which by default is 120s or 600s depending on the version in use), then a SIGALRM signal is raised calling a signal handler which runs asynchronously, calling additional unsafe functions running under root privileges which the researchers were able to exploit to run arbitrary code and gain root shell access. The researchers have used a uniquely crafted lab environment to prove the RCE, trying to circumvent multiple protections all modern operating systems employ to protect against access to running memory, e.g. ASLR. In the lab, using a 32-bit server and a low-latency network connection, it took an average of 6 to 8 hours to obtain a root shell after approximately 10,000 connection attempts. On top of the very long time to exploit, the massive number of connections needed is likely to be flagged by different network monitoring systems and is an easy vector to identify and block. The attack for the time being is extremely complicated to perform in real-world conditions, with mitigations such as using fail2ban and limiting public access to OpenSSH – which is ALWAYS recommended - making it nearly impossible to exploit. Public exploitation & prevention No indications of exploitation attempts targeting Cato customers were found. And while PoC code has surfaced with a claim to exploit the vulnerability, Cato’s security research team has determined that it is not in fact a viable exploit and would not result in an RCE, including tests performed on Cato Sockets internally. However, it does lay a good foundation for exploiting this vulnerability, and we expect more attempts to be released soon. Cato’s security research team continues to analyze this threat to determine the possible exploitation avenues and how they meet existing prevention policies and introduce new logic to address the issue specifically. Summary A remote code execution in multiple versions of OpenSSH was discovered, there is no working public PoC available and exploitation in real-world conditions is impractical to near impossible. Nonetheless due to the high profile of the CVE and quickly evolving landscape, if an exploit PoC surfaces in the future it is important that all systems are patched. Just as important are strict network access controls limiting public SSH access, including of course Cato Sockets which should never be internet facing on the management side.

Cato Networks’ Impact and analysis of CVE-2024-3661 – “TunnelVision” VPN vulnerability

23/05/2024

6m read

On May 6th, 2024, researchers from the Leviathan Security Group published an article detailing a technique to bypass most VPN applications, assigned as CVE-2024-3661 with... Read ›

Vadim Freger

Matan Mittelman

23/05/2024

6m read

Cato Networks’ Impact and analysis of CVE-2024-3661 – “TunnelVision” VPN vulnerability On May 6th, 2024, researchers from the Leviathan Security Group published an article detailing a technique to bypass most VPN applications, assigned as CVE-2024-3661 with a High CVSS score of 7.6. Researchers have labeled this technique 'decloaking' as while the VPN tunnel remains connected, it allows attackers to trick many VPN clients into sending traffic via a side channel and not through the encrypted tunnel. Traffic flows through the side channel unencapsulated and can be snooped by an attacker. The attack requires introducing a rogue DHCP server to the local network. This is not easy on well-maintained networks, that use trust zones and DHCP snooping features to prevent this attack vector. Notably, the threat of an adjacent attacker on the local network is not limited to DHCP alone, untrusted networks may impose various other threats, such as ARP poisoning, LLMNR poisoning, and so on.In the case presented in the article, the malicious DHCP server poisons the routing table of its neighbor on the local network. Taking advantage of the broadcast nature of DHCP communications, as well as the fact it is UDP based, i.e. little source verification is performed, the fabrication of responses is easily achieved and can be done in various ways. Specifically, the response sent by the malicious server to a lease request sent on the local network utilizes option 121 [RFC 3442] - allowing the DHCP server to push classless static routes into the neighboring client routing table. Cato Client impact and recommended actions The affected operating systems are: Windows Linux MacOS iOS * Android is unaffected by the technique since it does not implement support for DHCP Option 121 altogether. For recommendations for Windows Client users, see below. We are additionally working on updates to the other affected operating systems and updates will be issued as they become available. Cato customers using the Windows Client may use a registry key to enable the “Delete Static Routes” feature on the Client, effectively configuring the Client to delete all static routes that are not managed by Cato upon connecting.The configuration will take effect the next time the Client connects to the Cato cloud. If Always-On is enabled, users may need to bypass Always-On. For more information on how to bypass always on see here.Also, if there are legitimate reasons for static routes to be present, this configuration may conflict with those routes and should be considered. Registry key details:Location - Computer\HKEY_LOCAL_MACHINE\SOFTWARE\CatoNetworksVPNName: DeleteStaticRoutesValue: 1 - (type: DWORD) The below one-liner can be used as well on Windows or distributed using known methods such as MDM tools or GPO policies.reg add "HKEY_LOCAL_MACHINE\SOFTWARE\CatoNetworksVPN" /v DeleteStaticRoutes /t REG_DWORD /d 1 /f To improve security in managed networks or in scenarios involving public or otherwise untrusted networks, these additional recommendations may be used to mitigate the vulnerability: Mitigating DHCP attacks on local networks: Admins can enable configurations on network switches such as DHCP Snooping to protect the network from the introduction of a rogue DHCP server. Use Cellular Hotspots: Using a cellular network instead of public Wi-Fi mitigates the risk, as the network is controlled by the mobile device. Disable Option 121: Disable it on endpoints where possible, keeping in mind that this may disrupt some network connectivity. Cato Networks is not aware of any malicious exploitation of its ZTNA using this technique. Details of the attack When a VPN client operates, it begins by creating an encrypted version of the original packet received from its virtual network interface. This encrypted packet is then encapsulated within the VPN protocol layer, allowing secure communication with the VPN server. Upon establishing a connection with the VPN server, the VPN client modifies the host's network settings to route all traffic through this secure tunnel. The Role of DHCP in Network Configuration DHCP (Dynamic Host Configuration Protocol) plays a critical role in network management by automatically assigning IP addresses and configuring network settings for devices on a network, ensuring seamless connectivity and efficient use of IP address space. One of the advanced features of DHCP is Option 121, introduced in RFC 3442. Option 121, also known as the “Classless Static Route Option”, allows network administrators to define classless static routes for clients, specifying routes with both the destination subnet and the gateway address. This capability enhances routing flexibility by enabling the precise direction of traffic to specific subnets, improving network efficiency and control. For example, administrators can use Option 121 to route traffic for a particular subnet through a different gateway than the default, optimizing network traffic flow and enhancing security measures by directing traffic through designated security appliances or monitoring systems. Methods of exploitation The prerequisite is for an attacker to have his own malicious DHCP server in the network and for targeted users to treat it as the legitimate DHCP. There are several methods by which an attacker on the same network as the targeted user can position themselves as the DHCP server: DHCP Starvation Attack: By using a rogue DHCP server to perform a DHCP starvation attack against the legitimate DHCP server, the attacker can exhaust available IP addresses and respond to new clients. Race Condition Exploitation: The rogue DHCP server can race to respond to DHCPDISCOVER broadcasts, taking advantage of the common client behavior of accepting the first lease offer received. ARP Spoofing: The attacker can use ARP spoofing to intercept traffic between the legitimate DHCP server and clients, then wait for clients to renew their leases, redirecting them to the rogue DHCP server. Attack Execution Once a malicious DHCP is deployed on the same network as the targeted VPN user. The malicious server is configured to use itself as the default gateway. When traffic reaches this gateway, traffic forwarding rules are applied to relay it to the legitimate gateway, allowing traffic to be monitored/inspected while traversing through the malicious server, effectively performing an Adversary-in-the-Middle (AitM) attack. Utilizing DHCP Option 121 A crucial part of the attack involves leveraging DHCP option 121 to inject custom routes into the VPN user’s routing table. Arbitrary routes can be set, and if needed, multiple routes. By pushing routes more specific than the default /0 CIDR range used by most VPNs, it is ensured that these routes have higher priority than those for the VPN’s virtual interface. For instance, by setting two /1 routes, the attacker can override the 0.0.0.0/0 all-traffic rule set by most VPNs. Injecting these routes causes network traffic to be directed through the same interface as the rogue DHCP server, bypassing the VPN’s virtual interface. As a result, the traffic routed this way is not encrypted by the VPN and is instead transmitted via the network interface interacting with the DHCP server. Summary The "decloaking" technique highlights a vulnerability in VPN applications, allowing attackers to reroute traffic outside the encrypted tunnel. By exploiting DHCP and specifically Option 121, attackers can manipulate routing tables and compromise network security.The attack is not trivial to carry out, especially on well-maintained networks, and does not directly compromise the user, rather putting the attacker in a position to snoop on the traffic, which in most scenarios is already encrypted, e.g. HTTPS/TLS, before passing in the VPN. This discovery underscores the importance of securing DHCP configurations and being vigilant on public networks.

When Patch Tuesday becomes Patch Monday – Friday

01/03/2024

4m read

If you’re an administrator running Ivanti VPN (Connect Secure and Policy Secure) appliances in your network, then the past two months have likely made you... Read ›

Vadim Freger

01/03/2024

4m read

When Patch Tuesday becomes Patch Monday – Friday If you’re an administrator running Ivanti VPN (Connect Secure and Policy Secure) appliances in your network, then the past two months have likely made you wish you weren't.In a relatively short timeframe bad news kept piling up for Ivanti Connect Secure VPN customers, starting on Jan. 10th, 2024, when critical and high severity vulnerabilities, CVE-2024-21887 and CVE-2023-46805 respectively, were disclosed by Ivanti impacting all supported versions of the product. The chaining of these vulnerabilities, a command injection weakness and an authentication bypass, could result in remote code execution on the appliance without any authentication. This enables complete device takeover and opening the door for attackers to move laterally within the network. This was followed three weeks later, on Jan. 31st, 2024, by two more high severity vulnerabilities, CVE-2024-21888 and CVE-2024-21893, prompting CISA to supersede its previous directive to patch the two initial CVEs, by ordering all U.S. Federal agencies to disconnect from the network all Ivanti appliances “as soon as possible” and no later than 11:59 PM on February 2nd. As patches were gradually made available by Ivanti, the recommendation by CISA and Ivanti themselves has been to not only patch impacted appliances but to first factory reset them, and then apply the patches to prevent attackers from maintaining upgrade persistence. It goes without saying that the downtime and amount of work required from security teams to maintain the business’ remote access are, putting it mildly, substantial. In today’s “work from anywhere” market, businesses cannot afford downtime of this magnitude, the loss of employee productivity that occurs when remote access is down has a direct impact on the bottom line.Security teams and CISOs running Ivanti and similar on-prem VPN solutions need to accept that this security architecture is fast becoming, if not already, obsolete and should remain a thing of the past. Migrating to a modern ZTNA deployment, more-than-preferably as a part of single vendor SASE solution, has countless benefits. Not only does it immensely increase the security within the network, stopping lateral movement and limiting the “blast radius” of an attack, but it also serves to alleviate the burden of patching, monitoring and maintaining the bottomless pit of geographically distributed physical appliances from multiple vendors. [boxlink link="https://www.catonetworks.com/resources/cato-networks-sase-threat-research-report/"] Cato Networks SASE Threat Research Report H2/2022 | Download the Report [/boxlink] Details of the vulnerabilities CVE-2023-46805: Authentication Bypass (CVSS 8.2)Found in the web component of Ivanti Connect Secure and Ivanti Policy Secure (versions 9.x and 22.x) Allows remote attackers to access restricted resources by bypassing control checks. CVE-2024-21887: Command Injection (CVSS 9.1)Identified in the web components of Ivanti Connect Secure and Ivanti Policy Secure (versions 9.x and 22.x) Enables authenticated administrators to execute arbitrary commands via specially crafted requests. CVE-2024-21888: Privilege Escalation (CVSS 8.8)Discovered in the web component of Ivanti Connect Secure (9.x, 22.x) and Ivanti Policy Secure (9.x, 22.x) Permits users to elevate privileges to that of an administrator. CVE-2024-21893: Server-Side Request Forgery (SSRF) (CVSS 8.2)Present in the SAML component of Ivanti Connect Secure (9.x, 22.x), Ivanti Policy Secure (9.x, 22.x), and Ivanti Neurons for ZTA Allows attackers to access restricted resources without authentication. CVE-2024-22024: XML External Entity (XXE) Vulnerability (CVSS 8.3)Detected in the SAML component of Ivanti Connect Secure (9.x, 22.x), Ivanti Policy Secure (9.x, 22.x), and ZTA gateways Permits unauthorized access to specific restricted resources. Specifically, by chaining CVE-2023-46805, CVE-2024-21887 & CVE-2024-21893 attackers can bypass authentication, and obtain root privileges on the system, allowing for full control of the system. The first two CVEs were observed being chained together in attacks going back to December 2023, i.e. well before the publication of the vulnerabilities.With estimates of internet connected Ivanti VPN gateways ranging from ~20,000 (Shadowserver) all the way to ~30,000 (Shodan) and with public POCs being widely available it is imperative that anyone running unpatched versions applies them and follows Ivanti’s best practices to make sure the system is not compromised. Conclusion In times when security & IT teams are under more pressure than ever to make sure business and customer data are protected, with CISOs possibly even facing personal liability for data breaches, it’s become imperative to implement comprehensive security solutions and to stop duct-taping various security solutions and appliances in the network. Moving to a fully cloud delivered single vendor SASE solution, on top of providing the full suite of modern security any organization needs, such as ZTNA, SWG, CASB, DLP, and much more, it greatly reduces the maintenance required when using multiple products and appliances. Quite simply eliminating the need to chase CVEs, applying patches in endless loops and dealing with staff burnout. The networking and security infrastructure is consumed like any other cloud delivered service, allowing security teams to focus on what’s important.

Demystifying GenAI security, and how Cato helps you secure your organizations access to ChatGPT

29/02/2024

6m read

Over the past year, countless articles, predictions, prophecies and premonitions have been written about the risks of AI, with GenAI (Generative AI) and ChatGPT being... Read ›

Vadim Freger

Avishay Zawoznik

29/02/2024

6m read

Demystifying GenAI security, and how Cato helps you secure your organizations access to ChatGPT Over the past year, countless articles, predictions, prophecies and premonitions have been written about the risks of AI, with GenAI (Generative AI) and ChatGPT being in the center. Ranging from its ethics to far reaching societal and workforce implications (“No Mom, The Terminator isn’t becoming a reality... for now”).Cato security research and engineering was so fascinated about the prognostications and worries that we decided to examine the risks to business posed by ChatGPT. What we found can be summarized into several key conclusions: There is presently more scaremongering than actual risk to organizations using ChatGPT and the likes. The benefits to productivity far outweigh the risks. Organizations should nonetheless be deploying security controls to keep their sensitive and proprietary information from being used in tools such as ChatGPT since the threat landscape can shift rapidly. Concerns explored A good deal of said scaremongering is around the privacy aspect of ChatGPT and the underlying GenAI technology.  The concern -- what exactly happens to the data being shared in ChatGPT; how is it used (or not used) to train the model in the background; how it is stored (if it is stored) and so on. The issue is the risk of data breaches and data leaks of company’s intellectual property when users interact with ChatGPT. Some typical scenarios being: Employees using ChatGPT – A user uploads proprietary or sensitive information to ChatGPT, such as a software engineer uploading a block of code to have it reviewed by the AI. Could this code later be leaked through replies (inadvertently or maliciously) in other accounts if the model uses that data to further train itself?Spoiler: Unlikely and no actual demonstration of systematic exploitation has been published. Data breaches of the service itself – What exposure does an organization using ChatGPT have if OpenAI is breached, or if user data is exposed through bugs in ChatGPT? Could sensitive information leak this way?Spoiler: Possibly, at least one public incident was reported by OpenAI in which some users saw chat titles of other users in their account due to a bug in OpenAI’s infrastructure. Proprietary GenAI implementations – AI already has its own dedicated MITRE framework of attacks, ATLAS, with techniques ranging from input manipulation to data exfiltration, data poisoning, inference attacks and so on. Could an organization's sensitive data be stolen though these methods?Spoiler: Yes, methods range from harmless, to theoretical all the way to practical, as showcased in a recent Cato Research post on the subject, in any case securing proprietary implementation of GenAI is outside the scope of this article. There’s always a risk in everything we do. Go onto the internet and there’s also a risk, but that doesn’t stop billions of users from doing it every day. One just needs to take the appropriate precautions. The same is true with ChatGPT. While some scenarios are more likely than others, by looking at the problem from a practical point of view one can implement straightforward security controls for peace of mind. [boxlink link="https://catonetworks.easywebinar.live/registration-everything-you-wanted-to-know-about-ai-security"] Everything You Wanted To Know About AI Security But Were Afraid To Ask | Watch the Webinar [/boxlink] GenAI security controls In a modern SASE architecture, which includes CASB & DLP as part of the platform, these use-cases are easily addressable. Cato’s platform being exactly that, it offers a layered approach to securing usage of ChatGPT and similar applications inside the organization: Control which applications are allowed, and which users/groups are allowed to use those applications Control what text/data is allowed to be sent Enforcing application-specific options, e.g. opting-out of data retention, tenant control, etc. The initial approach is defining what AI applications are allowed and which user groups are allowed to use them, this can be done by a combination of using the “Generative AI Tools” application category with the specific tools to allow, e.g., blocking all GenAI tools and only allowing "OpenAI". A cornerstone of an advanced DLP solution is its ability to reliably classify data, and the legacy approaches of exact data matches, static rules and regular expressions are now all but obsolete when used on their own. For example, blocking a credit card number would be simple using a regular expression but in real-life scenarios involving financial documents there are many other means by which sensitive information can leak. It would be nearly pointless to try and keep up with changing data and fine-tuning policies without a more advanced solution that just works. Luckily, that is exactly where Cato’s ML (Machine Learning) Data Classifiers come in. This is the latest addition to Cato’s already expansive array of AI/ML capabilities integrated into the platform throughout the years. Our in-house LLM (Large Language Model), trained on millions of documents and data types, can natively identify documents in real-time, serving as the perfect tool for such policies.Let’s look at the scenario of blocking specific text input with ChatGPT, for example uploading confidential or sensitive data through the prompt. Say an employee from the legal department is drafting an NDA (non-disclosure agreement) document and before finalizing it gives it to ChatGPT to go over it and suggest improvement or even just go over the grammar. This could obviously be a violation of the company’s privacy policies, especially if the document contains PII. Figure 1 - Example rule to block upload of Legal documents, using ML Classifiers We can go deeper To further demonstrate the power and flexibility of a comprehensive CASB solution, let us examine an additional aspect of ChatGPT’s privacy controls. There is an option in the settings to disable “Chat history & training”, essentially letting the user decide that he does not want his data to be used for training the model and retained on OpenAI’s servers.This important privacy control is disabled by default, that is by default all chats ARE saved by OpenAI, aka users are opted-in, something an organization should avoid in any work-related activity with ChatGPT. Figure 2 - ChatGPT's data control configuration A good way to strike a balance between allowing users the flexibility to use ChatGPT but under stricter controls is only allowing chats in ChatGPT that have chat history disabled. Cato’s CASB granular ChatGPT application allows for this flexibility by being able to distinguish in real-time if a user is opted-in to chat history and block the connection before data is sent. Figure 3 – Example rule for “training opt-out” enforcement Lastly, as an alternative (or complementary) approach to the above, it is possible to configure Tenant Control for ChatGPT access, i.e., enforce which accounts are allowed when accessing the application. In a possible scenario an organization has corporate accounts in ChatGPT, where they have default security and data control policies enforced for all employees, and they would like to make sure employees do not access ChatGPT with their personal accounts on the free tier. Figure 4 - Example rule for tenant control To learn more about Cato’s CASB and DLP visit: https://www.catonetworks.com/platform/cloud-access-security-broker-casb/ https://www.catonetworks.com/platform/data-loss-prevention-dlp/

Cato XDR Storyteller – Integrating Generative AI with XDR to Explain Complex Security Incidents

08/02/2024

7m read

Generative AI (à la OpenAI’s GPT and the likes) is a powerful tool for summarizing information, transformations of text, transformation of code, all while doing... Read ›

Vadim Freger

Daniel Pienica

Iddo Gal

08/02/2024

7m read

Cato XDR Storyteller – Integrating Generative AI with XDR to Explain Complex Security Incidents Generative AI (à la OpenAI’s GPT and the likes) is a powerful tool for summarizing information, transformations of text, transformation of code, all while doing so using its highly specialized ability to “speak” in a natural human language. While working with GPT APIs on several engineering projects an interesting idea came up in brainstorming, how well would it work when asked to describe information provided in raw JSON into natural language? The data in question were stories from our XDR engine, which provide a full timeline of security incidents along with all the observed information that ties to the incident such as traffic flows, events, source/target addresses and more. When inputted into the GPT mode, even very early results (i.e. before prompt engineering) were promising and we saw a very high potential to create a method to summarize entire security incidents into natural language and providing SOC teams that use our XDR platform a useful tool for investigation of incidents. Thus, the “XDR Story Summary” project, aka “XDR Storyteller” came into being, which is integrating GenAI directly into the XDR detection & response platform in the Cato Management Application (CMA). The summaries are presented in natural language and provide a concise presentation of all the different data points and the full timeline of an incident. Figure 1 - Story Summary in action in Cato Management Application (CMA) These are just two examples of the many different scenarios we POCed prior to starting development: Example use-case #1 – deeper insight into the details of an incident.GPT was able to add details into the AI summary which were not easily understood from the UI of the story, since it is comprised of multiple events.GPT could infer from a Suspicious Activity Monitoring (SAM) event, that in addition to the user trying to download a malicious script, he attempted to disable the McAfee and Defender services running on the endpoint. The GPT representation is built from reading a raw JSON of an XDR story, and while it is entirely textual which puts it in contrast to the visual UI representation it is able to combine data from multiple contexts into a single summary giving insights into aspects that can be complex to grasp from the UI alone. Figure 2 - Example of a summary of a raw JSON, from the OpenAI Playground Example use-case #2 – Using supporting playbooks to add remediation recommendations on top of the summary. By giving GPT an additional source of data via a playbook used by our Support teams, he was able to not only summarize a network event but also provide a concise Cato-specific recommended actions to take to resolve/investigate the incident. Figure 3 - Example of providing GPT with additional sources of data, from the OpenAI Playground Picking a GenAI model There are multiple aspects to consider when integrating a 3rd-party AI service (or any service handling your data for that matter), some are engineering oriented such as how to get the best results from the input and others are legal aspects pertaining to handling of our and our customer’s data. Before defining the challenges of working with a GenAI model, you actually need to pick the tool you’ll be integrating, while GPT-4 (OpenAI) might seem like the go-to choice due to its popularity and impressive feature set it is far from being the only option, examples being PaLM(Google), LLaMA (Meta), Claude-2 (Anthropic) and multiple others. We opted for a proof-of-concept (POC) between OpenAI’s GPT and Amazon’s Bedrock which is more of an AI platform allowing to decide which model to use (Foundation Model - FM) from a list of several supported FMs. [boxlink link="https://www.catonetworks.com/resources/the-industrys-first-sase-based-xdr-has-arrived/"] The Industry’s First SASE-based XDR Has Arrived | Download the eBook [/boxlink] Without going too much into the details of the POC in this specific post, we’ll jump to the result which is that we ended up integrating our solution with GPT. Both solutions showed good results, and going the Amazon Bedrock route had an inherent advantage in the legal and privacy aspects of moving customer data outside, due to: Amazon being an existing sub-processor since we widely use AWS across our platform. It is possible to link your own VPC to Bedrock avoiding moving traffic across the internet. Even so due to other engineering considerations we opted for GPT, solving the privacy hurdle in another way which we’ll go into below. Another worthy mention, a positive effect of running the POC is that it allowed us to build a model-agnostic design leaving the option to add additional AI sources in the future for reliability and better redundancy purposes. Challenges and solutions Let’s look at the challenges and solutions when building the “Storyteller” feature: Prompt engineering & context – for any task given to an AI to perform it is important to frame it correctly and give the AI context for an optimal result.For example, asking ChatGPT “Explain thermonuclear energy” and “Explain thermonuclear energy for a physics PHD” will yield very different results, and the same applies for cybersecurity. Since the desired output is aimed at security and operations personnel, we should therefore give the AI the right context, e.g. “You are an MDR analyst, provide a comprehensive summary where the recipient is the customer”. For better context, other than then source JSON to analyze, we add source material that GPT should use for the reply. In this case to better understand Figure 4 - Example of prompt engineering research from the OpenAI Playground Additional prompt statements can help control the output formatting and verbosity. A known trait of GenAI’s is that they do like to babble and can return excessively long replies, often with repetitive information. But since they are obedient (for now…) we can shape the replies by adding instructions such as “avoid repeating information” or “interpret the information, do not just describe it” to the prompts.Other prompt engineering statements can control the formatting itself of the reply, so self-explanatory instructions like “do not use lists”, “round numbers if they are too long” or “use ISO-8601 date format” can help shape the end result. Data privacy – a critical aspect when working with a 3rd party to which customer data which also contains PII is sent, and said data is of course also governed by the rigid compliance certifications Cato complies with such as SOC2, GDPR, etc. As mentioned above in certain circumstances such as when using AWS this can be solved by keeping everything in your own VPC, but when using OpenAI’s API a different approach was necessary. It’s worth noting that when using OpenAI’s Enterprise tier then indeed they guarantee that your prompts and data are NOT used for training their model, and other privacy related aspects like data retention control are available as well but nonetheless we wanted to address this on our side and not send Personal Identifiable Information (PII) at all.The solution was to encrypt by tokenization any fields that contain PII information before sending them. PII information in this context is anything revealing of the user or his specific activity, e.g. source IP, domains, URLs, geolocation, etc. In testing we’ve seen that not sending this data has no detrimental effect on the quality of the summary, so essentially before compiling the raw output to send for summarization we perform preprocessing on the data. Based on a predetermined list of fields which can or cannot be sent as-is we sanitize the raw data. Keeping a mapping of all obfuscated values, and once getting the response replacing again the obfuscated values with the sensitive fields for a complete and readable summary, without having any sensitive customer data ever leave our own cloud. Figure 5 - High level flow of PII obfuscation Rate limiting – like most cloud APIs, OpenAI is no different and applies various rate limits on requests to protect their own infrastructure from over-utilization. OpenAI specifically does this by assigning users a tier-based limit calculation based on their overall usage, this is an excellent practice overall and when designing a system that consumes such an API, certain aspects need to be taken into consideration: Code should be optimized (shouldn’t it always? 😉) so as not to “expend” the limited resources – number of requests per minute/day or request tokens. Measuring the rate and remaining tokens, with OpenAI this can be done by adding specific HTTP request headers (e.g., “x-ratelimit-remaining-tokens”) and looking at remaining limits in the response. Error handling in case a limit is reached, using backoff algorithms or simply retrying the request after a short period of time. Part of something bigger Much like the entire field of AI itself, the shaping and application of which we are now living through, the various applications in cybersecurity are still being researched and expanded on, and at Cato Networks we continue to invest heavily into AI & ML based technologies across our entire SASE platform. Including and not limited to the integration of many Machine Learning models into our cloud, for inline and out-of-band protection and detection (we’ll cover this in upcoming blog posts) and of course features like XDR Storyteller detailed in this post which harnesses GenAI for a simplified and more thorough analysis of security incidents.

Cato XDR Story Similarity – A Data Driven Incident Comparison and Severity Prediction Model

06/02/2024

8m read

At Cato our number one goal has always been to simplify networking and security, we even wrote it on a cake once so it must... Read ›

Vadim Freger

Daniel Pienica

06/02/2024

8m read

Cato XDR Story Similarity – A Data Driven Incident Comparison and Severity Prediction Model At Cato our number one goal has always been to simplify networking and security, we even wrote it on a cake once so it must be true: Figure 1 - A birthday cake Applying this principle to our XDR offering, we aimed at reducing the complexity of analyzing security and network incidents, using a data-driven approach that is based on the vast amounts of data we see across our global network and collect into our data lake. On top of that, being able to provide a prediction of the threat type and the predicted verdict, i.e. if it is benign or suspicious. Upon analyzing XDR stories – a summary of events that comprise a network or security incident – many similarities can be observed both inside the network of a given customer, and even more so between different customers’ networks. Meaning, eventually a good deal of network and security incidents that occur in one network have a good chance of recurring in another. Akin to the MITRE ATT&CK Framework, which aims to group and inventory attack techniques demonstrating that there is always similarity of one sort or another between attacks.For example, a phishing campaign targeted at a specific industry, e.g. the banking sector, will likely repeat itself in multiple customer accounts from that same industry. In essence this allows crowdsourcing of sorts where all customers can benefit from the sum of our network and data. An important note is that we will never share data of one customer with another, upholding to our very strict privacy measures and data governance, but by comparing attacks and story verdicts across accounts we can still provide accurate predictions without sharing any data. The conclusion is that by learning from the past we can predict the future, using a combination of statistical algorithms we can determine with a high probability if a new story is related to a previously seen story and the likelihood of it being the same story with the same verdict, in turn cutting down the time to analyze the incident, freeing up the security team’s time to work on resolving it. Figure 2 - A XDR story with similarities The similarity metric – Jaccard Similarity Coefficient To identify whether incidents share a similarity we look at the targets, i.e. the destination domains/IPs involved in the incident, going over all our data and grouping the targets into clusters we then need to measure the strength of the relation between the clusters. To measure that we use the Jaccard index (also known as Jaccard similarity coefficient). The Jaccard coefficient measures similarity between finite sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets: Taking a more graphic example, given two sets of domains (i.e. targets), we can calculate the following by looking Figure 3 below. Figure 3 The size of the intersection between sets A & B is 1 (google.com), and the size of the union is 5 (all domains summed). The Jaccard similarity between the sets would be 1/5 = 0.2 or in other words, if A & B are security incidents that involved these target domains, they have a similarity of 20%, which is a weak indicator and hence they should not be used to predict the other. The verification model - Louvain Algorithm Modularity is a measure used in community detection algorithms to assess the quality of a partition of a network into communities. It quantifies how well the nodes in a community are connected compared to how we would expect them to be connected in a random network. Using the Louvain algorithm, we detected communities of cyber incidents by considering common targets and using Jaccard similarity as the distance metric between incidents. Modularity ranges from -1 to 1, where a value close to 1 indicates a strong community structure within the network. Therefore, the modularity score achieved provides sufficient evidence that our approach of utilizing common targets is effective in identifying communities of related cyber incidents. To understand how modularity is calculated, let's consider a simplified example. Suppose we have a network of 10 cyber incidents, and our algorithm identifies two communities.Each community consists of the following incidents: Community 1: Incidents {A, B, C, D}Community 2: Incidents {E, F, G, H, I, J} The total number of edges connecting the incidents within each community can be calculated as follows: Community 1: 6 edges (A-B, A-C, A-D, B-C, B-D, C-D)Community 2: 15 edges (E-F, E-G, E-H, E-I, E-J, F-G, F-H, F-I, F-J, G-H, G-I, G-J, H-I, H-J, I-J) Additionally, we can calculate the total number of edges in the entire network: Total edges: 21 (6 within Community 1 + 15 within Community 2) Now, let's calculate the expected number of edges in a random network with the same node degrees.The node degrees in our network are as follows: Community 1: 3 (A, B, C, and D have a degree of 3)Community 2: 5 (E, F, G, H, I, and J have a degree of 5) To calculate the expected number of edges, we can use the following formula: Expected edges between two nodes (i, j) = (degree of node i * degree of node j) / (2 * total edges) For example, the expected number of edges between nodes A and B would be: (3 * 3) / (2 * 21) = 0.214 By calculating the expected number of edges for all pairs of nodes, we can obtain the expected number of edges within each community and in the entire network. Finally, we can use these values to calculate the modularity using the formula: Modularity = (actual number of edges - expected number of edges) / total edges The Louvain algorithm works iteratively to maximize the modularity score. It starts by assigning each node to its own community and then iteratively moves nodes between communities to increase the modularity value. The algorithm continues this process until no further improvement in modularity can be achieved. A practical example, in figure 4 below, using Gephi (an open-source graph visualization application), we have an example of a customers’ cyber incidents graph. The nodes are the cyber incidents, and the edges are weighted using the Jaccard similarity metric.We can see clear division of clusters with interconnected incidents showing that using Jaccard similarity on common targets is having great results. The colors of the clusters are based on the cyber incident type, and we can see that our approach is confirmed by having cyber incidents of multiple types clustered together. The big cluster in the center is composed of three very similar cyber incident types. This customers’ incidents in this example achieved a modularity score of 0.75. Figure 4 – Modularity verification visualization using Gephi In summary, the modularity value obtained after applying the Louvain algorithm over the entire dataset of customers and incidents, is about 0.71, which is considered high. This indicated that our approach of using common targets and Jaccard similarity as the distance metric is effective in detecting communities of cyber incidents in the network and served as validation of the design. [boxlink link="https://www.catonetworks.com/resources/the-industrys-first-sase-based-xdr-has-arrived/"] The Industry’s First SASE-based XDR Has Arrived | Download the eBook [/boxlink] Architecting to run at scale The above was a very simplified example of how to measure similarity. Running this at scale over our entire data lake presented a scaling challenge that we opted to solve using a serverless architecture that can scale on-demand based on AWS Lambda.Lambda is an event-driven serverless platform allowing you to run code/specific functions on-demand and to scale automatically using an API Gateway service in front of your Lambdas.In the figure below we can see the distribution of Lambda invocations over a given week, and the number of parallel executions demonstrating the flexibility and scaling that the architecture allows for. Figure 5 - AWS Lambda execution metrics The Cato XDR Service runs on top of data from our data lake once a day, creating all the XDR stories. Part of every story creation is also to determine the similarity score, achieved by invoking the Lambda function. Oftentimes Lambda’s are ready to use functions that contain the code inside the Lambda, in our case to fit our development and deployment models we chose to use Lambda’s ability to run Docker images through ECR (Elastic Container Registry). The similarity model is coded in Python, which runs inside the Docker image, executed by Lambda every time it runs. The backend of the Lambda is a DocumentDB cluster, a NoSQL database offered by AWS which is also MongoDB compliant and performs very well for querying large datasets. In the DB we store the last 6 months of story similarity data, and every invocation of the Lambda uses this data to determine similarity by applying the Jaccard index on the data, returning a dataset with the results back to the XDR service. Figure 6 - High level diagram of similarity calculation with Lambda An additional standalone phase of this workflow is keeping the DocDB database up to date with data of stories and targets to keep similarity calculation relevant and accurate.The update phase runs daily, orchestrated using Apache Airflow, an open-source workflow management platform which is very suited for this and used for many of our data engineering workflows as well. Airflow triggers a different Lambda instance, technically running the same Docker image as before but invoking a different function to update the database. Figure 7 - DocDB update workflow Ultimate impact and what's next We’ve reviewed how by leveraging a data-driven approach we were able to address the complexity of analyzing security and network incidents by linking them to already identified threats and predicting their verdict.Overall, in our analysis we saw that a little over 30% of incidents have a similar incident linked to them, this is a very strong and indicative result, ultimately meaning we can help reduce the time it takes to investigate a third of the incidents across a network.As IT & Security teams continue to struggle with staff shortages to keep up with the ongoing and constant flow of cybersecurity incidents, capabilities such as this go a long way to reduce the workload and fatigue, allowing teams to focus on what’s important. Using effective and easy to implement algorithms coupled with a highly scalable serverless infrastructure using AWS Lambda we were able to achieve a powerful solution that can meet the requirement of processing massive amounts of data. Future enhancements being researched involve comparing entire XDR stories to provide an even stronger prediction model, for example by identifying similarity between incidents even if they do not share the same targets through different vectors.Stay tuned.

Atlassian Confluence Server and Data Center Remote Code Execution (CVE-2023-22527) – Cato’s Analysis and Mitigation

25/01/2024

5m read

Atlassian recently disclosed a new critical vulnerability in its Confluence Server and Data Center product line, the CVE has a CVSS score of 10, and... Read ›

Vadim Freger

Ronen Jaffa

25/01/2024

5m read

Machine Learning in Action – An In-Depth Look at Identifying Operating Systems Through a TCP/IP Based Model

15/01/2024

5m read

In the previous post, we’ve discussed how passive OS identification can be done based on different network protocols. We’ve also used the OSI model to... Read ›

Asaf Fried

Vadim Freger

15/01/2024

5m read

Machine Learning in Action – An In-Depth Look at Identifying Operating Systems Through a TCP/IP Based Model In the previous post, we’ve discussed how passive OS identification can be done based on different network protocols. We’ve also used the OSI model to categorize the different indicators and prioritize them based on reliability and granularity. In this post, we will focus on the network and transport layers and introduce a machine learning OS identification model based on TCP/IP header values.  So, what are machine learning (ML) algorithms and how can they replace traditional network and security analysis paradigms? If you aren’t familiar yet, ML is a field devoted to performing certain tasks by learning from data samples. The process of learning is done by a suitable algorithm for the given task and is called the “training” phase, which results in a fitted model. The resulting model can then be used for inference on new and unseen data.  ML models have been used in the security and network industry for over two decades. Their main contribution to network and security analysis is that they make decisions based on data, as opposed to domain expertise (i.e., they are data-driven). At Cato we use ML models extensively across our service, and in this post specifically we’ll delve into the details of how we enhanced our OS identification engine using a TCP/IP based model.   For OS identification, a network analyst might create a passive network signature for detecting a Windows OS based on his knowledge on the characteristics of the Windows TCP/IP stack implementation. In this case, he will also need to be familiar with other OS implementations to avoid false positives. However, with ML, an accurate network signature can be produced by the algorithm after training on several labeled network flows from different devices and OSs. The differences between the two approaches are illustrated in Figure 1.  Figure 1: A traditional paradigm for writing identification rules vs. a machine learning approach.  In the following sections, we will demonstrate how an ML model that generates OS identification rules can be created using a Decision Tree. A decision tree is a good choice for our task for a couple of reasons. Firstly, it is suitable for multiclass classification problems, such as OS identification, where a flow can be produced from various OS types (Windows, Linux, iOS, Android, Linux, and more). But perhaps even more importantly, after being trained, the resulting model can be easily converted to a set of decision rules, with the following form:  if condition1 and condition 2 … and condition n then label   This means that your model can be deployed on environments with minimal dependencies and strict performance limits, which are common requirements for network appliances such as packet filtering firewalls and deep packet inspection (DPI) intrusion prevention systems (IPS).  How do decision trees work for classification tasks?   In this section we will use the following example dataset to explain the theory behind decision trees. The dataset represents the task of classifying OSs based on TCP/IP features. It contains 8 samples in total, captured from 3 different OSs: Linux, Mac, and Windows. From each capture, 3 elements were extracted: IP initial time-to-live (ITTL), TCP maximum segment size (MSS), and TCP window size.   Figure 2: The training dataset with 8 samples, 3 features, and 3 classes.  Decision trees, as their name implies, use a tree-based structure to perform classification. Each node in the root and internal tree levels represents a condition used to split the data samples and move them down the tree. The nodes at the bottom level, also called leaves, represent the classification type. This way, the data samples are classified by traversing the tree paths until they reach a leaf node. In Figure 3, we can observe a decision tree created from our dataset. The first level of the tree splits our data samples based on the “IP ITTL” feature. Samples with a value higher than 96 are classified as a Windows OS, while the rest traverse down the tree to the second level decision split.  Figure 3: A simple decision tree for classifying an OS.  So, how did we create this tree from our data? Well, this is the process of learning that was mentioned earlier. Several variations exist for training a decision tree; In our example, we will apply the well-known Classification and Regression Tree (CART) algorithm.  The process of building the tree is done from top to bottom, starting from the root node. In each step, a split criterion is selected with the feature and threshold that provide the best “split quality” for the data in the current node. In general, split criterions that divide the data into groups with more homogeneous class representation (i.e., higher purity) are considered to have a better split quality. The CART algorithm measures the split quality using a metric called Gini Impurity. Formally, the metric is defined as:  Where 𝐶 denotes the number of classes in the data (in our case, 3), and 𝑝 denotes the probability for that class, given the data in the current node. The metric is bounded between 0 and 1 the represent the degree of node impurity. The quality of the split criterion is then defined by the weighted sum of the Gini Impurity values of the nodes below. Finally, the split criterion that gives to lowest weighted sum of the Gini Impurities for the bottom nodes is selected.   In Figure 4, we can see an example for selecting the first split criterion of the tree. The root node of tree, containing all data samples, has the Gini Impurity values of:  Then, given the split criterion of “IP ITTL <= 96”, the data is split to two nodes. The node that satisfies the condition (left side), has the Gini Impurity values of:    While the node that doesn’t, has the Gini Impurity values of:  Overall, the weighted sum for this split is:  This value is the minimal Gini Impurity of all the candidates and is therefore selected for the first split of the tree. For numeric features, the CART algorithm selects the candidates as all the midpoints between sequential values from different classes, when sorted by value. For example, when looking at the sorted “IP ITTL” feature in the dataset, the split criterion is the midpoint between IP ITTL = 64, which belongs to a Mac sample, and IP ITTL = 128, which belongs to a Windows sample. For the second split, the best split quality is given by the “TCP MSS” features, from the midpoint between TCP MSS = 1386, which belongs to a Mac sample, and TCP MSS = 1460, which belongs to a Linux sample.  Figure 4: Building a tree from the data – level 1 and level 2. The tree nodes display: 1. Split criterion, 2. Gini Impurity value, 3. Number of data sample from each class.  In our example, we fully grow our tree until all the leaves have a homogenous class representation, i.e., each leaf has data samples from a single class only. In practice, when fitting a decision tree to data, a stopping criterion is selected to make sure the model doesn’t overfit the data. These criteria include maximum tree height, minimum data samples for a node to be considered a leaf, maximum number of leaves, and more. In case the stopping criterion is reached, and the leaf doesn’t have a homogeneous class representation, the majority class can be used for classification.   [boxlink link="https://catonetworks.easywebinar.live/registration-everything-you-wanted-to-know-about-ai-security"] Everything You Wanted To Know About AI Security But Were Afraid To Ask | Watch the Webinar [/boxlink] From tree to decision rules  The process of converting a tree to rules is straight forward. Each route in the tree from root to leaf node is a decision rule composed from a conjunction of statements. I.e., If a new data sample satisfies all of statements in the path it is classified with the corresponding label.   Based on the full binary tree theorem, for a binary tree with 𝑛 nodes, the number of extracted decision rules is (𝑛+1)/ 2. In Figure 5, we can see how the trained decision tree with 5 nodes, is converted to 3 rules. Figure 5: Converting the tree to a set of decision rules.  Cato’s OS detection model  Cato’s OS detection engine, running in real-time on our cloud, is enhanced by rules generated by a decision tree ML model, based on the concepts we described in this post. In practice, to gain a robust and accurate model we trained our model on over 10k unique labeled TCP SYN packets from various types of devices. Once the initial model is trained it also becomes straightforward to re-train it on samples from new operating systems or when an existing networking implementation changes.  We also added additional network features and extended our target classes to include embedded and mobile OSs such as iOS and Android. This resulted in a much more complex tree that generated 125 different OS detection rules. The resulting set of rules that were generated through this process would simply not have been feasible to achieve using a manual work process. This greatly emphasizes the strength of the ML approach, both the large scope of rules we were able to generate and saving a great deal of engineering time.  Figure 6: Cato’s OS detection tree model. 15 levels, 249 nodes, and 125 OS detection rules.  Having a data-driven OS detection engine enables us to keep up with the continuously evolving landscape of network-connected enterprise devices, including IoT and BYOD (bring your own device). This capability is leveraged across many of our security and networking capabilities, such as identifying and analyzing security incidents using OS information, enforcing OS-based connection policies, and improving visibility into the array of devices visible in the network.  An example of the usage of the latter implementation of our OS model can be demonstrated in Figure 7, the view of Device Inventory, a new feature giving administrators a full view of all network connected devices, from workstations to printers and smartwatches. With the ability to filter and search through the entire inventory. Devices can be aggregated by different categories, such as OS shown below or by the device type, manufacturer, etc.   Figure 7: Device Inventory, filtering by OS using data classified using our models However, when inspecting device traffic, there is other significant information besides OS we can extract using data-driven methods. When enforcing security policies, it is also critical to learn the device hardware, model, installed application, and running services. But we'll leave that for another post.  Wrapping up  In this post, we’ve discussed how to generate OS identification rules using a data-driven ML approach. We’ve also introduced the Decision Tree algorithm for deployment considerations on minimal dependencies and strict performance limits environments, which are common requirements for network appliances. Combined with the manual fingerprinting we’ve seen in the previous post; this series provides an overview of the current best practices for OS identification based on network protocols. 

Apache Struts 2 Remote Code Execution (CVE-2023-50164) – Cato’s Analysis and Mitigation

17/12/2023

3m read

By Vadim Freger, Dolev Moshe Attiya On December 7th, 2023, the Apache Struts project disclosed a critical vulnerability (CVSS score 9.8) in its Struts 2... Read ›

Dolev Moshe Attiya

Vadim Freger

17/12/2023

3m read

Apache Struts 2 Remote Code Execution (CVE-2023-50164) – Cato’s Analysis and Mitigation By Vadim Freger, Dolev Moshe Attiya On December 7th, 2023, the Apache Struts project disclosed a critical vulnerability (CVSS score 9.8) in its Struts 2 open-source web framework. The vulnerability resides in the flawed file upload logic and allows attackers to manipulate upload parameters, resulting in arbitrary file upload and code execution under certain conditions. There is no known workaround, and the only solution is to upgrade to the latest versions, the affected versions being: Struts 2.0.0 - Struts 2.3.37 (EOL) Struts 2.5.0 - Struts 2.5.32 Struts 6.0.0 - Struts 6.3.0 The Struts framework, an open-source Java EE web application development framework, is somewhat infamous for its history of critical vulnerabilities. Those include, but are not limited to, CVE-2017-5638 which was the vector of the very public Equifax data breach in 2017 resulting in the theft of 145 million consumer records, which was made possible due to an unpatched Struts 2 server. At the time of disclosure, there were no known attempts to exploit, but several days later on December 12th, a Proof-of-Concept (POC) was made publicly available. Immediately, we saw increased scanning and exploitation activity across Cato’s global network. Within one day, Cato had protected against the attack. [boxlink link="https://www.catonetworks.com/rapid-cve-mitigation/"] Rapid CVE Mitigation by Cato Security Research [/boxlink] Details of the vulnerability The vulnerability is made possible by combining two flaws in Struts 2, allowing attackers to manipulate file upload parameters to upload and then execute a file. This vulnerability stems from the manipulation of file upload parameters. The first flaw involves simulating the file upload, where directory traversal becomes possible along with a malicious file. This file upload request generates a temporary file corresponding to a parameter in the request. Under regular circumstances, the temporary file should be deleted after the request ends, but in this case, the temporary file is not deleted, enabling attackers to upload their file to the host.The second flaw is the case-sensitive nature of HTTP parameters. Sending a capitalized parameter and later using a lowercase parameter with the same name in a request makes it possible to modify a field without undergoing the usual checks and validations. This creates an ideal scenario for employing directory traversal to manipulate the upload path, potentially directing the malicious file to an execution folder. From there, an attacker can execute the malicious file, for instance, a web shell to gain access to the server. Cato’s analysis and response to the CVE From our data and analysis at Cato’s Research Labs we have seen multiple exploitation attempts of the CVE across Cato customer networks immediately following the POC availability.Attempts observed range from naive scanning attempts to real exploitation attempts looking for vulnerable targets. Cato deployed IPS signatures to block any attempts to exploit the RCE in just 24 hours from the date of the POC publication, protecting all Cato-connected edges – sites, remote users, and cloud resources -- worldwide from December 13th, 2023. Nonetheless, Cato recommends upgrading all vulnerable webservers to the latest versions released by the project maintainers.

Cato Application Catalog – How we supercharged application categorization with AI/ML

23/11/2023

6m read

New applications emerge at an almost impossible to keep-up-with pace, creating a constant challenge and blind spot for IT and security teams in the form... Read ›

Tomer Doitshman

Vadim Freger

Gil Haham

23/11/2023

6m read

Cato Application Catalog – How we supercharged application categorization with AI/ML New applications emerge at an almost impossible to keep-up-with pace, creating a constant challenge and blind spot for IT and security teams in the form of Shadow IT. Organizations must keep up by using tools that are automatically updated with latest developments and changes in the applications landscape to maintain proper security. An integral part of any SASE product is its ability to accurately categorize and map user traffic to the actual application being used. To manage sanctioned/unsanctioned applications, apply security policies across the network based on the application or category of applications, and especially for granular application controls using CASB, a comprehensive application catalog must be maintained. At Cato, keeping up required building a process that is both highly automated and just as importantly, data-driven, so that we focus on the applications most in-use by our customers and be able to separate the wheat from the chaff.In this post we’ll detail how we supercharged our application catalog updates from a labor-intensive manual process to an AI/ML based process that is fully automated in the form of a data-driven pipeline, growing our rate of adding new application by an order of magnitude, from tens of application to hundreds added every week. What IS an application in the catalog? Every application in our Application Catalog has several characteristics: General – what the company does, employees, where it’s headquartered, etc. Compliance – certifications the application holds and complies with. Security – features supported by the application such as if it supports TLS or Two-Factor authentication, SSO, etc. Risk score – a critical field calculated by our algorithms based on multiple heuristics (detailed here later) to allow IT managers and CISOs focus on actual possible threats to their network. Down to business, how it actually gets done We refer to the process of adding an application as “signing” it, that is, starting from the automated processes up to human analysts going over the list of apps to be released in the weekly release cycle and giving it a final human verification (side note: this is also presently a bottleneck in the process, as we want the highest control and quality when publishing new content to our production environment, though we are working on ways to improve this part of the process as well). As mentioned, first order of business is picking the applications that we want to add, and for that we use our massive data lake in which we collect all the metadata from all traffic that flows through our network.We identify these by looking at the most used domains (FQDNs) in our entire network, repeating across multiple customer accounts, which are yet to be signed and are not in our catalog. [boxlink link="https://catonetworks.easywebinar.live/registration-everything-you-wanted-to-know-about-ai-security"] Everything You Wanted To Know About AI Security But Were Afraid To Ask | Watch the Webinar [/boxlink] The automation is done end-to-end using “Shinnok”, our in-house tool developed and maintained by our Security Research team, taking the narrowed down list of unsigned apps Shinnok begins compiling the 4 fields (description, compliance, security & risk score) for every app. Description – This is the most straightforward part, and based on info taken via API from Crunchbase Compliance – Using a combination of online lookups and additional heuristics for every compliance certification we target; we compile the list of supported certifications by the app.For example by using Google’s query API for a given application + “SOC2”, and then further filtering the results for false positives from unreliable sources we can identify support for the SOC2 compliance. Security – Similar to compliance, with the addition of using our data lake to identify certain security features being used by the app that we observe over the network. Risk Score – Being the most important field, we take a combination of multiple data points to calculate the risk score: Popularity: This is based on multiple data points including real-time traffic data from our network to measure occurrences of the application across our own network and correlated with additional online sources. Typically, an app that is more popular and more well-known poses a lower risk than a new obscure application. CVE analysis: We collect and aggregate all known CVEs of the application, obviously the more high-severity CVEs an application has means it has more opening for attackers and increases the risk to the organization. Sentiment score: We collect news, mentions and any articles relating to the company/application, we then build a dataset with all mentions about the application.We then pass this dataset through our advanced AI deep learning model, for every mention outputting whether it is a positive or negative article/mentions, generating a final sentiment score and adding it as a data point for the overall algorithm. Distilling all the different data points using our algorithms we can calculate the final Risk Score of an app. WIIFM? The main advantage of this approach to application categorization is that it is PROACTIVE, meaning network administrators using Cato receive the latest updates for all the latest applications automatically. Based on the data we collect we evaluate that 80% - 90% of all HTTP traffic in our network is covered by a known application categorization.Admins can be much more effective with their time by looking at data that is already summarized giving them the top risks in their organization that require attention. Use case example #1 – Threads by Meta To demonstrate the proactive approach, we can take a look at a recent use case of the very public and explosive launch of the Threads platform by Meta, which anecdotally regardless of its present success was recorded as the largest product launch in history, overtaking ChatGPT with over 100M user registrations in 5 days.In the diagram below we can see this from the perspective of our own network, checking all the boxes for a new application that qualifies to be added to our app catalog. From the numbers of unique connections and users to the numbers of different customer accounts in total that were using Threads. Thanks to the automated process, Threads was automatically included in the upcoming batch of applications to sign. Two weeks after its release it was already part of the Cato App Catalog, without end users needing to perform any actions on their part. Use case example #2 – Coverage by geographical region As part of an analysis done by our Security Research team we identified a considerable gap in our coverage of application coverage for the Japanese market, and this coincided with feedback received from the Japan sales teams on lacking coverage.Using the same automated process, this time limiting the scope of the data from our data lake being inputted to Shinnok only from Japanese users we began a focused project of augmenting the application catalog with applications specific to the Japanese market, we were able to add more than 600 new applications over a period of 4 months. Following this we’ve measured a very substantial increase in the coverage of apps going from under 50% coverage to over 90% of all inspected HTTP traffic to Japanese destinations. To summarize We’ve reviewed how by leveraging our huge network and data lake, we were able to build a highly automated process, using real-time online data sources, coupled with AI/ML models to categorize applications with very little human work involved.The main benefits are of course that Cato customers do not need to worry about keeping up-to-date on the latest applications that their users are using, instead they know they will receive the updates automatically based on the top trends and usage on the internet.

Cisco IOS XE Privilege Escalation (CVE-2023-20198) – Cato’s analysis and mitigation

02/11/2023

4m read

By Vadim Freger, Dolev Moshe Attiya, Shirley Baumgarten All secured webservers are alike; each vulnerable webserver running on a network appliance is vulnerable in its... Read ›

Vadim Freger

Shirley Baumgarten

02/11/2023

4m read

Cisco IOS XE Privilege Escalation (CVE-2023-20198) – Cato’s analysis and mitigation By Vadim Freger, Dolev Moshe Attiya, Shirley Baumgarten All secured webservers are alike; each vulnerable webserver running on a network appliance is vulnerable in its own way. On October 16th 2023 Cisco published a security advisory detailing an actively exploited vulnerability (CVE-2023-20198) in its IOS XE operating system with a 10 CVSS score, allowing for unauthenticated privilege escalation and subsequent full administrative access (level 15 in Cisco terminology) to the vulnerable device.After gaining access, which in itself is already enough to do damage and allows full device control, using an additional vulnerability (CVE-2023-20273) an attacker can elevate further to the “root” user and install a malicious implant to the disk of the device. When the initial announcement was published Cisco had no patched software update to provide, and the suggested mitigations were to disable HTTP/S access to the IOS XE Web UI and/or limiting the access to it from trusted sources using ACLs and approx. a week later patches were published and the advisory updated.The zero-day vulnerability was being exploited before the advisory was published, and many current estimates and scanning analyses put the number of implanted devices in the tens of thousands. [boxlink link="https://www.catonetworks.com/rapid-cve-mitigation/"] Rapid CVE Mitigation by Cato Security Research [/boxlink] Details of the vulnerability The authentication bypass is done on the webui_wsma_http or webui_wsma_https endpoints in the IOS XE webserver (which is running OpenResty, an Nginx variant that adds Lua scripting support). By using double-encoding (a simple yet clearly effective evasion technique) in the URL of the POST request it bypasses checks performed by the webserver and passes the request to the backend. The request body contains an XML payload which the backend executes arbitrarily since it’s considered to pass validations and comes from the frontend.In the request example below (credit: @SI_FalconTeam) we can see the POST request along with the XML payload is sent to /%2577ebui_wsma_http, when %25 is the character “%” encoded, followed by 77, and combined is “%77” which is the character “w” encoded. Cisco has also provided a command to check the presence of an implant in the device, by running: curl -k -X POST "https[:]//DEVICEIP/webui/logoutconfirm.html?logon_hash=1", replacing DEVICEIP and checking the response, if a hexadecimal string is returned an implant is present. Cato’s analysis and response to the CVE From our data and analysis at Cato’s Research Labs we have seen multiple exploitation attempts of the CVE, along with an even more interesting case of Cisco’s own SIRT (Security Incident Response Team) performing scanning of devices to detect if they are vulnerable, quite likely to proactively contact customers running vulnerable systems.An example of scanning activity from 144.254.12[.]175, an IP that is part of a /16 range registered to Cisco. Cato deployed IPS signatures to block any attempts to exploit the vulnerable endpoint, protecting all Cato connected sites worldwide from November 1st 2023.Cato also recommends to always avoid placing critical networking infrastructure to be internet facing. In instances when this is a necessity, disabling HTTP access and proper access controls using ACLs to limit the source IPs able to access devices must be implemented. Networking devices are often not thought of as webservers, and due to this do not always receive the same forms of protection e.g., a WAF, however their Web UIs are clearly a powerful administrative interface, and we see time and time again how they are exploited. Networking devices like Cisco’s are typically administered almost entirely using CLI with the Web UI receiving less attention, somewhat underscoring a dichotomy between the importance of the device in the network to how rudimentary of a webserver it may be running. https://www.youtube.com/watch?v=6caLf-1KGFw&list=PLff-wxM3jL7twyfaaYB7jxy6WqDB_17V4

Cato’s Analysis and Protection for cURL SOCKS5 Heap Buffer Overflow (CVE-2023-38545)

12/10/2023

4m read

TL;DR This vulnerability appears to be less severe than initially anticipated. Cato customers and infrastructure are secure. Last week the original author and long-time lead... Read ›

Vadim Freger

12/10/2023

4m read

Cato’s Analysis and Protection for cURL SOCKS5 Heap Buffer Overflow (CVE-2023-38545) TL;DR This vulnerability appears to be less severe than initially anticipated. Cato customers and infrastructure are secure. Last week the original author and long-time lead developer of cURL Daniel Stenberg published a “teaser” for a HIGH severity vulnerability in the ubiquitous libcurl development library and the curl command-line utility. A week of anticipation, multiple heinous crimes against humanity and a declaration of war later, the vulnerability was disclosed publicly. The initial announcement caused what in hindsight can be categorized as somewhat undue panic in the security and sysadmin worlds. But given how widespread the usage of libcurl and curl is around the world (at Cato we use widely as well, more on that below), and to quote from the libcurl website – “We estimate that every internet connected human on the globe uses (lib)curl, knowingly or not, every day”, the initial concern was more than understandable. The libcurl library and the curl utility are used for interacting with URLs and for various multiprotocol file transfers, they are bundled into all the major Linux/UNIX distributions. Likely for that reason the project maintainers opted to keep the vulnerability disclosure private, and shared very little details to deter attackers, only letting the OS distributions maintainers know in advance while patched version are made ready in the respective package management systems for when it is disclosed. [boxlink link="https://www.catonetworks.com/rapid-cve-mitigation/"] Rapid CVE Mitigation by Cato Security Research [/boxlink] The vulnerability in detail The code containing the buffer overflow vulnerability is part of curl’s support for the SOCKS5 proxy protocol.SOCKS5 is a simple and well-known (while not very well-used nowadays) protocol for setting up an organizational proxy or quite often for anonymizing traffic, like it is used in the Tor network. The vulnerability is in libcurl hostname resolution which is either delegated to the target proxy server or done by libcurl itself. If a hostname larger than 255 bytes is given, then it turns to local resolution and only passed the resolved address. Due to the bug, and in a slow enough handshake (“slow enough” being typical server latency according to the post), the Buffer Overflow can be triggered, and the entire “too-long-hostname” being copied to the buffer instead of the resolved result. There are multiple conditions that need to be met for the vulnerability to be exploited, specifically: In applications that do not set “CURLOPT_BUFFERSIZE” or set it below 65541. Important to note that the curl utility itself sets it to 100kB and so is not vulnerable unless changed specifically in the command line. CURLOPT_PROXYTYPE set to type CURLPROXY_SOCKS5_HOSTNAME CURLOPT_PROXY or CURLOPT_PRE_PROXY set to use the scheme socks5h:// A possible way to exploit the buffer overflow would likely require the attacker to control a webserver which is contacted by the libcurl client over SOCKS5, could make it return a crafted redirect (HTTP 30x response) which will contain a Location header with a long enough hostname to trigger the buffer overflow. Cato’s usage of (lib)curl At Cato we of course utilize both libcurl and curl itself for multiple purposes: curl and libcurl based applications are used extensively in our global infrastructure in scripts and in-house applications. Cato’s SDP Client also implements libcurl and uses it for multiple functions. We do not use SOCKS5, and Cato’s code and infrastructure are not vulnerable to any form of this CVE. Cato’s analysis response to the CVE Based on the CVE details and the public POC shared along with the disclosure, Cato’s Research Labs researchers believe that chances for this to be exploited successfully are medium – low. Nevertheless we have of course added IPS signatures for this CVE, providing Cato connected sites worldwide the peace and quiet through virtual patching, blocking attempts for an exploit with a detect-to-protect time of 1 day and 3 hours for all users and sites connected to Cato worldwide, and Opt-In Protection already available after 14 hours.Cato’s recommendation is as always to patch impacted servers and applications, affected versions being from libcurl 7.69.0 to and including 8.3.0. In addition, it is possible to mitigate by identifying usage as already stated of the parameters that can lead to the vulnerability being triggered - CURLOPT_PROXYTYPE, CURLOPT_PROXY, CURLOPT_PRE_PROXY. For more insights on CVE-2023-38545 specifically and many other interesting and nerdy Cybersecurity stories, listen (and subscribe!) to Cato’s podcast - The Ring of Defense: A CyberSecurity Podcast (also available in audio form).

Cato Protects Against Atlassian Confluence Server Exploits (CVE-2023-22515)

06/10/2023

3m read

A new critical vulnerability has been disclosed by Atlassian in a security advisory published on October 4th 2023 in its on-premise Confluence Data Center and... Read ›

Vadim Freger

06/10/2023

3m read

Cato Protects Against Atlassian Confluence Server Exploits (CVE-2023-22515) A new critical vulnerability has been disclosed by Atlassian in a security advisory published on October 4th 2023 in its on-premise Confluence Data Center and Server product. A privilege escalation vulnerability through which attackers may exploit a vulnerable endpoint in internet-facing Confluence instances to create unauthorized Confluence administrator accounts and gain access to the Confluence instance. At the time of writing a CVSS score was not assigned to the vulnerability but it can be expected to be very high (9 – 10) due to the fact it is remotely exploitable and allows full access to the server once exploited. [boxlink link="https://www.catonetworks.com/rapid-cve-mitigation/"] Rapid CVE Mitigation by Cato Security Research [/boxlink] Cato’s Response   There are no publicly known proofs-of-concept (POC) of the exploit available, but it has been confirmed by Atlassian that they have been made aware of the exploit by a “handful of customers where external attackers may have exploited a previously unknown vulnerability” so it can be assumed with a high certainty that it is already being exploited. Cato’s Research Labs identified possible exploitation attempts of the vulnerable endpoint (“/setup/”) in some of our customers immediately after the security advisory was released, which were successfully blocked without any user intervention needed. The attempts were blocked by our IPS signatures aimed at identifying and blocking URL scanners even before a signature specific to this CVE was available. The speed with which using the very little information available from the advisory was already integrated into online scanners gives a strong indication of how much of a high-value target Confluence servers are, and is concerning given the large numbers of publicly facing Confluence servers that exist. Following the disclosure, Cato deployed signatures blocking any attempts to interact with the vulnerable “/setup/” endpoint, with a detect-to-protect time of 1 day and 23 hours for all users and sites connected to Cato worldwide, and Opt-In Protection already available in under 24 hours. Furthermore, Cato’s recommendation is to restrict access to Confluence servers’ administration endpoints only from authorized IPs, preferably from within the network and when not possible that it is only accessible from hosts protected by Cato, whether behind a Cato Socket or remote users running the Cato Client. Cato’s Research Labs continues to monitor the CVE for additional information, and we will update our signatures as more information becomes available or a POC is made public and exposes additional information. Follow our CVE Mitigation page and Release Notes for future information.

Filter by topic