Cato XDR Storyteller – Integrating Generative AI with XDR to Explain Complex Security Incidents

Generative AI (à la OpenAI’s GPT and the likes) is a powerful tool for summarizing information, transformations of text, transformation of code, all while doing... Read ›
Cato XDR Storyteller – Integrating Generative AI with XDR to Explain Complex Security Incidents Generative AI (à la OpenAI’s GPT and the likes) is a powerful tool for summarizing information, transformations of text, transformation of code, all while doing so using its highly specialized ability to “speak” in a natural human language. While working with GPT APIs on several engineering projects an interesting idea came up in brainstorming, how well would it work when asked to describe information provided in raw JSON into natural language? The data in question were stories from our XDR engine, which provide a full timeline of security incidents along with all the observed information that ties to the incident such as traffic flows, events, source/target addresses and more. When inputted into the GPT mode, even very early results (i.e. before prompt engineering) were promising and we saw a very high potential to create a method to summarize entire security incidents into natural language and providing SOC teams that use our XDR platform a useful tool for investigation of incidents. Thus, the “XDR Story Summary” project, aka “XDR Storyteller” came into being, which is integrating GenAI directly into the XDR detection & response platform in the Cato Management Application (CMA). The summaries are presented in natural language and provide a concise presentation of all the different data points and the full timeline of an incident. Figure 1 - Story Summary in action in Cato Management Application (CMA) These are just two examples of the many different scenarios we POCed prior to starting development: Example use-case #1 – deeper insight into the details of an incident.GPT was able to add details into the AI summary which were not easily understood from the UI of the story, since it is comprised of multiple events.GPT could infer from a Suspicious Activity Monitoring (SAM) event, that in addition to the user trying to download a malicious script, he attempted to disable the McAfee and Defender services running on the endpoint. The GPT representation is built from reading a raw JSON of an XDR story, and while it is entirely textual which puts it in contrast to the visual UI representation it is able to combine data from multiple contexts into a single summary giving insights into aspects that can be complex to grasp from the UI alone. Figure 2 - Example of a summary of a raw JSON, from the OpenAI Playground Example use-case #2 – Using supporting playbooks to add remediation recommendations on top of the summary. By giving GPT an additional source of data via a playbook used by our Support teams, he was able to not only summarize a network event but also provide a concise Cato-specific recommended actions to take to resolve/investigate the incident. Figure 3 - Example of providing GPT with additional sources of data, from the OpenAI Playground Picking a GenAI model There are multiple aspects to consider when integrating a 3rd-party AI service (or any service handling your data for that matter), some are engineering oriented such as how to get the best results from the input and others are legal aspects pertaining to handling of our and our customer’s data. Before defining the challenges of working with a GenAI model, you actually need to pick the tool you’ll be integrating, while GPT-4 (OpenAI) might seem like the go-to choice due to its popularity and impressive feature set it is far from being the only option, examples being PaLM(Google), LLaMA (Meta), Claude-2 (Anthropic) and multiple others. We opted for a proof-of-concept (POC) between OpenAI’s GPT and Amazon’s Bedrock which is more of an AI platform allowing to decide which model to use (Foundation Model - FM) from a list of several supported FMs. [boxlink link=""] The Industry’s First SASE-based XDR Has Arrived | Download the eBook [/boxlink] Without going too much into the details of the POC in this specific post, we’ll jump to the result which is that we ended up integrating our solution with GPT. Both solutions showed good results, and going the Amazon Bedrock route had an inherent advantage in the legal and privacy aspects of moving customer data outside, due to: Amazon being an existing sub-processor since we widely use AWS across our platform. It is possible to link your own VPC to Bedrock avoiding moving traffic across the internet. Even so due to other engineering considerations we opted for GPT, solving the privacy hurdle in another way which we’ll go into below. Another worthy mention, a positive effect of running the POC is that it allowed us to build a model-agnostic design leaving the option to add additional AI sources in the future for reliability and better redundancy purposes. Challenges and solutions Let’s look at the challenges and solutions when building the “Storyteller” feature: Prompt engineering & context – for any task given to an AI to perform it is important to frame it correctly and give the AI context for an optimal result.For example, asking ChatGPT “Explain thermonuclear energy” and “Explain thermonuclear energy for a physics PHD” will yield very different results, and the same applies for cybersecurity. Since the desired output is aimed at security and operations personnel, we should therefore give the AI the right context, e.g. “You are an MDR analyst, provide a comprehensive summary where the recipient is the customer”. For better context, other than then source JSON to analyze, we add source material that GPT should use for the reply. In this case to better understand Figure 4 - Example of prompt engineering research from the OpenAI Playground Additional prompt statements can help control the output formatting and verbosity. A known trait of GenAI’s is that they do like to babble and can return excessively long replies, often with repetitive information. But since they are obedient (for now…) we can shape the replies by adding instructions such as “avoid repeating information” or “interpret the information, do not just describe it” to the prompts.Other prompt engineering statements can control the formatting itself of the reply, so self-explanatory instructions like “do not use lists”, “round numbers if they are too long” or “use ISO-8601 date format” can help shape the end result. Data privacy – a critical aspect when working with a 3rd party to which customer data which also contains PII is sent, and said data is of course also governed by the rigid compliance certifications Cato complies with such as SOC2, GDPR, etc. As mentioned above in certain circumstances such as when using AWS this can be solved by keeping everything in your own VPC, but when using OpenAI’s API a different approach was necessary. It’s worth noting that when using OpenAI’s Enterprise tier then indeed they guarantee that your prompts and data are NOT used for training their model, and other privacy related aspects like data retention control are available as well but nonetheless we wanted to address this on our side and not send Personal Identifiable Information (PII) at all.The solution was to encrypt by tokenization any fields that contain PII information before sending them. PII information in this context is anything revealing of the user or his specific activity, e.g. source IP, domains, URLs, geolocation, etc. In testing we’ve seen that not sending this data has no detrimental effect on the quality of the summary, so essentially before compiling the raw output to send for summarization we perform preprocessing on the data. Based on a predetermined list of fields which can or cannot be sent as-is we sanitize the raw data. Keeping a mapping of all obfuscated values, and once getting the response replacing again the obfuscated values with the sensitive fields for a complete and readable summary, without having any sensitive customer data ever leave our own cloud. Figure 5 - High level flow of PII obfuscation Rate limiting – like most cloud APIs, OpenAI is no different and applies various rate limits on requests to protect their own infrastructure from over-utilization. OpenAI specifically does this by assigning users a tier-based limit calculation based on their overall usage, this is an excellent practice overall and when designing a system that consumes such an API, certain aspects need to be taken into consideration: Code should be optimized (shouldn’t it always? 😉) so as not to “expend” the limited resources – number of requests per minute/day or request tokens. Measuring the rate and remaining tokens, with OpenAI this can be done by adding specific HTTP request headers (e.g., “x-ratelimit-remaining-tokens”) and looking at remaining limits in the response. Error handling in case a limit is reached, using backoff algorithms or simply retrying the request after a short period of time. Part of something bigger Much like the entire field of AI itself, the shaping and application of which we are now living through, the various applications in cybersecurity are still being researched and expanded on, and at Cato Networks we continue to invest heavily into AI & ML based technologies across our entire SASE platform. Including and not limited to the integration of many Machine Learning models into our cloud, for inline and out-of-band protection and detection (we’ll cover this in upcoming blog posts) and of course features like XDR Storyteller detailed in this post which harnesses GenAI for a simplified and more thorough analysis of security incidents.

Cato XDR Story Similarity – A Data Driven Incident Comparison and Severity Prediction Model

At Cato our number one goal has always been to simplify networking and security, we even wrote it on a cake once so it must... Read ›
Cato XDR Story Similarity – A Data Driven Incident Comparison and Severity Prediction Model At Cato our number one goal has always been to simplify networking and security, we even wrote it on a cake once so it must be true: Figure 1 - A birthday cake Applying this principle to our XDR offering, we aimed at reducing the complexity of analyzing security and network incidents, using a data-driven approach that is based on the vast amounts of data we see across our global network and collect into our data lake. On top of that, being able to provide a prediction of the threat type and the predicted verdict, i.e. if it is benign or suspicious. Upon analyzing XDR stories – a summary of events that comprise a network or security incident – many similarities can be observed both inside the network of a given customer, and even more so between different customers’ networks. Meaning, eventually a good deal of network and security incidents that occur in one network have a good chance of recurring in another. Akin to the MITRE ATT&CK Framework, which aims to group and inventory attack techniques demonstrating that there is always similarity of one sort or another between attacks.For example, a phishing campaign targeted at a specific industry, e.g. the banking sector, will likely repeat itself in multiple customer accounts from that same industry. In essence this allows crowdsourcing of sorts where all customers can benefit from the sum of our network and data. An important note is that we will never share data of one customer with another, upholding to our very strict privacy measures and data governance, but by comparing attacks and story verdicts across accounts we can still provide accurate predictions without sharing any data. The conclusion is that by learning from the past we can predict the future, using a combination of statistical algorithms we can determine with a high probability if a new story is related to a previously seen story and the likelihood of it being the same story with the same verdict, in turn cutting down the time to analyze the incident, freeing up the security team’s time to work on resolving it. Figure 2 - A XDR story with similarities The similarity metric – Jaccard Similarity Coefficient To identify whether incidents share a similarity we look at the targets, i.e. the destination domains/IPs involved in the incident, going over all our data and grouping the targets into clusters we then need to measure the strength of the relation between the clusters. To measure that we use the Jaccard index (also known as Jaccard similarity coefficient). The Jaccard coefficient measures similarity between finite sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets: Taking a more graphic example, given two sets of domains (i.e. targets), we can calculate the following by looking Figure 3 below. Figure 3 The size of the intersection between sets A & B is 1 (, and the size of the union is 5 (all domains summed). The Jaccard similarity between the sets would be 1/5 = 0.2 or in other words, if A & B are security incidents that involved these target domains, they have a similarity of 20%, which is a weak indicator and hence they should not be used to predict the other. The verification model - Louvain Algorithm Modularity is a measure used in community detection algorithms to assess the quality of a partition of a network into communities. It quantifies how well the nodes in a community are connected compared to how we would expect them to be connected in a random network. Using the Louvain algorithm, we detected communities of cyber incidents by considering common targets and using Jaccard similarity as the distance metric between incidents. Modularity ranges from -1 to 1, where a value close to 1 indicates a strong community structure within the network. Therefore, the modularity score achieved provides sufficient evidence that our approach of utilizing common targets is effective in identifying communities of related cyber incidents. To understand how modularity is calculated, let's consider a simplified example. Suppose we have a network of 10 cyber incidents, and our algorithm identifies two communities.Each community consists of the following incidents: Community 1: Incidents {A, B, C, D}Community 2: Incidents {E, F, G, H, I, J} The total number of edges connecting the incidents within each community can be calculated as follows: Community 1: 6 edges (A-B, A-C, A-D, B-C, B-D, C-D)Community 2: 15 edges (E-F, E-G, E-H, E-I, E-J, F-G, F-H, F-I, F-J, G-H, G-I, G-J, H-I, H-J, I-J) Additionally, we can calculate the total number of edges in the entire network: Total edges: 21 (6 within Community 1 + 15 within Community 2) Now, let's calculate the expected number of edges in a random network with the same node degrees.The node degrees in our network are as follows: Community 1: 3 (A, B, C, and D have a degree of 3)Community 2: 5 (E, F, G, H, I, and J have a degree of 5) To calculate the expected number of edges, we can use the following formula: Expected edges between two nodes (i, j) = (degree of node i * degree of node j) / (2 * total edges) For example, the expected number of edges between nodes A and B would be: (3 * 3) / (2 * 21) = 0.214 By calculating the expected number of edges for all pairs of nodes, we can obtain the expected number of edges within each community and in the entire network. Finally, we can use these values to calculate the modularity using the formula: Modularity = (actual number of edges - expected number of edges) / total edges The Louvain algorithm works iteratively to maximize the modularity score. It starts by assigning each node to its own community and then iteratively moves nodes between communities to increase the modularity value. The algorithm continues this process until no further improvement in modularity can be achieved. A practical example, in figure 4 below, using Gephi (an open-source graph visualization application), we have an example of a customers’ cyber incidents graph. The nodes are the cyber incidents, and the edges are weighted using the Jaccard similarity metric.We can see clear division of clusters with interconnected incidents showing that using Jaccard similarity on common targets is having great results. The colors of the clusters are based on the cyber incident type, and we can see that our approach is confirmed by having cyber incidents of multiple types clustered together. The big cluster in the center is composed of three very similar cyber incident types. This customers’ incidents in this example achieved a modularity score of 0.75. Figure 4 – Modularity verification visualization using Gephi In summary, the modularity value obtained after applying the Louvain algorithm over the entire dataset of customers and incidents, is about 0.71, which is considered high. This indicated that our approach of using common targets and Jaccard similarity as the distance metric is effective in detecting communities of cyber incidents in the network and served as validation of the design. [boxlink link=""] The Industry’s First SASE-based XDR Has Arrived | Download the eBook [/boxlink] Architecting to run at scale The above was a very simplified example of how to measure similarity. Running this at scale over our entire data lake presented a scaling challenge that we opted to solve using a serverless architecture that can scale on-demand based on AWS Lambda.Lambda is an event-driven serverless platform allowing you to run code/specific functions on-demand and to scale automatically using an API Gateway service in front of your Lambdas.In the figure below we can see the distribution of Lambda invocations over a given week, and the number of parallel executions demonstrating the flexibility and scaling that the architecture allows for. Figure 5 - AWS Lambda execution metrics The Cato XDR Service runs on top of data from our data lake once a day, creating all the XDR stories. Part of every story creation is also to determine the similarity score, achieved by invoking the Lambda function. Oftentimes Lambda’s are ready to use functions that contain the code inside the Lambda, in our case to fit our development and deployment models we chose to use Lambda’s ability to run Docker images through ECR (Elastic Container Registry). The similarity model is coded in Python, which runs inside the Docker image, executed by Lambda every time it runs. The backend of the Lambda is a DocumentDB cluster, a NoSQL database offered by AWS which is also MongoDB compliant and performs very well for querying large datasets. In the DB we store the last 6 months of story similarity data, and every invocation of the Lambda uses this data to determine similarity by applying the Jaccard index on the data, returning a dataset with the results back to the XDR service. Figure 6 - High level diagram of similarity calculation with Lambda An additional standalone phase of this workflow is keeping the DocDB database up to date with data of stories and targets to keep similarity calculation relevant and accurate.The update phase runs daily, orchestrated using Apache Airflow, an open-source workflow management platform which is very suited for this and used for many of our data engineering workflows as well. Airflow triggers a different Lambda instance, technically running the same Docker image as before but invoking a different function to update the database. Figure 7 - DocDB update workflow Ultimate impact and what's next We’ve reviewed how by leveraging a data-driven approach we were able to address the complexity of analyzing security and network incidents by linking them to already identified threats and predicting their verdict.Overall, in our analysis we saw that a little over 30% of incidents have a similar incident linked to them, this is a very strong and indicative result, ultimately meaning we can help reduce the time it takes to investigate a third of the incidents across a network.As IT & Security teams continue to struggle with staff shortages to keep up with the ongoing and constant flow of cybersecurity incidents, capabilities such as this go a long way to reduce the workload and fatigue, allowing teams to focus on what’s important. Using effective and easy to implement algorithms coupled with a highly scalable serverless infrastructure using AWS Lambda we were able to achieve a powerful solution that can meet the requirement of processing massive amounts of data. Future enhancements being researched involve comparing entire XDR stories to provide an even stronger prediction model, for example by identifying similarity between incidents even if they do not share the same targets through different vectors.Stay tuned.

Enhancing Security and Asset Management with AI/ML in Cato Networks’ SASE Product

We just introduced what we believe is a unique application of real-time, deep learning (DL) algorithms to network prevention. The announcement is hardly our foray... Read ›
Enhancing Security and Asset Management with AI/ML in Cato Networks’ SASE Product We just introduced what we believe is a unique application of real-time, deep learning (DL) algorithms to network prevention. The announcement is hardly our foray into artificial intelligence (AI) and machine learning (ML). The technologies have long played a pivotal role in augmenting Cato's SASE security and networking capabilities, enabling advanced threat prevention and efficient asset management. Let's take a closer look.  What is Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL)?  Before diving into the details of Cato's approach to AI, ML, and DL, let's provide some context around the technologies. AI is the overarching concept of creating machines that can perform tasks typically requiring human intelligence, such as learning, reasoning, problem-solving, understanding natural language, and perception. One example of AI applications is in healthcare, where AI-powered systems can assist doctors in diagnosing diseases or recommending personalized treatment plans.  ML is a subset of AI that focuses on developing algorithms to learn from and make predictions based on data. These algorithms identify patterns and relationships within datasets, allowing a system to make data-driven decisions without explicit programming. An example of an ML application is in finance, where algorithms are used for credit scoring, fraud detection, and algorithmic trading to optimize investment strategy and risk management.  Deep Learning (DL) is a subset of ML, employing artificial neural networks to process data and mimic the human brain's decision-making capabilities. These networks consist of multiple interconnected layers capable of extracting higher-level features and patterns from vast amounts of data. A popular use of DL is seen in self-driving vehicles, where complex image recognition algorithms allow the vehicle to detect and respond appropriately to traffic signs, pedestrians, and other obstacles to ensure safe driving.  Overcoming Challenges in Implementing AI/ML for Real-time Network Security Monitoring  Implementing DL and ML for Cato customers presents several challenges. Cato handles and monitors terabytes of customer network traffic daily. Processing that much data requires a tremendous amount of compute capacity. Falsely flagging network activity as an attack could materially impact our customer's operations so our algorithms must be incredibly accurate. Additionally, we can't interfere with our user's experience, leaving just milliseconds to perform real-time inference.   Cato tackles these challenges by running our DL and ML algorithms on Cato's cloud infrastructure. Being able to run in the cloud enables us to use the cloud's ubiquitous compute and storage capacity. In addition, we've taken advantage of cloud infrastructure advancements, such as AWS SageMaker. SageMaker is a cloud-based platform that provides a comprehensive set of tools and services for building, training, and deploying machine learning models at scale. Finally, Cato's data lake provides a rich data set, converging networking metadata with security information, to better train our algorithms.   With these technologies, we have successfully deployed and optimized our ML algorithms, meticulously reducing the risks associated with false flagging network activity and ensuring real-time inference. The Cato algorithms monitor network traffic in real-time while maintaining low false positive rates and high detection rates.  How Cato Uses Deep Learning to Enhance Threat Detection and Prevention  Using DL techniques, Cato harnesses the power of artificial intelligence to amplify the effectiveness of threat detection and prevention, thereby fortifying network security and safeguarding users against diverse and evolving cyber risks. DL is used in many different ways in Cato SASE Cloud.  For example, we use DL for DNS protection by integrating deep learning models within Cato IPS to detect Command and Control (C2) communication originating from Domain Generation Algorithm (DGA) domains, the essence of our launch today, and DNS tunneling. By running these models inline on enormous amounts of network traffic, Cato Networks can effectively identify and mitigate threats associated with malicious communication channels, preventing real-time unauthorized access and data breaches in milliseconds.  [boxlink link=""] Eliminate Threat Intelligence False Positives with SASE | Download the eBook [/boxlink] We stop phishing attempts through text and image analysis by detecting flows to known brands with low reputations and newly registered websites associated with phishing attempts. By training models on vast datasets of brand information and visual content, Cato Networks can swiftly identify potential phishing sites, protecting users from falling victim to fraudulent schemes that exploit their trust in reputable brands.  We also prioritize incidents for enhanced security with machine learning. Cato identifies attack patterns using aggregations on customer network activity and the classical ML Random Forest algorithm, enabling security analysts to focus on high-priority incidents based on the model score.  The prioritization model considers client group characteristics, time-related metrics, MITRE ATT&CK framework flags, server IP geolocation, and network features. By evaluating these varied factors, the model boosts incident response efficiency, streamlining the process, and ensures clients' networks' security and resilience against emerging threats.  Finally, we leverage ML and clustering for enhanced threat prediction. Cato harnesses the power of collective intelligence to predict the risk and type of threat of new incidents. We employ advanced ML techniques, such as clustering and Naive Bayes-like algorithms, on previously handled security incidents. This data-driven approach using forensics-based distance metrics between events enables us to identify similarities among incidents. We can then identify new incidents with similar networking attributes to predict risk and threat accurately.   How Cato Uses AI and ML in Asset Visibility and Risk Assessment  In addition to using ML for threat detection and prevention, we also tap AI and ML for identifying and assessing the risk of assets connecting to Cato. Understanding the operating system and device types is critical to that risk assessment, as it allows organizations to gain insights into the asset landscape and enforce tailored security policies based on each asset's unique characteristics and vulnerabilities.  Cato assesses the risk of a device by inspecting traffic coming from client device applications and software. This approach operates on all devices connected to the network. By contrast, relying on client-side applications is only effective for known supported devices. By leveraging powerful AI/ML algorithms, Cato continuously monitors device behavior and identifies potential vulnerabilities associated with outdated software versions and risky applications.  For OS Type Detection, Cato's AI/ML capabilities accurately identify the operating system type of agentless devices connected to the network. This information provides valuable insights into the security posture of individual devices and enables organizations to enforce appropriate security policies tailored to different operating systems, strengthening overall network security.  Cato Will Continue to Expand its ML/AI Usage Cato will continue looking at ways of tapping ML and AI to simplify security and improve its effectiveness. Keep an eye on this blog as we publish new findings.