We just introduced what we believe is a unique application of real-time, deep learning (DL) algorithms to network prevention. The announcement is hardly our foray...
Enhancing Security and Asset Management with AI/ML in Cato Networks’ SASE Product We just introduced what we believe is a unique application of real-time, deep learning (DL) algorithms to network prevention. The announcement is hardly our foray into artificial intelligence (AI) and machine learning (ML). The technologies have long played a pivotal role in augmenting Cato's SASE security and networking capabilities, enabling advanced threat prevention and efficient asset management. Let's take a closer look.
What is Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL)?
Before diving into the details of Cato's approach to AI, ML, and DL, let's provide some context around the technologies. AI is the overarching concept of creating machines that can perform tasks typically requiring human intelligence, such as learning, reasoning, problem-solving, understanding natural language, and perception. One example of AI applications is in healthcare, where AI-powered systems can assist doctors in diagnosing diseases or recommending personalized treatment plans.
ML is a subset of AI that focuses on developing algorithms to learn from and make predictions based on data. These algorithms identify patterns and relationships within datasets, allowing a system to make data-driven decisions without explicit programming. An example of an ML application is in finance, where algorithms are used for credit scoring, fraud detection, and algorithmic trading to optimize investment strategy and risk management.
Deep Learning (DL) is a subset of ML, employing artificial neural networks to process data and mimic the human brain's decision-making capabilities. These networks consist of multiple interconnected layers capable of extracting higher-level features and patterns from vast amounts of data. A popular use of DL is seen in self-driving vehicles, where complex image recognition algorithms allow the vehicle to detect and respond appropriately to traffic signs, pedestrians, and other obstacles to ensure safe driving.
Overcoming Challenges in Implementing AI/ML for Real-time Network Security Monitoring
Implementing DL and ML for Cato customers presents several challenges. Cato handles and monitors terabytes of customer network traffic daily. Processing that much data requires a tremendous amount of compute capacity. Falsely flagging network activity as an attack could materially impact our customer's operations so our algorithms must be incredibly accurate. Additionally, we can't interfere with our user's experience, leaving just milliseconds to perform real-time inference.
Cato tackles these challenges by running our DL and ML algorithms on Cato's cloud infrastructure. Being able to run in the cloud enables us to use the cloud's ubiquitous compute and storage capacity. In addition, we've taken advantage of cloud infrastructure advancements, such as AWS SageMaker. SageMaker is a cloud-based platform that provides a comprehensive set of tools and services for building, training, and deploying machine learning models at scale. Finally, Cato's data lake provides a rich data set, converging networking metadata with security information, to better train our algorithms.
With these technologies, we have successfully deployed and optimized our ML algorithms, meticulously reducing the risks associated with false flagging network activity and ensuring real-time inference. The Cato algorithms monitor network traffic in real-time while maintaining low false positive rates and high detection rates.
How Cato Uses Deep Learning to Enhance Threat Detection and Prevention
Using DL techniques, Cato harnesses the power of artificial intelligence to amplify the effectiveness of threat detection and prevention, thereby fortifying network security and safeguarding users against diverse and evolving cyber risks. DL is used in many different ways in Cato SASE Cloud.
For example, we use DL for DNS protection by integrating deep learning models within Cato IPS to detect Command and Control (C2) communication originating from Domain Generation Algorithm (DGA) domains, the essence of our launch today, and DNS tunneling. By running these models inline on enormous amounts of network traffic, Cato Networks can effectively identify and mitigate threats associated with malicious communication channels, preventing real-time unauthorized access and data breaches in milliseconds.
[boxlink link="https://www.catonetworks.com/resources/eliminate-threat-intelligence-false-positives-with-sase/"] Eliminate Threat Intelligence False Positives with SASE | Download the eBook [/boxlink]
We stop phishing attempts through text and image analysis by detecting flows to known brands with low reputations and newly registered websites associated with phishing attempts. By training models on vast datasets of brand information and visual content, Cato Networks can swiftly identify potential phishing sites, protecting users from falling victim to fraudulent schemes that exploit their trust in reputable brands.
We also prioritize incidents for enhanced security with machine learning. Cato identifies attack patterns using aggregations on customer network activity and the classical ML Random Forest algorithm, enabling security analysts to focus on high-priority incidents based on the model score.
The prioritization model considers client group characteristics, time-related metrics, MITRE ATT&CK framework flags, server IP geolocation, and network features. By evaluating these varied factors, the model boosts incident response efficiency, streamlining the process, and ensures clients' networks' security and resilience against emerging threats.
Finally, we leverage ML and clustering for enhanced threat prediction. Cato harnesses the power of collective intelligence to predict the risk and type of threat of new incidents. We employ advanced ML techniques, such as clustering and Naive Bayes-like algorithms, on previously handled security incidents. This data-driven approach using forensics-based distance metrics between events enables us to identify similarities among incidents. We can then identify new incidents with similar networking attributes to predict risk and threat accurately.
How Cato Uses AI and ML in Asset Visibility and Risk Assessment
In addition to using ML for threat detection and prevention, we also tap AI and ML for identifying and assessing the risk of assets connecting to Cato. Understanding the operating system and device types is critical to that risk assessment, as it allows organizations to gain insights into the asset landscape and enforce tailored security policies based on each asset's unique characteristics and vulnerabilities.
Cato assesses the risk of a device by inspecting traffic coming from client device applications and software. This approach operates on all devices connected to the network. By contrast, relying on client-side applications is only effective for known supported devices. By leveraging powerful AI/ML algorithms, Cato continuously monitors device behavior and identifies potential vulnerabilities associated with outdated software versions and risky applications.
For OS Type Detection, Cato's AI/ML capabilities accurately identify the operating system type of agentless devices connected to the network. This information provides valuable insights into the security posture of individual devices and enables organizations to enforce appropriate security policies tailored to different operating systems, strengthening overall network security.
Cato Will Continue to Expand its ML/AI Usage
Cato will continue looking at ways of tapping ML and AI to simplify security and improve its effectiveness. Keep an eye on this blog as we publish new findings.
Successfully Identifying operating systems in organizations has become a crucial part of network security and asset management products. With this information, IT and security departments...
IoT has an identity problem. Here’s how to solve it Successfully Identifying operating systems in organizations has become a crucial part of network security and asset management products. With this information, IT and security departments can gain greater visibility and control over their network. When a software agent is installed on a host, this task becomes trivial. However, several OS types, mainly for embedded and IoT devices, are unmanaged or aren’t suitable to run an agent.
Fortunately, identification can also be done with a much more passive method, that doesn’t require installation of software on endpoint devices and works for most OS types. This method, called passive OS fingerprinting, involves matching uniquely identifying patterns in the network traffic a host produces, and classifying it accordingly. In most cases, these patterns are evaluated on a single network packet, rather than a sequence of flows between a client host and a server.
There exist several protocols, from different network layers that can be used for OS fingerprinting. In this post we will cover those that are most commonly used today. Figure 1 displays these protocols, based on the Open Systems Interconnection (OSI) model. As a rule of thumb, protocols at the lower levels of the OSI stack provide better reliability with lower granularity compared to those on the upper levels of stack, and vice versa.
Figure 1: Different network protocols for OS identification based on the OSI model
Starting from the bottom of the stack, at the data link layer, exists the medium access control (MAC) protocol. Over this protocol, a unique physical identifier, called the MAC address, is allocated to the network interface card (NIC) of each network device. The address, which is hardcoded into the device at manufacturing, is composed of 12 hexadecimal digits, which are commonly represented as 6 pairs divided by hyphens. From these hexadecimal digits, the leftmost six represent the manufacturer's unique identifier, while the rightmost six represent the serial number of the NIC. In the context of OS identification, using the manufacturer's unique identifier, we can infer the type of device running in the network, and in some cases, even the OS.
In Figure 2, we see a packet capture from a MacBook laptop communicating over Ethernet. The 6 leftmost digits of the source MAC address are 88:66:5a, and affiliated with “Apple, Inc.” manufacturer.
Figure 2: an “Apple, Inc.” MAC address in the data link layer of a packet capture
Moving up the stack, at the network and transport layers, is a much more granular source of information, the TCP/IP stack. Fingerprinting based on TCP/IP information stems from the fact that the TCP and IP protocols have certain parameters, from the header segment of the packet, that are left up for implementation, and most OSes select unique values for these parameters.
[boxlink link="https://www.catonetworks.com/resources/cato-networks-sase-threat-research-report/"] Cato Networks SASE Threat Research Report H2/2022 | Download the Report [/boxlink]
Some of the most commonly used today for identification are initial time to live (TTL), Windows Size, “Don't Fragment” flag, and TCP options (values and order). In Figure 3 and Figure 4, we can see a packet capture of a MacBook laptop initiating a TCP connection to a remote server. For each outgoing packet, The IP header includes a combination of flags and an initial TTL value that is common for MacOS hosts, as well as the first “SYN” packet of the TCP handshake with the Windows Size value and the set of TCP options. The combination of these values is sufficient to identify this host as a MacOS.
Figure 3: Different header values from the IP protocol in the network layer
Figure 4: Different header values from the TCP protocol in the transport layer
At the upper most level of the stack, in the application layer, several different protocols can be used for identification. While providing a high level of granularity, that often indicates not only the OS type but also the exact version or distribution, some of the indicators in these protocols are open to user configuration, and therefore, provide lower reliability.
Perhaps the most common protocol in the application level used for OS identification is HTTP. Applications communicating over the web often add a User-Agent field in the HTTP headers, which allows network peers to identify the application, OS, and underlying device of the client.
In Figure 5, we can see a packet capture of an HTTP connection from a browser. After the TCP handshake, the first HTTP request to the server contains a User-Agent field which identifies the client as a Firefox browser, running on a Windows 7 OS.
Figure 5: Detecting a Windows 7 OS from the User-Agent field in the HTTP headers
However, the User-Agent field, is not the only OS indicator that can be found over the HTTP protocol. While being completely nontransparent to the end-user, most OSes have a unique implementation of connectivity tests that automatically run when the host connects to a public network. A good example for this scenario is Microsoft’s Network Connectivity Status Indicator (NCSI). The NCSI is an internet connection awareness protocol used in Microsoft's Windows OSes. It is composed of a sequence of specifically crafted DNS and HTTP requests and responses that help indicate if the host is located behind a captive portal or a proxy server. In Figure 6, we can see a packet capture of a Windows host performing a connectivity test based on the NCSI protocol. After a TCP handshake is conducted, an HTTP GET request is sent to http://www.msftncsi.com/ncsi.txt.
Figure 6: Windows host running a connectivity test based on the NCSI protocol
The last protocol we will cover in the application layer, is DHCP. The DHCP protocol, used for IP assignment over the network. Overall, this process is composed of 4 steps: Discovery, Offer, Request and Acknowledge (DORA). In these exchanges, several granular OS indicators are provided in the DHCP options of the message. In Figure 7, we can see a packet capture of a client host (192.168.1.111) that is broadcasting DHCP messages over the LAN and receiving replies from a DHCP server (192.168.1.1). The first DHCP Inform message, contains the DHCP option number 60 (vendor class identifier) with the value of “MSFT 5.0”, associated with a Microsoft Windows client. In addition, the DHCP option number 55 (parameter request list) contains a sequence of values that is common for Windows OSes. Combined with the order of the DHCP options themselves, these indicators are sufficient to identify this host as a Windows OS.
Figure 7: Using DHCP options for OS identification
In this post, we’ve introduced the task of OS identification from network traffic and covered some of the most commonly used protocols these days. While some protocols provide better accuracy than others, there is no 'silver bullet' for this task, and we’ve seen the tradeoff between granularity and reliability with the different options. Rather than fingerprinting based on a single protocol, you might consider a multi-protocol approach. For example, an HTTP User-Agent combined with lower-level TCP options fingerprint.
Identification of OS-level client types over IP networks has become crucial for network security vendors. With this information, security administrators can gain greater visibility into...
Inside Cato: How a Data Driven Approach Improves Client Identification in Enterprise Networks Identification of OS-level client types over IP networks has become crucial for network security vendors. With this information, security administrators can gain greater visibility into their networks, differentiate between legitimate human activity and suspicious bot activity, and identify potentially unwanted software. The process of identifying clients by their network traces is, however, very challenging. Most of the common methods being applied today require a great deal of manual work, advanced domain expertise, and are prone to misclassification. Using a data-driven approach based on machine learning, Cato recently developed a new technology that addresses these problems, enabling accurate and scalable identification of network clients.
Going “old school” with manual fingerprinting
One of the most common methods to passively identify network clients, without requiring access to either endpoint, is fingerprinting. Imagine you are a police investigator arriving at a crime scene for forensics. Lucky for you, the suspect left an intact fingerprint. Since he is a well-known criminal with previous offenses, his fingerprints are already in the police database, and you can use the one you found to trace back to him. Like humans, network clients also leave unique traces that can be used to identify them. In your network, combinations of attributes such as HTTP headers order, user-agents, and TLS cipher suites, are unique to certain clients.
[boxlink link="https://catonetworks.easywebinar.live/registration-ransomware-chokepoints"] Ransomware Chokepoints: Disrupt the Attack | Watch Webinar [/boxlink]
In recent years, fingerprints relying solely on unsecured network traffic attributes (e.g., the HTTP user-agent value) have become obsolete, since they are easy to spoof, and are not applicable when using secured connections. TLS fingerprints, that rely on attributes from the Client Hello message of the TLS handshake, on the other hand, do not suffer from these drawbacks and are slowly gaining larger adaptation from security vendors.
Below is an example of a TLS fingerprint that identifies a OneDrive client.
Caption: TLS header fingerprint of a One Drive client (source)
However, manually mapping network clients to unique identifiers is not a simple task; It requires in-depth domain knowledge and expertise. Without them, the method is prone to misclassifications. In a shared community effort to address this issue, some open-source projects (e.g. sslhaf and JA3) were created, but they provide low coverage and are not updated frequently.
An even greater issue with manual fingerprinting is scalability. Accurately classifying client types requires manually analyzing traffic captures, a labor-intensive process that does not scale for enterprise networks.
Taking such an approach at Cato wasn’t feasible. Each day the Cato SASE Cloud must inspect millions of unique TLS handshake values. The large number of values is due not only to the number of network clients connected to Cato SASE Cloud but also to the number of different versioning and updates that alter the TLS behavior of each client. Clearly, we needed a better solution.
Clustering – An automated and robust approach
With great amounts of data, comes great amounts of capabilities. With the use of machine learning clustering algorithms, we’ve managed to reduce millions of unique TLS handshake values to a subset of a few hundred clusters, representing unique network clients that share similar values. After creating the clusters, a single fingerprint can be generated for each one, using the longest common substring (LCS) from the Client Hello message of all the samples in the cluster. Finding the common denominator of several samples makes the approach more robust to small variations in the message values.
Caption: Similar values from the TLS Client Hello message are clustered together and the LCS is used to generate a fingerprint. Each colored cluster represents a different client.
The next step of the process is to identify and label the client of the fingerprint. To do so, we search for the fingerprint in our data lake, containing terabytes of daily network flows generated from different clients, and look for common attributes such as domains or applications contacted, and HTTP user-agent (visible from TLS traffic interception and inspection).
For example, in the plot above, a group of TLS network flows, containing similar Client Hello messages, were clustered together by the algorithm (see the 3-point “Java/1.8.0_211” cluster colored in light blue). The resulting TLS fingerprint matched a group of inspected TLS flows in our data lake, with visible HTTP headers; all of which had a common user-agent that belongs to the Java Standard Library; a library used to perform HTTP requests.
Accurately identifying client types has become crucial for network security vendors. With this new data-driven approach, we’ve managed to develop a fully automated and continuous pipeline that generates TLS fingerprints. The new method scales to large enterprise networks and is more robust to variation in the client's network activity.
A technique long used for profiting from the brand strength of popular domain names is finding increased use in phishing attacks. Cybersquatting (also called domain...
Cato Networks Adds Protection from the Perils of Cybersquatting A technique long used for profiting from the brand strength of popular domain names is finding increased use in phishing attacks. Cybersquatting (also called domain squatting) is the use of a domain name with the intent to profit from the goodwill of a trademark belonging to someone else. Increasingly, attackers are tapping cybersquatting to harvest user credentials.
Last month, one such campaign targeted 1,000 users at a high-profile communications company with an email containing a supposed secure file link from an email security vendor. Once clicked, the link led to a spoofed Proofpoint page with login links for different email providers.
So prevalent are these threats that Cato Networks has added cybersquatting protection to our service. Over the past month, we’ve detected 5,000 unique squatted domains for more than 50 well-known trademarks. These domains follow certain patterns. By understanding these patterns, you’ll be more likely to protect your organization from this new threat.
[boxlink link="https://www.catonetworks.com/resources/ransomware-is-on-the-rise-catos-security-as-a-service-can-help?utm_source=blog&utm_medium=top_cta&utm_campaign=ransomware_ebook"] Ransomware is on the Rise | Download eBook [/boxlink]
Types of Cybersquatting
There are several techniques for creating domains that may trick unsuspecting users. Here are four of the most common:
Typosquatting creates domain names that incorporate typical typos users input when attempting to access a legitimate site. A perfect example is catonetwrks.com, which leaves out the “o” in networks. The user mistypes Cato’s Web site and ends up interacting with another site used to spread misinformation, redirect the user, or download malware to the user’s system.
Combosquatting creates a domain that combines the legitimate domain with additional words or letters. For example, cato-networks.com adds a hyphen to Cato’s URL catonetworks.com. Combosquatting is often used for links in phishing emails.
Here are two examples of counterfeit websites that use combosquatting to prompt the user to submit sensitive information. The domain names, amazon-verifications[.]com and amazonverification[.]tk, make the user think they are interacting with a legitimate website owned by Amazon.
[caption id="attachment_20714" align="alignnone" width="860"] Figure 1- Examples of combosquatting[/caption]
Levelsquatting inserts the target domain into the subdomain of the cybersquatting URL. This attack usually targets mobile device users who may only see part of the URL displayed in the small mobile-device browser window. A perfect example of levelsquatting would be login.catonetworks.com.fake.com. The user may only see the prefix of login.catonetworks.com on his Apple or Android screen and thinks it’s a legitimate Cato Networks login site.
Homographsquatting uses various character combinations that resemble the target domain visually. One example is catonet0rks.com, which uses a zero digit that looks like the letter “o” or ccitonetworks.com, where the combination of “c” and “i” after the initial “c” looks to users like the letter “a.”
Homographsquatting can also use Punycode to include non-ASCII characters in international domain names (IDN). An example would be cаtonetworks.com (xn--ctonetworks-yij.com in Punycode). In this case the “a” is a non-ASCII character from the Cyrillic alphabet.
Here is a non-malicious example of a Facebook homograph domain (xn--facebok-y0a[.]com in Punycode) offered for sale. The squatted domain is used for the owner’s personal profit.
[caption id="attachment_20716" align="alignnone" width="1600"] Figure 2- Example of homographsquatting targeting Facebook users[/caption]
And here is another use of homographsquatting, this time going after Microsoft users. The domain name - nnicrosoft[.]online – uses double “n”s to look like the “m” in “microsoft.”
[caption id="attachment_20718" align="alignnone" width="1600"] Figure 3- Example of homographsquatting targeting Microsoft users[/caption]
How to Detect Cybersquatting
To detect cybersquatted domains, Cato Networks uses a method called Damerau-Levenshtein distance. This approach counts the minimum number of operations (insertions, deletions, substitution, or transposition of two adjacent characters) needed to change one word into the other.
For example, netflex.com has an edit distance of 1 from the legitimate site, netflix.com via the substitution the “i” character with “e”.
[caption id="attachment_20722" align="alignnone" width="861"] Figure 4 – Substitution of the “i” character with an “e” in the netflix domain.[/caption]
Cato Networks configures the edit distance used to classify squatted domains dynamically for each squatted trademark, taking into consideration the length and word similarity. Think of the words that can be generated with a 2 edit distance from the name Instagram or DHL, for example.
We also look at who registered the domain. You might be surprised to learn that many domains of trademarks with common typos are registered by the trademark owner to redirect the user to the correct site. Detecting a domain registered by anyone other than the trademark owner arouses suspicion.
Checking the domain age and registrar also turns up clues. Newly registered domains and domains from low-reputation registrars are more likely to be associated with unwanted and malicious activity than others.
Separating Squatted from Non-squatted Domains
In October 2021 alone, Cato Networks used these methods to detect more than 5,000 unique squatted domains for more than 50 well-known trademarks. The graphic below shows that fewer than 20% were owned by the legitimate trademark owner.
[caption id="attachment_20724" align="alignnone" width="1200"] Figure 5 – Distribution of ownership in the detected domains.[/caption]
Additionally, Cato’s data shows that legitimate companies tend to register domains that include their trademark with combinations of other characters and typical typos. Domains that are not registered by trademark owners tend to have a higher percentage of trademarks in the subdomain level, i.e. levelsquatting.
[caption id="attachment_20726" align="alignnone" width="1200"] Figure 6 - Distribution of squatting techniques in domains not registered by trademark owners.[/caption]
Finally, this graphic of Cato Networks data shows that many of the squatted domains target search engines, social media, Office suites and e-commerce websites.
[caption id="attachment_20728" align="alignnone" width="1200"] Figure 7- Top targeted trademarks.[/caption]
Don’t Wait to Identify Cybersquatting
There is no doubt that cybersquatting can be used in a variety of ways to target unsuspecting users and companies for a data breach. Organizations need to educate themselves on the perils of cybersquatting and incorporate tools and techniques for detecting phishing and other attacks that use this method for nefarious purposes. The good news is that Cato customers can now take advantage of Cato’s cybersquatting detection to protect their users and precious data.