9m read

What Is Prompt Injection?

What’s inside?

Cato Networks named a Leader in the 2024 Gartner® Magic Quadrant™ for Single-Vendor SASE

Get the report

Prompt injection attacks use malicious, crafted queries to trick an LLM into taking some undesirable action. For instance, an attacker might convince the GenAI system to ignore guardrails or corporate policies and generate unapproved types of content.

With widespread integration of GenAI and AI agents into corporate workflows, prompt injection attacks pose a significant risk to service reliability and security. Attackers can trick agents into taking actions that lead to data breaches, malware infections, or other security incidents.

Key Highlights

  • Prompt injection manipulates model instructions to trigger unsafe outputs, data exposure, or unintended actions.
  • Indirect prompt injection can occur through untrusted content such as web pages, documents, emails, tickets, or chat logs.
  • Risk increases when models can use tools, access data sources, or take actions.
  • Mitigations are layered. Input and output controls help, but “perfect prevention” is not a realistic assumption.
  • The goal is to reduce the blast radius by using least privilege, strong data boundaries, and verification steps for sensitive actions.

How Does Prompt Injection Work in Practice?

LLMs commonly have instructions and guardrails in place to control how they operate. LLM creators build in certain safeguards, and enterprises add their own instructions as they configure tools to perform various tasks within their environments.

Prompt injection attacks use carefully crafted inputs designed to evade these guardrails and do something that benefits the attacker, such as stealing sensitive data, producing policy-violating outputs, or triggering actions in connected tools. These inputs could be provided directly to the tool or embedded in other content that it consumes, such as RAG-retrieved documents or webpages that an agent visits.

Direct Prompt Injection

Direct prompt injection involves the attacker interacting directly with the agent. For instance, an attacker might be entering prompts into an LLM chatbot. These malicious prompts can follow various patterns, such as:

  • Ignore previous instructions
  • Coercive roleplaying
  • Developer mode
  • Policy override language

The goal of these instructions is to get the attacker’s prompt to be prioritized over the LLM guardrails, either by exploiting a loophole or being perceived as more important by the LLM. If successful, the LLM may generate disallowed content, leak hidden instructions, provide unsafe recommendations, or misroute actions to the wrong tool or user.

Indirect Prompt Injection

Indirect prompt injection attacks insert malicious instructions into the content that a model consumes rather than direct instructions. These could include:

  • Webpages
  • Documents
  • Emails
  • IT tickets
  • Chat transcripts
  • Knowledge bases

These threats are more significant if an LLM accesses external content, either via a RAG or by browsing the web for potential answers, since the retrieved data is treated as context by the LLM. Malicious instructions can literally be hidden in content, such as using white text to make instructions invisible to human readers or the use of benign-looking text that a model interprets as instructions.

The Main Types of Prompt Injection Attacks

Prompt injection attacks can be performed to achieve various goals. Two of the main types of attacks attempt to access sensitive information about the system prompt or to manipulate an AI’s tool usage to collect sensitive information or perform harmful actions.

Prompt Leakage and System Prompt Extraction

Some attacks are designed to access the system prompt, which can include sensitive data, such as hidden system instructions, policies, or secrets. This information can be useful to an attacker since it allows them to craft follow-on attacks that evade these defenses.

The potential for prompt leakage means that secrets should never be embedded in prompts. Security guardrails like “no reveal” instructions may be circumvented or defeated by an attacker.

Tool and Agent Hijacking

AI tools may be able to call functions or APIs, browse the internet, or trigger workflows. A carefully crafted prompt may allow an attacker to define which tools are selected, the parameters sent to them, and the sequence in which actions are called.

This poses a risk that an attacker may be able to access sensitive data when making calls to ticketing systems, CRM, document stores, code repos, or cloud consoles. To manage this risk, organizations should implement least privilege access controls and human-in-the-loop approval gates for high-impact actions, especially for AI agents that operate without human oversight.

Where Does Prompt Injection Show Up in Enterprise Environments?

Prompt injection attacks can occur anywhere that an organization is using GenAI. Common examples include:

  • Public chat tools
  • Embedded assistants
  • Customer support bots
  • Internal knowledge assistants
  • Developer copilots
  • Autonomous agents

Even if users don’t interact directly with these tools, prompt injection is still a risk. Tools can access web content, attachments, pasted logs, third-party SaaS comments, and other untrusted inputs that could include malicious instructions.

Chatbots and Customer-facing Assistants

Chatbots and customer-facing agents accept free-form text input from unknown users. Successful prompt injection attacks could involve manipulating outputs to mislead customers and potential data exposure if the bot has access to the CRM or order systems.

To manage these risks, organizations should implement strict data scopes, redact sensitive data when possible, and specify templated responses for sensitive requests. Additionally, these tools should be monitored for signs of potential abuse, including request rates, repetition, and payload similarity.

RAG and “Ask Your Knowledge Base” Assistants

RAG and “ask your knowledge base” assistants have the ability to access documents and other content that might be elevated above corporate policy as part of the LLM’s context. As a result, malicious content embedded in webpages and other resources can allow an attacker to evade guardrails.

Organizations can mitigate these risks by implementing content labeling, retrieval allowlists, and prompt compartmentalization. Additionally, least-privilege access controls limit the data that these tools can access, reducing the potential impacts of an attack.

Impacts of Prompt Injection on Enterprise Security

Prompt injection attacks can influence AI-powered tools into doing the attacker’s bidding. This can introduce significant threats to confidentiality, integrity, and availability due to the potential for data breaches, manipulated outputs, and corrupted workflows.

A successful prompt injection attack can be difficult to investigate due to the use of legitimate tools and limited visibility into these tools. This poses a significant risk to regulatory compliance as well if organizations lack the data to prove what happened and the scope of the incident.

Common Business Impacts

Prompt injection attacks can have a variety of different impacts on the enterprise, including:

  • Leakage of customer data
  • Exposure of intellectual property (IP)
  • Service downtime
  • Incorrect responses to customer queries
  • Decision-making based on inaccurate data
  • Reputational and financial damage
  • Regulatory penalties and legal exposure

How Can Prompt Injection Be Detected?

Detecting prompt injection requires analyzing LLM inputs, outputs, and behavior to identify anomalies or violations of corporate policy. To do so, organizations should log prompts, responses, and tool calls for monitoring. Analysis should look for anomalies, policy violations, and repeated payload patterns.

Indicators in Prompts, Context, and Tool Calls

Prompts, context, and tool calls can include indicators of prompt injection attacks. Things to look for include:

  • Suspicious instruction patterns
  • Coercive language
  • Unusual frequency of tool calls
  • Unusual tool call destinations
  • Oversized data returned from tools
  • Retrieved text attempting to act as policy

Practical Mitigations for Prompt Injection

Prompt injection attacks require defense in depth since attackers can craft malicious prompts to evade simple text searches and pattern matching. A layered mitigation model should include:

  • Secure application design
  • Model interaction controls
  • Data governance
  • Operational testing

The goal of these mitigations is to reduce the impact and frequency of attacks. The nature of LLMs means that it is impossible to completely guarantee that attacks will never succeed.

Design Patterns That Reduce Blast Radius

Prompt injection mitigation actions should focus on managing the potential blast radius of a successful injection. Best practices include:

  • Least privilege access controls for tools, connectors, and data sources:This includes implementing separate permissions for read and write access.
  • Human-in-the-loop gates for sensitive actions: For instance, human approval should be required for payments, account changes, privileged admin tasks, and other high-risk activities.
  • Strong boundaries for secrets: Secrets should never be stored in prompts, using scoped tokens or vault patterns instead.
  • Segmentation for retrieval sources: Ideally, LLMs should only be able to access trusted corpora containing curated and labeled data.
  • Deterministic post-processing for risky outputs: LLM outputs in high-risk workflows or with access to sensitive data should be post-processed by non-AI software to ensure that policies are enforced and sensitive data is redacted.

Guardrails at Runtime

In addition to limiting the blast radius of attacks, organizations can also implement guardrails that manage risk at runtime. Best practices include:

  • Request filtering and normalization: All LLM responses should be analyzed to filter out potentially sensitive or malicious content and to ensure adherence to corporate policies.
  • Data loss controls: Use DLP or an SWG to prevent sensitive data from leaving through prompts and responses when possible.
  • Content risk controls: LLMs with web access should be restricted in the websites and content types that they can access.
  • Access governance for AI apps and assistants: AI apps and assistants should be limited with least privilege access controls that restrict the data and tools that they can use.
  • Logging and alerting: Logging and alerting should be built into AI-powered tools to ensure proper visibility and audit trails.

How Is Prompt Injection Tested and Validated?

Prompt injection vulnerabilities should be tested regularly since small changes to prompts, tools, models, and data sources can reintroduce risk. An AI red team uses adversarial prompts to evaluate model and app behaviors in a structured and repeatable way. Some of the key things to test for include:

  • Data leakage
  • Policy bypass
  • Tool misuse
  • Retrieval manipulation
  • Harmful output

Testing should be performed on a regular basis, especially after deploying new agents, tools, or connectors. The success of a test program can be evaluated based on success rate, severity, detection coverage, time to detect, and time to contain.

Building a Repeatable Prompt Injection Test Suite

A prompt injection test suite can ensure that an organization is properly addressing top prompt injection risks to its systems. Best practices include:

  • Maintain a library of adversarial prompts mapped to risk categories.
  • Include indirect injection fixtures (documents, web pages, tickets) that are safe but realistic.
  • Test per workflow and per permission set. The same model can be safe in one context and unsafe in another.
  • Automate regression testing when prompts, retrieval sources, or tool schemas change.
  • Document expected behavior and escalation paths when a test fails.

Prompt Injection Risk Reduction and Operational Guardrails

Prompt injection is a growing risk to corporate cybersecurity as companies increasingly rely on AI tools that may be tricked into performing malicious actions. Prompt injection security includes constraining inputs, context, actions, and outputs to reduce the likelihood of an attack and its potential impacts on the business. Organizations should:

  • Restrict AI tools’ access to data and tools.
  • Monitor inputs and outputs for anomalies and policy violations.
  • Use deterministic sanitization and templating for high-risk responses.
  • Regularly test tools with adversarial prompts.


Eliminating the risk of prompt injection is impossible. However, organizations can reduce the likelihood and blast radius of a successful attack.

FAQ about Prompt Injection

Is prompt injection the same as jailbreaking?

Jailbreaking is a particular type of prompt injection attack designed to trick a model into generating prohibited and harmful output. Prompt injection attacks also include the potential that a model will take unapproved actions and interact with third-party tools.

Can prompt injection be fully prevented?

No, prompt injection can’t be fully prevented due to the difficulty of identifying all malicious inputs and the fact that LLMs aren’t deterministic, predictable systems. However, layered security controls can reduce the risk and the blast radius of a successful attack.

Why is indirect prompt injection considered more dangerous?

Indirect prompt injection involves an attacker injecting instructions into content consumed by an LLM, such as webpages accessed while gathering context from the internet. This is more dangerous because an attacker might be able to influence an agent even if they can’t directly access its user interface.

What should be logged to investigate prompt injection attempts?

Logs should include events, tool calls, and policy violations. Logging raw prompts can be dangerous due to the potential that they contain sensitive data, and all logs should have this information redacted.

What is the first control to implement for an AI assistant?

Least privilege access is the most important control to implement for an AI assistant. Constraining its access to data and other tools limits the potential damage that can be done by a successful attack.

Cato Networks named a Leader in the 2024 Gartner® Magic Quadrant™ for Single-Vendor SASE

Get the report