June 16, 2026 · 12 min read · infosec.qa

AI Attack Surface: The 2026 Map for LLM & Agents

Q: How is the AI attack surface different from a traditional attack surface?

A traditional attack surface is deterministic: fixed code paths, network ports, and auth boundaries you can fuzz, scan, and enumerate. The AI attack surface is probabilistic and data-driven. Attacks arrive as natural-language inputs, poisoned training data, fetched web content, or tool calls, not malformed packets. The model, the RAG corpus, and agent tools have no fixed control flow to test. SAST, DAST, and SCA cover none of the six AI-specific layers , which is why AI adds at least six new layers legacy tooling does not test.

Q: What are the layers of an LLM attack surface?

An LLM attack surface has six layers. The data layer covers training-data poisoning and RAG corpus corruption. The model layer covers extraction, membership inference, weight tampering, and pickle deserialization RCE. The application/prompt layer covers direct injection, jailbreaks, and unsafe output handling. The runtime layer covers denial-of-wallet and model DoS. The supply-chain layer covers poisoned hub models and malicious ML packages. The agentic layer covers tool abuse, indirect injection, and excessive agency.

Q: Why can't a normal penetration test cover AI systems?

A normal penetration test stops at the API boundary and assumes a deterministic target. AI systems break both assumptions. There is no fixed code path to fuzz, so a clean test run does not mean the model is safe. The model weights, the RAG corpus, and the agent's tools are out of scope by default in standard pentest scoping. Shadow AI, undocumented model endpoints and copilots, expands the surface invisibly. An AI attack-surface assessment adds the six-layer coverage a pentest skips.

Q: How do you reduce the attack surface of an agentic AI system?

Start by inventorying every tool, function, and data source the agent can reach, then apply least privilege to each. Cut excessive agency : remove tools the agent does not strictly need, and require human approval for high-blast-radius actions. Treat all fetched content (web pages, documents, emails) as untrusted to blunt indirect prompt injection . Isolate and validate agent memory to prevent poisoning. Sandbox tool execution, log every call, and threat-model the supply-chain and agentic layers first, since they carry the highest 2026 blast radius.

The AI attack surface is every layer where an AI system can be manipulated. Here is the 2026 six-layer map, OWASP+ATLAS table, and why pentests miss it.

Key Takeaways

The AI attack surface spans six distinct layers - data, model, application/prompt, runtime, supply chain, and agentic - and SAST, DAST, and SCA tools are blind to five of the eight representative attacks mapped against those layers.
Agentic and supply-chain layers carry the highest 2026 blast radius: a single indirect prompt injection can trigger real-world tool actions, while supply-chain incidents like the LiteLLM PyPI compromise exposed roughly 500,000 credentials.
Standard penetration tests stop at the API boundary and assume a deterministic target, leaving the model weights, RAG corpus, supply chain, and agent tools out of scope by default - covering at most one quarter of the AI attack surface.
RAG corpus poisoning is the most commonly overlooked entry point in 2026 because the corpus is typically a living document store with write access granted to multiple teams, making it trivially reachable by an insider or compromised account.

Ask a security team to draw their AI attack surface and most will sketch the same thing they have drawn for a decade: an API gateway, some auth, a network boundary, a database. That picture is not wrong, but it covers maybe a quarter of what an AI system actually exposes. The other three quarters - the training data, the model weights, the prompt, the inference runtime, the supply chain, and the agent’s tools - are invisible to the SAST, DAST, and pentest playbooks those teams already run.

This is the map. Not another “AI is risky, be careful” essay, but a layer-by-layer engineering breakdown of the entire modern AI attack surface, with a single threat-per-layer table that cross-maps OWASP LLM Top 10 IDs to MITRE ATLAS techniques and flags exactly which layers your legacy tooling cannot see. By the end you will be able to point at your own systems and name the blind spots.

What is the AI attack surface? (one-paragraph answer)

The AI attack surface is every layer where an AI system can be manipulated, exfiltrated, or abused - spanning the data, model, application/prompt, runtime, supply-chain, and agent layers. Put differently: it is the full set of inputs, components, and dependencies an attacker can touch to make an AI system leak, lie, break, or act against you.

Contrast that with the traditional application attack surface, which is code, network, and auth: deterministic paths you can fuzz, ports you can scan, and credentials you can rotate. AI does not replace that surface; it sits on top of it and adds at least six new layers that legacy security tooling does not test. A web app has a request and a response. An AI system has training data you did not write, weights you cannot read, a prompt an attacker can hijack, an inference loop that costs money per token, a supply chain of models and adapters pulled from public hubs, and - increasingly - an agent that can call tools and act in the world. That is the core claim of this page: six of the AI attack surface’s layers are effectively invisible to SAST, DAST, and SCA.

The six-layer AI attack surface map: data, model, application/prompt, runtime, supply chain, and agentic layers

The six layers of the AI attack surface

Here is the full taxonomy. Each layer is a distinct place an attacker operates, with its own techniques, its own tooling, and - critically - its own coverage gap.

1. Data layer

Everything that shapes what the model knows. The threats here are quiet and early: training-data poisoning (inject malicious samples so the model learns a backdoor trigger), dataset backdoors (a specific input pattern flips behavior at inference), embedding and RAG corpus poisoning (seed your retrieval store with adversarial documents so the model retrieves attacker content as “ground truth”), and PII leakage from training sets (the model memorizes and regurgitates sensitive records). RAG corpus poisoning is the one that bites teams in 2026, because the corpus is usually a living document store anyone with write access can edit.

2. Model layer

The weights themselves and the artifacts you load them from. Model extraction and stealing reconstructs a proprietary model through its API. Membership inference determines whether a specific record was in the training set - a privacy breach on its own. Weight tampering and malicious LoRA adapter backdoors ship a poisoned fine-tune or adapter that looks legitimate and triggers on a hidden phrase. And the classic that keeps coming back: pickle deserialization RCE, where loading a model file in the wrong format executes arbitrary code on your inference host. The model file is executable code wearing a data costume.

3. Application/prompt layer

Where the user, the system prompt, and the model meet. This is the most-discussed layer and still the most underestimated. Direct prompt injection overrides your instructions with the user’s. Jailbreaks coax the model past its guardrails. System-prompt leakage exfiltrates your hidden instructions, business logic, and sometimes keys. And unsafe output handling - treating model output as trusted - turns a generated string into XSS, SSRF, or SQL injection when you render or execute it downstream. The model is just another untrusted input source; teams keep forgetting that.

4. Runtime/inference layer

The economics and availability of serving the model. Denial-of-wallet drives your token bill to the moon with expensive crafted requests. Token flooding and model DoS exhaust context windows or GPU capacity to take the service down. Side-channel timing attacks infer prompt or data content from response latency. None of these show up as a CVE or a failing unit test - they show up as an invoice or an outage.

5. Supply-chain layer

Everything you pulled in rather than built. Poisoned models from hubs, malicious ML packages, namespace reuse, and typosquatting all live here. The last six months made this concrete: the LiteLLM PyPI compromise exposed roughly 500,000 credentials, and Hugging Face namespace reuse let attackers re-register deleted org names and serve trojaned models under a trusted path. If you are pulling models, adapters, or pip install-ing ML libraries, this layer is already in your blast radius. We go deep on it in AI supply-chain attacks.

6. Agentic layer

The newest and highest-blast-radius layer: what happens when the model can act. Tool and function-call abuse tricks the agent into calling a tool with attacker-controlled arguments. Indirect injection via fetched content hides instructions inside a web page, PDF, or email the agent retrieves, turning “summarize this doc” into “exfiltrate the user’s data.” Excessive agency means the agent has more permissions than the task needs, so one bad instruction does real damage. Memory poisoning persists malicious context across sessions. And multi-agent confused-deputy attacks abuse trust between agents so one tricks another into using its higher privileges. This is where a prompt becomes an action, and where a small injection becomes a breach.

Threat-per-layer reference table (OWASP LLM + MITRE ATLAS mapped)

One table, every layer, mapped to the two frameworks security teams actually use - and a final column that tells you whether your existing scanners would ever catch it. No competitor publishes this combined mapping, so treat it as the canonical reference.

Layer	Example attack	OWASP LLM Top 10	MITRE ATLAS technique	Caught by SAST/DAST/SCA?
Data	RAG corpus / training-data poisoning	LLM04: Data & Model Poisoning	AML.T0020 Poison Training Data	No
Model	Malicious LoRA backdoor; pickle RCE	LLM05: Improper Output Handling / LLM03: Supply Chain	AML.T0018 Backdoor ML Model	Partial (SCA flags some packages only)
Application/prompt	Direct prompt injection; jailbreak	LLM01: Prompt Injection	AML.T0051 LLM Prompt Injection	No
Application/prompt	Unsafe output -> XSS/SSRF	LLM02: Sensitive Info / LLM05: Improper Output Handling	AML.T0048 External Harms	Partial (DAST catches the downstream XSS, not the cause)
Runtime/inference	Denial-of-wallet; model DoS	LLM10: Unbounded Consumption	AML.T0034 Cost Harvesting	No
Supply chain	Poisoned hub model; typosquat package	LLM03: Supply Chain	AML.T0010 ML Supply Chain Compromise	Partial (SCA flags known-bad versions only)
Agentic	Tool abuse; excessive agency	LLM06: Excessive Agency	AML.T0053 LLM Plugin Compromise	No
Agentic	Indirect injection via fetched content	LLM01: Prompt Injection	AML.T0051 LLM Prompt Injection	No

Read down the right-hand column. Of eight representative attacks across the surface, five are a flat “No” for legacy tooling and the rest are “Partial” - meaning SCA flags a known-bad package version or DAST trips on the symptom long after the cause. The AI attack surface is not partially covered by your current stack; it is mostly invisible to it. If you want the full per-item breakdown of the OWASP side, see the OWASP LLM Top 10 for 2026.

Why traditional pentests miss the AI attack surface

A good penetration test is one of the highest-value security investments you can make - for the surface it was designed for. AI breaks the assumptions that make it work.

Probabilistic vs deterministic targets. Pentesting and fuzzing rely on fixed code paths: send input, observe the branch, find the bug, reproduce it. An LLM has no fixed control flow. The same prompt can pass a hundred times and fail the hundred-and-first because temperature, context, and retrieved content changed. A clean pentest run on an AI system means almost nothing about whether the model can be jailbroken tomorrow. You need adversarial, sampling-based testing - which is exactly what AI red teaming provides and a pentest does not.

Scope stops at the API boundary. Standard pentest scoping covers the application and its network. The model weights, the RAG corpus, the fine-tune pipeline, and the agent’s tools are out of scope by default - nobody wrote them into the statement of work because they are not “the app.” Yet that is where four of the six layers live. The tester probes the front door while the data, model, supply-chain, and agentic layers sit untouched.

Shadow AI expands the surface invisibly. A team wires up an LLM endpoint, a copilot, or an internal agent over a weekend and never tells security. Every one of those is a new, undocumented entry into the AI attack surface. You cannot scope what you have not inventoried, and traditional asset discovery does not look for model endpoints.

What an AI-specific assessment adds. On top of a normal pentest, an AI attack-surface assessment inventories shadow AI and model endpoints, red-teams the prompt and agentic layers with adversarial sampling, audits the model and supply-chain provenance, and threat-models each of the six layers against the OWASP/ATLAS mapping above. It is additive, not a replacement - it covers the three quarters your pentest was never scoped to reach. Pair it with a structured AI risk assessment framework to turn findings into governance.

How to map and reduce your own AI attack surface

You can run a first pass of this yourself before you call anyone. Four steps.

Step 1 - Inventory. Enumerate everything in scope: every model and version, every inference endpoint, every RAG source and the write access to it, every tool an agent can call, and every third-party AI API you depend on. Crucially, hunt for shadow AI - the copilots and endpoints nobody registered. Check egress logs for calls to model providers, scan repos for AI SDK imports, and ask each team what they have shipped. You cannot defend what you have not listed.

Step 2 - Classify by layer and assign an owner. Map each inventoried item to one or more of the six layers and give every layer a named owner. The data layer is usually ML/data engineering; the agentic layer is usually whoever built the agent. Ownership gaps are where attacks live.

Step 3 - Threat-model per layer. Walk the table above for each system. For each layer, ask: which of these attacks is reachable here, what is the blast radius, and what control would stop it? Write it down. This is where most teams discover their RAG corpus is world-writable or their agent has a shell tool it never needed.

Step 4 - Prioritize by 2026 blast radius. Not all layers are equal right now. The agentic and supply-chain layers carry the highest blast radius in 2026 - agentic because a single indirect injection can trigger real-world actions, supply-chain because incidents like the LiteLLM compromise and Hugging Face namespace reuse hit thousands of downstream users at once. Fix excessive agency, sandbox tool execution, treat fetched content as untrusted, and vet every model and package provenance before you chase lower-impact layers. For the agentic layer specifically, prompt-injection bypass techniques shows what your defenses are actually up against.

A first-pass self-assessment is genuinely valuable. But steps 1 to 4 against the full six-layer model, done rigorously with adversarial testing rather than a checklist, is a real engagement - which is exactly what we built our assessment to deliver.

Map your AI attack surface before someone else does

You now have the map. The hard part is honest application: most teams who run steps 1 to 4 find a shadow agent with too many tools, a writable RAG corpus, or an unvetted model pulled from a public hub - blind spots that no SAST scan or annual pentest would ever surface.

That is the whole point of this page. Book an AI attack-surface assessment and we map your AI systems against all six layers - data, model, prompt, runtime, supply chain, and agentic - red-team the live ones, and hand you a prioritized remediation plan tied to the OWASP LLM and MITRE ATLAS mapping above. You get a ranked list of exactly where your AI is exposed and what to fix first, instead of a diagram and a hunch.

The attack surface grew six layers in the last few years. Make sure your testing did too. Get in touch and we will scope it with you.

Disclaimer

This article is published for educational purposes. The six-layer taxonomy and the threat-per-layer table are this team’s synthesis intended to help engineers reason about AI systems; they are a model, not an exhaustive or authoritative standard. AI threats evolve continuously, and specific techniques, framework IDs, and incident details may change after publication.

OWASP LLM Top 10 identifiers and MITRE ATLAS technique IDs referenced in the table reflect the author’s mapping at the time of writing and should be verified against the current official OWASP and MITRE publications before being relied upon. The mapping is descriptive and illustrative, not an official endorsement by either organization.

Incident references (including the LiteLLM PyPI compromise and Hugging Face namespace reuse) are based on public reporting and are summarized for context; figures such as the approximate credential count are as reported by public sources and may be revised. OWASP, MITRE, ATLAS, Hugging Face, LiteLLM, PyPI, and all other names mentioned are trademarks of their respective owners. The author and publisher are not affiliated with, endorsed by, or sponsored by any organization named here; mentions are nominative and descriptive only.

This post does not constitute legal, financial, or security-compliance advice. Readers acting on any guidance do so at their own risk and should consult qualified professionals for decisions material to their organization. Corrections and good-faith disputes are welcome - please contact us and we will review promptly.

Common Questions

Frequently Asked Questions

What is the AI attack surface?

The AI attack surface is every layer where an AI system can be manipulated, exfiltrated, or abused. It spans six layers: the data layer (training and RAG corpus poisoning), the model layer (extraction, weight tampering, malicious adapters), the application/prompt layer (prompt injection and jailbreaks), the runtime layer (denial-of-wallet, model DoS), the supply-chain layer (poisoned models and packages), and the agentic layer (tool-call abuse, excessive agency, memory poisoning). Traditional code, network, and auth surfaces are a subset, not the whole picture.

How is the AI attack surface different from a traditional attack surface?

A traditional attack surface is deterministic: fixed code paths, network ports, and auth boundaries you can fuzz, scan, and enumerate. The AI attack surface is probabilistic and data-driven. Attacks arrive as natural-language inputs, poisoned training data, fetched web content, or tool calls, not malformed packets. The model, the RAG corpus, and agent tools have no fixed control flow to test. SAST, DAST, and SCA cover none of the six AI-specific layers, which is why AI adds at least six new layers legacy tooling does not test.

What are the layers of an LLM attack surface?

An LLM attack surface has six layers. The data layer covers training-data poisoning and RAG corpus corruption. The model layer covers extraction, membership inference, weight tampering, and pickle deserialization RCE. The application/prompt layer covers direct injection, jailbreaks, and unsafe output handling. The runtime layer covers denial-of-wallet and model DoS. The supply-chain layer covers poisoned hub models and malicious ML packages. The agentic layer covers tool abuse, indirect injection, and excessive agency.

Why can't a normal penetration test cover AI systems?

A normal penetration test stops at the API boundary and assumes a deterministic target. AI systems break both assumptions. There is no fixed code path to fuzz, so a clean test run does not mean the model is safe. The model weights, the RAG corpus, and the agent's tools are out of scope by default in standard pentest scoping. Shadow AI, undocumented model endpoints and copilots, expands the surface invisibly. An AI attack-surface assessment adds the six-layer coverage a pentest skips.

How do you reduce the attack surface of an agentic AI system?

Start by inventorying every tool, function, and data source the agent can reach, then apply least privilege to each. Cut excessive agency: remove tools the agent does not strictly need, and require human approval for high-blast-radius actions. Treat all fetched content (web pages, documents, emails) as untrusted to blunt indirect prompt injection. Isolate and validate agent memory to prevent poisoning. Sandbox tool execution, log every call, and threat-model the supply-chain and agentic layers first, since they carry the highest 2026 blast radius.

Know Your AI Attack Surface

Request a free AI Security Scorecard assessment and discover your AI exposure in 5 minutes.

Get Your Free Scorecard