'Agentjacking' via Sentry MCP: The Hidden Cost of Trusting MCP Servers
June 21, 2026 · 9 min read
When the Agent Reads an Attacker's Instructions
Security researchers disclosed a prompt-injection attack — nicknamed "agentjacking" — that turns a coding agent's own tools against it. The vector: a crafted error event sent to a project's Sentry DSN. When a developer later asks their agent to investigate the error through the Sentry MCP server, the agent reads the attacker's text as instructions rather than data, and executes them with the developer's local permissions.
The reported blast radius is large: roughly 2,388 organizations had exposed DSNs, and 100+ companies reportedly ran the proof-of-concept. Most alarmingly, agents executed the injected instructions even when explicitly told to ignore untrusted content — because the malicious text arrived through a trusted tool, wearing the costume of legitimate data.
Why This Is an MCP-Wide Problem, Not a Sentry Bug
The Model Context Protocol (MCP) lets coding agents pull in external context — error trackers, databases, ticketing systems, documentation. That's exactly what makes agents useful. But it also means every MCP server is a channel through which external, attacker-influenceable content can reach the model. Sentry is just the example that got demonstrated; the pattern generalizes to any MCP server that surfaces data an outsider can write to.
The core problem is that LLMs don't have a hard boundary between "instructions" and "data." Anything in the context window can steer behavior. An MCP server that returns user-submitted bug reports, public issue comments, or third-party log entries is feeding the model content an adversary may have authored. The agent's local execution permissions then turn a text-injection into a code-execution.
The Real Cost: Auditing the Toolchain
The expensive part of agentjacking isn't a single incident — it's the ongoing work it forces onto every team running coding agents with MCP. Three costs stand out.
Inventory and review. Someone has to enumerate every MCP server the team's agents connect to, classify which ones surface externally-influenceable data, and decide which are safe to keep. This is recurring security-engineering time, not a one-time fix.
Sandboxing and least privilege. The mitigation is to stop running agents with broad local permissions. Containerizing agent execution, scoping filesystem and network access, and gating destructive actions behind human approval all cost engineering time and add friction to the workflow that made agents fast in the first place.
Secret hygiene. The exposed DSNs are a reminder that credentials embedded in code, configs, and error payloads are an attack surface. Rotating and properly scoping these — and keeping them out of anything an agent can read and act on — is unglamorous, continuous work.
A Practical Defense Posture
Treat all MCP-sourced content as untrusted by default. Run agents in sandboxed environments with the narrowest permissions that still let them work. Require human confirmation for actions with real blast radius — shell commands, file deletion, network calls, credential access. And prefer MCP servers that clearly delineate or label externally-submitted data so it's at least possible to treat it differently.
None of this is free. Sandboxing adds latency and setup; human-in-the-loop approval slows the agent down; auditing consumes security hours. But these are the true costs of running autonomous agents safely, and they belong in any honest budget for AI-assisted development — alongside the token spend everyone already tracks.
The convenience of plugging an agent into your whole toolchain is real, and so is the attack surface it creates. Factor the security overhead into your cost of AI coding from the start. To keep the token side of that budget visible while you harden the rest, our cost calculator estimates per-task model spend across providers.
Frequently Asked Questions
What is 'agentjacking'?
It's a prompt-injection attack where an attacker plants malicious text in data a coding agent later reads through a tool — in the disclosed case, a crafted error event sent to a Sentry DSN. When the agent investigates via the Sentry MCP server, it executes the injected text as instructions with the developer's local permissions, turning text injection into code execution.
Is this a Sentry-specific vulnerability?
No. Sentry was the demonstrated example, but the pattern applies to any MCP server that surfaces externally-influenceable content — bug reports, public comments, third-party logs. LLMs don't hard-separate instructions from data, so any tool feeding attacker-authored text into the context window is a potential vector.
Why did agents execute the attack even when told to ignore untrusted content?
Because the malicious text arrived through a trusted tool, formatted as legitimate data. The model has no reliable boundary between instructions and data in its context window, so an explicit 'ignore untrusted input' instruction doesn't stop content that looks like normal tool output from steering behavior.
What does defending against this actually cost?
It's mostly ongoing engineering time: inventorying and reviewing every connected MCP server, sandboxing agent execution with least-privilege permissions, gating destructive actions behind human approval, and rotating exposed secrets. These add latency and friction but are the real cost of running autonomous agents safely and belong in any AI-coding budget.
Want to calculate exact costs for your project?
Related Articles
Ecosystem Cost in AI Coding Tools: Extensions, Skills, MCP Servers, and Hidden Maintenance
AI coding tools are no longer just models. Extensions, skills, MCP servers, prompt libraries, and team-specific automation create an ecosystem maintenance cost. Learn how to budget for it.
MCP Servers and Enterprise AI Coding: The True Cost of Private Network Integration
OpenAI now supports enterprise MCP servers behind private networks. We break down the real total cost of setting up, running, and maintaining private MCP infrastructure for AI coding workflows.
AI Model Context Protocol (MCP): Hidden Token Costs of Tool Calls
MCP enables AI coding agents to call external tools, but each tool adds thousands of tokens to every request. We quantify the overhead and show how to minimize hidden costs from tool descriptions, function formatting, and response parsing.