
AI Agent Security: A Practical Guide for Developers
AI coding agents can execute commands, read secrets, and modify your entire codebase. This guide covers real attacks, documented vulnerabilities, and practical defenses.
AI coding agents have file system access, shell execution, and network capabilities on your machine. Claude Code runs commands. Cursor edits files across your project. Devin writes, tests, and deploys code. When something goes wrong, the consequences are not hypothetical. They are real data leaks, real credential exposure, and real supply chain compromises.
This guide covers documented attacks, the threat model that matters for developers, and the concrete steps you should take today.
What Makes AI Agents Different
A chatbot receives text and returns text. An AI agent receives text, makes decisions, and takes actions. That distinction changes everything about the security model.
A chatbot that hallucinates gives you a wrong answer. An agent that hallucinates runs the wrong command. A chatbot that gets prompt-injected says something unexpected. An agent that gets prompt-injected can exfiltrate your credentials, modify your code, or install a malicious dependency.
The shift from "text in, text out" to "text in, actions out" means every input the agent processes is a potential vector for unintended behavior. And the inputs are everywhere: files in your repo, API responses, MCP tool outputs, even git commit messages.
Real Attacks on AI Agents
These are not theoretical risks. Every incident below is documented.
Prompt Injection in Production
Bing/Sydney (February 2023): Users manipulated Bing Chat into revealing its system prompt and behaving against its guidelines. This was one of the first large-scale public demonstrations that prompt injection works in production systems, not just research papers.
ChatGPT memory exploit (2024): Security researcher Johann Rehberger demonstrated persistent prompt injection through ChatGPT's memory feature. A malicious document could inject instructions that persisted across future conversations, meaning a single poisoned file could compromise every subsequent session.
GPT Store data leaks (2024): Multiple custom GPTs in OpenAI's GPT Store were found to leak their system prompts and uploaded knowledge files through prompt injection. Developers who built GPTs with proprietary data had that data extracted by users with simple prompt tricks.
Vulnerabilities in Coding Agents
GitHub Copilot RCE (CVE-2025-53773): A vulnerability where a malicious repository could craft files that, when processed by Copilot, executed arbitrary code on the developer's machine. Opening the wrong repo was enough.
Cursor IDE vulnerabilities (CVE-2025-54135, CVE-2025-55284): Malicious project files could trigger unintended agent actions in Cursor. The attack vector was the project itself, meaning cloning a repository could compromise your environment.
These CVEs demonstrate that the code your agent reads is an attack surface.
MCP Supply Chain Attacks
Invariant Labs documented several attack patterns against the Model Context Protocol:
Tool poisoning: MCP servers can embed hidden instructions in tool descriptions. These instructions are invisible to the user in the approval UI but are processed by the AI agent. A tool described as "search files" could contain hidden text telling the agent to first read and exfiltrate your .env file.
Cross-server rug-pull attacks: An MCP server can change its tool descriptions after you have approved it. The agent re-reads tool descriptions on each invocation. A server that was safe when you installed it can become malicious in a later update, and you will not be prompted to re-approve it.
Postmark MCP breach: The official Postmark MCP server contained hidden instructions in its tool descriptions that could exfiltrate email content processed through the server.
GitHub MCP injection: Researchers demonstrated that malicious content embedded in GitHub issues and pull requests could inject instructions when processed through the GitHub MCP server, turning a routine code review into an attack vector.
24,008 secrets found in public .mcp.json files: A scan of public GitHub repositories found over 24,000 API keys hardcoded directly in MCP configuration files. These files were committed to version control with real credentials.
The Five Risks
1. Secret Exposure
AI agents read files to understand your project. That includes .env files, configuration files, and anything containing API keys or credentials. Once a secret enters the conversation context, it is part of the session history. Depending on the provider, it may be logged, stored, or used for training.
The 24,000+ secrets found in public .mcp.json files show this is not a niche problem. Developers routinely put credentials where agents can read them and tools can commit them.
Real example: An agent reads your .env to understand your database configuration. That file contains your Stripe live key, your database password, and your JWT secret. All of those values are now in the conversation context and potentially in provider logs.
Mitigations:
- Use a secrets manager that injects values at runtime without exposing them to the agent. See managing secrets with Claude Code for the full setup.
- Block access to secret files using
permissions.denyin Claude Code settings (more on this below). - Rotate any credential that has appeared in an agent conversation.
- Run pre-commit hooks like gitleaks to catch secrets before they reach version control.
2. Prompt Injection
If your agent processes any external input, it can be prompt-injected. Files in a cloned repo, API responses, web pages, MCP tool outputs, even data from a database query. The ChatGPT memory exploit proved that a single poisoned document can compromise not just one session but every future session.
For coding agents specifically, the attack surface is the codebase itself. A malicious contributor can embed instructions in code comments, markdown files, or commit messages that the agent will process as part of its context.
Mitigations:
- Treat every external input as untrusted, the same way you treat user input in a web application.
- Review repositories before letting an agent process them, especially from unknown contributors.
- Keep destructive operations in
permissions.askmode so you approve each one. - Be skeptical of agent suggestions that seem unrelated to your request, like installing an unexpected package or modifying an unrelated file.
3. Excessive Permissions
Most developers give their agent unrestricted access to the project directory. That means read, write, and delete on every file. If the agent gets prompt-injected or simply makes a mistake, the blast radius is everything.
The OWASP Top 10 for AI Agents (2025) lists "Excessive Agency" as a primary risk category. The principle of least privilege applies to AI agents just as it applies to human users and service accounts.
Mitigations:
- Use Claude Code's permission system to deny access to sensitive files and directories.
- Keep
permissions.askenabled for destructive operations (file writes, command execution). - In CI/CD, use read-only access where possible.
- Separate development and production credentials so an agent working on code never has access to production secrets.
4. Supply Chain via MCP
Every MCP server you connect is a new dependency in your security perimeter. Unlike npm packages that you can audit and pin to versions, MCP servers can change their behavior dynamically through tool description updates.
The rug-pull attack pattern is particularly dangerous: you audit and approve a server, and it changes its instructions later. The agent follows the new instructions without asking for re-approval.
Mitigations:
- Only install MCP servers from trusted, well-known sources.
- Read the source code of any MCP server before adding it to your configuration.
- Review what tools a server exposes and what permissions each tool requests.
- Remove MCP servers you no longer actively use.
- Never hardcode API keys in
.mcp.jsonfiles. Use environment variables or a secrets manager. - Monitor Invariant Labs and similar researchers for newly disclosed MCP vulnerabilities.
5. Data Leakage Through Context
Everything the agent reads becomes part of the conversation. Customer data, internal documentation, proprietary algorithms, credentials. Depending on the provider's data policy, this content might be stored, logged, or used for model improvement.
The OWASP LLM Top 10 (2025) lists "Sensitive Information Disclosure" (LLM02) as the second highest risk. For coding agents with file system access, this risk is amplified because the agent can proactively read files you did not explicitly share.
Mitigations:
- Be deliberate about what directories and files the agent can access.
- Use
permissions.denyto block access to directories containing customer data, credentials, or regulated information. - For HIPAA, GDPR, or SOC 2 regulated data, verify your AI provider's data processing agreement before the agent accesses that content.
- Prefer self-hosted or zero-retention AI services for sensitive projects.
Claude Code's Built-in Security
Claude Code ships with a permission system and execution sandbox. Understanding how they work is essential for securing your workflow.
Permission Modes
Claude Code uses three permission levels, evaluated in priority order:
| Mode | Behavior | Use case |
|---|---|---|
deny | Blocks the tool entirely. The agent cannot use it. | Protecting sensitive files, blocking dangerous commands |
ask | Requires user approval each time. This is the default for most operations. | Normal development workflow |
allow | Auto-approves without prompting. | Trusted, low-risk operations you use frequently |
Priority is deny > ask > allow. If a tool matches a deny rule, it is blocked regardless of other rules.
Configuration
Permissions are configured in .claude/settings.json at the project level or in your user-level settings:
{
"permissions": {
"deny": [
"Read(.env)",
"Read(.env.*)",
"Read(**/credentials*)",
"Read(**/*.pem)",
"Bash(rm -rf *)"
],
"allow": [
"Read(src/**)",
"Read(docs/**)"
]
}
}
Important: .claudeignore is not an official Claude Code mechanism. Use permissions.deny in .claude/settings.json to control file access.
Sandbox
Claude Code runs with a sandbox layer:
- Linux: Uses bubblewrap for process isolation
- macOS: Uses the seatbelt sandbox framework
The sandbox restricts what the agent process can access at the OS level, providing defense in depth beyond the permission system.
Setting Up a Secure Workflow
A practical checklist for developers working with AI coding agents:
File access controls
- Configure
permissions.denyto block.env,.env.*,*.pem, and credential files - Block access to directories containing customer data or regulated information
- Deny access to production configuration files
Secret management
- Never paste secrets directly into agent conversations
- Use a vault that injects secrets at runtime without exposing values to the agent
- Rotate any credential that has appeared in a conversation context
- Store MCP server tokens in environment variables, never in
.mcp.json
Permission boundaries
- Keep
permissions.askenabled for file writes and command execution - Use
permissions.denyfor known-dangerous operations - Use separate API keys for development and production environments
- In CI/CD pipelines, grant read-only access where possible
MCP server hygiene
- Audit MCP server source code before installation
- Remove MCP servers you are not actively using
- Monitor for tool description changes in servers you depend on
- Never commit
.mcp.jsonfiles containing real credentials
Code review
- Review AI-generated commits before pushing to shared branches
- Use pre-commit hooks (gitleaks, detect-secrets) to catch leaked credentials
- Be skeptical of agent changes to files unrelated to your request
- Check dependency additions for suspicious or unknown packages
The Trust Boundary
The most important concept in AI agent security is the trust boundary. Your AI agent sits inside your development environment but processes external inputs and connects to external services. It has more system access than a junior developer but zero judgment about what is sensitive.
Draw a clear line:
- Inside the boundary: Files the agent should read, commands it should run, tools it should use.
- Outside the boundary: Credentials, customer data, production systems, untrusted repositories.
Most security incidents with AI agents happen not because of sophisticated zero-day exploits, but because the boundary was never defined. The agent had access to everything, processed an input it should not have trusted, and acted on it.
Define the boundary explicitly with permissions.deny. Review it when your project changes. Treat it as infrastructure, not an afterthought.
Further Reading
- OWASP Top 10 for LLM Applications (2025) for the full threat taxonomy
- Invariant Labs MCP Security Research for documented MCP attack patterns
- Why .env files are dangerous with AI agents dives into the secret exposure problem
- The complete guide to LLM security covers the broader threat landscape
- Managing secrets with Claude Code for the practical vault setup
- Try SecureCode free. Zero-knowledge secrets for AI coding agents.