How to fix: prompt injection / agent following malicious instructions
Cause
Content the agent ingests (a web page, a document, a ticket) contains instructions that hijack its behavior.
The fix
- 1Treat anything the agent reads from outside as untrusted, adversarial input.
- 2Constrain tool permissions to the minimum the task needs — no standing access to destructive actions.
- 3Route internal system access through audited MCP servers with scoped permissions and logging.
- 4Add human-in-the-loop confirmation for high-impact or irreversible actions.
- 5Monitor and log tool calls so injection attempts are visible after the fact.
Prevent it
Design agents with least-privilege tools, audited access, and human gates on risky actions so injection can’t cause real damage.
Frequently asked questions
What causes “prompt injection / agent following malicious instructions”?
Content the agent ingests (a web page, a document, a ticket) contains instructions that hijack its behavior.
How do I prevent “prompt injection / agent following malicious instructions” from recurring?
Design agents with least-privilege tools, audited access, and human gates on risky actions so injection can’t cause real damage.