How to fix: prompt injection / agent following malicious instructions

Cause

Content the agent ingests (a web page, a document, a ticket) contains instructions that hijack its behavior.

The fix

1Treat anything the agent reads from outside as untrusted, adversarial input.
2Constrain tool permissions to the minimum the task needs — no standing access to destructive actions.
3Route internal system access through audited MCP servers with scoped permissions and logging.
4Add human-in-the-loop confirmation for high-impact or irreversible actions.
5Monitor and log tool calls so injection attempts are visible after the fact.

Prevent it

Design agents with least-privilege tools, audited access, and human gates on risky actions so injection can’t cause real damage.

What causes “prompt injection / agent following malicious instructions”?

Content the agent ingests (a web page, a document, a ticket) contains instructions that hijack its behavior.

How do I prevent “prompt injection / agent following malicious instructions” from recurring?

Design agents with least-privilege tools, audited access, and human gates on risky actions so injection can’t cause real damage.