Skip to content
Agent Month

How to fix: 429 rate limit error

Cause

You exceeded the requests-per-minute (RPM) or tokens-per-minute (TPM) quota for your account tier.

The fix

  1. 1Read the response: a `429` includes a `retry-after` header telling you how many seconds to wait.
  2. 2Implement exponential backoff with jitter — most official SDKs retry 429s automatically (default ~2 retries); raise the limit if needed.
  3. 3Reduce token pressure: trim prompts, cache stable prefixes, and route low-stakes calls to a cheaper/separate model with its own quota.
  4. 4Batch non-urgent work through the provider’s batch API, which has separate, higher limits.
  5. 5If you’re consistently capped, request a tier/quota increase from the provider.

Prevent it

Add a gateway that meters and queues requests, with backoff and per-route rate budgets, so spikes degrade gracefully instead of failing.

Frequently asked questions

What causes “429 rate limit error”?

You exceeded the requests-per-minute (RPM) or tokens-per-minute (TPM) quota for your account tier.

How do I prevent “429 rate limit error” from recurring?

Add a gateway that meters and queues requests, with backoff and per-route rate budgets, so spikes degrade gracefully instead of failing.