Skip to content
Agent Month

How to fix: request timeout on large or streaming responses

Cause

A non-streaming request with a high `max_tokens` exceeds the SDK/HTTP timeout before the full response returns.

The fix

  1. 1Switch to streaming for any request that may produce long output (e.g. `max_tokens` above ~16K).
  2. 2Use the SDK’s stream helper and collect the final message rather than awaiting one big response.
  3. 3Confirm your client timeout units — some SDKs use milliseconds, others seconds — and raise the timeout if needed.
  4. 4For very long agentic turns, stream progress to the user so the connection stays active and the UX reflects work in progress.

Prevent it

Default to streaming for anything with large output, and design timeouts and progress UX around minutes-long turns on frontier models.

Frequently asked questions

What causes “request timeout on large or streaming responses”?

A non-streaming request with a high `max_tokens` exceeds the SDK/HTTP timeout before the full response returns.

How do I prevent “request timeout on large or streaming responses” from recurring?

Default to streaming for anything with large output, and design timeouts and progress UX around minutes-long turns on frontier models.