Skip to content
Agent Month

Context window

A context window is the maximum amount of text (measured in tokens) a model can consider in a single request, including both input and output.

The context window bounds how much the model can “see” at once — the system prompt, conversation history, retrieved documents, and the response all draw from it. Exceeding it forces truncation, summarization, or compaction.

Larger context windows (now up to ~1M tokens on frontier models) enable whole-codebase reasoning and long documents, but using more context costs more tokens and can dilute focus, so it’s not always better.

Managing the context window well — via retrieval, compaction, and caching — is central to building cost-effective, reliable LLM systems.