Skip to content
Agent Month

LLM cost optimization

LLM cost optimization is the practice of reducing what production AI features cost — through routing, caching, batching, and right-sizing models — without losing quality.

LLM cost optimization starts with observability: instrumenting every call so you can see cost, latency, and token usage by route and model. From there, the main levers are right-sizing the model per task, caching repeated prompts, batching non-urgent work, and defining cheap, reliable fallback chains.

The discipline is doing all of this behind evals, so each change is a measured trade rather than a quality gamble. Done well, 30–60% reductions are common with no measurable quality loss.

Because spend usually hides as a single line item, the highest-leverage first step is almost always making it visible by route.