Skip to content
Agent Month

Model routing

Model routing sends each request to the most appropriate model — by cost, quality, or latency — instead of using one model for everything.

Model routing classifies each request and dispatches it to a model that fits: a cheap, fast model for easy tasks like classification and a frontier model for work where quality is load-bearing.

Done with quality guards (and validated by evals), routing is the single biggest lever in most cost-optimization efforts — a large share of call volume is low-stakes and can move to a cheaper tier with no measurable quality loss.

Routing is usually implemented in a gateway or router layer that also handles caching, batching, and fallback chains.