Prompt caching
Prompt caching reuses the model’s processing of a repeated prompt prefix, cutting cost and latency on requests that share a large, stable preamble.
Prompt caching lets a provider store the processed form of a stable prompt prefix — a large system prompt, tool definitions, or shared context — so repeated requests pay a steep discount on those tokens instead of full price.
It’s a prefix match: any change inside the cached region invalidates everything after it. The practical rule is to keep stable content at the front and put volatile content (timestamps, per-request data) at the end.
For system-prompt-heavy or multi-turn workloads, correct caching is one of the easiest large cost reductions available — often a major part of a cost-optimization engagement.