Skip to content
Agent Month

Glossary

AI engineering glossary

Plain-English definitions of the terms behind production AI — no fluff, written for engineers and the leaders who fund them.

Agent observability
Agent observability is the tooling that makes an agent’s behavior in production visible — tracing tool calls, prompts, costs, and failures.
Agentic development
Agentic development is the practice of delegating software tasks to AI agents that plan, edit, and verify code across a codebase, with human oversight.
AI agent
An AI agent is a system where a language model decides and takes actions through tools in a loop to accomplish a goal, rather than producing a single response.
AI code security
AI code security is the practice of controlling the risks of AI-generated code and prompts — vulnerabilities, license issues, data leakage, and prompt injection.
Codebase readiness
Codebase readiness is how well a codebase supports AI/agentic development — measured by module boundaries, tests, types, docs, and context files.
Context window
A context window is the maximum amount of text (measured in tokens) a model can consider in a single request, including both input and output.
Embeddings
Embeddings are numeric vector representations of text (or other data) that place similar meanings close together, enabling semantic search and RAG.
Fine-tuning
Fine-tuning further trains a base model on your data to adapt its behavior, format, or style for a specific task.
Function calling (tool use)
Function calling is a model capability that lets it request a structured tool call, which your code executes and returns results for.
Hallucination
A hallucination is when a model produces confident, plausible-sounding output that is factually wrong or unsupported.
LLM cost optimization
LLM cost optimization is the practice of reducing what production AI features cost — through routing, caching, batching, and right-sizing models — without losing quality.
LLM evals
LLM evals are systematic tests that measure the quality of a model’s outputs against defined criteria, so changes can be validated instead of guessed.
LLM gateway
An LLM gateway is a layer that sits between your application and model providers to handle routing, caching, fallbacks, observability, and cost control.
LLM-as-judge
LLM-as-judge is an evaluation technique where a language model scores another model’s output against criteria you define.
Model Context Protocol (MCP)
MCP is an open protocol that standardizes how applications expose tools, data, and prompts to AI models and agents.
Model routing
Model routing sends each request to the most appropriate model — by cost, quality, or latency — instead of using one model for everything.
Prompt caching
Prompt caching reuses the model’s processing of a repeated prompt prefix, cutting cost and latency on requests that share a large, stable preamble.
Prompt engineering
Prompt engineering is the practice of designing the instructions and context given to a language model to get reliable, high-quality outputs.
Regression testing (for LLMs)
LLM regression testing re-runs a suite of evals on every change so a prompt or model update can’t silently degrade quality.
Retrieval-augmented generation (RAG)
RAG is a technique that retrieves relevant documents at query time and adds them to the prompt so the model answers from up-to-date, specific knowledge.
Self-hosted LLM
A self-hosted LLM is an open-weight model you run on your own infrastructure, so data never leaves your environment.
Semantic search
Semantic search retrieves results by meaning rather than exact keywords, using embeddings to match intent.
Structured outputs
Structured outputs constrain a model’s response to a defined schema (such as JSON), guaranteeing parseable, valid output.
Tokens
Tokens are the chunks of text a language model reads and writes; pricing and context limits are measured in them, not words or characters.
Vector database
A vector database stores embeddings and finds the nearest vectors to a query efficiently, powering semantic search and RAG.