The shippable agent-month: a working definition
Last verified: June 2026· content
Two phrases sound the same and mean opposite things. They are both about the agent-month — the unit of productivity for an AI coding agent. One is a critique. The other is a measure. The first is a warning. The second is what you should be paying for.
This is the working definition of both, and the gap between them.
What Wes McKinney meant by “the mythical agent-month”
In early 2025, Wes McKinney published “The Mythical Agent-Month” on the O’Reilly Radar. The argument is the same one Fred Brooks made in 1975 in The Mythical Man-Month: adding bodies to a late project makes it later, not sooner, because the cost of communication grows faster than the cost of work.
McKinney’s claim is that AI agents have the same trap. Add ten agents to the codebase and you do not get ten times the throughput. You get ten times the merge conflicts, ten times the prompt-injection surfaces, ten times the cloud bill, and ten times the silent quality regressions nobody catches because there are no evals. Naive “AI 10×” claims collapse the moment an agent is asked to do real work against a real codebase.
The argument is correct, and the right thing for a buyer to do with it is to be skeptical of any “agent-month” claim that does not come with the infrastructure work that makes the claim defensible.
That infrastructure work is what the rest of this post is about.
What we mean by “the shippable agent-month”
The shippable agent-month is the actual deliverable rate of one well-instrumented agent working against a codebase that is ready for it, measured in production, not estimated.
The word “shippable” is doing real work. It is the difference between:
- An agent that can answer a question about the codebase.
- An agent that can make a change to the codebase.
- An agent that can ship a change to the codebase, behind the same review hooks, evals, and observability a human would face.
The first is a demo. The second is a tool. The third is a unit of productivity you can put on a board update.
The shippable rate of one agent on a codebase that is not ready for it (zero tests, no MCP, no routing, no evals) is approximately zero shippable agent-months, no matter how capable the model. The same agent on a codebase that is ready for it — the same way a developer is faster in a clean monorepo with fast tests and a deploy pipeline — ships materially more than zero.
The infrastructure work that turns the one into the other is the shippable part.
What “ready for it” means
A codebase is agent-ready when the model can do the four things that make a developer fast:
- See the right context. A curated
CLAUDE.md/AGENTS.mdin the repo. Real tests that run in <60s. Module boundaries that are small enough to reason about. A type system that catches interface drift at compile time. - Reach the systems it needs. MCP servers for Postgres, the issue tracker, the deploy pipeline, the observability stack — every read-only, least-privileged, audit-logged. The agent can answer its own questions instead of asking the on-call.
- Know when it is wrong. An eval suite in CI that runs on every prompt, model, and retrieval change. Online sampling against production traffic. LLM-as-judge calibrated against human labels.
- Cost the right amount. Per-call observability on tokens, latency, cost, and business KPI. Cost-aware routing that keeps quality on the prompts that matter and uses a smaller model where it doesn’t.
A team without 1–4 is paying for the mythical agent-month. A team with 1–4 is paying for the shippable one. Most teams are somewhere in between, and the gap is the work.
The four infrastructure engagements that close the gap
A focused, 4–6 week engagement on each of the four gets a team from “we bought Copilot” to “we have an AI platform”. The engagements are sequential; each pays for the next.
- LLM cost & performance optimization — instrument, route, cache, observe. 30–60% bill reduction in 4–6 weeks. The win funds the platform.
- Production AI eval infrastructure — eval harness + regression suite + online quality monitoring. Wired into CI/CD. The first time you ship a prompt change, the eval catches the regression before the customer does.
- Internal AI coding workflow — slash commands, rules files, agent definitions, MCP for the top 3 internal systems, review hooks. The “free-for-all” stops; the team’s collective learning accumulates in the repo.
- Agentic codebase readiness audit — a scored report and a remediation roadmap. The “is your codebase ready” question gets answered with a number, not a vibe.
For regulated workloads, add a parallel track:
- Self-hosted LLM infrastructure — vLLM / TensorRT-LLM / SGLang on the team’s own cloud. Private inference, RAG, and fine-tuning for the workloads that cannot leave the building.
We do all five as fixed-scope or outcome-priced engagements, and the infrastructure we deploy is the same open-source code we ship under https://github.com/neul-labs">github.com/neul-labs.
The unit of value when you buy AI engineering capacity
If you are a CTO, VP Engineering, or Head of Platform comparing engagement models, the question is: what is the shippable agent-month worth?
A useful frame:
- A senior engineer on a clean codebase with fast tests and a deploy pipeline ships about one PR per day of real work. The shippable-agent-month equivalent is the same engineer’s daily PR rate plus the agent’s daily PR rate, after the infrastructure work is in place.
- A 30-engineer team without the infrastructure work ships approximately zero shippable agent-months. The same team with the four engagements above in place ships 5–15 per day, depending on workload.
- A “transformation” engagement without the infrastructure work ships the deck, not the work. The shippable rate stays at zero until 1–4 are done.
The shippable agent-month is the unit of value the infrastructure work unlocks. The mythical one is the unit of value the deck is selling.
What this site is
We are the people who ship 1–5. The pages on this site — the MCP integrations, the fixes, the glossary, the playbooks, the case studies — are the same work we deploy for clients, written down so a senior engineer can run them.
If you are buying AI engineering capacity, the question is not “is this the mythical agent-month?” It is “is this the shippable one, and what work makes it shippable?” The rest of the site answers the second question.
If you are running the work yourself, the playbooks tell you how.
Either way, the right next move is the one that turns mythical productivity into shipped work. That is what the shippable agent-month is for.