Skip to content
Agent Month

The AI coding golden path: a 6–10 week rollout

Last verified: June 2026· playbook

Weeks 1–2: baseline — measure, pick, name

The first two weeks are diagnostic, not build. You cannot standardize a workflow you don't understand.

The diagnostic:

  1. Measure current usage. Survey the team, then check the actual seat-level telemetry from Claude Code, Cursor, and Copilot. The gap between what people say and what they do is the workflow you'd be replacing.
  2. Pick the top 3 internal systems the team most wants agents to reach. For most teams this is the database (Postgres, Snowflake), the issue tracker (Linear, Jira), and the observability stack (Datadog, Sentry). If you have a fourth, you don't have a fourth.
  3. Name the owner per system. A senior engineer with on-call access who can answer "should the agent be able to call this tool?" and who can debug the MCP server when it breaks. No owner, no rollout.
  4. Pick the metrics. Adoption (active users / week), quality (PR rejection rate for agent-opened PRs), and security (audit-log signals). Decide what success looks like before week 4.

End of week 2: one named internal owner per system, one shared repo for the rules + commands + MCP config, and one set of metrics.

Weeks 3–4: rules + commands — the team priors

The first thing the team touches is the priors. These are the rules, commands, and templates every engineer starts with in every repo.

Minimum viable priors:

  • One .claude/commands/ directory in every repo, in version control. The team's slash commands: /add-tests, /review-pr, /explain-module, /write-spec. One per workflow every engineer runs more than once a week.
  • One .cursor/rules/ file per repo, capturing the same priors for Cursor. (The two configs are different files but the content is shared.)
  • One CLAUDE.md and AGENTS.md at the repo root, with: the architecture, the test command, the deploy command, the conventions, and the "do not touch this file" list. This is the file every agent reads first.
  • A shared prompt library in the repo, so every team can reuse the prompts the senior engineers have already debugged.

End of week 4: every repo has the priors, the team has the commands they actually use, and a senior engineer has reviewed every command for the "this is the team standard, not someone's personal preference" check.

Weeks 5–6: MCP access to the top 3 internal systems

This is the highest-leverage part. Once the agent can reach the top 3 systems on its own, the workflow changes — the agent stops being a chat box and starts being a colleague that has access to the codebase.

One MCP server per system, in priority order:

  1. Postgres (or Snowflake / BigQuery). Read-only DB role against a replica. Timeouts, statement limits, audit logging. This is the single most common MCP server — and the one with the highest security stakes.
  2. Issue tracker (Linear, Jira). Read by default, write for ticket creation, never auto-transition. The agent can answer "what's the status of this ticket?" and create new ones; it cannot close a ticket.
  3. Observability (Datadog, Sentry). Read-only scoped credentials. The agent can pull the last 30 minutes of latency, look up an error, and grep a log.

For each: read-only by default, scoped credentials, audit log every call, MCP server version pinned in the config. The 3-week engagement we ship per server is in /playbook/mcp-server-build.

End of week 6: the agent can read the database, the issue tracker, and the observability stack — and every call is logged.

Weeks 7–10: review hooks + adoption measurement

Without review hooks, the quality story falls apart. The team needs to know which PRs were opened by an agent, and those PRs need a stricter review than the human baseline.

The hooks:

  • AI-aware lint that catches hallucinated package names, missing test files, and suspicious imports. Most teams wire this into the existing lint step (ESLint, golangci-lint, ruff) — it's a custom rule, not a new tool.
  • Secret scan + license check in CI, scoped to PRs opened by an agent. Snyk or Semgrep with the AI-aware rule pack works for most stacks.
  • A "this PR was opened by an agent" label, applied automatically. The reviewer knows to look more carefully. The metric you track: PR rejection rate for agent-opened PRs vs. human-opened PRs. They should converge by month 3.
  • Format and import-order checks that the agent gets wrong occasionally. A small set of rules that block the merge until fixed.

Adoption measurement is the other half. The minimum:

  • Active users per week, per tool. Are the licenses actually being used?
  • Slash-command invocations per week, per command. Which commands are real, which are dead?
  • PRs opened by an agent, accepted vs. rejected. The quality metric.
  • Time from issue open to PR open, agent vs. human. The speed metric.

End of week 10: review hooks in CI, adoption dashboard live, and a clear yes/no on whether the rollout is delivering.

How to drive adoption (without mandating a tool)

Mandate kills adoption. The standard "everyone must use X" announcement produces a 2-week compliance bump followed by a 6-month decline as people route around the mandated tool.

What works:

  • Make the right way the easy way. The shared .claude/commands/ library is faster to reach for than writing a prompt from scratch. The MCP server is faster than asking the on-call.
  • Pair the rollout with a senior engineer who already lives in the workflow. The senior engineer's adoption pulls the rest of the team. Choose the 2–3 most-respected senior engineers and onboard them first.
  • Show the numbers publicly. A weekly "adoption update" in your team channel — not a slide, just a 4-line Loom or a screenshot of the dashboard. The team can see what's working.
  • Reward the right behavior. A PR that catches a hallucinated package before merge is a small win, but a public call-out ("Sarah's review caught a hallucinated API this week") trains the team to do more of it.

What doesn't work:

  • Mandating a single tool. The strongest 2026 patterns use Claude Code for autonomous / scripting work, Cursor for inline editing, and Copilot for the long tail. Mandating any one of them leaves 30–50% of the productivity on the floor.
  • "AI training" sessions. The team learns by doing, not by sitting through a 60-minute deck. The shared priors + a 30-minute walkthrough beats a half-day workshop every time.
  • Mandating coverage metrics. "100% of engineers must use X every day" is the metric that gets gamed. Use active-users, slash-command invocations, and PR quality as the proxy.

What kills adoption

The four things that kill adoption, in order:

  1. A bot that opens bad PRs. If the first 3 PRs an agent opens need a senior engineer to fix, the team learns to distrust the agent forever. The first 3 PRs must be a senior engineer reviewing, not the team trusting the agent.
  2. A review hook that flags every PR. If the review step is noisy, the team bypasses it. Wire the hooks so they catch real issues, not style nits.
  3. An MCP server that flakes. If the agent can't reliably reach the database, the team routes around it. The MCP server is the most critical part of the stack — test it more than anything else.
  4. A shared CLAUDE.md that goes stale. A CLAUDE.md that says "this service uses Postgres 14" when it's actually Postgres 16 is a quiet productivity tax. The file needs an owner, like every other artifact in the repo.

What the team ends up with

By end of week 10, the team ends up with:

  • One .claude/commands/ + one .cursor/rules/ baseline, in version control, in every repo
  • 3 MCP servers (Postgres, issue tracker, observability), read-only, scoped, audit-logged, version-pinned
  • One shared prompt library
  • Review hooks in CI: AI-aware lint, secret scan, format, label
  • One weekly adoption dashboard
  • One named internal owner per MCP server
  • One runbook for "the agent opened a bad PR — what now?"

The work compounds: the senior engineers' prompts become the shared library, the shared library becomes the team standard, the team standard becomes the onboarding material for new hires.

Do this yourself vs hire us

When to do this yourself, when to hire:

Do this yourself if…

  • You have a senior engineer with MCP + AI-tooling experience in-house
  • Your team is under 30 engineers — the rollout is short enough to manage internally
  • You already have a clear "this is the workflow" picture in mind
  • You have executive patience to wait 6+ weeks for the adoption metrics to move

Hire us if…

  • Your team is 30–500 engineers — the rollout needs a senior engineer full-time for 6–10 weeks
  • You want the rules, commands, and MCP servers shipped in version control, not described in a doc
  • You don't yet have a senior engineer with MCP + auth + audit production experience
  • You want adoption measured from week 1, not retrofitted after
  • You want this delivered while your team keeps shipping, not on top of it

Frequently asked questions

Do I need to pick one tool (Claude Code, Cursor, or Copilot)?

No. The strongest 2026 patterns use Claude Code for autonomous / scripting work, Cursor for inline editing, and Copilot for the long tail. Standardize the workflow, not the tool.

How long does an AI coding rollout take?

A focused 6–10 week engagement covers: baseline measurement, rules + commands, MCP for 3 systems, review hooks, and adoption measurement. Larger orgs take longer.

What is the difference between an AI coding rollout and a Copilot / Cursor license deployment?

License deployment gives every engineer a tool. A rollout adds: rules files, slash commands, MCP integrations to internal tools, review hooks, and a measurement layer. License deployment without a rollout is the "free-for-all" that the dev community is complaining about.

What are the right adoption metrics?

Active users per week per tool, slash-command invocations per week, PR acceptance rate for agent-opened PRs vs. human, and time from issue open to PR open. The first three are inputs; the last is the outcome.

How is an AI coding rollout priced?

Most engagements are fixed-scope ($40–120k) for a 6–10 week build, with optional monthly tuning ($5–10k/mo). Outcome-priced retainers are also possible.