How to build a production MCP server

Last verified: June 2026· playbook

Week 1: scope + auth + the read-only smoke test

The first week is about scope, not code. Before you write a line, you need:

One named internal owner. An engineer who can answer "should the agent be able to call this tool?" and who has on-call access to the system. If you don't have a named owner, the server ships into a void.
A specific use case. "Expose Datadog to Claude Code" is too broad. "Let the agent pull the traces for the last 30 minutes of latency for a service, scoped to on-call" is specific. The narrower the use case, the faster the server ships.
Read-only credentials. For the first build, the server should be read-only. Every tool returns data, nothing mutates. If the use case requires write, add one write tool at a time, deliberately, with a human approval gate.
The auth story. Who issues the credential? Where does it live? How is it rotated? If the answer is "I'll figure that out later", the server is not production.

End of week 1: a one-page spec covering the above, plus a read-only credential, plus a "this is the smoke test we'll run before the server is allowed to take a write request" check.

Week 2: tools + transport + the production concern

Now write the server. The work is mostly typing and packaging.

Tool surface:

Start with 3–5 tools. list_X, get_X, search_X. Discoverable, typed, named for the action an agent would actually take.
Each tool has a clear description. This is what the agent reads to decide which tool to call. Vague descriptions = wrong tool calls. Spend a paragraph on each one.
Return structured responses. JSON in, JSON out. The agent reasons better on structured data than on prose.

Transport:

stdio for local dev (Claude Code / Cursor running the server as a subprocess)
HTTP + SSE or streamable HTTP for shared deployments (a single server instance the team connects to)
Pick one and stay with it. Multi-transport is a maintenance tax.

The "production concern" — the one thing the system would let the agent do that you don't want — is what differentiates a demo from a production server. For Datadog, it's read-only scoping so an agent can't mutate monitors. For Postgres, it's a read-only DB role against a replica. For Stripe, it's a restricted key with no refund authority. Document it in the spec; design around it.

Week 3: rollout, audit, and the gate

Week 3 is rollout, not new code.

Ship the server behind a feature flag. Roll it out to one engineer, then a small team, then the whole org. Watch the audit log.
Audit log every tool call. Who, what, when, with the tool input + the system response. Ship the log to the same observability stack as the rest of your production. If you can't query it like a normal log, the log is not there.
Gate high-impact tools behind human approval. The MCP spec supports a permissions model. Use it. A "delete this Postgres table" tool should require an explicit human "yes" before it runs.
Document and hand off. One runbook. The internal owner can debug, version, and roll the server forward without you.

End of week 3: one named owner, one credential, one audit log, one runbook, one team using it daily.

Patterns that scale to a team

Version pinning. Servers change. Pin the version (or commit hash) in the MCP config; a breaking change in a dep doesn't quietly rewrite your team's tools.
One credential per agent, not per user. Easier to rotate, easier to audit, easier to revoke. Tie the credential to the agent, not the engineer running it.
Tool naming = action. search_postgres not postgres_search. get_datadog_monitor not datadog_monitor_info. The agent reads the tool name first; verb-first wins.
Long responses → structured summaries. If a tool returns 5,000 tokens of log lines, the agent burns context. Summarize on the server side; return what an agent would actually need to act.
Cross-team server sharing. One MCP server, many agents (Claude Code, Cursor, Cline, your own internal one). The standardization is the value.

Anti-patterns that bite

Hardcoded credentials in the config. The single most common mistake. Use env vars, 1Password CLI, Doppler, or Vault.
Full-access tokens. If a token can delete a production table, an agent prompt injection will eventually try.
No audit log. The audit log is your incident response. Without it, you cannot tell which tool call caused the bad outcome.
Skipping the read-only smoke test. The cheapest way to validate scope is to do a clearly read-only task first, then graduate to writes.
Over-broad tool surface. 30 tools that overlap = the agent picks wrong. 5 sharp tools > 30 vague ones.

Do this yourself vs hire us

When to do this yourself, when to hire:

Do this yourself if…

You have a senior engineer who has shipped an OAuth + read-only database role before
You have one or two specific use cases, not 10+
Your named internal owner has 3+ weeks of dedicated time
The credentials are scoped and rotatable already (most aren't, by the way)

Hire us if…

You have 5+ internal systems to expose and want them shipped in 6–8 weeks, not 6+ months
The credentials aren't ready (no read-only role, no service account, no OAuth app)
You don't have a senior engineer who has shipped MCP + auth + audit in production
You need the audit log wired into your observability stack from day one
You want a defined scope, a fixed price, and a senior engineer shipping in your repo

Frequently asked questions

How long does it take to build a production MCP server?

A focused 3-week engagement per server, assuming a clear use case, a named internal owner, and a read-only credential. The first server takes longer (the patterns are set); subsequent servers reuse them.

What is the difference between an MCP server and a tool / function call?

A tool / function call is provider-specific (OpenAI tools ≠ Claude tool use ≠ Gemini function calling). MCP is a portable, typed, discoverable protocol — one server works across Claude Code, Cursor, Cline, and any future MCP-capable client.

Do I need to host the MCP server myself?

For internal systems (Postgres, deploy, internal APIs) yes — the server should run in your infra, with your auth, with your audit log. For SaaS systems, vendors like Composio or Anthropic ship pre-built servers you can use as-is.

What credentials does the MCP server use?

Best practice: a service-account credential scoped to the system (read-only by default), stored in a secret store, injected at the boundary. Never a personal user token; never hardcoded in the config.

Can the agent call multiple MCP servers at once?

Yes. In Claude Code and Cursor, you register as many servers as you want; they're merged into one tool list. If two servers expose a tool with the same name, the most recently registered wins (or you can disambiguate in settings).