
The model layer is solved. The orchestration layer is catching up. The context layer — how agents discover, access, and act on external tools and data — is where the work is now. And it turns out that work is not engineering. It is product design.
The stack is almost complete
For years, building an AI product meant solving three problems at once: which model to use, how to orchestrate it, and how to connect it to real data. Each layer had its own bottleneck.
The model layer matured first. GPT-4, Claude, Gemini — frontier models now ship with reasoning, tool use, and instruction-following that would have seemed unrealistic two years ago. The orchestration layer followed. LangChain, CrewAI, AutoGen, and a dozen other frameworks turned multi-step agent workflows from research projects into something a product team could ship in a sprint.
The context layer was the last open problem. Before MCP, every model-to-tool connection was a custom integration. Build once, for one model, maintain forever. Anthropic launched MCP in November 2024 to standardise that connection. By March 2026, it had 97 million monthly SDK downloads, 10,000+ public servers, and native adoption from OpenAI, Google, Microsoft, and AWS. The Linux Foundation took governance in December 2025. 78% of enterprise AI teams now report at least one MCP-backed agent in production.
The protocol works. The stack is nearly complete. And yet 88% of AI agent pilots still never reach production.
That gap is not a model problem. It is not an orchestration problem. And it is not, strictly speaking, a protocol problem. It is a product design problem — hiding inside infrastructure decisions that nobody on the product team made deliberately.
By “product design,” we mean a specific set of decisions: what capabilities to expose to the agent and when, how to structure tool discovery so the agent can navigate without drowning in context, how to make permissions visible so users understand what the agent does on their behalf, and who in the organisation owns the agent’s behaviour. These are not UX decisions in the traditional sense — they don’t involve screens or layouts. But they shape whether the product works for real people.
The 88% figure comes from Forrester and Anaconda’s 2026 surveys. The 12% that succeed share four attributes: pre-deployment infrastructure investment, governance documentation before deployment, baseline metrics before pilots, and dedicated business ownership. Three of those four are product design decisions.
MCP in numbers:
- 97M monthly SDK downloads (March 2026)
- 10,000+ public MCP servers
- 78% enterprise AI teams — at least one MCP agent in production
- Adopted by OpenAI, Google, Microsoft, AWS
- Linux Foundation governance — December 2025
MCP in numbers:
- 97M monthly SDK downloads (March 2026)
- 10,000+ public MCP servers
- 78% enterprise AI teams — at least one MCP agent in production
- Adopted by OpenAI, Google, Microsoft, AWS
- Linux Foundation governance — December 2025
What MCP actually asks of product teams
MCP defines three primitives: tools (actions an agent can take), resources (context data it can read), and prompts (reusable templates). The protocol handles the plumbing. What it does not handle is the set of decisions that determine whether any of this works for a human being.
Three design problems surface the moment a team moves from “we have an MCP server” to “our users trust this agent.”
Tool discovery: what does the agent see, and when?
The naive approach is to expose every tool at once. An e-commerce platform with 300+ API operations loads over 36,000 tokens in tool definitions before the agent processes a single user request — roughly a fifth of Claude’s context window, consumed before any work begins.
Amazon Prime Video hit this wall in production. Their engineering team found that giving agents comprehensive tool access through centralised MCP servers actually degraded performance. More tools meant more confusion, more hallucination, worse outcomes. Their solution was progressive tool discovery — exposing a single “find tools” capability at initialisation, then loading only three or four context-appropriate tools per task. The Speakeasy team measured the difference: from 405,000 tokens for 400 tools down to 1,600–2,500 tokens. A 160× reduction.
Progressive discovery is, at its origin, an engineering optimisation. Teams adopt it to reduce cost and fight hallucination. But once the engineering pattern exists, a layer of product decisions sits on top. Which tools are in the default set? How does the agent communicate to the user that its current capabilities are limited? What happens when a request falls outside the loaded tool set — does the agent fail silently, escalate, or explain? These are not engineering questions. They are the same questions a product designer asks when structuring a navigation system — except the user is an AI agent, and the navigation happens in a context window instead of on a screen.
Speakeasy: the token math
- 400 tools = 405,000 tokens
- Progressive loading = 1,600–2,500 tokens
- Reduction: 160×
Speakeasy: the token math
- 400 tools = 405,000 tokens
- Progressive loading = 1,600–2,500 tokens
- Reduction: 160×
Permissions: who controls what the agent can do?
Traditional access control was built for humans who log in, do work, and log out. Agents don’t follow that pattern. They run continuously, spin up dynamically, and chain actions across multiple systems in a single workflow. A user asks an agent to “pull last quarter’s sales data and draft a board update.” That single request might touch a data warehouse, a CRM, and a document editor — each with its own credential requirements, each with its own scope of what’s allowed.
The security implications are well documented. 88% of organisations confirmed or suspected security incidents related to AI agents in 2026. When ten agents share one credential set, there is no way to link a specific tool call to a specific user — a direct compliance failure under HIPAA, SOC2, and GDPR.
But the product design implication is less discussed and equally critical: how does the user understand what their agent can and cannot do? In a traditional application, permissions are invisible until you hit a wall. For an agent that acts autonomously, invisible permissions create invisible risks. Designing permission transparency for agent-mediated work is an open UX problem that most teams haven’t started solving.
MCP adopted OAuth 2.1 as its authorisation standard. The protocol answers “can this agent do this thing?” It does not answer “does the user understand what this agent is doing on their behalf?” — which is the question that builds or breaks trust.
Context structure: what does the agent know about the user’s world?
An agent that connects to six tools has access to six different data contexts — each with its own schema, its own freshness, its own level of reliability. The agent needs to decide which context matters for a given request, how to reconcile conflicting data across sources, and how to communicate the limits of what it knows.
Block, the parent company of Square and Cash App, runs MCP across 12,000 employees with default servers connecting to Snowflake, GitHub, Jira, Slack, Google Drive, and internal APIs. Employees report 50–75% time savings on common tasks. But that result didn’t come from plugging in servers. It came from structuring which contexts the agent loads by default, what it has to discover on demand, and how it communicates the boundaries of its knowledge to the user.
This is not a data engineering problem. It is a product design problem: what does the agent’s working memory look like, and who decides what goes in it?
Where design is absent, agents fail
The five root causes cited most frequently in agent scaling failures are integration complexity, inconsistent output quality at volume, absence of monitoring tooling, unclear organisational ownership, and insufficient domain training data. Three of those five — integration complexity, monitoring absence, and unclear ownership — are design problems wearing engineering clothes.
Integration complexity is not about whether the API works. It is about whether the tool surface is structured so the agent can navigate it without drowning in context. Monitoring absence is not about whether logs exist. It is about whether someone designed what gets surfaced to the user, how the system explains what it did, and what the user can do when something goes wrong. Unclear ownership means nobody decided who is responsible for the agent’s behaviour — a product accountability decision, not an org-chart problem.
This pattern connects to what we explored in Where AI Agents Fail: Designing for Control and From UX to AX: Why Interaction Design Breaks When Systems Start Acting. The failure modes described there — agents that automate too aggressively, systems that expose too much process complexity, trust breakdowns when users can’t trace what happened — map directly onto what MCP adoption is revealing at enterprise scale.
Gartner predicts that over 40% of agentic AI projects will be cancelled by 2027 due to escalating costs, unclear business value, or inadequate risk controls. “Inadequate risk controls” is a governance design problem. “Unclear business value” is a product scoping problem. The technology works. The products around it don’t.
The five failure modes
- Integration complexity
- Inconsistent output quality at volume
- Absence of monitoring tooling
- Unclear organisational ownership
- Insufficient domain training data
The five failure modes
- Integration complexity
- Inconsistent output quality at volume
- Absence of monitoring tooling
- Unclear organisational ownership
- Insufficient domain training data
What good looks like
The teams that ship agents to production share a set of design decisions that have nothing to do with which model they chose or which framework they used.
They design tool surfaces, not just tool integrations. Amazon Prime Video’s progressive discovery approach — starting with a single “find tools” entry point and loading three to four tools per task — is an information architecture pattern, not a performance hack. It treats the agent’s context window as a design surface with the same constraints as a mobile screen: limited real estate, high cost of distraction, zero tolerance for irrelevance.
They make permissions visible, not just enforced. The FactSet approach to enterprise MCP governance uses scope-based permissions that adapt to each interaction. More importantly, they surface what the agent can and cannot do as part of the interaction, not as a wall the user hits after the fact. The agent tells the user what it’s about to access before it accesses it. When it hits a permission boundary, it doesn’t fail silently — it tells the user what happened and what their options are.
They design the feedback loop, not just the output. 96% of organisations have adopted agents, but only 12% have a centralised platform to manage them. The teams that close the operational gap build observability into the user experience, not as a backend dashboard that only engineers see.
At The Gradient, we’ve been building with MCP in our own content operations — connecting AI agents to Sanity CMS for editorial workflows that span drafting, metadata, and publishing. The pattern we’ve seen firsthand mirrors what the enterprise data shows: the technical integration is the easy part. The hard part is designing what the agent should do with the access it has, how it communicates what it did, and what the user’s recourse is when the output isn’t right.
The context layer is the next frontier
For the past decade, the interface meant screens. Layouts, components, navigation patterns, responsive breakpoints. Product designers spent their time deciding where buttons go, how flows connect, and what the user sees at each step.
MCP is one protocol within a broader context layer that also includes retrieval systems, memory architectures, prompt assembly, and intent routing. But MCP’s rapid adoption is making visible a set of problems that exist across that entire layer. When agents mediate the interaction between users and systems, the product decisions shift — from layouts and flows to behaviour, trust, and accountability.
This is the argument we’ve been building across this series since we first made the case that UI in the classical sense is losing its central role. MCP makes that shift tangible. It gives teams the plumbing to connect agents to real systems. It does not give them a framework for deciding how those connections should behave from the user’s perspective.
The product decisions — what to expose, how to structure discovery, how to make permissions legible, how to build trust through transparency — are where the work actually happens. Product teams that treat MCP as backend infrastructure will build agents that work in demos and stall in production. The ones that treat it as a design surface will build the products that earn the 171% ROI the data promises.
The models are ready. The orchestration is ready. The protocols are ready. What’s still missing is the product thinking that turns working infrastructure into products people trust.
What does this mean for your product?
Discuss with your AI.

A Product Strategist with over 13 years of experience in marketing, product strategy, and branding. His love for analytics, funnels, and a structured approach ensures that the digital products we craft aren't just functional—they impress.

