⚠️ Disclaimer: This post reflects personal experiences and opinions. It is not affiliated with, sponsored by, or endorsed by my employer. The technical analysis and architecture patterns described here are shared for educational purposes. Always evaluate any architecture against your own requirements before adopting it.

With Agent Skills, Do We Still Need MCP?

📅 2026-03-20📖 ~12 min readAI AgentsMCPAgent SkillsArchitecture
MCP and Agent Skills solve different problems. MCP standardizes tool access (the hands) but eagerly consumes context and lacks orchestration logic. Skills encode domain expertise (the brain) but struggle with environment dependencies and portability. The winning architecture uses both: skills declare what capabilities they need, MCP servers provide those capabilities with full environment isolation, and agent runtimes bind them together. Neither replaces the other — they’re different layers of the same stack.

You just connected 15 MCP servers to your AI agent. GitHub, Slack, databases, file systems, APIs — the works. Context window: 40% consumed before a single user message. The agent sees 120 tool definitions and picks the wrong one half the time. Sound familiar?

Meanwhile, a colleague installs an Agent Skill. One line loads into memory. The agent handles the same task flawlessly — but the skill’s Python script crashes on their machine because pandas isn’t installed.

Two extensibility patterns. Two very real failure modes. And a growing question in the AI agent community: now that we have Agent Skills, do we still need MCP? Or does one make the other redundant?

The answer is more nuanced than a simple yes or no. This post examines both patterns, their architectural trade-offs, and proposes a hybrid architecture that plays to both strengths.

What MCP Actually Does

Model Context Protocol is an open standard — originally introduced by Anthropic — that standardizes how AI agents discover and invoke external tools. Think of it as the USB-C of the agent world: one standard interface that replaces a mess of proprietary integrations.

Before MCP, if you had N agents and M tool providers, you needed N×M custom integrations. MCP reduces this to N+M: each tool provider implements one MCP server, each agent implements one MCP client, and they just work together. This is the exact same value proposition that LSP (Language Server Protocol) brought to code editors — and it worked spectacularly there.

An MCP server exposes tools (functions the agent can call), resources (data the agent can read), and prompts (reusable templates). The protocol handles discovery, invocation, and response formatting through a standardized JSON-RPC interface.

Example: A weather-data MCP server might expose tools like get_forecast(location), get_historical(location, date_range), and get_alerts(region). Any MCP-compatible agent can discover and call these tools without knowing anything about the implementation — whether it uses Open-Meteo, WeatherAPI, or a proprietary data feed.

What MCP does well

  • Interoperability: Write a tool once, use it from Claude Code, Cursor, OpenClaw, or any MCP-compatible agent
  • Encapsulation: The server manages its own dependencies, authentication, and runtime environment
  • Discovery: Agents can dynamically discover available tools and their schemas
  • Ecosystem growth: The standard enables a marketplace of tools — anyone can publish an MCP server

Where MCP struggles

Context window consumption. This is MCP’s most practical limitation. When an agent connects to MCP servers, it loads all available tool schemas into its system prompt — tool names, descriptions, parameter definitions, everything. Connect 10 MCP servers, each exposing 5–10 tools, and you’ve consumed several thousand tokens before the conversation even starts. Every tool call’s request and response further accumulates in the context window.

No orchestration logic. MCP tools are independent and stateless. A GitHub MCP server doesn’t know a Slack MCP server exists. If you want a workflow like “fetch PR details from GitHub, run analysis, post results to Slack,” the orchestration has to come from somewhere else — either the agent’s own reasoning or a higher-level abstraction.

Eager loading. MCP’s design is inherently eager: connect a server, load all its tools. There’s no standardized mechanism for lazy loading or dynamic tool registration based on the current task. The community is exploring solutions (tool routing layers, dynamic registration/deregistration), but nothing is standardized yet.

Tool proliferation. As the ecosystem grows, agents face a paradox of choice. With 200 available tools, the agent has to read through all their descriptions to decide which one to use — burning context and increasing the chance of picking the wrong tool.

What Agent Skills Actually Do

Agent Skills (also called AgentSkills, based on an open standard by Anthropic) are a fundamentally different kind of extension. While MCP extends what an agent can do, skills extend what an agent knows how to do.

A skill is essentially a domain expert’s decision tree, encoded in a format an agent can internalize. It consists of:

  • SKILL.md — The core knowledge file: when to trigger, what steps to follow, what quality standards to apply, what pitfalls to avoid
  • scripts/ — Lightweight executable scripts (bash, Python) the skill can invoke
  • references/ — Deep reference material loaded on demand
  • assets/ — Supporting files (templates, configs, sample data)

The critical mechanism is the description field: a one-line summary that tells the agent when this skill is relevant. The agent framework matches incoming tasks against skill descriptions and loads the full SKILL.md only when there’s a match.

Example: An aws-blog-review skill doesn’t just “check articles” — it contains the complete AWS editorial standards: prohibited language lists, structural requirements, SEO rules, compliance checklists. This domain knowledge can’t be expressed as a single MCP tool call.

What skills do well

  • Lazy loading: Only the one-line description lives in memory. The full SKILL.md loads on demand, and unloads after use. This is fundamentally more context-efficient than MCP’s eager approach.
  • Domain knowledge encoding: Skills capture how to think about a problem, not just how to execute a function. They encode expert judgment, quality standards, edge cases, and decision logic.
  • Multi-tool orchestration: A single skill can coordinate multiple tools in a coherent workflow, with conditional logic, error handling, and fallback strategies.
  • Progressive disclosure: Skills can structure knowledge in layers — load the overview first, then drill into references only when needed, minimizing context consumption.

Where skills struggle

Environment dependencies. This is the Achilles’ heel. A skill’s scripts/ directory might contain a Python script that requires boto3, pandas, and matplotlib. But the host machine might not have these installed — or might have the wrong version. Current solutions are ad hoc:

  • uv run --with boto3,pandas python3 scripts/analysis.py — creates a temporary virtual environment (fast, clean, but only works if uv is installed)
  • A setup script in the skill — fragile and not standardized
  • Assuming the environment is pre-configured — works on your machine, breaks on everyone else’s

There is no standardized dependency declaration or runtime resolution mechanism for skills today. This severely limits portability: a skill that works perfectly on one machine may fail silently on another.

No standard runtime contract. Unlike MCP servers (which are self-contained processes with their own environments), skills run in the agent’s environment. They inherit whatever Python version, system packages, and permissions the host has. This makes them lighter but more fragile.

The Real Difference: Who Owns Control?

Beyond the technical details, there’s a philosophical difference that matters for architecture decisions.

MCP is tool-centric. The tool provider defines what you can do, what parameters to pass, and what format comes back. The agent is the caller, but the rules are set by the tool. It’s like ordering from a menu — the kitchen decides what’s available.

Skills are agent-centric. The agent decides when to trigger, what workflow to follow, and what quality standards to apply. Tools are just execution means. It’s like cooking at home — the recipe is yours, and you can swap utensils freely.

This control difference determines composability. Two MCP servers are unaware of each other. But a skill can orchestrate multiple tools in a single workflow: “get data from source A, cross-reference with source B, apply business rules, format output, deliver to destination C.” This cross-tool orchestration capability exists only at the skill layer or in the agent’s own reasoning.

The Hybrid Architecture: Skills as the Brain, MCP as the Hands

So, can we combine them? Not only can we — we probably should. The two patterns solve complementary problems, and their weaknesses are each other’s strengths.

The core idea: MCP encapsulates environment dependencies. Skills encapsulate domain logic.

Consider a concrete example: multi-cloud infrastructure monitoring. The current approach crams everything into a single Python script inside a skill — data collection (boto3, azure-sdk), metric calculation (pandas/numpy), visualization (matplotlib), and alerting logic (when to page on-call, what thresholds matter). Environment dependencies, data access, and domain knowledge are all tangled together.

With a hybrid architecture:

MCP layer (solves the environment problem):

  • A cloud-metrics MCP server encapsulates the cloud SDKs, exposing get_cpu_utilization, get_cost_breakdown, and get_alert_history
  • It manages its own Python environment, dependency versions, and credential management
  • It can run in a container, on a remote server, or as a Lambda function — the skill never needs to know

Skill layer (solves the cognition problem):

  • SKILL.md defines: when to trigger monitoring, the escalation criteria for different severity levels, how to correlate cross-service anomalies
  • It calls MCP tools to get data, then applies domain knowledge to make judgments
  • It doesn’t need to know how boto3 is installed or what Python version is running

What this solves

1. Portability. Skills are no longer tied to a specific environment. The same infra-monitor skill works on a Mac calling a local MCP server or on an EC2 instance calling a remote one — identical logic, different execution backends.
2. Context window efficiency. Here’s the key insight: not all MCP tools need to be loaded at all times. When a skill is triggered, it tells the agent which MCP tools are relevant for this specific task. The skill acts as a tool router for MCP — tools load on demand, guided by the skill’s knowledge of what’s needed. Without the skill, the agent has to load everything and figure it out itself.
3. Reuse. A single cloud-metrics MCP server can be used by multiple skills: a cost optimization skill, an incident response skill, and a capacity planning skill. Each skill brings different domain logic, but the underlying data source is shared.
4. Testability. MCP servers can be tested independently (deterministic input/output). Skill logic can be reviewed independently (are the decision criteria correct?). When they’re tangled in one script, changing data access logic can accidentally break business logic.

The architecture in layers

🚀 Hybrid Architecture Stack
┌─────────────────────────────────────────────┐
│                  Agent Layer                 │
│    Matches tasks to skills, manages context  │
├─────────────────────────────────────────────┤
│                  Skill Layer                 │
│  Domain knowledge, orchestration, decisions  │
│  (SKILL.md + lightweight scripts)            │
├─────────────────────────────────────────────┤
│                  MCP Layer                   │
│  Tool execution, environment isolation,      │
│  standardized interfaces                     │
└─────────────────────────────────────────────┘
  • Skill layer: Pure text knowledge + lightweight scripts (bash/curl-level), zero external dependencies
  • MCP layer: Encapsulated capabilities with full runtime environments, standardized interfaces
  • Agent layer: Uses skill guidance to invoke MCP tools, manages the conversation

Making It Work: Capability Declaration

For this hybrid to work at scale, we need a mechanism for skills to declare what capabilities they need without binding to specific MCP servers. Think of it like Kubernetes resource requests: declare what you need, let the runtime figure out who provides it.

A skill could declare:

yaml
requires_capabilities:
  - cloud_metrics
  - chart_generation
  - notification_delivery

The agent runtime resolves these against available MCP servers:

  • cloud_metricsaws-cloudwatch-mcp (local) or datadog-mcp (remote)
  • chart_generationmatplotlib-mcp (container) or plotly-mcp (local)
  • notification_deliveryslack-mcp or pagerduty-mcp

If an MCP server isn’t available, the runtime falls back to the skill’s built-in scripts (if any). Same behavior, different execution path. This gives skills maximum portability: they work in MCP-rich environments with full isolation, and degrade gracefully in simpler setups.

Fallback in practice

Here’s what the resolution flow looks like concretely. Say an infra-monitor skill declares requires_capabilities: [chart_generation] and the agent triggers it:

1. Runtime checks MCP registry: Is there an MCP server that provides chart_generation? → Yes, matplotlib-mcp is running locally. Use it. The skill calls generate_chart(service, period, metrics) through the MCP interface. The server handles matplotlib, numpy, font rendering — all inside its own environment.
2. MCP server unavailable (not installed, crashed, remote endpoint down): Runtime checks if the skill has a built-in fallback. The skill’s scripts/ contains generate_chart.py with a # requires: matplotlib, pandas header. Runtime uses uv run --with matplotlib,pandas python3 scripts/generate_chart.py to execute it in a temporary virtual environment. Same output, slightly slower, no pre-configured server needed.
3. No MCP, no script, no uv: Runtime reports to the agent: “capability chart_generation unavailable.” The skill’s SKILL.md has a fallback instruction: “If chart generation is unavailable, output the data as a formatted table instead.” The agent degrades gracefully — the user still gets the analysis, just without the visual.

This three-tier fallback (MCP → skill script → graceful degradation) means a skill author can target the best-case environment without breaking in the worst case. The skill’s logic is the same at every tier; only the execution path changes.

This is fundamentally different from hardcoding requires_mcp: ["cloudwatch-server"] in a skill, which binds to a specific implementation. Capability-based declaration is provider-agnostic.

Where Not to Split: The Monolith Has Its Place

Not every script should be decomposed into MCP tools. If a Python script is a tightly-coupled pipeline — “fetch data → calculate → output conclusions” — with intermediate steps that depend on each other, splitting it into multiple MCP tool calls introduces unnecessary overhead:

  • Latency: Each call round-trips back to the agent for the next decision
  • Token cost: Each intermediate result consumes context window
  • Complexity: The agent has to reason about a multi-step data pipeline that the script handles in milliseconds

The right heuristic: extract reusable data capabilities into MCP; keep one-shot orchestration pipelines in skill scripts. It’s the microservices vs. monolith decision — more granular isn’t always better. Cut at the right boundaries.

Scenario MCP? Skill Script? Why
Cloud metrics API used by 3+ skills Reusable, environment-heavy, stable interface
50-line fetch → calculate → format report Tightly coupled, single-purpose, splitting adds latency
Chart generation with matplotlib/plotly Heavy dependencies, reusable across skills
Conditional workflow: “if error rate > 5% AND latency spike, escalate” Domain logic, not a reusable capability
Database connector with auth management Credentials + connection pooling belong in isolated process
One-off data transformation for specific output Too specific to be worth the MCP overhead

A real-world example of when not to split: consider an infrastructure health check script that queries 15 services, calculates availability percentages, cross-references against SLA thresholds, correlates with recent deployment events, and outputs a single go/no-go deployment readiness verdict. The entire pipeline takes 3 seconds as a monolithic script. Splitting it into MCP calls would mean 15+ round-trips back to the agent, each burning context tokens for intermediate results the user never sees. Keep it as a skill script. Extract the cloud metrics API as MCP for reuse elsewhere, but let the analysis pipeline stay monolithic.

The Current Gap: What’s Missing

For this hybrid architecture to mature, several pieces are still missing from the ecosystem:

1. Standardized skill dependency declaration. No current skill standard includes a formal mechanism to declare required capabilities or MCP servers. Skills either bundle their dependencies (fragile) or assume they exist (brittle).
2. Dynamic MCP tool loading. The MCP protocol doesn’t yet have a standard way for agents to dynamically load and unload tool sets based on the current task. This would allow skills to control which MCP tools are active, solving the context bloat problem.
3. Capability registry. There’s no standard way to map abstract capabilities (“I need cloud metrics”) to concrete MCP implementations (“use this server at this endpoint”). Each agent framework does this differently, if at all.
4. Fallback negotiation. When a required MCP server is unavailable, there’s no standard mechanism for the runtime to negotiate fallbacks — use a different MCP server, fall back to a skill script, or gracefully degrade.

Conclusion: Not Either/Or — It’s a Stack

So, do we still need MCP now that we have Agent Skills?

Yes — but for different reasons than you might think.

MCP isn’t just about connecting to tools. It’s about environment isolation and standardized execution. Skills aren’t just about knowing things. They’re about encoding expert judgment and orchestrating complex workflows.

The real question isn’t “MCP or skills?” — it’s “where do you draw the line between them?” Our answer:

  • MCP owns the execution boundary: environment management, dependency isolation, standardized tool interfaces, cross-agent interoperability
  • Skills own the knowledge boundary: when to act, what steps to follow, how to evaluate quality, how to orchestrate multiple tools into coherent workflows
  • The agent owns the decision boundary: matching tasks to skills, resolving capabilities to MCP servers, managing context

Neither replaces the other. MCP without skills is a toolbox without a craftsman’s knowledge — powerful tools sitting idle because no one knows when to use them. Skills without MCP are expertise without reliable hands — knowing exactly what to do but struggling with environment dependencies and tool fragmentation.

The future of agent extensibility isn’t choosing between MCP and skills. It’s building the stack that lets them work together — where skills declare what capabilities they need, MCP servers provide those capabilities with full environment isolation, and agent runtimes handle the binding between them.

We’re not there yet. But the direction is clear.


About the author: Melanie Li is an AWS Solutions Architect specializing in machine learning. She writes about AI agents, cloud architecture, and the tools that shape how we build intelligent systems. Find her on LinkedIn and at melanieli.com.au.

⚠️ Disclaimer: This post reflects personal experiences and opinions. It is not affiliated with, sponsored by, or endorsed by my employer. The technical analysis and architecture patterns described are shared for educational purposes only.

Related Posts

Comments

No comments yet. Be the first to share your thoughts!