Skills Over MCP Working Group - June 16th 2026 Meeting Notes #2941

olaservo · 2026-06-18T12:27:33Z

olaservo
Jun 18, 2026
Maintainer

Repo: modelcontextprotocol/experimental-ext-skills

Attendees

Name	Organization
Ola Hungerford	Nordstrom / MCP Maintainer
Peter Alexander	Anthropic / Core Maintainer
Sambhav Kothari	Bloomberg / MCP Maintainer
Jonathan Hefner	MCP Maintainer / Agent Skills Maintainer
Aditya Kumar	Glean
Sriram Panyam	GM
Sam Kantor	Slack
Ian	Slack

1. SDK strategy: where experimental helpers live, and getting skill support shipped

The Go SDK currently has no easy extension mechanism in order to incorporate the new pieces that the SEP introduces. The fallback is forking. The most practical answer will most likely be to release and coordinate once the SEP is merged.

On where experimental helper code should live: Sam shared the pattern used for other experimental extensions such as the Interceptors WG repo: a polyglot repository implementing each language under the experimental namespace, plus client implementations for clients that support extension. Adding new methods / extension points is more straightforward for TypeScript, Python, and C#; Go is the hard case.

The near-term plan: ideally these land in the official SDKs once stable. The group has signoff to do a point release of the v1 TypeScript SDK with skill support, to get it out ahead of the next spec release (with the understanding it'll need redoing for the v2 SDK).

How an SDK exposes dynamic pluggability for extensions is unspecified from MCP's side (mirroring how OAuth-style extensions just live in the SDKs). The group also discussed a follow-up SEP giving tier-one SDKs normative extensibility requirements (e.g. _meta must be extensible; servers must be able to register new RPC methods). The group agreed on starting with non-normative guidance now that there are a few extensions to generalize from.

2. Reserving the `io.modelcontextprotocol` namespace in skill frontmatter `metadata`

One concrete ask for the spec is to explicitly reserve the io.modelcontextprotocol reverse-domain namespace inside the skill frontmatter metadata block (the extension point owned by the Agent Skills spec), so that nobody uses io.modelcontextprotocol/* keys ad hoc before they go through the SEP. Other reverse domains remain fair game; the point is to keep a clean migration path once any of these keys are formalized.

The motivating case is the primitive-grouping / tools experiment: letting a skill declare the tools that are relevant to it so they can be loaded on demand. It's working in early experiments, and Sam Morrow (GitHub) has a separate implementation using an io.mcp/tools-style key for the same idea. That work is adjacent exploration, not part of SEP-2640, which is exactly why the group wants it parked under a reserved-but-unformalized namespace rather than colliding with eventual standard keys.

A clarifying point landed during discussion: skills are surfaced as a capability, but individual frontmatter metadata keys don't bubble up to MCP capabilities, so capabilities can't carry this signal — it has to live in the metadata. Peter confirmed the Agent Skills metadata field is explicitly for client-defined additional properties, so this fits. Sam offered to open a small wording suggestion.

3. Observability and skill-usage reporting

Aditya raised observability as something not strictly blocking but core to how skills get used in practice: can MCP hosts report back to the skill provider whether a skill is actually being used, and surface drift — e.g. a skill that silently stopped triggering overnight, the kind of thing teams already catch with tools. The hard constraint everyone returned to is that there's no way to enforce reporting at the harness level: a server or SDK can emit metrics, but no harness is obligated to, and "what counts as a read" is genuinely ambiguous once harnesses grep across many installed skills. Sriram and Sam both leaned toward treating this as a client-side observability concern (emit appropriate metrics when the harness walks skills) rather than something the skills extension itself mandates — it's the same shape as any cacheable resource read.

Ola noted a related thread she's been prototyping: using interceptors for skill attribution — a standardized passive hook between the skill and its invocation (at the client or gateway layer) that inspects which primitives pass through and what can be parsed around them, rather than just counting invocations. Sriram pushed on the semantics: skill execution is closer to the agent inlining a macro than a discrete invocation, so an interceptor can't reliably tell that the body was actually run; Ola framed her approach as deliberately more passive than that. Sam drew the parallel to citation enforcement, where a system instruction asks the chain to carry an attribution token and the call is blocked if it's missing. The model can still emit a malformed one, and you can go as deep as content-matching depending on how much you care.

Aditya grounded the use case: Glean holds a company's skills and knows its users, so when an admin pipes those skills into another system (e.g. Claude), confirming the skills are being used is both a governance and an observability need. It also matters whether usage is implicit or user-invoked (mostly implicit, as skill selection becomes automatic over time). Glean currently captures this via a plugin with hooks, which feels like a roundabout path to something that should work out of the box. Consensus: this is harness-dependent and broader than skills, so Aditya will move the thread to the observability group; the skills side can still make sure it exposes the right primitives (MCP resource URI, skill URI) so any skill-respecting harness has something to attribute against. Not a blocker for landing SEP-2640.

4. Non-sandbox agents and filtering out code-execution skills

Sam Kantor and Ian joined from Slack, where they've been building Slackbot's AI agent. Their first question: Slackbot is not a code-execution agent — no sandbox — and they wanted to know whether other agents share that constraint and how the SEP accommodates it. They can unzip an archive into something and simulate execution, but it's rough; earlier skill approaches that leaned on tool-calling were simpler for non-coding agents.

Sam Kothari shared experience with implementing the SEP for a traditional, non-sandbox agent: you technically only need a resource-read capability, which can be modeled as an internal harness tool — not every server has to expose a read tool. The harness consumes the skill index on startup and applies whatever selection strategy it likes (LLM- or keyword-based), then loads the skill. The key for non-coding agents is filtering to skills that don't require code execution. Today that's an internal convention at Bloomberg — exactly the kind of thing the reserved namespace (§2) or eventually the Agent Skills spec could potentially standardize. Concretely, Bloomberg keys off the Agent Skills directory model: if a skill ships a scripts/ folder (where the spec says scripts live), they treat it as requiring execution and filter it out; they also restrict served file types to an allowlist (plain text and similar pass, others are blocked).

On whether this belongs in the SEP: Peter's view is that compatibility/execution signaling should be solved in the Agent Skills spec itself, since it applies equally to skills downloaded from GitHub or anywhere else — solving it MCP-only would just force it to be solved again elsewhere, differently. Jonathan clarified the current state: the frontmatter compatibility field exists but is free text, meant to be evaluated by an LLM against its environment rather than checked deterministically. This is a deliberate "AI-maximalist" choice, because enumerating every possible qualification isn't tractable.

5. Discovery for very large or unenumerable catalogs

Slack's second question was about scale: how should a client discover skills over MCP when a server has a very large, generated catalog and doesn't want to serve a full index (a partner expects tens of thousands of skills)? Sam outlined the two index-side options already available: run the catalog through an LLM once against compatibility criteria (slow first load, fast and context-cheap after) or filter the index programmatically on metadata keys / keywords. Slack's scenario is the case where there's effectively no enumerable index to filter.

The discussion converged on treating this as the general problem of navigating a large file tree, not a skills-specific one — which is why the extension recently added directory-style reads. Several threads:

Sriram suggested hierarchical / paginated indexes: a top-level index that yields more as the client drills in, with out-of-hierarchy references returning invalid params. Resource listing already paginates, so skill listing reasonably could too.
Peter floated a single index entry pointing at a directory with a "go explore" instruction, leaning on the new resource directory-read verb so the model can traverse — while noting the hard part is giving the model a reason to explore one giant undifferentiated tree.
Sam (Slack) described offloading selection from the LLM into something closer to a recommender — give it context, get back matching skills — without the SEP necessarily owning that, but with enough flexibility that it's possible. Sriram cautioned that pushing a filter parameter onto the server risks requiring the server to run its own LLM, and that arbitrary server-side filtering semantics won't be standardized across implementations.
Ola noted that another entry point could be a discovery tool that tells the model what's available to progressively disclose (like a routing or manifest), with skills declaring their own file dependencies following existing conventions. She noted servers that know they're sitting on a flat, non-navigable blob could choose to ship their own helper — as she did with resource templates and completions in an experimental version of the GitHub MCP server — and can also choose to prune or scope skills dynamically (e.g. exclude skills irrelevant to a known user).
Jonathan offered a sizing data point against premature panic: at ~500-char descriptions, short names, and ~50 bytes overhead, 20,000 skills is roughly an 11 MB index — not small, but tractable. The sharper problem is selection across multiple servers each vendoring large catalogs, which Aditya and Jonathan agreed is a harness-level problem (Glean solves it at the harness layer, aggregated across systems); pagination and mandated server-side search aren't obviously the right tools.

Sense of the room: don't block on this. Add non-normative recommendations to the SEP (e.g. "a server with a large catalog SHOULD offer a skill-search resource/tool endpoint"; arbitrary semantics, but a known shape harnesses can rely on) rather than mandating structure. Slack confirmed the SEP fits their use case in some form, which was the bar they came for.

The group also discussed spelling out how clients without a filesystem should implement skills — pointing at a standard load-resource tool, even abstractly, or to a worked example. Ola noted that detailed implementation guides and examples should most likely live outside the SEP so the normative requirements don't get too buried.

6. Go SDK reference implementation demo (Sriram)

Sriram walked through an early, experimental Go SDK serving three example skills. The server runs as a normal MCP server and works across legacy, stateful, and stateless transports. Discovery flow: list returns the skills; skill://index.json is autogenerated by walking the skill tree (a custom index can be supplied instead), and a digest is generated alongside it to detect changes. Reading the index and reading individual skills is straightforward, and relative paths resolve against the skill folder. Archives aren't in the demo but are wired the same way — a zip is sent out like any other resource. He raised a design question that Slack's earlier point sharpened: if a server offers the same skill as a normal directory, a zip, and a tarball under three prefixes, is that one index or several? — a case where hierarchical indexes could help. Observability hooks exist but his Grafana stack isn't wired up yet; that's slated for a future demo.

7. Index digests, caching, and the coming changes

The group raised a caching question that dovetailed with the digest work: there's a useful per-skill digest for avoiding rehydration — should the whole index have an equivalent, so a server can tell a client "the index hasn't changed" (ETag-style) and the client can skip refetching? The group worked through where index freshness would even live: there's no resources/metadata call yet, so today you'd read last-modified off resources/list (works, but inelegant), or pass it in _meta on the response (non-standard). Peter's framing: the real goal is to avoid downloading the index at all when it's unchanged. A server-sent change notification could help and may be worth a notification type in the SEP, though it overlaps with the separate MCP events work and Slack agreed it's somewhat out of scope and driven by their very-large-index case.

Ola's closing point: a lot of things are about to shift with the next spec: TTLs / caching semantics on resources, the directory-listing work this WG has been prototyping, and changes to protocol behavior generally — so once the index shape is finalized the group should revisit how those changes flow through. Aditya noted the same invalidation question applies to tools (how does a client know a tool was updated), suggesting a more central mechanism, and Ola pointed to the primitive-grouping group as the place where related caching/discovery conversations (including the cost implications of invalidating context) are happening, for anyone who wants to join.

Next steps

Land SEP-2640. No blockers surfaced.
Namespace reservation. Reserve io.modelcontextprotocol within the skill frontmatter metadata block (Sam to open a small wording suggestion); keep the primitive-grouping / tools experiment parked there until formalized. (Edit: Sam added comment here)
TypeScript SDK. Ship skill support in a v1 point release; plan to redo for the v2 SDK. Ola to close the WG-repo PR in favor of the official integration.
Discovery at scale. Add non-normative recommendations to the SEP for large/unenumerable catalogs (e.g. an optional skill-search endpoint) rather than mandating structure; keep modeling discovery on directory/filesystem semantics.
Observability. Aditya to carry skill-usage reporting to the observability group; skills side to ensure the right attribution primitives are exposed. Sriram to wire up observability in a future Go SDK demo.
SDK extensibility guidance. Consider a follow-up (non-normative to start) on what tier-one SDKs should provide for extensions.

Discord channel link for follow-ups and discussion: #skills-over-mcp-wg

Meeting notes prepared by Claude from an auto-generated transcript by Gemini and manually edited.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skills Over MCP Working Group - June 16th 2026 Meeting Notes #2941

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Skills Over MCP Working Group - June 16th 2026 Meeting Notes #2941

Uh oh!

Uh oh!

olaservo Jun 18, 2026 Maintainer

Attendees

1. SDK strategy: where experimental helpers live, and getting skill support shipped

2. Reserving the io.modelcontextprotocol namespace in skill frontmatter metadata

3. Observability and skill-usage reporting

4. Non-sandbox agents and filtering out code-execution skills

5. Discovery for very large or unenumerable catalogs

6. Go SDK reference implementation demo (Sriram)

7. Index digests, caching, and the coming changes

Next steps

Replies: 0 comments

olaservo
Jun 18, 2026
Maintainer

2. Reserving the `io.modelcontextprotocol` namespace in skill frontmatter `metadata`