Skills Over MCP Working Group - June 16th 2026 Meeting Notes #2941
olaservo
started this conversation in
Meeting Notes - Skills Over MCP WG
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Repo: modelcontextprotocol/experimental-ext-skills
Discord: #skills-over-mcp-wg
SEP: SEP-2640 (Skills Extension)
Attendees
1. SDK strategy: where experimental helpers live, and getting skill support shipped
The Go SDK currently has no easy extension mechanism in order to incorporate the new pieces that the SEP introduces. The fallback is forking. The most practical answer will most likely be to release and coordinate once the SEP is merged.
On where experimental helper code should live: Sam shared the pattern used for other experimental extensions such as the Interceptors WG repo: a polyglot repository implementing each language under the experimental namespace, plus client implementations for clients that support extension. Adding new methods / extension points is more straightforward for TypeScript, Python, and C#; Go is the hard case.
The near-term plan: ideally these land in the official SDKs once stable. The group has signoff to do a point release of the v1 TypeScript SDK with skill support, to get it out ahead of the next spec release (with the understanding it'll need redoing for the v2 SDK).
How an SDK exposes dynamic pluggability for extensions is unspecified from MCP's side (mirroring how OAuth-style extensions just live in the SDKs). The group also discussed a follow-up SEP giving tier-one SDKs normative extensibility requirements (e.g.
_metamust be extensible; servers must be able to register new RPC methods). The group agreed on starting with non-normative guidance now that there are a few extensions to generalize from.2. Reserving the
io.modelcontextprotocolnamespace in skill frontmattermetadataOne concrete ask for the spec is to explicitly reserve the
io.modelcontextprotocolreverse-domain namespace inside the skill frontmattermetadatablock (the extension point owned by the Agent Skills spec), so that nobody usesio.modelcontextprotocol/*keys ad hoc before they go through the SEP. Other reverse domains remain fair game; the point is to keep a clean migration path once any of these keys are formalized.The motivating case is the primitive-grouping / tools experiment: letting a skill declare the tools that are relevant to it so they can be loaded on demand. It's working in early experiments, and Sam Morrow (GitHub) has a separate implementation using an
io.mcp/tools-style key for the same idea. That work is adjacent exploration, not part of SEP-2640, which is exactly why the group wants it parked under a reserved-but-unformalized namespace rather than colliding with eventual standard keys.A clarifying point landed during discussion: skills are surfaced as a capability, but individual frontmatter
metadatakeys don't bubble up to MCP capabilities, so capabilities can't carry this signal — it has to live in the metadata. Peter confirmed the Agent Skillsmetadatafield is explicitly for client-defined additional properties, so this fits. Sam offered to open a small wording suggestion.3. Observability and skill-usage reporting
Aditya raised observability as something not strictly blocking but core to how skills get used in practice: can MCP hosts report back to the skill provider whether a skill is actually being used, and surface drift — e.g. a skill that silently stopped triggering overnight, the kind of thing teams already catch with tools. The hard constraint everyone returned to is that there's no way to enforce reporting at the harness level: a server or SDK can emit metrics, but no harness is obligated to, and "what counts as a read" is genuinely ambiguous once harnesses grep across many installed skills. Sriram and Sam both leaned toward treating this as a client-side observability concern (emit appropriate metrics when the harness walks skills) rather than something the skills extension itself mandates — it's the same shape as any cacheable resource read.
Ola noted a related thread she's been prototyping: using interceptors for skill attribution — a standardized passive hook between the skill and its invocation (at the client or gateway layer) that inspects which primitives pass through and what can be parsed around them, rather than just counting invocations. Sriram pushed on the semantics: skill execution is closer to the agent inlining a macro than a discrete invocation, so an interceptor can't reliably tell that the body was actually run; Ola framed her approach as deliberately more passive than that. Sam drew the parallel to citation enforcement, where a system instruction asks the chain to carry an attribution token and the call is blocked if it's missing. The model can still emit a malformed one, and you can go as deep as content-matching depending on how much you care.
Aditya grounded the use case: Glean holds a company's skills and knows its users, so when an admin pipes those skills into another system (e.g. Claude), confirming the skills are being used is both a governance and an observability need. It also matters whether usage is implicit or user-invoked (mostly implicit, as skill selection becomes automatic over time). Glean currently captures this via a plugin with hooks, which feels like a roundabout path to something that should work out of the box. Consensus: this is harness-dependent and broader than skills, so Aditya will move the thread to the observability group; the skills side can still make sure it exposes the right primitives (MCP resource URI, skill URI) so any skill-respecting harness has something to attribute against. Not a blocker for landing SEP-2640.
4. Non-sandbox agents and filtering out code-execution skills
Sam Kantor and Ian joined from Slack, where they've been building Slackbot's AI agent. Their first question: Slackbot is not a code-execution agent — no sandbox — and they wanted to know whether other agents share that constraint and how the SEP accommodates it. They can unzip an archive into something and simulate execution, but it's rough; earlier skill approaches that leaned on tool-calling were simpler for non-coding agents.
Sam Kothari shared experience with implementing the SEP for a traditional, non-sandbox agent: you technically only need a resource-read capability, which can be modeled as an internal harness tool — not every server has to expose a read tool. The harness consumes the skill index on startup and applies whatever selection strategy it likes (LLM- or keyword-based), then loads the skill. The key for non-coding agents is filtering to skills that don't require code execution. Today that's an internal convention at Bloomberg — exactly the kind of thing the reserved namespace (§2) or eventually the Agent Skills spec could potentially standardize. Concretely, Bloomberg keys off the Agent Skills directory model: if a skill ships a
scripts/folder (where the spec says scripts live), they treat it as requiring execution and filter it out; they also restrict served file types to an allowlist (plain text and similar pass, others are blocked).On whether this belongs in the SEP: Peter's view is that compatibility/execution signaling should be solved in the Agent Skills spec itself, since it applies equally to skills downloaded from GitHub or anywhere else — solving it MCP-only would just force it to be solved again elsewhere, differently. Jonathan clarified the current state: the frontmatter
compatibilityfield exists but is free text, meant to be evaluated by an LLM against its environment rather than checked deterministically. This is a deliberate "AI-maximalist" choice, because enumerating every possible qualification isn't tractable.5. Discovery for very large or unenumerable catalogs
Slack's second question was about scale: how should a client discover skills over MCP when a server has a very large, generated catalog and doesn't want to serve a full index (a partner expects tens of thousands of skills)? Sam outlined the two index-side options already available: run the catalog through an LLM once against compatibility criteria (slow first load, fast and context-cheap after) or filter the index programmatically on metadata keys / keywords. Slack's scenario is the case where there's effectively no enumerable index to filter.
The discussion converged on treating this as the general problem of navigating a large file tree, not a skills-specific one — which is why the extension recently added directory-style reads. Several threads:
Sense of the room: don't block on this. Add non-normative recommendations to the SEP (e.g. "a server with a large catalog SHOULD offer a skill-search resource/tool endpoint"; arbitrary semantics, but a known shape harnesses can rely on) rather than mandating structure. Slack confirmed the SEP fits their use case in some form, which was the bar they came for.
The group also discussed spelling out how clients without a filesystem should implement skills — pointing at a standard load-resource tool, even abstractly, or to a worked example. Ola noted that detailed implementation guides and examples should most likely live outside the SEP so the normative requirements don't get too buried.
6. Go SDK reference implementation demo (Sriram)
Sriram walked through an early, experimental Go SDK serving three example skills. The server runs as a normal MCP server and works across legacy, stateful, and stateless transports. Discovery flow: list returns the skills;
skill://index.jsonis autogenerated by walking the skill tree (a custom index can be supplied instead), and a digest is generated alongside it to detect changes. Reading the index and reading individual skills is straightforward, and relative paths resolve against the skill folder. Archives aren't in the demo but are wired the same way — a zip is sent out like any other resource. He raised a design question that Slack's earlier point sharpened: if a server offers the same skill as a normal directory, a zip, and a tarball under three prefixes, is that one index or several? — a case where hierarchical indexes could help. Observability hooks exist but his Grafana stack isn't wired up yet; that's slated for a future demo.7. Index digests, caching, and the coming changes
The group raised a caching question that dovetailed with the digest work: there's a useful per-skill digest for avoiding rehydration — should the whole index have an equivalent, so a server can tell a client "the index hasn't changed" (ETag-style) and the client can skip refetching? The group worked through where index freshness would even live: there's no
resources/metadatacall yet, so today you'd read last-modified offresources/list(works, but inelegant), or pass it in_metaon the response (non-standard). Peter's framing: the real goal is to avoid downloading the index at all when it's unchanged. A server-sent change notification could help and may be worth a notification type in the SEP, though it overlaps with the separate MCP events work and Slack agreed it's somewhat out of scope and driven by their very-large-index case.Ola's closing point: a lot of things are about to shift with the next spec: TTLs / caching semantics on resources, the directory-listing work this WG has been prototyping, and changes to protocol behavior generally — so once the index shape is finalized the group should revisit how those changes flow through. Aditya noted the same invalidation question applies to tools (how does a client know a tool was updated), suggesting a more central mechanism, and Ola pointed to the primitive-grouping group as the place where related caching/discovery conversations (including the cost implications of invalidating context) are happening, for anyone who wants to join.
Next steps
io.modelcontextprotocolwithin the skill frontmattermetadatablock (Sam to open a small wording suggestion); keep the primitive-grouping / tools experiment parked there until formalized. (Edit: Sam added comment here)Discord channel link for follow-ups and discussion: #skills-over-mcp-wg
Meeting notes prepared by Claude from an auto-generated transcript by Gemini and manually edited.
Beta Was this translation helpful? Give feedback.
All reactions