Runtime tool drift detection — gap beyond admission-time security? #2826

MaazAhmed47 · 2026-05-31T08:36:02Z

MaazAhmed47
May 31, 2026

Working on runtime MCP security and wanted to surface a gap for discussion.

Admission-time checks (trust roots, signed assertions, server attestation) verify a server when you connect. But they don't catch tools that change behavior AFTER admission — a read-only tool that later adds export effects, PII data classes, or escalates externality from internal to external. The server identity is unchanged, so the admission check still passes.

Is post-admission tool drift considered in-scope for the spec / Security IG, or is it expected to live in a runtime monitoring layer outside the protocol?

I've been building an open-source implementation focused on this (continuous baseline + drift detection with severity-based quarantine) and would be happy to share findings or contribute if it's relevant to the group's direction.

MaazAhmed47 · 2026-05-31T10:27:18Z

MaazAhmed47 May 31, 2026
Author

This is the sharpest framing of the problem I've seen — thank
you. The split you're describing maps cleanly:

Admission anchors a capability manifest (identity + declared
scope at connection time)
Each call carries a receipt that binds the executed tool to
that admitted manifest
The drift detector lives between those two points,
implementation-specific, without blocking interop

I agree post-admission drift belongs in the runtime monitoring
layer, not the spec core — but your point about the spec
defining the hook is the key insight. Without a standardized
capability manifest at admission and a verification step at
call time, every runtime monitor reinvents the binding and
they don't interoperate.

To your open question — "what does the receipt carry that lets
a runtime monitor verify this call matched the admitted
capability?" — in my implementation the baseline captures the
full declared schema (params, types, effects, data classes,
externality) at registration. On each call I diff the live
tool definition against that baseline and score severity by
kind of change: effect escalation (read_only → mutating),
new sensitive data classes, externality crossing internal →
external, new required params. The gap I keep hitting is
exactly what you name: there's no spec-level anchor, so the
"admitted manifest" is something each implementation has to
reconstruct rather than read from a standard field.

A minimal spec hook — a signed capability manifest at admission
that the runtime can re-verify per call — would let drift
detectors, receipt verifiers (like your ATB substrate), and
policy engines all bind to the same source of truth. That feels
like the interop-preserving primitive this thread is circling.

Would be curious whether the Security IG sees appetite for
standardizing that manifest+anchor, even if drift logic itself
stays implementation-specific.

vaaraio · 2026-06-14T15:55:49Z

vaaraio
Jun 14, 2026

This is the right gap to name. Admission-time checks answer "is this the server I trusted," they cannot answer "did this tool do what it claimed on this call." That second question is a runtime one, and it does not have to be left to opaque monitoring.

The approach I have been working on (open SEP-2828) is a server-side signed record emitted per tool call: the inputs, the outcome, and the effects, hash-chained and bound back to the decision that authorized the call. Drift stops being "hope a monitor flagged it" and becomes evidence: the record for call N shows an externality or data class that was not in the approved shape, and any third party can verify that offline without trusting the server.

So my read on your question: the spec can define the admission-time trust root, and a thin, verifiable per-call record is the runtime half that catches drift after admission. The record format is public if it is useful here.

0 replies

MaazAhmed47 · 2026-06-14T20:23:08Z

MaazAhmed47
Jun 14, 2026
Author

This is the right framing. Admission answers "is this the server I trusted," and your per-call decision/outcome split answers "what did the governing system decide, and what actually happened" those are genuinely different surfaces and the bind-back to the request attestation is the part that makes it verifiable rather than hopeful.

Where I've been working is the detection half that sits just upstream of your decision record. My focus is post-approval capability-surface drift: a tool admitted with one declared shape (schema, annotations, declared effects, data-access, external-reach) that changes after approval. I baseline the approved surface, diff the current surface on each call, classify the change, and emit a recomputable record approved-surface hash, current-surface hash, the classified delta, that a third party can re-derive without trusting me.

The reason your SEP caught my attention is that your decision record carries the decision and a risk basis (reason, riskScore, policyId) but stays open about what produces that basis. Drift detection is one thing that produces it, when a tool's surface drifts post-approval, that's a concrete, verifiable reason to decide escalate or block, with evidence behind it rather than an opaque score. So my drift-evidence looks like it could be the kind of thing that populates a decision record's basis, rather than a parallel record.

The thing I'd want to avoid is fragmentation. There are a few runtime-evidence efforts converging on the same shape now yours, and the audit-record SEP I contributed the runtime-security profile to. If drift-detection evidence had a defined place in the decision-record basis instead of each detector inventing its own, that's a stronger story for everyone verifying these offline. Yes to the format being public I'd find it useful to map how a drift-triggered decision expresses in your decision-record shape.

0 replies

vaaraio · 2026-06-15T19:07:27Z

vaaraio
Jun 15, 2026

This is the shape I'd want too: drift evidence as the basis behind a decision, not a second parallel record competing to be the source of truth.

In the decision record the basis is deliberately left open. It carries the decision and the risk basis (reason, riskScore, policyId) but does not fix what produced that basis. What it does not yet have is a defined place to point at external evidence, and that is the slot worth defining for your work: a content-addressed reference, so the decision record names your drift record by hash rather than re-describing it. You emit your record (approved-surface hash, current-surface hash, classified delta), it gets a content address, and the decision's reason cites that address under a policyId. I'd propose adding that reference to the basis rather than leaving each detector to invent its own envelope.

A third party with no trust in either of us can then re-derive the approved-surface and current-surface hashes from the declared shapes, confirm your delta classification, confirm the decision record's basis reference resolves to your drift record, and confirm the decision (escalate or block) binds back to the same request attestation. Two independent records, one bind, both recomputable offline.

On fragmentation, I agree that is the risk worth designing against. The win is not one detector winning, it is a single defined evidence-reference slot in the decision basis, so a drift detector, a policy engine, and whatever comes next all populate the same place instead of each shipping its own envelope. If it is useful I will write up the mapping from your drift evidence to that evidence reference against the public vectors, and your detector plus the Vaara verifier becomes a two-implementation check that the reference resolves by recomputation on both sides.

0 replies

MaazAhmed47 · 2026-06-15T19:29:46Z

MaazAhmed47
Jun 15, 2026
Author

This is exactly the shape, drift evidence as the cited basis behind a decision, content-addressed, not a competing source of truth.
The evidenceRef slot in the decision basis is the right seam:
my detector emits the record (approved-surface hash, current-surface hash, classified delta, under a policyId), it gets a content address, and your decision record's reason cites that address. Two records, one bind, both recomputable offline by someone who trusts neither of us.

Yes to the mapping write up against the public vectorsyou're better placed to define how it slots into the decision basis, and I'll align my record's canonicalization so it resolves cleanly.
And yes to the two-implementation check:
my detector emitting, the Vaara verifier recomputing, the reference resolving on both sides. That's the part I care most about a recompute that holds across two independent implementations is the only thing that proves the reference is a property of the rule, not either codebase.

On fragmentation, agreed and you said it better than I did, a defined evidence-reference slot that a drift detector, a policy engine, and whatever's next all populate, instead of each shipping its own envelope. That's the outcome worth building toward.

0 replies

vaaraio · 2026-06-15T22:39:06Z

vaaraio
Jun 15, 2026

That locks it. evidenceRef is the slot: content-addressed, your drift record stays its own source of truth, the decision basis cites it by address.

I wrote the mapping up against the public vectors; it is in the open PR at vaaraio/vaara#252, with a worked example that takes a drift record (approved-surface hash, current-surface hash, classified delta, policyId) to an evidenceRef whose content address reproduces from the document itself. Two things in it are the contract both sides hold:

Canonicalization: RFC 8785 (JCS) over the record bytes, then sha256, as sha256:. That is the same canonicalization the SEP-2828 records already sign over, so a verifier that can check a decision signature already has the bytes rule. evidenceRef carries canonicalization as an explicit field, so the rule is named rather than assumed.
Schema: your record declares its own schema and version (placeholder interlock.drift-record/v0 in the example). The decision cites that schema so a verifier knows how to read the bytes once they resolve.

The only thing your side has to align is that the drift record is a JCS-canonicalizable JSON object, so the cited address recomputes from your bytes alone. Field names, delta classification, policyId all stay yours.

The check passes when you emit a record and compute its address, I cite it in a signed decision, and a third party canonicalizes your bytes under JCS to the same address (citation resolves) and verifies my signature (the citation is the one signed). Send me a real drift record from the detector, or its exact field shape, and I will swap it into the worked example for the placeholder and we run the recompute on both sides against that.

Release waits on that check; the field exists to prove two implementations agree on the binding, so it ships as "two implementations recompute the same address," not a field on its own.

0 replies

MaazAhmed47 · 2026-06-16T20:39:14Z

MaazAhmed47
Jun 16, 2026
Author

Dug into aligning my emitter to interlock.drift-record/v0 and hit two things worth flagging before we run the recompute.

First: the worked example's digest doesn't reproduce as published, the surface hashes in it are elided ("sha256:aaaa...", "sha256:bbbb..."), not the real 64-hex bytes the d303af92 address was computed over. The JCS+sha256 recipe applies cleanly to a v0-shaped record on my side, but I can't validate the byte-clean recompute until I have the un-elided vector. Could you publish the real record with full hashes?

Second, classifiedDelta: my detector computes the structured pieces (changed field, from/to, kind) transiently but collapses them to a classification string plus prose before persisting, so producing a true structured classifiedDelta is an engine change on my side, not just a record-format rename. Before I do that - does the binding/recompute actually need classifiedDelta as structured data, or does the evidenceRef binding hold on the record envelope (surface hashes, schema, policyId) with the delta carried as my existing classification?

Once I have the real vector and know whether structured classifiedDelta is in scope, I'll align and we run the recompute.

0 replies

vaaraio · 2026-06-17T04:50:33Z

vaaraio
Jun 17, 2026

Both points are fair, and the second is the more useful answer.

On the vector: you were right, the worked example shipped placeholder surface hashes, so it could not be reproduced past the drift record. That is fixed. The vectors now compute approvedSurfaceHash and currentSurfaceHash from two real tool surfaces that ship alongside them (approved_surface.json, current_surface.json), so the whole chain recomputes from published bytes: surface bytes to surface hash to drift record to the evidenceRef address. The address moved to sha256:8e22e733c3526ca8e7987ab2355f18e66752f29ac629dbd41c9b80650822a56b. tests/vectors/evidence_ref_v0/ carries the records and a standalone checker (stdlib + cryptography + rfc8785, no Vaara import) that recomputes it from a clean checkout.

On classifiedDelta: the binding does not need it structured. The evidenceRef digest is sha256 over the JCS-canonical bytes of the whole drift record, so it covers whatever you put in that field, including a single classification string. Vaara never parses the field. It only recomputes the address over the bytes you emitted. So your existing classification string is fine. The recompute holds on the record envelope (surface hashes, schema, policyId, and whatever delta representation you chose), and each side computes the address over its own emitted bytes. The only things both implementations must agree on are the canonicalization (JCS) and that the address matches.

One consequence worth naming: if you run the checker against your own detector's record, the address will not match mine, and that is correct. The contract is per-record. Your decision cites the address of your record, not of my example.

0 replies

MaazAhmed47 · 2026-06-17T04:55:47Z

MaazAhmed47
Jun 17, 2026
Author

This resolves both, thanks for shipping the real surface bytes, and good to have the full chain recompute from published bytes now. And the classifiedDelta clarification is exactly what I was hoping: the digest covering the whole record means my existing classification string works as-is, no engine change needed. The per-record contract makes sense, each side cites its own record's address, same canonicalization. I'll run your checker in evidence_ref_v0 against a real record from my detector and confirm the recompute holds on my side, then we've got the two-implementation check. Will follow up with the result.

0 replies

MaazAhmed47 · 2026-06-18T15:51:08Z

MaazAhmed47
Jun 18, 2026
Author

Ran your checker against a real record from my emitter - the full production path, genuine classifier output (severity high, action deny, external-reach finding), not a hand-authored record.
It holds:

sha256(JCS(record)) recomputes byte-identical to my emitted digest
(sha256:b11219346e…f1567)
evidence_ref_resolves passes
full main() over a case carrying my record: 6/6, exit 0

So the per-record contract works across both implementations - my address, recomputed by your unmodified checker, no shared code. G2's done from my side.

One thing worth aligning before you ship, not a recompute failure: my envelope stamps canonicalization: "json/jcs-rfc8785", but your resolver keys on the literal "JCS". The bytes and hash are identical - it's the same RFC 8785 JCS - but the label strings differ, so a
Vaara decision citing my envelope's label verbatim would fail closed on the unrecognized name. Worth a one-line agreement on the registry string: do we standardize on "JCS", or register both as aliases? Happy to change my stamp to match whatever you land on - it's a label, not a behavior change.

Either way the cryptographic recompute is sound and consistent. Let me know on the label and I'll align my side.

0 replies

vaaraio · 2026-06-18T17:37:42Z

vaaraio
Jun 18, 2026

This is the result that matters: a real production record from your own emitter, not a hand-authored fixture, recomputing byte-identical at 6/6, exit 0. Two independent emitters now agree on the per-record contract through nothing but published bytes. Thanks for running it against the genuine classifier path; that is the property I was after, and it held.

0 replies

MaazAhmed47 · 2026-06-18T19:34:43Z

MaazAhmed47
Jun 18, 2026
Author

Glad it held, and agreed, the genuine-classifier-path record recomputing byte-identical is the property that matters. Two independent emitters agreeing through nothing but published bytes is exactly the interop proof. Good to have G2 closed.

One small thing to fold in since it's live on the Assay side too: there are three canonicalization-label spellings across the evidenceRef work now, mine (json/jcs-rfc8785), the "JCS" string on your side, and jcs-json-v1 on Roel's. All RFC 8785. I checked my side the label is sibling metadata, outside the digest, so I can align to whatever canonical string we pick without touching any digests. Might be worth the substrate converging on one canonical label with the others as recognized aliases, rather than each pair maintaining its own set. Happy to align my emitter's stamp to whatever you and the SEP land on.

Either way - G2's done, and thanks for holding the shape steady while I ran it.

0 replies

vaaraio · 2026-06-19T02:41:05Z

vaaraio
Jun 19, 2026

Agreed, worth converging. Since the label sits outside the digest on all three sides, aligning costs nothing and spares every new implementer from guessing which spelling to match.

My vote for the canonical value is jcs-rfc8785: it names the actual normative spec, RFC 8785, rather than the generic "JCS", which is ambiguous across readings. Keep JCS and jcs-json-v1 as recognized aliases so existing records still verify.

The Vaara emitter will carry jcs-rfc8785 as its canonical stamp and accept the aliases on the verify side, so whatever the extension lands on there is already a reference implementation that reads all three. Your side and Roel's converging on the same set closes it.

0 replies

MaazAhmed47 · 2026-06-19T02:58:04Z

MaazAhmed47
Jun 19, 2026
Author

Agreed, and jcs-rfc8785 is the stronger choice on the merits naming RFC 8785 directly beats the generic "JCS" or a version suffix.

One thing worth knowing before it sets: there's another independent implementer in this space (observed-effect evidence, separate axis from yours but riding the same envelope and canonicalization) who's already pinned a v0 with jcs-json-v1 as their canonical. Same convergence instinct, different string. So there are genuinely two "canonicals" forming on the same substrate right now.

I don't have a hard stake in which wins it's outside the digest on my side, so I'll stamp whatever the canonical lands on and keep the rest as aliases. But since the whole point was one label so new implementers don't guess, it'd be worth the canonical being agreed across implementations, not just within each extension. Happy to connect you with the other implementer if useful, or you all converge on the thread, either way I'll align to the result.

0 replies

Runtime tool drift detection — gap beyond admission-time security? #2826

Uh oh!

MaazAhmed47 May 31, 2026

Replies: 14 comments · 1 reply

This comment was marked as spam.

Uh oh!

MaazAhmed47 May 31, 2026 Author

Uh oh!

vaaraio Jun 14, 2026

Uh oh!

MaazAhmed47 Jun 14, 2026 Author

Uh oh!

Uh oh!

vaaraio Jun 15, 2026

Uh oh!

MaazAhmed47 Jun 15, 2026 Author

Uh oh!

vaaraio Jun 15, 2026

Uh oh!

MaazAhmed47 Jun 16, 2026 Author

Uh oh!

vaaraio Jun 17, 2026

Uh oh!

MaazAhmed47 Jun 17, 2026 Author

Uh oh!

MaazAhmed47 Jun 18, 2026 Author

Uh oh!

vaaraio Jun 18, 2026

Uh oh!

MaazAhmed47 Jun 18, 2026 Author

Uh oh!

vaaraio Jun 19, 2026

Uh oh!

MaazAhmed47 Jun 19, 2026 Author

MaazAhmed47
May 31, 2026

Replies: 14 comments 1 reply

MaazAhmed47 May 31, 2026
Author

vaaraio
Jun 14, 2026

MaazAhmed47
Jun 14, 2026
Author

vaaraio
Jun 15, 2026

MaazAhmed47
Jun 15, 2026
Author

vaaraio
Jun 15, 2026

MaazAhmed47
Jun 16, 2026
Author

vaaraio
Jun 17, 2026

MaazAhmed47
Jun 17, 2026
Author

MaazAhmed47
Jun 18, 2026
Author

vaaraio
Jun 18, 2026

MaazAhmed47
Jun 18, 2026
Author

vaaraio
Jun 19, 2026

MaazAhmed47
Jun 19, 2026
Author