Runtime tool drift detection — gap beyond admission-time security? #2826
Replies: 14 comments 1 reply
This comment was marked as spam.
This comment was marked as spam.
-
|
This is the right gap to name. Admission-time checks answer "is this the server I trusted," they cannot answer "did this tool do what it claimed on this call." That second question is a runtime one, and it does not have to be left to opaque monitoring. The approach I have been working on (open SEP-2828) is a server-side signed record emitted per tool call: the inputs, the outcome, and the effects, hash-chained and bound back to the decision that authorized the call. Drift stops being "hope a monitor flagged it" and becomes evidence: the record for call N shows an externality or data class that was not in the approved shape, and any third party can verify that offline without trusting the server. So my read on your question: the spec can define the admission-time trust root, and a thin, verifiable per-call record is the runtime half that catches drift after admission. The record format is public if it is useful here. |
Beta Was this translation helpful? Give feedback.
-
|
This is the right framing. Admission answers "is this the server I trusted," and your per-call decision/outcome split answers "what did the governing system decide, and what actually happened" those are genuinely different surfaces and the bind-back to the request attestation is the part that makes it verifiable rather than hopeful. Where I've been working is the detection half that sits just upstream of your decision record. My focus is post-approval capability-surface drift: a tool admitted with one declared shape (schema, annotations, declared effects, data-access, external-reach) that changes after approval. I baseline the approved surface, diff the current surface on each call, classify the change, and emit a recomputable record approved-surface hash, current-surface hash, the classified delta, that a third party can re-derive without trusting me. The reason your SEP caught my attention is that your decision record carries the decision and a risk basis (reason, riskScore, policyId) but stays open about what produces that basis. Drift detection is one thing that produces it, when a tool's surface drifts post-approval, that's a concrete, verifiable reason to decide escalate or block, with evidence behind it rather than an opaque score. So my drift-evidence looks like it could be the kind of thing that populates a decision record's basis, rather than a parallel record. The thing I'd want to avoid is fragmentation. There are a few runtime-evidence efforts converging on the same shape now yours, and the audit-record SEP I contributed the runtime-security profile to. If drift-detection evidence had a defined place in the decision-record basis instead of each detector inventing its own, that's a stronger story for everyone verifying these offline. Yes to the format being public I'd find it useful to map how a drift-triggered decision expresses in your decision-record shape. |
Beta Was this translation helpful? Give feedback.
-
|
This is the shape I'd want too: drift evidence as the basis behind a decision, not a second parallel record competing to be the source of truth. In the decision record the basis is deliberately left open. It carries the decision and the risk basis (reason, riskScore, policyId) but does not fix what produced that basis. What it does not yet have is a defined place to point at external evidence, and that is the slot worth defining for your work: a content-addressed reference, so the decision record names your drift record by hash rather than re-describing it. You emit your record (approved-surface hash, current-surface hash, classified delta), it gets a content address, and the decision's reason cites that address under a policyId. I'd propose adding that reference to the basis rather than leaving each detector to invent its own envelope. A third party with no trust in either of us can then re-derive the approved-surface and current-surface hashes from the declared shapes, confirm your delta classification, confirm the decision record's basis reference resolves to your drift record, and confirm the decision (escalate or block) binds back to the same request attestation. Two independent records, one bind, both recomputable offline. On fragmentation, I agree that is the risk worth designing against. The win is not one detector winning, it is a single defined evidence-reference slot in the decision basis, so a drift detector, a policy engine, and whatever comes next all populate the same place instead of each shipping its own envelope. If it is useful I will write up the mapping from your drift evidence to that evidence reference against the public vectors, and your detector plus the Vaara verifier becomes a two-implementation check that the reference resolves by recomputation on both sides. |
Beta Was this translation helpful? Give feedback.
-
|
This is exactly the shape, drift evidence as the cited basis behind a decision, content-addressed, not a competing source of truth. Yes to the mapping write up against the public vectorsyou're better placed to define how it slots into the decision basis, and I'll align my record's canonicalization so it resolves cleanly. On fragmentation, agreed and you said it better than I did, a defined evidence-reference slot that a drift detector, a policy engine, and whatever's next all populate, instead of each shipping its own envelope. That's the outcome worth building toward. |
Beta Was this translation helpful? Give feedback.
-
|
That locks it. evidenceRef is the slot: content-addressed, your drift record stays its own source of truth, the decision basis cites it by address. I wrote the mapping up against the public vectors; it is in the open PR at vaaraio/vaara#252, with a worked example that takes a drift record (approved-surface hash, current-surface hash, classified delta, policyId) to an evidenceRef whose content address reproduces from the document itself. Two things in it are the contract both sides hold:
The only thing your side has to align is that the drift record is a JCS-canonicalizable JSON object, so the cited address recomputes from your bytes alone. Field names, delta classification, policyId all stay yours. The check passes when you emit a record and compute its address, I cite it in a signed decision, and a third party canonicalizes your bytes under JCS to the same address (citation resolves) and verifies my signature (the citation is the one signed). Send me a real drift record from the detector, or its exact field shape, and I will swap it into the worked example for the placeholder and we run the recompute on both sides against that. Release waits on that check; the field exists to prove two implementations agree on the binding, so it ships as "two implementations recompute the same address," not a field on its own. |
Beta Was this translation helpful? Give feedback.
-
|
Dug into aligning my emitter to interlock.drift-record/v0 and hit two things worth flagging before we run the recompute. First: the worked example's digest doesn't reproduce as published, the surface hashes in it are elided ("sha256:aaaa...", "sha256:bbbb..."), not the real 64-hex bytes the d303af92 address was computed over. The JCS+sha256 recipe applies cleanly to a v0-shaped record on my side, but I can't validate the byte-clean recompute until I have the un-elided vector. Could you publish the real record with full hashes? Second, classifiedDelta: my detector computes the structured pieces (changed field, from/to, kind) transiently but collapses them to a classification string plus prose before persisting, so producing a true structured classifiedDelta is an engine change on my side, not just a record-format rename. Before I do that - does the binding/recompute actually need classifiedDelta as structured data, or does the evidenceRef binding hold on the record envelope (surface hashes, schema, policyId) with the delta carried as my existing classification? Once I have the real vector and know whether structured classifiedDelta is in scope, I'll align and we run the recompute. |
Beta Was this translation helpful? Give feedback.
-
|
Both points are fair, and the second is the more useful answer. On the vector: you were right, the worked example shipped placeholder surface hashes, so it could not be reproduced past the drift record. That is fixed. The vectors now compute approvedSurfaceHash and currentSurfaceHash from two real tool surfaces that ship alongside them (approved_surface.json, current_surface.json), so the whole chain recomputes from published bytes: surface bytes to surface hash to drift record to the evidenceRef address. The address moved to sha256:8e22e733c3526ca8e7987ab2355f18e66752f29ac629dbd41c9b80650822a56b. tests/vectors/evidence_ref_v0/ carries the records and a standalone checker (stdlib + cryptography + rfc8785, no Vaara import) that recomputes it from a clean checkout. On classifiedDelta: the binding does not need it structured. The evidenceRef digest is sha256 over the JCS-canonical bytes of the whole drift record, so it covers whatever you put in that field, including a single classification string. Vaara never parses the field. It only recomputes the address over the bytes you emitted. So your existing classification string is fine. The recompute holds on the record envelope (surface hashes, schema, policyId, and whatever delta representation you chose), and each side computes the address over its own emitted bytes. The only things both implementations must agree on are the canonicalization (JCS) and that the address matches. One consequence worth naming: if you run the checker against your own detector's record, the address will not match mine, and that is correct. The contract is per-record. Your decision cites the address of your record, not of my example. |
Beta Was this translation helpful? Give feedback.
-
|
This resolves both, thanks for shipping the real surface bytes, and good to have the full chain recompute from published bytes now. And the classifiedDelta clarification is exactly what I was hoping: the digest covering the whole record means my existing classification string works as-is, no engine change needed. The per-record contract makes sense, each side cites its own record's address, same canonicalization. I'll run your checker in evidence_ref_v0 against a real record from my detector and confirm the recompute holds on my side, then we've got the two-implementation check. Will follow up with the result. |
Beta Was this translation helpful? Give feedback.
-
|
Ran your checker against a real record from my emitter - the full production path, genuine classifier output (severity high, action deny, external-reach finding), not a hand-authored record.
So the per-record contract works across both implementations - my address, recomputed by your unmodified checker, no shared code. G2's done from my side. One thing worth aligning before you ship, not a recompute failure: my envelope stamps canonicalization: "json/jcs-rfc8785", but your resolver keys on the literal "JCS". The bytes and hash are identical - it's the same RFC 8785 JCS - but the label strings differ, so a Either way the cryptographic recompute is sound and consistent. Let me know on the label and I'll align my side. |
Beta Was this translation helpful? Give feedback.
-
|
This is the result that matters: a real production record from your own emitter, not a hand-authored fixture, recomputing byte-identical at 6/6, exit 0. Two independent emitters now agree on the per-record contract through nothing but published bytes. Thanks for running it against the genuine classifier path; that is the property I was after, and it held. |
Beta Was this translation helpful? Give feedback.
-
|
Glad it held, and agreed, the genuine-classifier-path record recomputing byte-identical is the property that matters. Two independent emitters agreeing through nothing but published bytes is exactly the interop proof. Good to have G2 closed. One small thing to fold in since it's live on the Assay side too: there are three canonicalization-label spellings across the evidenceRef work now, mine (json/jcs-rfc8785), the "JCS" string on your side, and jcs-json-v1 on Roel's. All RFC 8785. I checked my side the label is sibling metadata, outside the digest, so I can align to whatever canonical string we pick without touching any digests. Might be worth the substrate converging on one canonical label with the others as recognized aliases, rather than each pair maintaining its own set. Happy to align my emitter's stamp to whatever you and the SEP land on. Either way - G2's done, and thanks for holding the shape steady while I ran it. |
Beta Was this translation helpful? Give feedback.
-
|
Agreed, worth converging. Since the label sits outside the digest on all three sides, aligning costs nothing and spares every new implementer from guessing which spelling to match. My vote for the canonical value is jcs-rfc8785: it names the actual normative spec, RFC 8785, rather than the generic "JCS", which is ambiguous across readings. Keep JCS and jcs-json-v1 as recognized aliases so existing records still verify. The Vaara emitter will carry jcs-rfc8785 as its canonical stamp and accept the aliases on the verify side, so whatever the extension lands on there is already a reference implementation that reads all three. Your side and Roel's converging on the same set closes it. |
Beta Was this translation helpful? Give feedback.
-
|
Agreed, and jcs-rfc8785 is the stronger choice on the merits naming RFC 8785 directly beats the generic "JCS" or a version suffix. One thing worth knowing before it sets: there's another independent implementer in this space (observed-effect evidence, separate axis from yours but riding the same envelope and canonicalization) who's already pinned a v0 with jcs-json-v1 as their canonical. Same convergence instinct, different string. So there are genuinely two "canonicals" forming on the same substrate right now. I don't have a hard stake in which wins it's outside the digest on my side, so I'll stamp whatever the canonical lands on and keep the rest as aliases. But since the whole point was one label so new implementers don't guess, it'd be worth the canonical being agreed across implementations, not just within each extension. Happy to connect you with the other implementer if useful, or you all converge on the thread, either way I'll align to the result. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Working on runtime MCP security and wanted to surface a gap for discussion.
Admission-time checks (trust roots, signed assertions, server attestation) verify a server when you connect. But they don't catch tools that change behavior AFTER admission — a read-only tool that later adds export effects, PII data classes, or escalates externality from internal to external. The server identity is unchanged, so the admission check still passes.
Is post-admission tool drift considered in-scope for the spec / Security IG, or is it expected to live in a runtime monitoring layer outside the protocol?
I've been building an open-source implementation focused on this (continuous baseline + drift detection with severity-based quarantine) and would be happy to share findings or contribute if it's relevant to the group's direction.
Beta Was this translation helpful? Give feedback.
All reactions