title: Safety description: MCP risk-model and sandbox safety signals - L0–L5 risk distribution, tool-poisoning hits, content-hash tamper rejects, HITL approval rate, and sandbox guard blocks, all from lifetime counters.

Purpose - the security view across the MCP risk model and the JS tool sandbox. It rolls up every safety signal the runtime emits - per-call risk levels, tool-description poisoning scans, content-hash tamper detection, human-in-the-loop approvals, and sandbox guard rejections - into one dashboard, so an operator can answer “is the agent being fed anything dangerous, and did the guards catch it?” at a glance.
Two streams, both lifetime (not windowed):
SystemMetricsSnapshot - saip.risk.signal (grouped by signal type), saip.tool.risk (grouped by composed level L0–L5), mcp.hitl.decision (grouped by outcome), and sandbox.guard.blocked (grouped by reason). The saip.* counters are emitted by McpRiskSignalLogger (the risk-signal sink) and McpToolObservationFilter.McpRiskEventRingBuffer - the most recent risk events (server/tool risk computed, floor override, hash mismatch, composition lifecycle, poisoning hit) with their type and summary.See MCP Server Safety for the risk model these signals come from.
Shares the Observability global settings, but the KPI cards and bar charts are lifetime counters - the time-window preset does not scope them. Only the Recent risk events timeline reflects recency (most recent 50).
| Card | Shows | Source |
|---|---|---|
| Risk signals | Σ all saip.risk.signal events (lifetime) |
saip.risk.signal counter |
| Tamper rejects | hash-ledger-mismatch - a default/exposed tool’s content hash changed since first seen (TOFU) |
saip.risk.signal{type=hash-ledger-mismatch} |
| Poisoning hits | poisoning-hit - a tool description/schema matched a prompt-injection pattern |
saip.risk.signal{type=poisoning-hit} |
| Floor overrides | floor-override-triggered - a risk floor rule forced a higher level |
saip.risk.signal{type=floor-override-triggered} |
| HITL approval rate | % approved of all human-in-the-loop decisions | mcp.hitl.decision counter |
| Sandbox guard blocks | Σ sandbox.guard.blocked (SSRF + filesystem policy rejections) |
sandbox.guard.blocked counter |
| Chart | Type | Reading |
|---|---|---|
| Risk signals by type | Horizontal bar | saip.risk.signal grouped by type - server-risk-computed, tool-publish-risk-computed, floor-override-triggered, hash-ledger-mismatch, composition-lifecycle, poisoning-hit |
| Risk level distribution | Horizontal bar, L0→L5 in order | Final composed risk level of each executed MCP tool call (saip.tool.risk). L0 verified · L1 safe · L2 low · L3 moderate · L4 high · L5 critical |
| HITL decisions | Horizontal bar | mcp.hitl.decision outcomes (chat-side + MCP-server-side): approved / declined / denied / elicit-failed |
| Sandbox guard blocks | Horizontal bar | sandbox.guard.blocked by reason: host-not-in-allowlist, private-ip, too-many-redirects, body-too-large, … |
Recent risk events - a scrollable timeline of the latest events from McpRiskEventRingBuffer, each row showing time, a type badge (warn-tinted for failures), and a one-line summary. Populated as MCP servers and tools are registered, exposed, composed, or fail an integrity/poisoning check.
Sandbox guard blocks countermcp.hitl.decision counter