feat: safety_check hook + safer_shell builtin by melmennaoui · Pull Request #3273 · docker/docker-agent

melmennaoui · 2026-06-26T14:40:13Z

Summary

New safety_check hook event fires before Decide() so its deny/ask verdicts preempt --yolo and permission allow-rules. allow is advisory; the pipeline still runs the rest of the chain.
New safer_shell builtin classifies shell commands against an embedded destructive/safe taxonomy. Destructive matches force confirmation with a blast_radius classification; ~70 known-safe reads (ls, git status, docker ps, kubectl get, …) flow through silently; everything else asks with blast_radius=unknown. Compound shell (a && b, a; b, a | b) never safe-matches.
safer: true on a shell toolset auto-injects the builtin via ApplyAgentDefaults. The TUI confirmation dialog renders a composed warning block (⚠ Destructive command — high blast radius + matched reason) when blast_radius is present, instead of dumping raw key/value pairs. Hook output uses feat: add key/value metadata to tool-call confirmation events #3256's existing Metadata map — no bespoke types.

Test plan

task lint
go test ./pkg/hooks/... ./pkg/runtime/... ./pkg/tools/... ./pkg/agent/... ./pkg/config/... ./pkg/tui/dialog/...
Manual: docker-agent run examples/shell_safer.yaml with rm -rf /tmp/toto → warning block + forced confirmation; git status → no prompt
CI green

Introduces a pre-Decide() hook event whose verdicts can preempt --yolo and permission allow-rules, plus a builtin that classifies shell commands against an embedded destructive/safe taxonomy. The event piggy-backs on the existing hook Metadata facility (added in #3256) to surface structured context (blast_radius, category, reason) to the tool-call confirmation event — no bespoke types. Event: - hooks.EventSafetyCheck constant. Fires once per tool call BEFORE the deterministic approval pipeline (--yolo, permission patterns, pre_tool_use). Its verdicts (Deny / Ask / Allow / no-opinion) are the only ones that can override auto-approval rules. - HooksConfig.SafetyCheck []HookMatcherConfig (tool-matched, like pre_tool_use); compileEvents + IsEmpty updated. - aggregate() handles EventSafetyCheck: most-restrictive Decision wins (Deny > Ask > Allow), Deny flips Allowed=false to short-circuit the pipeline, Allow is intentionally advisory (the call still flows through Decide() / pre_tool_use), Metadata merged across hooks via last-writer-wins per key. - failClosed(EventSafetyCheck) returns true so a hook crash denies the call rather than silently allowing it through, matching the pre_tool_use posture. Toolset YAML knob: - Toolset.Safer *bool gates the feature per-toolset; validation rejects safer:true on non-shell toolsets. AgentConfig.SaferShellEnabled aggregates across an agent's toolsets — one match anywhere flips the agent-level flag. - agent-schema.json carries both the new safety_check event entry and the safer field on Toolset. Builtin: - pkg/hooks/builtins/safer_shell.go. Priority order: destructive match → Ask + Metadata{blast_radius, category, reason}; safe match → nil (no opinion, falls through to the regular approval pipeline); no match → Ask + blast_radius=unknown. Filters by tool name internally; non-shell calls are no-op so the builtin co-exists cleanly with any other safety_check hook. - Compound shell (a && b, a; b, a | b) never safe-matches — the matcher rejects safe matches on strings containing shell separators. A destructive segment inside a chain still matches (the destructive regex spans the whole command). - safety_patterns.json carries two sections. `destructive` is the taxonomy from the original safer-shell prototype (rm -rf, docker volume rm, mkfs, dd, …) with blast_radius tiers. `safe` is a new ~70-entry allowlist of known read-only command shapes (ls, cat, git status/diff/log/show, docker ps/images/logs/inspect, kubectl get/describe/logs, grep, rg, ps, df, echo, …) so common reads flow through without prompting. - AgentDefaults.SaferShell + ApplyAgentDefaults auto-injects the builtin under safety_check with matcher "*". Mirrors the redact_secrets sugar; idempotent against an explicit YAML entry. Tests: - pkg/hooks/builtins/safer_shell_test.go: destructive taxonomy fixtures, safe-list pass-through across ~50 fixtures, compound-shell fall-through to ask, cmd/command alias key, no-op contracts, unknown-radius fallback, ApplyAgentDefaults sugar. - pkg/hooks/aggregate_test.go: safety_check Deny / Allow / Metadata merge cases. Docs: - docs/configuration/hooks/index.md gains the safety_check event row, the safer_shell builtin row, the per-event input fields table entry, and a "Safety-Check Specific Output" section documenting the Metadata convention. - docs/tools/shell/index.md gains a Safer mode section. No behavior change for agents without safer:true on a shell toolset.

…e through agent loader Wires the safety_check event into the dispatcher's approval pipeline and threads the per-toolset safer:true flag through the agent loader so the safer_shell builtin actually fires when opted in. Dispatcher (pkg/runtime/toolexec/dispatcher.go): - New Stage 0 in approveAndRun: dispatch EventSafetyCheck before Decide(), --yolo, and pre_tool_use. Verdicts: Deny → notifyApproval(Deny, safety_check_hook_deny) + errorResponse with the decision reason; permission_request and the rest of the pipeline are skipped entirely. Ask → askUser(); permission_request is also skipped on this path so a policy-level allow there can't override the security verdict. Allow → advisory; pipeline still runs Decide() / pre_tool_use. no-op → fall through unchanged. - New ApprovalSourceSafetyCheckHookDeny constant for the deny path. - consultSafetyCheck caches the hook result on the call struct so approveAndRun + askUser don't dispatch twice. The cached result is also the source of safety_check Metadata. - confirmationMetadata merge order is now tool static < permission_request < safety_check. safety_check wins on key clashes because it carries a security verdict that a policy-level permission_request hook must not silently overwrite. Agent layer (pkg/agent + pkg/teamloader + pkg/runtime/hooks.go): - agent.WithSaferShell opt + Agent.SaferShell() accessor + private saferShell field, mirroring the WithRedactSecrets shape. - teamloader: agent.WithSaferShell(agentConfig.SaferShellEnabled()) alongside the existing WithRedactSecrets call. - buildHooksExecutors threads Agent.SaferShell() into builtins.AgentDefaults so ApplyAgentDefaults wires up the safety_check entry. Tests (pkg/runtime/toolexec/dispatcher_test.go): - SafetyCheckAskPreemptsYolo: an Ask verdict routes to user confirmation despite ToolsApproved=true. Verifies the hook's Metadata reaches the confirmation event. - SafetyCheckDenyShortCircuits: Deny goes straight to errorResponse without emitting a confirmation event. - SafetyCheckAllowIsAdvisory: Allow lets --yolo auto-run the call.

When the tool-call confirmation event carries the safer_shell builtin's `blast_radius` metadata, the confirmation dialog now composes a user-facing warning block instead of dumping the raw `blast_radius: high` / `category: fs-delete` / `reason: ...` rows. Before: Metadata blast_radius: high (with "high" colored red) category: fs-delete reason: Command matches destructive operation: rm -rf <path> After: ⚠ Destructive command — high blast radius Command matches destructive operation: rm -rf <path> Implementation: - New renderSafetyWarning composes the warning block from the safer_shell convention keys (blast_radius, category, reason) when blast_radius is present. The blast-radius badge keeps the existing color mapping (green/yellow/red/muted for low/medium/high/unknown). - renderMetadata suppresses the convention keys ONLY when blast_radius is also present. `reason` and `category` stay generic key names that a permission_request hook can use for unrelated purposes — without a blast_radius classification they render as plain pairs, so existing permission_request consumers aren't affected. - Non-convention metadata (e.g. team_policy, ticket) keeps rendering as plain pairs below the warning block when both sources contribute. Four new tests pin the cases: * RendersSafetyWarning — warning block fires; convention keys are suppressed from Metadata. * RendersSafetyWarningPlusExtraMetadata — extra permission_request keys still render as plain pairs. * ReasonOutsideSafetyVerdictRendersPlain — `reason` without blast_radius renders as a generic key. * (Existing RendersMetadata test still passes — its `reason` fixture stays generic because blast_radius isn't in play.) Verified manually with `docker-agent run examples/shell_safer.yaml` + a destructive rm -rf against /tmp.

Adds an example agent that wires the safer_shell builtin under the safety_check hook event by hand, demonstrating the manual form of what `safer: true` on a shell toolset auto-injects. Mirrors the redact_secrets / redact_secrets_hooks split: one example for the sugar, one showing the explicit hook entry. Tests/CI: TestLoadExamples covers the new file via the teamloader's example loader.

melmennaoui requested a review from a team as a code owner June 26, 2026 14:40

melmennaoui force-pushed the feat/safety-check-hook branch from 0cd381a to 399d3a6 Compare June 26, 2026 14:44

melmennaoui added 4 commits June 26, 2026 16:48

melmennaoui force-pushed the feat/safety-check-hook branch from 399d3a6 to bff4fa0 Compare June 26, 2026 14:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: safety_check hook + safer_shell builtin#3273

feat: safety_check hook + safer_shell builtin#3273
melmennaoui wants to merge 4 commits into
mainfrom
feat/safety-check-hook

melmennaoui commented Jun 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

melmennaoui commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

melmennaoui commented Jun 26, 2026 •

edited

Loading