feat: safety_check hook + safer_shell builtin#3273
Open
melmennaoui wants to merge 4 commits into
Open
Conversation
0cd381a to
399d3a6
Compare
Introduces a pre-Decide() hook event whose verdicts can preempt --yolo and permission allow-rules, plus a builtin that classifies shell commands against an embedded destructive/safe taxonomy. The event piggy-backs on the existing hook Metadata facility (added in #3256) to surface structured context (blast_radius, category, reason) to the tool-call confirmation event — no bespoke types. Event: - hooks.EventSafetyCheck constant. Fires once per tool call BEFORE the deterministic approval pipeline (--yolo, permission patterns, pre_tool_use). Its verdicts (Deny / Ask / Allow / no-opinion) are the only ones that can override auto-approval rules. - HooksConfig.SafetyCheck []HookMatcherConfig (tool-matched, like pre_tool_use); compileEvents + IsEmpty updated. - aggregate() handles EventSafetyCheck: most-restrictive Decision wins (Deny > Ask > Allow), Deny flips Allowed=false to short-circuit the pipeline, Allow is intentionally advisory (the call still flows through Decide() / pre_tool_use), Metadata merged across hooks via last-writer-wins per key. - failClosed(EventSafetyCheck) returns true so a hook crash denies the call rather than silently allowing it through, matching the pre_tool_use posture. Toolset YAML knob: - Toolset.Safer *bool gates the feature per-toolset; validation rejects safer:true on non-shell toolsets. AgentConfig.SaferShellEnabled aggregates across an agent's toolsets — one match anywhere flips the agent-level flag. - agent-schema.json carries both the new safety_check event entry and the safer field on Toolset. Builtin: - pkg/hooks/builtins/safer_shell.go. Priority order: destructive match → Ask + Metadata{blast_radius, category, reason}; safe match → nil (no opinion, falls through to the regular approval pipeline); no match → Ask + blast_radius=unknown. Filters by tool name internally; non-shell calls are no-op so the builtin co-exists cleanly with any other safety_check hook. - Compound shell (a && b, a; b, a | b) never safe-matches — the matcher rejects safe matches on strings containing shell separators. A destructive segment inside a chain still matches (the destructive regex spans the whole command). - safety_patterns.json carries two sections. `destructive` is the taxonomy from the original safer-shell prototype (rm -rf, docker volume rm, mkfs, dd, …) with blast_radius tiers. `safe` is a new ~70-entry allowlist of known read-only command shapes (ls, cat, git status/diff/log/show, docker ps/images/logs/inspect, kubectl get/describe/logs, grep, rg, ps, df, echo, …) so common reads flow through without prompting. - AgentDefaults.SaferShell + ApplyAgentDefaults auto-injects the builtin under safety_check with matcher "*". Mirrors the redact_secrets sugar; idempotent against an explicit YAML entry. Tests: - pkg/hooks/builtins/safer_shell_test.go: destructive taxonomy fixtures, safe-list pass-through across ~50 fixtures, compound-shell fall-through to ask, cmd/command alias key, no-op contracts, unknown-radius fallback, ApplyAgentDefaults sugar. - pkg/hooks/aggregate_test.go: safety_check Deny / Allow / Metadata merge cases. Docs: - docs/configuration/hooks/index.md gains the safety_check event row, the safer_shell builtin row, the per-event input fields table entry, and a "Safety-Check Specific Output" section documenting the Metadata convention. - docs/tools/shell/index.md gains a Safer mode section. No behavior change for agents without safer:true on a shell toolset.
…e through agent loader
Wires the safety_check event into the dispatcher's approval pipeline
and threads the per-toolset safer:true flag through the agent loader
so the safer_shell builtin actually fires when opted in.
Dispatcher (pkg/runtime/toolexec/dispatcher.go):
- New Stage 0 in approveAndRun: dispatch EventSafetyCheck before
Decide(), --yolo, and pre_tool_use. Verdicts:
Deny → notifyApproval(Deny, safety_check_hook_deny) +
errorResponse with the decision reason; permission_request
and the rest of the pipeline are skipped entirely.
Ask → askUser(); permission_request is also skipped on this
path so a policy-level allow there can't override the
security verdict.
Allow → advisory; pipeline still runs Decide() / pre_tool_use.
no-op → fall through unchanged.
- New ApprovalSourceSafetyCheckHookDeny constant for the deny path.
- consultSafetyCheck caches the hook result on the call struct so
approveAndRun + askUser don't dispatch twice. The cached result
is also the source of safety_check Metadata.
- confirmationMetadata merge order is now tool static < permission_request
< safety_check. safety_check wins on key clashes because it carries
a security verdict that a policy-level permission_request hook
must not silently overwrite.
Agent layer (pkg/agent + pkg/teamloader + pkg/runtime/hooks.go):
- agent.WithSaferShell opt + Agent.SaferShell() accessor + private
saferShell field, mirroring the WithRedactSecrets shape.
- teamloader: agent.WithSaferShell(agentConfig.SaferShellEnabled())
alongside the existing WithRedactSecrets call.
- buildHooksExecutors threads Agent.SaferShell() into
builtins.AgentDefaults so ApplyAgentDefaults wires up the
safety_check entry.
Tests (pkg/runtime/toolexec/dispatcher_test.go):
- SafetyCheckAskPreemptsYolo: an Ask verdict routes to user
confirmation despite ToolsApproved=true. Verifies the hook's
Metadata reaches the confirmation event.
- SafetyCheckDenyShortCircuits: Deny goes straight to errorResponse
without emitting a confirmation event.
- SafetyCheckAllowIsAdvisory: Allow lets --yolo auto-run the call.
When the tool-call confirmation event carries the safer_shell
builtin's `blast_radius` metadata, the confirmation dialog now
composes a user-facing warning block instead of dumping the raw
`blast_radius: high` / `category: fs-delete` / `reason: ...` rows.
Before:
Metadata
blast_radius: high (with "high" colored red)
category: fs-delete
reason: Command matches destructive operation: rm -rf <path>
After:
⚠ Destructive command — high blast radius
Command matches destructive operation: rm -rf <path>
Implementation:
- New renderSafetyWarning composes the warning block from the
safer_shell convention keys (blast_radius, category, reason) when
blast_radius is present. The blast-radius badge keeps the existing
color mapping (green/yellow/red/muted for low/medium/high/unknown).
- renderMetadata suppresses the convention keys ONLY when
blast_radius is also present. `reason` and `category` stay
generic key names that a permission_request hook can use for
unrelated purposes — without a blast_radius classification they
render as plain pairs, so existing permission_request consumers
aren't affected.
- Non-convention metadata (e.g. team_policy, ticket) keeps
rendering as plain pairs below the warning block when both
sources contribute.
Four new tests pin the cases:
* RendersSafetyWarning — warning block fires; convention keys
are suppressed from Metadata.
* RendersSafetyWarningPlusExtraMetadata — extra
permission_request keys still render as plain pairs.
* ReasonOutsideSafetyVerdictRendersPlain — `reason` without
blast_radius renders as a generic key.
* (Existing RendersMetadata test still passes — its `reason`
fixture stays generic because blast_radius isn't in play.)
Verified manually with `docker-agent run examples/shell_safer.yaml`
+ a destructive rm -rf against /tmp.
Adds an example agent that wires the safer_shell builtin under the safety_check hook event by hand, demonstrating the manual form of what `safer: true` on a shell toolset auto-injects. Mirrors the redact_secrets / redact_secrets_hooks split: one example for the sugar, one showing the explicit hook entry. Tests/CI: TestLoadExamples covers the new file via the teamloader's example loader.
399d3a6 to
bff4fa0
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
safety_checkhook event fires beforeDecide()so itsdeny/askverdicts preempt--yoloand permission allow-rules.allowis advisory; the pipeline still runs the rest of the chain.safer_shellbuiltin classifies shell commands against an embedded destructive/safe taxonomy. Destructive matches force confirmation with ablast_radiusclassification; ~70 known-safe reads (ls,git status,docker ps,kubectl get, …) flow through silently; everything else asks withblast_radius=unknown. Compound shell (a && b,a; b,a | b) never safe-matches.safer: trueon a shell toolset auto-injects the builtin viaApplyAgentDefaults. The TUI confirmation dialog renders a composed warning block (⚠ Destructive command — high blast radius+ matched reason) whenblast_radiusis present, instead of dumping raw key/value pairs. Hook output uses feat: add key/value metadata to tool-call confirmation events #3256's existingMetadatamap — no bespoke types.Test plan
task lintgo test ./pkg/hooks/... ./pkg/runtime/... ./pkg/tools/... ./pkg/agent/... ./pkg/config/... ./pkg/tui/dialog/...docker-agent run examples/shell_safer.yamlwithrm -rf /tmp/toto→ warning block + forced confirmation;git status→ no prompt