Skip to content

feat: safety_check hook + safer_shell builtin#3273

Open
melmennaoui wants to merge 4 commits into
mainfrom
feat/safety-check-hook
Open

feat: safety_check hook + safer_shell builtin#3273
melmennaoui wants to merge 4 commits into
mainfrom
feat/safety-check-hook

Conversation

@melmennaoui

@melmennaoui melmennaoui commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Summary

  • New safety_check hook event fires before Decide() so its deny/ask verdicts preempt --yolo and permission allow-rules. allow is advisory; the pipeline still runs the rest of the chain.
  • New safer_shell builtin classifies shell commands against an embedded destructive/safe taxonomy. Destructive matches force confirmation with a blast_radius classification; ~70 known-safe reads (ls, git status, docker ps, kubectl get, …) flow through silently; everything else asks with blast_radius=unknown. Compound shell (a && b, a; b, a | b) never safe-matches.
  • safer: true on a shell toolset auto-injects the builtin via ApplyAgentDefaults. The TUI confirmation dialog renders a composed warning block (⚠ Destructive command — high blast radius + matched reason) when blast_radius is present, instead of dumping raw key/value pairs. Hook output uses feat: add key/value metadata to tool-call confirmation events #3256's existing Metadata map — no bespoke types.

Test plan

  • task lint
  • go test ./pkg/hooks/... ./pkg/runtime/... ./pkg/tools/... ./pkg/agent/... ./pkg/config/... ./pkg/tui/dialog/...
  • Manual: docker-agent run examples/shell_safer.yaml with rm -rf /tmp/toto → warning block + forced confirmation; git status → no prompt
  • CI green

@melmennaoui melmennaoui requested a review from a team as a code owner June 26, 2026 14:40
@melmennaoui melmennaoui force-pushed the feat/safety-check-hook branch from 0cd381a to 399d3a6 Compare June 26, 2026 14:44
Introduces a pre-Decide() hook event whose verdicts can preempt --yolo
and permission allow-rules, plus a builtin that classifies shell
commands against an embedded destructive/safe taxonomy. The event
piggy-backs on the existing hook Metadata facility (added in #3256)
to surface structured context (blast_radius, category, reason) to the
tool-call confirmation event — no bespoke types.

Event:

- hooks.EventSafetyCheck constant. Fires once per tool call BEFORE
  the deterministic approval pipeline (--yolo, permission patterns,
  pre_tool_use). Its verdicts (Deny / Ask / Allow / no-opinion) are
  the only ones that can override auto-approval rules.
- HooksConfig.SafetyCheck []HookMatcherConfig (tool-matched, like
  pre_tool_use); compileEvents + IsEmpty updated.
- aggregate() handles EventSafetyCheck: most-restrictive Decision
  wins (Deny > Ask > Allow), Deny flips Allowed=false to short-circuit
  the pipeline, Allow is intentionally advisory (the call still flows
  through Decide() / pre_tool_use), Metadata merged across hooks via
  last-writer-wins per key.
- failClosed(EventSafetyCheck) returns true so a hook crash denies
  the call rather than silently allowing it through, matching the
  pre_tool_use posture.

Toolset YAML knob:

- Toolset.Safer *bool gates the feature per-toolset; validation
  rejects safer:true on non-shell toolsets. AgentConfig.SaferShellEnabled
  aggregates across an agent's toolsets — one match anywhere flips
  the agent-level flag.
- agent-schema.json carries both the new safety_check event entry and
  the safer field on Toolset.

Builtin:

- pkg/hooks/builtins/safer_shell.go. Priority order: destructive
  match → Ask + Metadata{blast_radius, category, reason}; safe match
  → nil (no opinion, falls through to the regular approval pipeline);
  no match → Ask + blast_radius=unknown. Filters by tool name
  internally; non-shell calls are no-op so the builtin co-exists
  cleanly with any other safety_check hook.
- Compound shell (a && b, a; b, a | b) never safe-matches — the
  matcher rejects safe matches on strings containing shell
  separators. A destructive segment inside a chain still matches
  (the destructive regex spans the whole command).
- safety_patterns.json carries two sections. `destructive` is the
  taxonomy from the original safer-shell prototype (rm -rf, docker
  volume rm, mkfs, dd, …) with blast_radius tiers. `safe` is a new
  ~70-entry allowlist of known read-only command shapes (ls, cat,
  git status/diff/log/show, docker ps/images/logs/inspect, kubectl
  get/describe/logs, grep, rg, ps, df, echo, …) so common reads
  flow through without prompting.
- AgentDefaults.SaferShell + ApplyAgentDefaults auto-injects the
  builtin under safety_check with matcher "*". Mirrors the
  redact_secrets sugar; idempotent against an explicit YAML entry.

Tests:

- pkg/hooks/builtins/safer_shell_test.go: destructive taxonomy
  fixtures, safe-list pass-through across ~50 fixtures, compound-shell
  fall-through to ask, cmd/command alias key, no-op contracts,
  unknown-radius fallback, ApplyAgentDefaults sugar.
- pkg/hooks/aggregate_test.go: safety_check Deny / Allow / Metadata
  merge cases.

Docs:

- docs/configuration/hooks/index.md gains the safety_check event
  row, the safer_shell builtin row, the per-event input fields
  table entry, and a "Safety-Check Specific Output" section
  documenting the Metadata convention.
- docs/tools/shell/index.md gains a Safer mode section.

No behavior change for agents without safer:true on a shell toolset.
…e through agent loader

Wires the safety_check event into the dispatcher's approval pipeline
and threads the per-toolset safer:true flag through the agent loader
so the safer_shell builtin actually fires when opted in.

Dispatcher (pkg/runtime/toolexec/dispatcher.go):

- New Stage 0 in approveAndRun: dispatch EventSafetyCheck before
  Decide(), --yolo, and pre_tool_use. Verdicts:
    Deny  → notifyApproval(Deny, safety_check_hook_deny) +
            errorResponse with the decision reason; permission_request
            and the rest of the pipeline are skipped entirely.
    Ask   → askUser(); permission_request is also skipped on this
            path so a policy-level allow there can't override the
            security verdict.
    Allow → advisory; pipeline still runs Decide() / pre_tool_use.
    no-op → fall through unchanged.
- New ApprovalSourceSafetyCheckHookDeny constant for the deny path.
- consultSafetyCheck caches the hook result on the call struct so
  approveAndRun + askUser don't dispatch twice. The cached result
  is also the source of safety_check Metadata.
- confirmationMetadata merge order is now tool static < permission_request
  < safety_check. safety_check wins on key clashes because it carries
  a security verdict that a policy-level permission_request hook
  must not silently overwrite.

Agent layer (pkg/agent + pkg/teamloader + pkg/runtime/hooks.go):

- agent.WithSaferShell opt + Agent.SaferShell() accessor + private
  saferShell field, mirroring the WithRedactSecrets shape.
- teamloader: agent.WithSaferShell(agentConfig.SaferShellEnabled())
  alongside the existing WithRedactSecrets call.
- buildHooksExecutors threads Agent.SaferShell() into
  builtins.AgentDefaults so ApplyAgentDefaults wires up the
  safety_check entry.

Tests (pkg/runtime/toolexec/dispatcher_test.go):

- SafetyCheckAskPreemptsYolo: an Ask verdict routes to user
  confirmation despite ToolsApproved=true. Verifies the hook's
  Metadata reaches the confirmation event.
- SafetyCheckDenyShortCircuits: Deny goes straight to errorResponse
  without emitting a confirmation event.
- SafetyCheckAllowIsAdvisory: Allow lets --yolo auto-run the call.
When the tool-call confirmation event carries the safer_shell
builtin's `blast_radius` metadata, the confirmation dialog now
composes a user-facing warning block instead of dumping the raw
`blast_radius: high` / `category: fs-delete` / `reason: ...` rows.

Before:

  Metadata
    blast_radius: high          (with "high" colored red)
    category: fs-delete
    reason: Command matches destructive operation: rm -rf <path>

After:

  ⚠  Destructive command — high blast radius
    Command matches destructive operation: rm -rf <path>

Implementation:

- New renderSafetyWarning composes the warning block from the
  safer_shell convention keys (blast_radius, category, reason) when
  blast_radius is present. The blast-radius badge keeps the existing
  color mapping (green/yellow/red/muted for low/medium/high/unknown).
- renderMetadata suppresses the convention keys ONLY when
  blast_radius is also present. `reason` and `category` stay
  generic key names that a permission_request hook can use for
  unrelated purposes — without a blast_radius classification they
  render as plain pairs, so existing permission_request consumers
  aren't affected.
- Non-convention metadata (e.g. team_policy, ticket) keeps
  rendering as plain pairs below the warning block when both
  sources contribute.

Four new tests pin the cases:

  * RendersSafetyWarning — warning block fires; convention keys
    are suppressed from Metadata.
  * RendersSafetyWarningPlusExtraMetadata — extra
    permission_request keys still render as plain pairs.
  * ReasonOutsideSafetyVerdictRendersPlain — `reason` without
    blast_radius renders as a generic key.
  * (Existing RendersMetadata test still passes — its `reason`
    fixture stays generic because blast_radius isn't in play.)

Verified manually with `docker-agent run examples/shell_safer.yaml`
+ a destructive rm -rf against /tmp.
Adds an example agent that wires the safer_shell builtin under the
safety_check hook event by hand, demonstrating the manual form of
what `safer: true` on a shell toolset auto-injects. Mirrors the
redact_secrets / redact_secrets_hooks split: one example for the
sugar, one showing the explicit hook entry.

Tests/CI: TestLoadExamples covers the new file via the teamloader's
example loader.
@melmennaoui melmennaoui force-pushed the feat/safety-check-hook branch from 399d3a6 to bff4fa0 Compare June 26, 2026 14:48
@aheritier aheritier added area/agent For work that has to do with the general agent loop/agentic features of the app area/config For configuration parsing, YAML, environment variables area/docs Documentation changes area/runtime Runtime engine, agent loop execution, tool dispatch, loop detection area/security Authentication, authorization, secrets, vulnerabilities area/tui For features/issues/fixes related to the TUI kind/feat PR adds a new feature (maps to feat:). Use on PRs only. labels Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/agent For work that has to do with the general agent loop/agentic features of the app area/config For configuration parsing, YAML, environment variables area/docs Documentation changes area/runtime Runtime engine, agent loop execution, tool dispatch, loop detection area/security Authentication, authorization, secrets, vulnerabilities area/tui For features/issues/fixes related to the TUI kind/feat PR adds a new feature (maps to feat:). Use on PRs only.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants