Skip to content

FEAT text adaptive scenario#1760

Open
hannahwestra25 wants to merge 16 commits into
microsoft:mainfrom
hannahwestra25:hawestra/text_adaptive_scenario
Open

FEAT text adaptive scenario#1760
hannahwestra25 wants to merge 16 commits into
microsoft:mainfrom
hannahwestra25:hawestra/text_adaptive_scenario

Conversation

@hannahwestra25
Copy link
Copy Markdown
Contributor

@hannahwestra25 hannahwestra25 commented May 19, 2026

Add Adaptive Scenario Framework with TextAdaptive

Summary

Introduces an adaptive scenario framework that picks attack techniques per-objective using an epsilon-greedy bandit informed by observed success rates, rather than running every selected technique against every objective. Concentrates spend on techniques that actually work against the target, and stops early on first success.

Adds:

  • AdaptiveScenario — modality-agnostic base class
  • TextAdaptive — concrete text-attack subclass
  • TechniqueSelector Protocol + EpsilonGreedyTechniqueSelector — pluggable selector with Laplace-smoothed estimates and pooled cross-context backoff
  • AdaptiveDispatchAttack — per-dataset dispatch strategy with per-call seed-group routing
  • Walkthrough notebook + .py doc
  • Unit tests (64 tests across selector, protocol, dispatcher, and scenario)

Motivation

Static scenarios are O(techniques × objectives): every technique runs against every objective regardless of whether earlier attempts already succeeded or whether the technique is known to be ineffective against the target. For evaluation runs with many techniques and many objectives, this wastes spend on combinations that aren't informative.

Adaptive scenarios reduce this to O(max_attempts × objectives) by:

  • learning from observed outcomes,
  • exploiting techniques that work on the target,
  • still exploring (with probability epsilon) so the table doesn't collapse onto a single technique prematurely,
  • stopping per-objective on first success.

How it works

For each objective the dispatcher loops up to max_attempts_per_objective times:

  1. Select — with probability epsilon pick a random technique, otherwise pick the one with the highest Laplace-smoothed success estimate (s + 1) / (n + 1). Cells with fewer than pool_threshold local observations fall back to the technique's pooled rate across all contexts (cold-start handling). Each decision derives a per-decision RNG from SHA-256(random_seed|context|decision_key) for resume-safe reproducibility.
  2. Execute — run the chosen technique against the seed group (read from AdaptiveDispatchParams.seed_group), merging the technique's seed_technique if it declares one. Techniques incompatible with the current seed group are filtered per-call.
  3. Record — update the selector's (context, technique) → (successes, attempts) table and stop early on success.

The selector is shared by reference across all dispatchers in a scenario run, so learning accumulates globally. The per-call context key is derived by a ContextExtractor; global_context (default) shares one table across all objectives, harm_category_context partitions by harm category.

Public API

from pyrit.scenario.scenarios.adaptive import (
    TextAdaptive,
    EpsilonGreedyTechniqueSelector,
    harm_category_context,
)

# Basic — uses default epsilon-greedy selector
scenario = TextAdaptive()
await scenario.initialize_async(objective_target=target)
result = await scenario.run_async()

# Tuned — custom selector + per-category learning
scenario = TextAdaptive(
    selector=EpsilonGreedyTechniqueSelector(epsilon=0.3, random_seed=42),
    context_extractor=harm_category_context,
)
scenario.set_params_from_args(args={"max_attempts_per_objective": 5})
await scenario.initialize_async(objective_target=target)
result = await scenario.run_async()

Adaptive scenarios are also resumable — pass scenario_result_id="..." to the constructor and prior dispatch trails are replayed into the selector before the remaining objectives run.

Notes

  • BASELINE_ATTACK_POLICY = Enabledprompt_sending is excluded from the adaptive technique pool and runs as the baseline comparison instead. This separates "what does the target do unprovoked" (baseline) from "what adversarial moves help" (adaptive techniques).
  • Per-dataset atomic attacks — one AtomicAttack per dataset carrying all seed groups, with per-call seed-group routing via AdaptiveDispatchParams. Per-call compatibility filtering happens inside the dispatcher.
  • Selector as constructor kwargselector: TechniqueSelector | None on the scenario. When None (default), an EpsilonGreedyTechniqueSelector is created with default settings. Selector-specific params (epsilon, pool_threshold, random_seed) live on the selector, not the scenario. max_attempts_per_objective is a scenario parameter via supported_parameters().
  • Resume rehydration — queries get_attack_results(scenario_result_id=...) and filters by attribution_data["parent_collection"] to replay prior dispatch trails via record_outcome. Already-completed atomics are skipped by the base Scenario resume path.
  • Two-row persistence per success — the inner technique persists its raw AttackResult via its own post-execute hook; the dispatcher returns a replace-based copy with a fresh attack_result_id/timestamp and the adaptive trail stamped onto metadata. Both rows share conversation_id.
  • Thread safetyEpsilonGreedyTechniqueSelector guards its counts table with a threading.Lock so individual select / record_outcome operations are atomic.

Files

Area File
Base scenario pyrit/scenario/scenarios/adaptive/adaptive_scenario.py
Text subclass pyrit/scenario/scenarios/adaptive/text_adaptive.py
Selector protocol + context extractors pyrit/scenario/scenarios/adaptive/selectors/protocol.py
Epsilon-greedy selector pyrit/scenario/scenarios/adaptive/selectors/epsilon_greedy.py
Per-dataset dispatcher pyrit/scenario/scenarios/adaptive/dispatcher.py
Package wiring pyrit/scenario/scenarios/adaptive/__init__.py, pyrit/scenario/scenarios/adaptive/selectors/__init__.py
Walkthrough doc/code/scenarios/3_adaptive_scenarios.py, doc/code/scenarios/3_adaptive_scenarios.ipynb
Tests tests/unit/scenario/scenarios/adaptive/test_epsilon_greedy.py, tests/unit/scenario/scenarios/adaptive/test_protocol.py, tests/unit/scenario/scenarios/adaptive/test_dispatcher.py, tests/unit/scenario/scenarios/adaptive/test_text_adaptive.py

Testing

pytest tests/unit/scenario/scenarios/adaptive/ — 64 tests pass. Coverage includes:

  • selector exploration / exploitation / cold-start / pooled backoff / concurrent record_outcome
  • protocol conformance, context extractor coverage
  • dispatcher early-stop, max-attempts retry, label propagation, context routing, fresh-result invariant, per-call compatibility filtering
  • scenario per-dataset atomics, shared selector, harm-category partitioning, seed-technique filtering, resume rehydration, baseline-policy enforcement, params via supported_parameters()

Comment on lines +80 to +87
return [
"airt_hate",
"airt_fairness",
"airt_violence",
"airt_sexual",
"airt_harassment",
"airt_misinformation",
"airt_leakage",
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may want a separate set of datasets

Comment thread pyrit/scenario/scenarios/adaptive/adaptive_scenario.py
Copy link
Copy Markdown
Contributor

@rlundeen2 rlundeen2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great!

Comment thread pyrit/scenario/scenarios/adaptive/adaptive_scenario.py Outdated
Comment thread pyrit/scenario/scenarios/adaptive/adaptive_scenario.py Outdated
Comment thread pyrit/scenario/scenarios/adaptive/adaptive_scenario.py Outdated
Comment thread pyrit/scenario/scenarios/adaptive/adaptive_scenario.py Outdated

techniques = self._build_techniques_dict(objective_target=self._objective_target)

selector = AdaptiveTechniqueSelector(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to name this more specifically, because I could envision different types of selectors also. But maybe this is a future problem. Nit only

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, and I'd actually push slightly harder than "nit" — this is cheap to fix now and expensive to fix later (the class name is part of the public __init__.py surface).

Concrete suggestion:

  1. Rename the concrete class to EpsilonGreedyTechniqueSelector (or EpsilonGreedySelector). The current name describes what role it plays in the scenario, not what algorithm it implements — and the algorithm is what callers will care about when picking between selectors.

  2. Extract a TechniqueSelector Protocol (or ABC) capturing the surface the dispatcher actually depends on — just select(*, context, techniques) -> str and record_outcome(*, context, technique, success). The dispatcher and AdaptiveScenario type-hint against the Protocol, the concrete class is one implementation.

  3. Plumb selector choice as a constructor arg. AdaptiveScenario.__init__(..., selector: TechniqueSelector | None = None) defaulting to EpsilonGreedyTechniqueSelector(epsilon=..., pool_threshold=..., rng=...). That immediately unlocks future selectors (UCB1, Thompson sampling, contextual bandits, even a plug-in for a tuned policy) without subclassing AdaptiveScenario.

  4. The rehydration hook (_rehydrate_selector_from_memory) needs to become selector-aware, since UCB-style selectors care about timestamps, Thompson sampling needs Beta posterior parameters, etc. For v1 you can document that only epsilon-greedy is rehydratable and others start fresh — but it's worth a TODO so it doesn't surprise the next contributor.

Steps 1–3 are mechanical and worth doing in this PR (keeps the public API stable for v1). Step 4 is genuinely future work.

The epsilon/pool_threshold constructor args on AdaptiveScenario then become awkward — they're epsilon-greedy specific. Either:

  • Drop them from the scenario constructor entirely and require callers wanting non-defaults to pass a constructed selector=..., or
  • Keep them as ergonomic shortcuts that only apply to the default selector (raise if selector= is also passed with these).

I'd go with the first — simpler contract, and once you have selector= as the extension point, the kwargs are redundant sugar.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated to add the Selector as input and created a greedy epsilon specific selector (with a base selector class), so I added the max_attempts_per_objective to supported_params but removed everything else so now users are able to create a selector or we use greedy epsilon as the default. This does make it a little less flexible in that you can't customize the greedy epsilon selector without creating another object and then passing it in but I think it's better than having these as params and I like supporting different selectors. wdyt ?

Comment thread pyrit/scenario/scenarios/adaptive/adaptive_scenario.py Outdated
Comment thread pyrit/scenario/scenarios/adaptive/selectors/epsilon_greedy.py
Comment thread pyrit/scenario/scenarios/adaptive/adaptive_scenario.py
hannahwestra25 and others added 2 commits May 21, 2026 15:37
- Remove prompt_sending from adaptive pool; enable baseline comparison
- Expose max_attempts_per_objective via supported_parameters() (scam.py pattern)
- Rename AdaptiveTechniqueSelector -> EpsilonGreedyTechniqueSelector
- Extract TechniqueSelector Protocol; accept custom selector via kwarg
- Per-decision RNG derivation (SHA-256) for resume reproducibility
- Drop uuid.uuid4() fallback for objective IDs
- Per-dataset atomic attacks (one AtomicAttack per dataset, not per objective)
- AdaptiveDispatchParams with per-call seed_group and compatibility filtering
- Context extraction moved to dispatcher
- Rehydration uses get_attack_results with attribution_data filtering
- Split selector.py into selectors/ folder (protocol.py + epsilon_greedy.py)
- Update notebooks for new API patterns

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@@ -0,0 +1,66 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't love this file name so if anyone has suggestions, please comment!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

selection_protocol.py?

hannahwestra25 and others added 2 commits May 21, 2026 16:59
- SIM108: use ternary for selector assignment
- D101: add docstring to AdaptiveDispatchParams
- DOC201/DOC501: add Returns/Raises sections to docstrings
- TC003: move Sequence import into TYPE_CHECKING block
- Fix trailing newline in epsilon_greedy.py

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@@ -0,0 +1,66 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

selection_protocol.py?


techniques = self._build_techniques_dict(objective_target=self._objective_target)

selector: TechniqueSelector
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nits and nonblocking:

  • Does a selector belong to a scenario or is it meant to be a runtime parameter? It uses the same pattern as the scorer and context extractor but seems more intrinsic to the scenario than the other two
  • What does selector: TechniqueSelector do? Is it just for type checking?
  • Is EpsilonGreedyTechniqueSelector guaranteed to work as a default for all subclasses of AdaptiveScenario?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants