-
Notifications
You must be signed in to change notification settings - Fork 762
FEAT text adaptive scenario #1760
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
hannahwestra25
wants to merge
17
commits into
microsoft:main
Choose a base branch
from
hannahwestra25:hawestra/text_adaptive_scenario
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
83a778e
init commit
hannahwestra25 d74fe3e
Merge branch 'main' of https://github.com/microsoft/PyRIT into hawest…
hannahwestra25 09e3007
merge
hannahwestra25 70d14c4
proofread
hannahwestra25 3df5787
pr review
hannahwestra25 eb68c1b
Merge branch 'main' of https://github.com/microsoft/PyRIT into hawest…
hannahwestra25 b794db0
generalize and clean up comments & notebooks
hannahwestra25 2c06a24
pre-commit
hannahwestra25 11b39a0
integrate attack technique group
hannahwestra25 61a1b7d
clean up and fix docstrings
hannahwestra25 32d8b5e
simplify notebook and pre-commit
hannahwestra25 1375974
Merge branch 'main' of https://github.com/microsoft/PyRIT into hawest…
hannahwestra25 4d5c2de
Merge remote-tracking branch 'upstream/main' into hannahwestra25/feat…
hannahwestra25 f86c191
feat: address PR #1760 review feedback
hannahwestra25 9e38a33
Merge remote-tracking branch 'upstream/main' into hannahwestra25/feat…
hannahwestra25 26cd65e
fix: address pre-commit lint failures
hannahwestra25 b4db6a6
Redesign TechniqueSelector: stateless, memory-backed, eval-hash keyed
hannahwestra25 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,265 @@ | ||
| { | ||
| "cells": [ | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "94e7f44a", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "# Adaptive Scenarios\n", | ||
| "\n", | ||
| "An **adaptive scenario** doesn't run every attack technique against every objective.\n", | ||
| "Instead, it picks which technique to try next per-objective, learns from what worked,\n", | ||
| "and stops as soon as one technique succeeds. This concentrates spend on techniques\n", | ||
| "that actually work on your target.\n", | ||
| "\n", | ||
| "## How it works (high level)\n", | ||
| "\n", | ||
| "For each objective, the scenario tries up to `max_attempts_per_objective` techniques:\n", | ||
| "\n", | ||
| "- With probability `epsilon`, it **explores** — picks a random technique.\n", | ||
| "- Otherwise it **exploits** — picks the technique with the highest observed success\n", | ||
| " rate so far.\n", | ||
| "- It records the outcome and stops early on success.\n", | ||
| "\n", | ||
| "Unseen techniques are tried first, so the first few objectives effectively round-robin\n", | ||
| "through every technique before the scenario settles on the best performers.\n", | ||
| "\n", | ||
| "## Adaptive vs. static scenarios\n", | ||
| "\n", | ||
| "| Feature | Static scenarios | Adaptive scenarios |\n", | ||
| "|---------------------|-----------------------------------|------------------------------------|\n", | ||
| "| Technique selection | Run every selected technique | Pick per-objective from outcomes |\n", | ||
| "| Early stopping | No | Yes — stops on first success |\n", | ||
| "| Cost | O(techniques × objectives) | O(max_attempts × objectives) |\n", | ||
| "\n", | ||
| "`AdaptiveScenario` is the modality-agnostic base class.\n", | ||
| "[`TextAdaptive`](../../../pyrit/scenario/scenarios/adaptive/text_adaptive.py) is the\n", | ||
| "text subclass used in the examples below." | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "cb716650", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## Setup" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "4b536900", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "from pathlib import Path\n", | ||
| "\n", | ||
| "from pyrit.registry import TargetRegistry\n", | ||
| "from pyrit.scenario import DatasetConfiguration\n", | ||
| "from pyrit.scenario.printer.console_printer import ConsoleScenarioResultPrinter\n", | ||
| "from pyrit.scenario.scenarios.adaptive import TextAdaptive\n", | ||
| "from pyrit.setup import initialize_from_config_async\n", | ||
| "\n", | ||
| "await initialize_from_config_async(config_path=Path(\"../../scanner/pyrit_conf.yaml\")) # type: ignore\n", | ||
| "\n", | ||
| "objective_target = TargetRegistry.get_registry_singleton().get_instance_by_name(\"openai_chat\")\n", | ||
| "printer = ConsoleScenarioResultPrinter()" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "9f9ff786", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## Basic usage\n", | ||
| "\n", | ||
| "Defaults: `max_attempts_per_objective=3`, epsilon-greedy selector with `epsilon=0.2`,\n", | ||
| "the subclass's default datasets." | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "33aa89d3", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "scenario = TextAdaptive()\n", | ||
| "\n", | ||
| "await scenario.initialize_async( # type: ignore\n", | ||
| " objective_target=objective_target,\n", | ||
| ")\n", | ||
| "result = await scenario.run_async() # type: ignore\n", | ||
| "await printer.write_async(result) # type: ignore" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "5083bbed", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## Configuring a run\n", | ||
| "\n", | ||
| "- **`max_attempts_per_objective`** — caps techniques tried per objective. Higher means\n", | ||
| " more chances to succeed and more API calls. Set via `set_params_from_args`.\n", | ||
| "- **`selector`** — a pre-built `TechniqueSelector` instance. Pass an\n", | ||
| " `EpsilonGreedyTechniqueSelector(epsilon=..., random_seed=...)`\n", | ||
| " to tune the selection algorithm. Defaults to an epsilon-greedy selector with\n", | ||
| " `epsilon=0.2`.\n", | ||
| "- **`scenario_strategies`** (on `initialize_async`) — restricts which techniques the\n", | ||
| " selector can pick from. Use `TextAdaptive.get_strategy_class()` to access the enum.\n", | ||
| "\n", | ||
| "The cell below exercises all of them at once." | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "db966395", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "from pyrit.scenario.scenarios.adaptive import EpsilonGreedyTechniqueSelector\n", | ||
| "\n", | ||
| "strategy_class = TextAdaptive.get_strategy_class()\n", | ||
| "\n", | ||
| "configured_scenario = TextAdaptive(\n", | ||
| " selector=EpsilonGreedyTechniqueSelector(\n", | ||
| " epsilon=0.3,\n", | ||
| " random_seed=42,\n", | ||
| " ),\n", | ||
| ")\n", | ||
| "configured_scenario.set_params_from_args(args={\"max_attempts_per_objective\": 5})\n", | ||
| "\n", | ||
| "await configured_scenario.initialize_async( # type: ignore\n", | ||
| " objective_target=objective_target,\n", | ||
| " scenario_strategies=[strategy_class(\"single_turn\")],\n", | ||
| " dataset_config=DatasetConfiguration(\n", | ||
| " dataset_names=[\"airt_hate\", \"airt_violence\"],\n", | ||
| " max_dataset_size=4,\n", | ||
| " ),\n", | ||
| ")\n", | ||
| "configured_result = await configured_scenario.run_async() # type: ignore\n", | ||
| "await printer.write_async(configured_result) # type: ignore" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "ba7e7126", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## Resuming a run\n", | ||
| "\n", | ||
| "Adaptive scenarios are resumable — pass `scenario_result_id=...` to the `TextAdaptive`\n", | ||
| "constructor and the run picks up where it left off. Resume must use the same\n", | ||
| "configuration as the original run." | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "4857bace", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "resumed_scenario = TextAdaptive(\n", | ||
| " selector=EpsilonGreedyTechniqueSelector(\n", | ||
| " epsilon=0.3,\n", | ||
| " random_seed=42,\n", | ||
| " ),\n", | ||
| " scenario_result_id=str(configured_result.id),\n", | ||
| ")\n", | ||
| "resumed_scenario.set_params_from_args(args={\"max_attempts_per_objective\": 5})\n", | ||
| "\n", | ||
| "await resumed_scenario.initialize_async( # type: ignore\n", | ||
| " objective_target=objective_target,\n", | ||
| " scenario_strategies=[strategy_class(\"single_turn\")],\n", | ||
| " dataset_config=DatasetConfiguration(\n", | ||
| " dataset_names=[\"airt_hate\", \"airt_violence\"],\n", | ||
| " max_dataset_size=4,\n", | ||
| " ),\n", | ||
| ")\n", | ||
| "resumed_result = await resumed_scenario.run_async() # type: ignore\n", | ||
| "await printer.write_async(resumed_result) # type: ignore" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "e267467c", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## Inspecting which techniques were tried\n", | ||
| "\n", | ||
| "The dispatcher stamps every objective's `AttackResult.metadata` with:\n", | ||
| "\n", | ||
| "- `adaptive_attempts` — the ordered list of `{\"technique\", \"outcome\"}` dicts\n", | ||
| " recording exactly which techniques the selector picked and what happened.\n", | ||
| "\n", | ||
| "Walk that metadata to see the per-objective trail and aggregate counts." | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "3a95436b", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "from collections import Counter\n", | ||
| "\n", | ||
| "# Per-objective trail\n", | ||
| "for results in resumed_result.attack_results.values():\n", | ||
| " for r in results:\n", | ||
| " attempts = r.metadata.get(\"adaptive_attempts\", [])\n", | ||
| " trail = \" → \".join(f\"{a['technique']}({a['outcome']})\" for a in attempts)\n", | ||
| " print(f\"[{r.outcome.value:7s}] {r.objective!r}: {trail}\")\n", | ||
| "\n", | ||
| "# Aggregate per-technique pick counts and success rate across the run\n", | ||
| "picks: Counter[str] = Counter()\n", | ||
| "wins: Counter[str] = Counter()\n", | ||
| "for results in resumed_result.attack_results.values():\n", | ||
| " for r in results:\n", | ||
| " for step in r.metadata.get(\"adaptive_attempts\", []):\n", | ||
| " picks[step[\"technique\"]] += 1\n", | ||
| " if step[\"outcome\"] == \"success\":\n", | ||
| " wins[step[\"technique\"]] += 1\n", | ||
| "\n", | ||
| "print(\"\\nTechnique wins / picks rate\")\n", | ||
| "for technique, n in picks.most_common():\n", | ||
| " print(f\"{technique:20s} {wins[technique]:>4} / {n:<4} {wins[technique] / n:.0%}\")" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "37cd0756", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## Running from the scanner CLI\n", | ||
| "\n", | ||
| "You can run `TextAdaptive` directly from the `pyrit_scan` CLI without writing Python:\n", | ||
| "\n", | ||
| "```bash\n", | ||
| "# Basic run with defaults\n", | ||
| "pyrit_scan --scenario TextAdaptive --target openai_chat\n", | ||
| "\n", | ||
| "# Tune max attempts and restrict strategies\n", | ||
| "pyrit_scan --scenario TextAdaptive --target openai_chat \\\n", | ||
| " --params max_attempts_per_objective=5 \\\n", | ||
| " --strategies single_turn\n", | ||
| "\n", | ||
| "# Use specific datasets and limit size\n", | ||
| "pyrit_scan --scenario TextAdaptive --target openai_chat \\\n", | ||
| " --datasets airt_hate airt_violence \\\n", | ||
| " --max-dataset-size 10\n", | ||
| "```" | ||
| ] | ||
| } | ||
| ], | ||
| "metadata": { | ||
| "jupytext": { | ||
| "main_language": "python" | ||
| } | ||
| }, | ||
| "nbformat": 4, | ||
| "nbformat_minor": 5 | ||
| } | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.