REFACTOR: unify error/blocked response scoring across scorers by romanlutz · Pull Request #1770 · microsoft/PyRIT

romanlutz · 2026-05-21T05:01:53Z

Description

Blocked (content-filtered) responses were being handled in three independent places that all converged on 0.0 / False by accident: the TrueFalseScorer no-pieces fallback, the FloatScaleScoreAggregator empty-list fallback, and TAP's error_score_map. Two scorers also had broken edge cases — SelfAskCategoryScorer would send error content to the LLM (likely raising InvalidJsonException or hallucinating a category), and ConversationScorer would crash with ValueError on any blocked turn.

This PR pushes blocked handling into the scorer base classes so the "correct" default behavior is what every scorer gets for free:

TrueFalseScorer defaults to False on blocked input (existing fallback, now load-bearing).
FloatScaleScorer defaults to 0.0 on blocked input (existing aggregator fallback, now load-bearing and documented).
SelfAskCategoryScorer's validator now rejects error pieces so it gets the same False default instead of asking the LLM about garbage.
ConversationScorer no longer rejects error pieces wholesale, so blocked turns mid-conversation are scored normally.
TAP's error_score_map is removed — it was added after the last release, was redundant with the new defaults, and only confused things further. SelfAskRefusalScorer still inverts True/False semantics on blocked content (refusal detected → True); that's intentional and the only "weird" default a user must still wrap in TrueFalseInverterScorer when using it as an objective scorer.

Tests and Documentation

New tests for the unified defaults in test_float_scale_threshold_scorer.py, test_scorer.py, test_self_ask_category.py, test_conversation_history_scorer.py, and the existing TAP suite.
TAP notebook (doc/code/executor/attack/tap_attack.{py,ipynb}) re-executed end-to-end against the real Azure OpenAI GPT-4o strict-filter target and OpenAI image target. The new "The request was blocked by the target; returning 0.0." rationale now appears in the tree output, confirming the refactor fires in a live run.
Class docstrings on FloatScaleScorer and TrueFalseScorer updated and converted from RST to MyST markdown so they render correctly in the jupyter-book docs site.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Push consistent blocked / error handling into the scorer base classes so that: * TrueFalseScorer returns Score(False) when no supported pieces remain * FloatScaleScorer returns Score(0.0) when no supported pieces remain Semantic overrides (e.g., SelfAskRefusalScorer returning True on blocked) stay in the subclass and continue to work as before. Changes: * pyrit/score/float_scale/float_scale_scorer.py: add _score_async override mirroring TrueFalseScorer's no-pieces fallback. Returns Score(0.0) with a rationale distinguishing blocked / error / filtered cases. * pyrit/score/true_false/self_ask_category_scorer.py: restrict default validator to text-only so blocked pieces are filtered out (instead of sent to the LLM as garbage). * pyrit/score/conversation_scorer.py: relax default validator (enforce_all_pieces_valid=False) so a blocked input message no longer raises ValueError before the conversation can even be looked up. * pyrit/executor/attack/multi_turn/tree_of_attacks.py: hard-remove error_score_map plumbing (added since the last release, so no public callers exist). TAP now relies on the unified scorer defaults to produce 0.0 on blocked responses. * Tests + docs updated to match. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

PyRIT renders docstrings as MyST markdown via jupyter-book. The class docstrings for TrueFalseScorer and FloatScaleScorer used RST-only constructs that wouldn't render correctly: * Section headers ('Default error / blocked behavior' + '----' underline) → bold heading * :class:'~pyrit.X.Y.Z' Sphinx cross-references → plain code-span class names Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Move from-imports out of test bodies to module top, per review feedback. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Refresh tap_attack.ipynb outputs against the unified blocked-scoring path. The new "The request was blocked by the target; returning 0.0." rationale now appears in live TAP runs against the strict-filter target, replacing the old error_score_map message. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The kwarg was introduced after the last release, so no client code can still be passing it. The two guard tests only exercised Python's standard "unknown kwarg → TypeError" behavior; the parameter's absence is already guaranteed by the __init__ signatures. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…eFalseScorer Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

behnam-o

I like the unification by introducing the idea of fallback score, and handling in the superclass. I just think the fallback score, by definition, should always have a value, and it can't be optional, or None. But I might be missing nuances, so not blocking.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…rer hierarchy Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…rter fallback semantics Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

romanlutz · 2026-05-22T12:45:18Z

Self-correction: reverted the TrueFalseInverterScorer change from commit 5d0f5e7. On further review, the original passthrough-and-invert behavior is correct. When the inner scorer's fallback fires False ("attack did not succeed"), the inverter correctly inverts that to True under the inverter's flipped semantic ("attack failed" or "no compliance detected"), which is the right answer for blocked messages. Forcing the inverter's own fallback instead would have flipped this to a misleading False.

The ConversationScorer fix from the same commit is unaffected and still in place.

Conflict in doc/code/executor/attack/tap_attack.ipynb resolved by taking main's version (incorporates the new pyrit.output printer module). The notebook can be re-executed in a follow-up to refresh cached rationale strings with the new blocked-fallback messages. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

romanlutz and others added 5 commits May 18, 2026 16:50

STYLE: hoist inline imports in score tests

cdc282b

Move from-imports out of test bodies to module top, per review feedback. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

romanlutz changed the title ~~[BREAKING] REFACTOR: unify error/blocked response scoring across scorers~~ REFACTOR: unify error/blocked response scoring across scorers May 21, 2026

hannahwestra25 self-assigned this May 21, 2026

hannahwestra25 reviewed May 21, 2026

View reviewed changes

Comment thread pyrit/score/float_scale/float_scale_scorer.py Outdated

hannahwestra25 reviewed May 21, 2026

View reviewed changes

Comment thread pyrit/score/float_scale/float_scale_scorer.py

hannahwestra25 reviewed May 21, 2026

View reviewed changes

Comment thread pyrit/score/float_scale/float_scale_scorer.py Outdated

hannahwestra25 approved these changes May 21, 2026

View reviewed changes

rlundeen2 self-assigned this May 21, 2026

rlundeen2 reviewed May 21, 2026

View reviewed changes

Comment thread pyrit/score/float_scale/float_scale_scorer.py

rlundeen2 approved these changes May 21, 2026

View reviewed changes

jsong468 reviewed May 21, 2026

View reviewed changes

Comment thread pyrit/score/float_scale/float_scale_scorer.py Outdated

jsong468 reviewed May 21, 2026

View reviewed changes

Comment thread pyrit/score/float_scale/float_scale_scorer.py Outdated

jsong468 reviewed May 21, 2026

View reviewed changes

Comment thread pyrit/score/float_scale/float_scale_scorer.py

jsong468 reviewed May 21, 2026

View reviewed changes

Comment thread pyrit/score/float_scale/float_scale_scorer.py Outdated

jsong468 approved these changes May 21, 2026

View reviewed changes

jsong468 self-assigned this May 21, 2026

romanlutz and others added 3 commits May 21, 2026 11:39

DOC: clarify FloatScaleScorer blocked-fallback docstring

ced931a

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

REFACTOR: extract blocked-fallback helper in FloatScaleScorer and Tru…

aad26dc

…eFalseScorer Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

REFACTOR: hoist blocked-fallback hook into base Scorer

a79e21b

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

behnam-o approved these changes May 21, 2026

View reviewed changes

Comment thread pyrit/score/scorer.py Outdated

romanlutz and others added 5 commits May 21, 2026 14:37

DOC: spell out blocked-fallback conditions in Score rationale

40bd2c7

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

REFACTOR: inline Score construction in TrueFalseScorer aggregator return

08e4827

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

REFACTOR: make Scorer._build_fallback_score abstract and document sco…

48f2415

…rer hierarchy Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

FIX: preserve full-conversation scoring on blocked turns and fix inve…

5d0f5e7

…rter fallback semantics Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

REVERT: restore TrueFalseInverterScorer passthrough-and-invert behavior

ac26486

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

romanlutz and others added 2 commits May 22, 2026 11:23

DOC: re-execute TAP notebook against live targets post-merge

c3e0a9a

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

jsong468 reviewed May 22, 2026

View reviewed changes

Comment thread pyrit/score/conversation_scorer.py

jsong468 reviewed May 22, 2026

View reviewed changes

Comment thread pyrit/score/float_scale/float_scale_scorer.py Outdated

REFACTOR: emit per-category fallback scores in AzureContentFilterScorer

137ef1a

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

romanlutz added this pull request to the merge queue May 22, 2026

Merged via the queue into microsoft:main with commit 3eb316c May 22, 2026
48 checks passed

romanlutz deleted the romanlutz/unify-error-scoring branch May 22, 2026 22:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REFACTOR: unify error/blocked response scoring across scorers#1770

REFACTOR: unify error/blocked response scoring across scorers#1770
romanlutz merged 16 commits into
microsoft:mainfrom
romanlutz:romanlutz/unify-error-scoring

romanlutz commented May 21, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

behnam-o left a comment

Uh oh!

Uh oh!

romanlutz commented May 22, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

romanlutz commented May 21, 2026

Description

Tests and Documentation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

behnam-o left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

romanlutz commented May 22, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants