REFACTOR: unify error/blocked response scoring across scorers#1770
Conversation
Push consistent blocked / error handling into the scorer base classes so that: * TrueFalseScorer returns Score(False) when no supported pieces remain * FloatScaleScorer returns Score(0.0) when no supported pieces remain Semantic overrides (e.g., SelfAskRefusalScorer returning True on blocked) stay in the subclass and continue to work as before. Changes: * pyrit/score/float_scale/float_scale_scorer.py: add _score_async override mirroring TrueFalseScorer's no-pieces fallback. Returns Score(0.0) with a rationale distinguishing blocked / error / filtered cases. * pyrit/score/true_false/self_ask_category_scorer.py: restrict default validator to text-only so blocked pieces are filtered out (instead of sent to the LLM as garbage). * pyrit/score/conversation_scorer.py: relax default validator (enforce_all_pieces_valid=False) so a blocked input message no longer raises ValueError before the conversation can even be looked up. * pyrit/executor/attack/multi_turn/tree_of_attacks.py: hard-remove error_score_map plumbing (added since the last release, so no public callers exist). TAP now relies on the unified scorer defaults to produce 0.0 on blocked responses. * Tests + docs updated to match. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PyRIT renders docstrings as MyST markdown via jupyter-book. The class docstrings for TrueFalseScorer and FloatScaleScorer used RST-only constructs that wouldn't render correctly:
* Section headers ('Default error / blocked behavior' + '----' underline) → bold heading
* :class:'~pyrit.X.Y.Z' Sphinx cross-references → plain code-span class names
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move from-imports out of test bodies to module top, per review feedback. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Refresh tap_attack.ipynb outputs against the unified blocked-scoring path. The new "The request was blocked by the target; returning 0.0." rationale now appears in live TAP runs against the strict-filter target, replacing the old error_score_map message. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The kwarg was introduced after the last release, so no client code can still be passing it. The two guard tests only exercised Python's standard "unknown kwarg → TypeError" behavior; the parameter's absence is already guaranteed by the __init__ signatures. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…eFalseScorer Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
behnam-o
left a comment
There was a problem hiding this comment.
I like the unification by introducing the idea of fallback score, and handling in the superclass. I just think the fallback score, by definition, should always have a value, and it can't be optional, or None. But I might be missing nuances, so not blocking.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rer hierarchy Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rter fallback semantics Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Self-correction: reverted the TrueFalseInverterScorer change from commit 5d0f5e7. On further review, the original passthrough-and-invert behavior is correct. When the inner scorer's fallback fires The ConversationScorer fix from the same commit is unaffected and still in place. |
Conflict in doc/code/executor/attack/tap_attack.ipynb resolved by taking main's version (incorporates the new pyrit.output printer module). The notebook can be re-executed in a follow-up to refresh cached rationale strings with the new blocked-fallback messages. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Description
Blocked (content-filtered) responses were being handled in three independent places that all converged on
0.0/Falseby accident: theTrueFalseScorerno-pieces fallback, theFloatScaleScoreAggregatorempty-list fallback, and TAP'serror_score_map. Two scorers also had broken edge cases —SelfAskCategoryScorerwould send error content to the LLM (likely raisingInvalidJsonExceptionor hallucinating a category), andConversationScorerwould crash withValueErroron any blocked turn.This PR pushes blocked handling into the scorer base classes so the "correct" default behavior is what every scorer gets for free:
TrueFalseScorerdefaults toFalseon blocked input (existing fallback, now load-bearing).FloatScaleScorerdefaults to0.0on blocked input (existing aggregator fallback, now load-bearing and documented).SelfAskCategoryScorer's validator now rejectserrorpieces so it gets the sameFalsedefault instead of asking the LLM about garbage.ConversationScorerno longer rejectserrorpieces wholesale, so blocked turns mid-conversation are scored normally.error_score_mapis removed — it was added after the last release, was redundant with the new defaults, and only confused things further.SelfAskRefusalScorerstill inverts True/False semantics on blocked content (refusal detected →True); that's intentional and the only "weird" default a user must still wrap inTrueFalseInverterScorerwhen using it as an objective scorer.Tests and Documentation
test_float_scale_threshold_scorer.py,test_scorer.py,test_self_ask_category.py,test_conversation_history_scorer.py, and the existing TAP suite.doc/code/executor/attack/tap_attack.{py,ipynb}) re-executed end-to-end against the real Azure OpenAI GPT-4o strict-filter target and OpenAI image target. The new"The request was blocked by the target; returning 0.0."rationale now appears in the tree output, confirming the refactor fires in a live run.FloatScaleScorerandTrueFalseScorerupdated and converted from RST to MyST markdown so they render correctly in the jupyter-book docs site.Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com