FEAT: Round Robin Target#1761
Conversation
| intersected = _intersect_capabilities([t.capabilities for t in targets]) | ||
|
|
||
| super().__init__( | ||
| custom_configuration=TargetConfiguration(capabilities=intersected), |
There was a problem hiding this comment.
also need to handle normalization for each target ? if targets have different normalization, we may be able to adapt and then the capability is supported.
There was a problem hiding this comment.
I think as best as we can, in the context of round robinning, we should enforce that the targets share the same behavior (e.g., adapt vs raise for certain capabilities) or else it doesn't make much sense to put wrap them in this target. That said, I think having flexibility and allowing a user to override using custom_configuration like with other targets (and trusting their judgment) makes the most sense. Let me know if that makes sense to you!
There was a problem hiding this comment.
sure that makes sense esp for a first pass and we can re-assess later but we still need to compare the configurations. If one target has a policy of RAISE and the other ADAPT, the capability could still be the same (so also should rename the intersection function to intersect_configuration)
| Returns: | ||
| PromptTarget: The next inner target. | ||
| """ | ||
| idx = self._rotation[self._counter % len(self._rotation)] |
There was a problem hiding this comment.
i think we need some logic if one of the targets fail
There was a problem hiding this comment.
I think leaving retry logic to the inner targets makes sense and is simplest. (Retries would occur at inner target _send_prompt_to_target_async before propagating any response to the round robin target level). And any legitimate response errors would be constructed as a Message and saved properly to memory as with any other target and used in scoring/determining whether to continue the attack.
There was a problem hiding this comment.
it's more like if the target itself fails — an endpoint goes down or exhausts its rate limit, the inner target will exhaust retries and raise--but then we keep handing it jobs. Agreed that inner target retries are the right layer for transient errors. Since the round-robin counter advances before the call, subsequent caller-level retries may land on the same broken target again. could just have a simple fallback to the next target in the rotation on exception — the inner target tries its best, and if it truly can't recover, the round-robin tries a different endpoint instead of propagating the failure
There was a problem hiding this comment.
Good point, I'm introducing logic in _send_prompt_to_target_async where normal retries on the inner target still occur, but if retries are exhausted, other targets are tried in the order of the rotation rather than immediately throwing the exception.
Note that the target can still be tried later on in the normal rotation as requests are sent (think scoring batch of prompts for example), and I think that is expected behavior. Figuring out whether an endpoint is actually permanently down (i.e., unhealthy state and never try it again) and removing it from rotation I imagine is beyond the scope of this PR. But let me know if you think otherwise.
Ideally, a user should remove an endpoint if it is permanently down.
|
copilot noted that, "OpenAI prefix caching can give 50%+ cost reduction on long conversations. Switching targets every turn means every target pays full price for the entire conversation prefix on every turn. For a Crescendo attack with 5+ turns across thousands of objectives, you could be doubling your API cost compared to pinning conversations to targets." so at the very least its something to document but also potentially want to give users the option to pin a conversation to a given target so if you have multiple conversations and multiple targets you assign the conversation to the target OR you do truly round robin like this PR sets up |
Good point! I can definitely be more elaborate in the documentation, but I think 1) this is something that should ultimately be up to the user and what they want to trade off (higher cost vs. hitting rate limits). 2) Giving users the option to pin a conversation to a specific target couples conversation state management and target prompt sending functionality (simply receiving a normalized conversation and adding a message) and was something we tried to avoid here by requiring editable history requirement. If a user just wanted to run an elaborate attack against one target (one conversation, one target) they could just use that target directly instead. Configuring a different endpoint per attack in something like a scenario seems like something we could do later on at the scenario or AttackExecutor level, not at the target level. (It also wouldn't be difficult for a user currently to set up a loop that executes attacks alternating between targets. We show a somewhat similar examples in our notebooks for looping through objectives for attacks, and a user could just loop through targets as well on each new iteration.) |
| """ | ||
| Validate that all inner targets have the same behavioral parameters. | ||
|
|
||
| Checks the params that affect model output quality (underlying_model_name, |
There was a problem hiding this comment.
Do certain classes of targets have params that matter more than others ? I'm thinking rather than having this list of behavioral parameters that extends to all targets, what if we had a base set of behavioral params that apply to all targets and then allow targets to add parameters. wdyt?
There was a problem hiding this comment.
Maybe this is something we can consider in the future, but the point of this method is solely for attack and scoring eval hashing. I think it would be okay to leave the rest up to user rather than enforcing exact identical parameters per target, which can be a bit tricky to figure out.
| missing), the fallback key's value from the component's raw params | ||
| is used instead. This keeps fallback logic in the eval layer without | ||
| changing full component hashes. ``None`` means no fallbacks. | ||
| * ``unwrap_child`` — if set, and the child being processed has a |
There was a problem hiding this comment.
unwrap_child sounds like a boolean to me and this is kinda functioning like one but I think we should be more specific that it's a name (maybe like wrapper_type_name or something like that). also maybe map to an enum rather than a string.
There was a problem hiding this comment.
agreed! for the enum suggestion, if we have others to unwrap, we can do that then, I think it might be overdoing it for now.
Round Robin Target
Description and design decisions
New
RoundRobinTargetclass (pyrit/prompt_target/round_robin_target.py): aPromptTargetthat wraps multiple inner targets and distributes requests across them using weighted round-robin selection. Intended for load-balancing across multiple deployments of the same model (e.g., Azure OpenAI endpoints in different regions).Per-call distribution, not per-conversation: requests are distributed on every call to
_send_prompt_to_target_async, not pinned to a conversation. This is safe because PyRIT's conversation history is managed at theconversation_idlevel in shared memory — not by the target itself. When any inner target handles a request, the base class_get_normalized_conversation_asyncfetches the full conversation from memory byconversation_id, appends the current message, and passes the complete history to the inner target. The inner target never needs to "remember" prior turns; it receives them in full every time. This architecture means switching inner targets mid-conversation has no effect on correctness.Requires multi-turn + editable history: all inner targets must support
supports_multi_turnandsupports_editable_history. This is enforced at construction using the existingCHAT_TARGET_REQUIREMENTSvalidation infrastructure. These capabilities guarantee the target rebuilds its state from the provided conversation rather than relying on server-side state.Same concrete class required: all inner targets must be the same Python class (e.g., all
OpenAIChatTarget). This prevents mixing fundamentally different target types that happen to share the same interface.Behavioral parameter consistency: inner targets must have matching
underlying_model_name(withmodel_namefallback),temperature, andtop_p. This ensures scoring results are comparable across targets. The validation uses the same (newly introduced) constants (TARGET_BEHAVIORAL_PARAMS,TARGET_BEHAVIORAL_PARAM_FALLBACKS) as the eval hash computation, so they cannot drift.Capability intersection: the round-robin's capabilities are the intersection (lower bound) of all inner targets' capabilities. Boolean capability flags are AND-ed; modality frozensets are intersected. If the intersection of input or output modalities is empty, construction fails.
Optional integer weights:
weights=[2, 1]expands into a rotation list[0, 0, 1]that cycles, sending roughly 2x traffic to the first target. Default is equal weight.Memory entries use the round-robin's identifier: the
prompt_target_identifieron request and response pieces is theRoundRobinTarget's ownComponentIdentifier. This keeps memory entries consistent — a single conversation shows one identifier throughout. The hash of the inner target that actually handled each request is recorded inprompt_metadata["inner_target_identifier"]for traceability.Eval hash unwrap mechanism (
pyrit/identifiers/evaluation_identifier.py): addedunwrap_childfield toChildEvalRule. When set, the eval hash computation "sees through" wrapper targets by substituting the first inner child before applying param filtering. This ensuresscorer(round_robin([t1_east, t1_west]))produces the same eval hash asscorer(t1_east), making scoring results comparable regardless of whether a round-robin was used. Applied toScorerEvaluationIdentifier(prompt_targetchild) andAtomicAttackEvaluationIdentifier(objective_targetchild).Why round-robin identifier on memory entries but unwrap in eval hash: these serve different purposes and operate at different layers. The
prompt_target_identifieron memory entries answers "what component was responsible for this request?" which is theRoundRobinTarget, since that's what the caller passed to the normalizer or scorer. Stamping inner target identifiers would create inconsistency within a single conversation (different turns showing different identifiers) and would require overriding_get_normalized_conversation_asyncto mutate message pieces, adding complexity for no functional gain. The inner target that actually handled each request is still traceable viaprompt_metadata["inner_target_identifier"]. The eval hash, by contrast, answers a completely different question: "are these two scorer configurations behaviorally equivalent for grouping evaluation results?" For that purpose, what matters isn't the wrapper but rather the underlying model, temperature, and top_p. The unwrap mechanism lives entirely in the eval hash computation layer and doesn't touch memory entries, identifiers, or runtime behavior. Keeping these two concerns separate means the memory layer stays simple (no hook overrides, no mutation) while the eval layer correctly groups results regardless of whether a round-robin was used.Prompt caching trade-off: switching targets mid-conversation defeats provider-side prompt prefix caching. For multi-turn attacks like Crescendo with 5+ turns across thousands of objectives, this can significantly increase API cost compared to pinning each conversation to a single target. This is a throughput vs. cost trade-off: round-robin avoids per-endpoint rate limits at the expense of caching efficiency. Users who need cache-efficient multi-turn conversations should assign individual targets at the attack or scenario level rather than using round-robin for those workloads. Conversation-to-target pinning was intentionally not added at the target level because it would couple conversation management with pure prompt sending — a responsibility that belongs to a higher level. A user who wants one target per conversation can simply pass the target directly to the attack without a round-robin.
Concurrency safety: the only shared mutable state is
self._counter(the rotation index), which is only mutated in the synchronous_next_target()method. Under Python's asyncio cooperative concurrency model, this is safe — no two coroutines can interleave within a synchronous method. Crucially, because the target is selected synchronously (as a local variable) before theawaitcall to_send_prompt_to_target_async, even if another coroutine advances_counterwhile the first is waiting on the network call, the already-selected target reference cannot be affected. Not safe for multi-threaded use, consistent with the rest of PyRIT's target classes.Minimal override surface: only
_send_prompt_to_target_asyncand_build_identifierare overridden. No override of_get_normalized_conversation_asyncorset_system_prompt— the base class handles both correctly since all memory operations are keyed byconversation_idand stamped withself.get_identifier().Tests and Documentation
Unit tests (
tests/unit/prompt_target/test_round_robin_target.py): 24 tests covering:_send_prompt_to_target_asyncdelegates to correct inner target, recordsinner_target_identifierin metadata, round-robins across callsset_system_prompt: uses round-robin identifier (verified via memory lookup)send_prompt_asyncflow keeps round-robin identifier on entriesunderlying_model_name, rejects mismatchedtemperature, accepts matching params with different endpoints, usesmodel_namefallbackEval hash unwrap tests (
tests/unit/identifiers/test_evaluation_identifier.py): 3 tests added:test_unwrap_substitutes_first_inner_child: verifies the unwrap produces the same hash as the direct targettest_unwrap_no_op_when_child_has_no_matching_subchild: verifies non-wrapper targets are unaffectedtest_scorer_eval_hash_matches_with_and_without_round_robin: end-to-endScorerEvaluationIdentifierequivalenceDocumentation notebook (
doc/code/targets/round_robin_target.ipynbandround_robin_target.py): 5 sections demonstrating:PromptSendingAttackNext Step