Skip to content

FEAT: Round Robin Target#1761

Open
jsong468 wants to merge 4 commits into
microsoft:mainfrom
jsong468:round_robin
Open

FEAT: Round Robin Target#1761
jsong468 wants to merge 4 commits into
microsoft:mainfrom
jsong468:round_robin

Conversation

@jsong468
Copy link
Copy Markdown
Contributor

@jsong468 jsong468 commented May 19, 2026

Round Robin Target

Description and design decisions

  • New RoundRobinTarget class (pyrit/prompt_target/round_robin_target.py): a PromptTarget that wraps multiple inner targets and distributes requests across them using weighted round-robin selection. Intended for load-balancing across multiple deployments of the same model (e.g., Azure OpenAI endpoints in different regions).

  • Per-call distribution, not per-conversation: requests are distributed on every call to _send_prompt_to_target_async, not pinned to a conversation. This is safe because PyRIT's conversation history is managed at the conversation_id level in shared memory — not by the target itself. When any inner target handles a request, the base class _get_normalized_conversation_async fetches the full conversation from memory by conversation_id, appends the current message, and passes the complete history to the inner target. The inner target never needs to "remember" prior turns; it receives them in full every time. This architecture means switching inner targets mid-conversation has no effect on correctness.

  • Requires multi-turn + editable history: all inner targets must support supports_multi_turn and supports_editable_history. This is enforced at construction using the existing CHAT_TARGET_REQUIREMENTS validation infrastructure. These capabilities guarantee the target rebuilds its state from the provided conversation rather than relying on server-side state.

  • Same concrete class required: all inner targets must be the same Python class (e.g., all OpenAIChatTarget). This prevents mixing fundamentally different target types that happen to share the same interface.

  • Behavioral parameter consistency: inner targets must have matching underlying_model_name (with model_name fallback), temperature, and top_p. This ensures scoring results are comparable across targets. The validation uses the same (newly introduced) constants (TARGET_BEHAVIORAL_PARAMS, TARGET_BEHAVIORAL_PARAM_FALLBACKS) as the eval hash computation, so they cannot drift.

  • Capability intersection: the round-robin's capabilities are the intersection (lower bound) of all inner targets' capabilities. Boolean capability flags are AND-ed; modality frozensets are intersected. If the intersection of input or output modalities is empty, construction fails.

  • Optional integer weights: weights=[2, 1] expands into a rotation list [0, 0, 1] that cycles, sending roughly 2x traffic to the first target. Default is equal weight.

  • Memory entries use the round-robin's identifier: the prompt_target_identifier on request and response pieces is the RoundRobinTarget's own ComponentIdentifier. This keeps memory entries consistent — a single conversation shows one identifier throughout. The hash of the inner target that actually handled each request is recorded in prompt_metadata["inner_target_identifier"] for traceability.

  • Eval hash unwrap mechanism (pyrit/identifiers/evaluation_identifier.py): added unwrap_child field to ChildEvalRule. When set, the eval hash computation "sees through" wrapper targets by substituting the first inner child before applying param filtering. This ensures scorer(round_robin([t1_east, t1_west])) produces the same eval hash as scorer(t1_east), making scoring results comparable regardless of whether a round-robin was used. Applied to ScorerEvaluationIdentifier (prompt_target child) and AtomicAttackEvaluationIdentifier (objective_target child).

  • Why round-robin identifier on memory entries but unwrap in eval hash: these serve different purposes and operate at different layers. The prompt_target_identifier on memory entries answers "what component was responsible for this request?" which is the RoundRobinTarget, since that's what the caller passed to the normalizer or scorer. Stamping inner target identifiers would create inconsistency within a single conversation (different turns showing different identifiers) and would require overriding _get_normalized_conversation_async to mutate message pieces, adding complexity for no functional gain. The inner target that actually handled each request is still traceable via prompt_metadata["inner_target_identifier"]. The eval hash, by contrast, answers a completely different question: "are these two scorer configurations behaviorally equivalent for grouping evaluation results?" For that purpose, what matters isn't the wrapper but rather the underlying model, temperature, and top_p. The unwrap mechanism lives entirely in the eval hash computation layer and doesn't touch memory entries, identifiers, or runtime behavior. Keeping these two concerns separate means the memory layer stays simple (no hook overrides, no mutation) while the eval layer correctly groups results regardless of whether a round-robin was used.

  • Prompt caching trade-off: switching targets mid-conversation defeats provider-side prompt prefix caching. For multi-turn attacks like Crescendo with 5+ turns across thousands of objectives, this can significantly increase API cost compared to pinning each conversation to a single target. This is a throughput vs. cost trade-off: round-robin avoids per-endpoint rate limits at the expense of caching efficiency. Users who need cache-efficient multi-turn conversations should assign individual targets at the attack or scenario level rather than using round-robin for those workloads. Conversation-to-target pinning was intentionally not added at the target level because it would couple conversation management with pure prompt sending — a responsibility that belongs to a higher level. A user who wants one target per conversation can simply pass the target directly to the attack without a round-robin.

  • Concurrency safety: the only shared mutable state is self._counter (the rotation index), which is only mutated in the synchronous _next_target() method. Under Python's asyncio cooperative concurrency model, this is safe — no two coroutines can interleave within a synchronous method. Crucially, because the target is selected synchronously (as a local variable) before the await call to _send_prompt_to_target_async, even if another coroutine advances _counter while the first is waiting on the network call, the already-selected target reference cannot be affected. Not safe for multi-threaded use, consistent with the rest of PyRIT's target classes.

  • Minimal override surface: only _send_prompt_to_target_async and _build_identifier are overridden. No override of _get_normalized_conversation_async or set_system_prompt — the base class handles both correctly since all memory operations are keyed by conversation_id and stamped with self.get_identifier().

Tests and Documentation

  • Unit tests (tests/unit/prompt_target/test_round_robin_target.py): 24 tests covering:

    • Construction validation: rejects < 2 targets, mixed classes, mismatched weights, zero/negative weights
    • Capability intersection: boolean AND, modality intersection, empty modality rejection
    • Capability requirements: rejects targets without multi-turn, rejects targets without editable history
    • Round-robin selection: FIFO rotation, weighted rotation
    • Delegation: _send_prompt_to_target_async delegates to correct inner target, records inner_target_identifier in metadata, round-robins across calls
    • set_system_prompt: uses round-robin identifier (verified via memory lookup)
    • Identifier: includes children and weights
    • End-to-end: full send_prompt_async flow keeps round-robin identifier on entries
    • Behavioral validation: rejects mismatched underlying_model_name, rejects mismatched temperature, accepts matching params with different endpoints, uses model_name fallback
  • Eval hash unwrap tests (tests/unit/identifiers/test_evaluation_identifier.py): 3 tests added:

    • test_unwrap_substitutes_first_inner_child: verifies the unwrap produces the same hash as the direct target
    • test_unwrap_no_op_when_child_has_no_matching_subchild: verifies non-wrapper targets are unaffected
    • test_scorer_eval_hash_matches_with_and_without_round_robin: end-to-end ScorerEvaluationIdentifier equivalence
  • Documentation notebook (doc/code/targets/round_robin_target.ipynb and round_robin_target.py): 5 sections demonstrating:

    • Basic usage with alternation printing showing which target handled each request
    • Weighted distribution with count summary
    • Drop-in usage with PromptSendingAttack
    • Multi-turn attack (Crescendo) with round-robin objective target
    • Batch scoring with round-robin scorer target, printing which scorer target scored each prompt

Next Step

  • Enable RoundRobinTarget selection in the GUI frontend

Comment thread doc/code/targets/round_robin_target.py Outdated
Comment thread pyrit/prompt_target/round_robin_target.py Outdated
intersected = _intersect_capabilities([t.capabilities for t in targets])

super().__init__(
custom_configuration=TargetConfiguration(capabilities=intersected),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also need to handle normalization for each target ? if targets have different normalization, we may be able to adapt and then the capability is supported.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think as best as we can, in the context of round robinning, we should enforce that the targets share the same behavior (e.g., adapt vs raise for certain capabilities) or else it doesn't make much sense to put wrap them in this target. That said, I think having flexibility and allowing a user to override using custom_configuration like with other targets (and trusting their judgment) makes the most sense. Let me know if that makes sense to you!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure that makes sense esp for a first pass and we can re-assess later but we still need to compare the configurations. If one target has a policy of RAISE and the other ADAPT, the capability could still be the same (so also should rename the intersection function to intersect_configuration)

Returns:
PromptTarget: The next inner target.
"""
idx = self._rotation[self._counter % len(self._rotation)]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we need some logic if one of the targets fail

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think leaving retry logic to the inner targets makes sense and is simplest. (Retries would occur at inner target _send_prompt_to_target_async before propagating any response to the round robin target level). And any legitimate response errors would be constructed as a Message and saved properly to memory as with any other target and used in scoring/determining whether to continue the attack.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's more like if the target itself fails — an endpoint goes down or exhausts its rate limit, the inner target will exhaust retries and raise--but then we keep handing it jobs. Agreed that inner target retries are the right layer for transient errors. Since the round-robin counter advances before the call, subsequent caller-level retries may land on the same broken target again. could just have a simple fallback to the next target in the rotation on exception — the inner target tries its best, and if it truly can't recover, the round-robin tries a different endpoint instead of propagating the failure

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'm introducing logic in _send_prompt_to_target_async where normal retries on the inner target still occur, but if retries are exhausted, other targets are tried in the order of the rotation rather than immediately throwing the exception.

Note that the target can still be tried later on in the normal rotation as requests are sent (think scoring batch of prompts for example), and I think that is expected behavior. Figuring out whether an endpoint is actually permanently down (i.e., unhealthy state and never try it again) and removing it from rotation I imagine is beyond the scope of this PR. But let me know if you think otherwise.

Ideally, a user should remove an endpoint if it is permanently down.

Comment thread pyrit/prompt_target/round_robin_target.py
@hannahwestra25
Copy link
Copy Markdown
Contributor

copilot noted that, "OpenAI prefix caching can give 50%+ cost reduction on long conversations. Switching targets every turn means every target pays full price for the entire conversation prefix on every turn. For a Crescendo attack with 5+ turns across thousands of objectives, you could be doubling your API cost compared to pinning conversations to targets." so at the very least its something to document but also potentially want to give users the option to pin a conversation to a given target so if you have multiple conversations and multiple targets you assign the conversation to the target OR you do truly round robin like this PR sets up

@jsong468
Copy link
Copy Markdown
Contributor Author

copilot noted that, "OpenAI prefix caching can give 50%+ cost reduction on long conversations. Switching targets every turn means every target pays full price for the entire conversation prefix on every turn. For a Crescendo attack with 5+ turns across thousands of objectives, you could be doubling your API cost compared to pinning conversations to targets." so at the very least its something to document but also potentially want to give users the option to pin a conversation to a given target so if you have multiple conversations and multiple targets you assign the conversation to the target OR you do truly round robin like this PR sets up

Good point! I can definitely be more elaborate in the documentation, but I think 1) this is something that should ultimately be up to the user and what they want to trade off (higher cost vs. hitting rate limits). 2) Giving users the option to pin a conversation to a specific target couples conversation state management and target prompt sending functionality (simply receiving a normalized conversation and adding a message) and was something we tried to avoid here by requiring editable history requirement.

If a user just wanted to run an elaborate attack against one target (one conversation, one target) they could just use that target directly instead. Configuring a different endpoint per attack in something like a scenario seems like something we could do later on at the scenario or AttackExecutor level, not at the target level. (It also wouldn't be difficult for a user currently to set up a loop that executes attacks alternating between targets. We show a somewhat similar examples in our notebooks for looping through objectives for attacks, and a user could just loop through targets as well on each new iteration.)

Comment thread pyrit/prompt_target/round_robin_target.py Outdated
"""
Validate that all inner targets have the same behavioral parameters.

Checks the params that affect model output quality (underlying_model_name,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do certain classes of targets have params that matter more than others ? I'm thinking rather than having this list of behavioral parameters that extends to all targets, what if we had a base set of behavioral params that apply to all targets and then allow targets to add parameters. wdyt?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is something we can consider in the future, but the point of this method is solely for attack and scoring eval hashing. I think it would be okay to leave the rest up to user rather than enforcing exact identical parameters per target, which can be a bit tricky to figure out.

missing), the fallback key's value from the component's raw params
is used instead. This keeps fallback logic in the eval layer without
changing full component hashes. ``None`` means no fallbacks.
* ``unwrap_child`` — if set, and the child being processed has a
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unwrap_child sounds like a boolean to me and this is kinda functioning like one but I think we should be more specific that it's a name (maybe like wrapper_type_name or something like that). also maybe map to an enum rather than a string.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed! for the enum suggestion, if we have others to unwrap, we can do that then, I think it might be overdoing it for now.

Comment thread pyrit/identifiers/evaluation_identifier.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants