Skip to content

support checkpoint_no_save_replay_buffer#1899

Merged
hhaAndroid merged 5 commits into
InternLM:mainfrom
hhaAndroid:fix_replaybuffer_save
Jun 10, 2026
Merged

support checkpoint_no_save_replay_buffer#1899
hhaAndroid merged 5 commits into
InternLM:mainfrom
hhaAndroid:fix_replaybuffer_save

Conversation

@hhaAndroid

Copy link
Copy Markdown
Collaborator

No description provided.

@hhaAndroid

Copy link
Copy Markdown
Collaborator Author

@claude review

Comment thread xtuner/v1/rl/agent_loop_manager/agent_loop_manager.py
@@ -979,12 +1007,17 @@ async def resume(self, checkpoint_path: Path | str) -> int:
checkpoint_path = Path(checkpoint_path)
for task in self.task_runners:
task.sampler.resume(self._task_checkpoint_path(checkpoint_path, task.task_name))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Backward compatibility — old checkpoints (saved before this PR) won't have the replay_buffer_saved key. The get(..., True) default handles this correctly, so existing checkpoints will always attempt replay buffer restore. Good.

Comment thread tests/rl/fast/pr_fast/test_agent_loop_manager_checkpoint.py Outdated
@claude

claude Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Claude: Code review completed for PR #1899 (support checkpoint_no_save_replay_buffer).

Summary

This PR adds a checkpoint_no_save_replay_buffer option that skips persisting the replay buffer during checkpointing. When enabled, the checkpoint saves a modified progress state that rolls back producer/target tracking to the last consumed position, so on resume the producer starts fresh with an empty buffer. The PR also adds two example RL configs and fixes CI test mocks for _save_eval_trajectories and _release_trace_store.

Issues

Warning

  • agent_loop_manager.py:943-957: The progress state rollback logic in _progress_state_without_replay_buffer sets target_samples = consumed_samples, which means the deficit is 0 at resume. This is likely correct (the trainer's continue_produce() will advance targets), but the resume invariant is non-obvious and worth documenting with a brief comment.

Nit

ProduceBatchResult impact

produced_samples, produced_tokens, raw_rewards_sum, raw_rewards_count are zeroed in the saved state when no_save_replay_buffer=True. On resume, the first training step will re-produce from scratch. No impact on batch status or timing fields beyond what the empty-buffer restart implies.

RoutedExperts impact

Not affected — no changes to routed-experts ownership, object-ref plumbing, or rollout response handling.

Verdict

APPROVE — the core logic is sound, the backward-compat default is correct, and the test covers the main path. The one suggestion is to add a comment explaining the resume invariant.

task_names = list(consumed_samples)
next_consumer_step = int(progress_state["next_consumer_step"])

progress_state["producer_future_step"] = next_consumer_step

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, rollback producer_future_step and related states to next_consumer_step if drop replay_buffer

@hhaAndroid hhaAndroid merged commit 213c353 into InternLM:main Jun 10, 2026
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants