support checkpoint_no_save_replay_buffer by hhaAndroid · Pull Request #1899 · InternLM/xtuner

hhaAndroid · 2026-06-10T04:33:00Z

No description provided.

…am/main

hhaAndroid · 2026-06-10T04:33:21Z

claude · 2026-06-10T04:36:37Z

@@ -979,12 +1007,17 @@ async def resume(self, checkpoint_path: Path | str) -> int:
        checkpoint_path = Path(checkpoint_path)
        for task in self.task_runners:
            task.sampler.resume(self._task_checkpoint_path(checkpoint_path, task.task_name))


Nit: Backward compatibility — old checkpoints (saved before this PR) won't have the replay_buffer_saved key. The get(..., True) default handles this correctly, so existing checkpoints will always attempt replay buffer restore. Good.

claude · 2026-06-10T04:37:13Z

Claude: Code review completed for PR #1899 (support checkpoint_no_save_replay_buffer).

Summary

This PR adds a checkpoint_no_save_replay_buffer option that skips persisting the replay buffer during checkpointing. When enabled, the checkpoint saves a modified progress state that rolls back producer/target tracking to the last consumed position, so on resume the producer starts fresh with an empty buffer. The PR also adds two example RL configs and fixes CI test mocks for _save_eval_trajectories and _release_trace_store.

Issues

Warning

agent_loop_manager.py:943-957: The progress state rollback logic in _progress_state_without_replay_buffer sets target_samples = consumed_samples, which means the deficit is 0 at resume. This is likely correct (the trainer's continue_produce() will advance targets), but the resume invariant is non-obvious and worth documenting with a brief comment.

Nit

agent_loop_manager.py:1009: Backward compat with old checkpoints is handled correctly via get("replay_buffer_saved", True).
test_agent_loop_manager_checkpoint.py:96-100: Test verifies buffer emptiness and model_step but could also assert on the restored next_consumer_step and producer_future_step to confirm progress state rollback.

ProduceBatchResult impact

produced_samples, produced_tokens, raw_rewards_sum, raw_rewards_count are zeroed in the saved state when no_save_replay_buffer=True. On resume, the first training step will re-produce from scratch. No impact on batch status or timing fields beyond what the empty-buffer restart implies.

RoutedExperts impact

Not affected — no changes to routed-experts ownership, object-ref plumbing, or rollout response handling.

Verdict

APPROVE — the core logic is sound, the backward-compat default is correct, and the test covers the main path. The one suggestion is to add a comment explaining the resume invariant.

jayhenry · 2026-06-10T06:53:15Z

+        task_names = list(consumed_samples)
+        next_consumer_step = int(progress_state["next_consumer_step"])
+
+        progress_state["producer_future_step"] = next_consumer_step


ok, rollback producer_future_step and related states to next_consumer_step if drop replay_buffer

hhaAndroid added 4 commits June 8, 2026 09:08

add reason and agentic demo cfg

3a560c4

fix ci

b62384b

Merge branch 'main' of https://github.com/InternLM/xtuner into upstre…

7df9890

…am/main

add checkpoint_no_save_replay_buffer

f678ac4

claude Bot reviewed Jun 10, 2026

View reviewed changes

Comment thread xtuner/v1/rl/agent_loop_manager/agent_loop_manager.py

claude Bot reviewed Jun 10, 2026

View reviewed changes

Comment thread tests/rl/fast/pr_fast/test_agent_loop_manager_checkpoint.py Outdated

jayhenry approved these changes Jun 10, 2026

View reviewed changes

Delete tests/rl/fast/pr_fast/test_agent_loop_manager_checkpoint.py

12d2f46

hhaAndroid merged commit 213c353 into InternLM:main Jun 10, 2026
5 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support checkpoint_no_save_replay_buffer#1899

support checkpoint_no_save_replay_buffer#1899
hhaAndroid merged 5 commits into
InternLM:mainfrom
hhaAndroid:fix_replaybuffer_save

hhaAndroid commented Jun 10, 2026

Uh oh!

hhaAndroid commented Jun 10, 2026

Uh oh!

Uh oh!

claude Bot Jun 10, 2026

Uh oh!

Uh oh!

claude Bot commented Jun 10, 2026

Uh oh!

jayhenry Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hhaAndroid commented Jun 10, 2026

Uh oh!

hhaAndroid commented Jun 10, 2026

Uh oh!

Uh oh!

claude Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

claude Bot commented Jun 10, 2026

Summary

Issues

Warning

Nit

ProduceBatchResult impact

RoutedExperts impact

Verdict

Uh oh!

jayhenry Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants