Skip to content

fix(pu): index sampled target policies by slot#488

Open
puyuan1996 wants to merge 1 commit into
mainfrom
fix/issue-486-sampled-target-policy-index
Open

fix(pu): index sampled target policies by slot#488
puyuan1996 wants to merge 1 commit into
mainfrom
fix/issue-486-sampled-target-policy-index

Conversation

@puyuan1996

Copy link
Copy Markdown
Collaborator

Closes #486

Summary

  • Keep base MuZero target policy assignment indexed by real legal action ids.
  • Override sampled MuZero/EfficientZero target policy assignment to use sampled-action slots instead of sparse environment action ids.
  • Route both non-reanalyzed and sampled reanalyzed target policy construction through the same assignment helper.
  • Add a regression test for sparse legal action ids with SampledMuZeroGameBuffer and SampledEfficientZeroGameBuffer.

Tests

  • /mnt/shared-storage-user/puyuan/lz/bin/python -m pytest lzero/mcts/tests/test_game_buffer.py -q
  • /mnt/shared-storage-user/puyuan/lz/bin/python -m py_compile lzero/mcts/buffer/game_buffer_muzero.py lzero/mcts/buffer/game_buffer_sampled_muzero.py lzero/mcts/buffer/game_buffer_sampled_efficientzero.py lzero/mcts/tests/test_game_buffer.py
  • /mnt/shared-storage-user/puyuan/lz/bin/python -m pytest lzero/model/tests/test_sampled_efficientzero_model.py -q

@puyuan1996 puyuan1996 changed the title fix(buffer): index sampled target policies by slot fix(pu): index sampled target policies by slot Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sampled MuZero / EfficientZero index out of bounds error in _compute_target_policy_non_reanalyzed

1 participant