feat(validation): log reward subscores in validate_model by go-sakayori · Pull Request #152 · tier4/Diffusion-Planner

go-sakayori · 2026-06-18T06:08:40Z

Log reward subscores in `validate_model`

Surfaces the planner's rule-based reward subscores as additive validation metrics, so best-model selection can see the same physical quantities the training reward is built from. avg_loss_* and best-model selection (still position_lat_loss) are unchanged.

What changed — `diffusion_planner/diffusion_planner/validate_model.py`

_reward_subscores_per_scene helper — compute_subscores_batch is single-scene / N-trajectory (its map + neighbor terms use one scene's tensors), so it loops over the B scenes (one ego prediction each) and stacks the continuous subscores. It reuses the same metre ego-frame tensors the existing road-border / neighbor metrics already use (prediction[:, 0], the masked neighbors_future, denorm_inputs[...]).
Logged as ego_subscore_{safety,ttc,progress,comfort,centerline,red_light,feasibility}, picked up by train.py's mean_ego_loss as valid_loss/ego_subscore_*.

Why this is correct

The validation dataset feeds the same npz the reward consumes (dataset.__getitem__ just np.loads it), so the subscores' tensor/channel conventions match — the same reason compute_road_border_penalty already runs in this loop.
The subscores are the exact same code the reward uses, guarded by the existing golden + parity tests.

Tests

test_validation_subscores (new) pins the per-scene batched helper against a direct single-scene compute_subscores_batch call (slicing correctness + finite values).
Golden + parity + full reward suite green (141 passed).

Caveats

The subscore values should be sanity-checked on a real validation run (effective from the next run only — in-flight training is unaffected).
The per-scene loop adds B compute_subscores_batch calls per batch to validation (per-epoch, not per-step); easy to gate behind a config flag if it proves too slow.

Implements the validation-logging part of #130: wire compute_subscores_batch into the base-SFT validation loop so best-model selection can see the same rule-based subscores the training reward is built from. Additive only — avg_loss_* and best-model selection (still position_lat_loss) are unchanged. - _reward_subscores_per_scene helper: compute_subscores_batch is single-scene / N-trajectory, so it loops over the B scenes (one ego prediction each) and stacks the continuous subscores. Uses the same metre ego-frame tensors the existing rb / neighbor metrics use (prediction[:,0], neighbors_future, denorm_inputs[...]). - Logged as ego_subscore_{safety,ttc,progress,comfort,centerline,red_light, feasibility} so train.py's mean_ego_loss picks them up as valid_loss/ego_subscore_*. Tested: test_validation_subscores pins the per-scene batched helper against a direct single-scene compute_subscores_batch call; golden + parity + reward suite green. Subscore values should be sanity-checked on a real validation run.

Copilot

Pull request overview

Adds logging of the rule-based reward’s component subscores during validate_model, so validation metrics expose the same physical quantities used by the training reward (without changing existing avg_loss_* reporting or best-model selection criteria).

Changes:

Added _reward_subscores_per_scene helper in validate_model.py to call planner_metrics.compute_subscores_batch per scene and stack requested subscores.
Logged subscores as additive validation metrics (ego_subscore_{...}) collected via the existing total_result_dict pathway.
Added a unit test that checks the per-scene helper matches direct compute_subscores_batch calls and produces finite outputs.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
diffusion_planner/diffusion_planner/validate_model.py	Computes + logs reward subscores per validation scene as additional `ego_subscore_*` metrics.
diffusion_planner/tests/test_validation_subscores.py	New unit tests validating helper slicing/stacking correctness and finite outputs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Adds planner_metrics.epdms_like.epdms_like_aggregate: a single [0,1] EPDMS-structured proxy score = (product of binary gates NC/DAC/TLC/kinematic) x (weighted average of [0,1] quality terms ttc/progress/comfort/lane). It flips and bounds the raw reward subscores (0 = best, penalties negative) into a score where 1 = perfect. Wired into validate_model.py as log-only metrics (ego_subscore_epdms_like plus gate_*/q_* components); checkpoint selection stays on L2 (mean_ego_loss). EPDMS keys are only emitted when gt_progress is passed, so the existing subscore-logging test is unaffected. This is an EPDMS-structured proxy, NOT a faithful NAVSIM EPDMS port (the faithful one lives in OnePlanner, see #142): it scores raw predicted waypoints (no controller rollout), omits DDC, has no false-positive filtering, and uses mean|jerk| for comfort. Thresholds/weights in EPDMSLikeConfig still need calibrating so GT trajectories score ~1.0. Includes unit tests in test_epdms_like_aggregate.py.

Signed-off-by: Go Sakayori <gsakayori@gmail.com>

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

go-sakayori · 2026-06-23T09:49:40Z

+        data_batched = {
+            "ego_shape": inputs["ego_shape"],
+            "neighbor_agents_future": neighbors_future,
+            "neighbor_agents_past": denorm_inputs["neighbor_agents_past"],
+        }


Fixed in 98f73b2

go-sakayori · 2026-06-23T09:49:50Z

+    w_sum = cfg.w_ttc + cfg.w_progress + cfg.w_comfort + cfg.w_lane
+    quality = (
+        cfg.w_ttc * ttc_q
+        + cfg.w_progress * progress_q
+        + cfg.w_comfort * comfort_q
+        + cfg.w_lane * lane_q
+    ) / w_sum
+


Fixed in 98f73b2

- validate_model.py: include ego_agent_future in data_batched so compute_progress_score_batch falls back to the goal-directed GT endpoint instead of the predicted path length when goal_pose is absent/far (>100 m). Keeps the progress / q_progress term honest. - epdms_like.py: guard against zero-sum quality weights (raise instead of producing NaN), and add the Apache 2.0 / TIER IV license header. - test_epdms_like_aggregate.py, test_validation_subscores.py: add the license header to match repo convention.

…egate The EPDMS-like component keys were duplicated between this constant and the components dict returned by epdms_like_aggregate, a sync hazard. Accumulate whatever keys the aggregate returns instead, making epdms_like_aggregate the single source of truth. _VAL_SUBSCORE_KEYS stays -- it is a genuine allowlist filtering the ~20-key compute_subscores_batch output.

go-sakayori changed the title ~~feat(validation): log reward subscores in validate_model (PR3 for #130)~~ feat(validation): log reward subscores in validate_model Jun 18, 2026

go-sakayori force-pushed the feat/validation-subscore-logging branch from 8c29ab8 to 98934df Compare June 18, 2026 11:19

go-sakayori requested a review from Copilot June 19, 2026 12:54

Copilot started reviewing on behalf of go-sakayori June 19, 2026 12:55 View session

Copilot AI reviewed Jun 19, 2026

View reviewed changes

Comment thread diffusion_planner/tests/test_validation_subscores.py

go-sakayori requested a review from Copilot June 23, 2026 09:32

Copilot started reviewing on behalf of go-sakayori June 23, 2026 09:33 View session

fix pre-comiit

716360c

Signed-off-by: Go Sakayori <gsakayori@gmail.com>

Copilot AI reviewed Jun 23, 2026

View reviewed changes

go-sakayori added 2 commits June 23, 2026 18:48

go-sakayori requested a review from SakodaShintaro June 23, 2026 14:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(validation): log reward subscores in validate_model#152

feat(validation): log reward subscores in validate_model#152
go-sakayori wants to merge 5 commits into
tier4-mainfrom
feat/validation-subscore-logging

go-sakayori commented Jun 18, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

go-sakayori Jun 23, 2026

Uh oh!

go-sakayori Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

go-sakayori commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Log reward subscores in validate_model

What changed — diffusion_planner/diffusion_planner/validate_model.py

Why this is correct

Tests

Caveats

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

go-sakayori Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

go-sakayori Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

go-sakayori commented Jun 18, 2026 •

edited

Loading

Log reward subscores in `validate_model`

What changed — `diffusion_planner/diffusion_planner/validate_model.py`