feat(train): add gradient statistics by rej55 · Pull Request #163 · tier4/Diffusion-Planner

rej55 · 2026-06-22T07:05:12Z

Summary

This PR adds per-step gradient statistics to the supervised training loop so that vanishing and exploding gradients can be observed during training. The statistics are logged to Weights & Biases through the existing logging path with no additional configuration.

What's added

A new helper compute_grad_stats() in utils/train_utils.py computes the following over the concatenation of all parameter gradients (the global gradient vector):

$L_1$ norm (grad/l1_norm)
$L_2$ norm (grad/l2_norm)
$L_\infty$ norm (grad/linf_norm, max absolute value)
mean (grad/mean)
standard deviation (grad/std)

How it works

In train_epoch.py, the statistics are computed right after loss.backward() and before clip_grad_norm_(), so that exploding gradients are not masked by gradient clipping:

loss["loss"].backward()

# Gradient statistics (computed before clipping so that exploding
# gradients are not masked by clip_grad_norm_).
loss.update(compute_grad_stats(model.parameters()))

nn.utils.clip_grad_norm_(model.parameters(), 5)

The values are merged into the per-batch loss dict, so they automatically flow through the existing pipeline: get_epoch_mean_loss() averages them per epoch, and they are logged to wandb as train_loss/grad/* via the existing train_loss/{k} logging in train_predictor.py.

Notes

No new dependencies and no API/config changes; existing wandb dashboards pick up the new metrics automatically.
The GRPO training path (grpo_epoch.py) is intentionally left unchanged because it uses a keyed all-reduce across ranks that requires a consistent metric-key set; it can be addressed separately if needed.

Testing

Verified that train_epoch.py and train_utils.py parse without errors.

Signed-off-by: Fumiya Watanabe <fumiya.watanabe.44@gmail.com>

go-sakayori

Confirmed it works

feat(train): add gradient statistics

7d9707e

Signed-off-by: Fumiya Watanabe <fumiya.watanabe.44@gmail.com>

go-sakayori approved these changes Jun 23, 2026

View reviewed changes

rej55 merged commit b58c65b into tier4-main Jun 23, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(train): add gradient statistics#163

feat(train): add gradient statistics#163
rej55 merged 1 commit into
tier4-mainfrom
feat/calc_grad_stats

rej55 commented Jun 22, 2026

Uh oh!

go-sakayori left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

rej55 commented Jun 22, 2026

Summary

What's added

How it works

Notes

Testing

Uh oh!

go-sakayori left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants