Skip to content

Perception reproducer: simulated neighbor-history mode + populated neighbor_ids sidecar#164

Merged
danielsanchezaran merged 14 commits into
tier4-mainfrom
feat/reproducer-sim-neighbor-history
Jun 24, 2026
Merged

Perception reproducer: simulated neighbor-history mode + populated neighbor_ids sidecar#164
danielsanchezaran merged 14 commits into
tier4-mainfrom
feat/reproducer-sim-neighbor-history

Conversation

@danielsanchezaran

@danielsanchezaran danielsanchezaran commented Jun 23, 2026

Copy link
Copy Markdown

Summary

Adds a simulated neighbor-history mode to the perception reproducer (closed-loop collision mining) and makes the C++ converter populate neighbor_ids in the per-frame JSON sidecar so the reproducer can key neighbors by track UUID.

Why

The reproducer drives ego closed-loop in a perfect tracker while replaying recorded neighbors via a position-keyed cursor. In the original recorded mode, each cursor frame copies that frame's own 31-step neighbor history verbatim — so a cursor-frozen moving agent still reads its recorded velocity while held in place, producing phantom collisions. Sim mode rebuilds neighbor history from the actually-shown motion.

Changes

scenario_generation/ (reproducer):

  • --neighbor_history_mode {recorded,sim} (recorded = unchanged default).
  • SimNeighborTracker: follows each neighbor by track UUID, advances a continuous recorded-time cursor toward the position-keyed target (one timeline sample per 0.1 s, interpolating between anchors), and rebuilds neighbor_agents_past from the rolling shown motion — velocity is the finite difference of shown positions, so a frozen neighbor reads ~0 and a moving one reads its true speed. Step 0 is seeded from recorded history, then phases out.
  • Saved windows get sim-rebuilt 4-col neighbor_agents_future (UUID-matched, slot-aligned) and closed-loop turn indicators (model-predicted, decoded via the shared simulate.decode_turn_indicator), not recorded GT.
  • Episode-based saving: one window per distinct collision (clear → contact rising edge with a different colliding UUID), with t0-clean and ego-moved gates.
  • Perf: bounded LRU npz cache, single-key neighbor-track loads, and a vectorized per-step neighbor-context build (numerically identical to the prior per-slot loop).

cpp_tools/ (converter sidecar):

  • frame_processor now passes the resolved neighbor_result.neighbor_ids (slot-aligned to neighbor_agents_past) to save_frame_json — previously it fell back to the empty default, so sidecars carried no UUIDs. Default argument removed to fail loud; test updated.

Testing

  • The simulated-neighbor mining path was run at corpus scale (the full collision-mining campaign) on an earlier revision of these changes. The rebase onto tier4-main and every subsequent review fix (round-1/2 + Copilot) were then verified by closed-loop smoke mining runs in both sim and recorded modes plus the unit tests (test_reproducer_sim_neighbor, test_reproducer_collision_scope, ...). A full corpus re-mine was not repeated on the final revision.
  • C++ change built + unit-tested under the pilot-auto.x2 autoware underlay: data_converter and test_frame_writer build clean and the BuildFrameJsonTest gtest (now asserting the populated neighbor_ids) passes. (The package-level colcon build also builds benchmark_tool/inference_tool, which fail against my local underlay due to pre-existing single-step-inference API drift — unrelated to the converter/sidecar path this PR touches.)

Add neighbor_history_mode={recorded,sim} to the perception reproducer.
Recorded mode (default, unchanged) copies each cursor frame's own 31-step
neighbor history verbatim, so a cursor-frozen moving agent still reads its
recorded velocity while held in place -> phantom collisions.

sim mode (SimNeighborTracker) follows each neighbor by track UUID, advances a
continuous recorded-time cursor toward the position-keyed target (one timeline
sample per 0.1s step, interpolating between anchors), and rebuilds
neighbor_agents_past from the rolling shown motion: velocity is the finite
difference of shown positions, so a frozen neighbor reads ~0 and a moving one
its true speed. Step 0 is seeded from the recorded history. Threaded through
run_segment / run_segments_batched (CPU and gpu_transform paths) and exposed as
--neighbor_history_mode on the miner CLI.

Also: cpp frame writer now passes the resolved neighbor_ids at the call site
(was defaulting to empty), with the default removed to fail loud; test updated.
…oop turn signals, episode-based saving

- SimNeighborTracker: project the live ego onto the recorded ego path so rec_t advances
  sub-frame (interpolated smooth motion instead of freeze-then-jump when the live ego is
  slower than the logged ego). build() now returns slot_uuids + per-step shown world poses.
- neighbor_agents_future in saved windows is rebuilt from the simulation's realized shown
  neighbor motion (UUID-matched, slot-aligned) instead of the recorded GT future, so the
  target is consistent with the (sim) past.
- turn_indicators fed back closed-loop: the model's predicted indicator (decoded via the
  shared simulate.decode_turn_indicator, C++ keep-bias) is appended each step; the recorded
  seed phases out within PAST steps. No recorded driver signal leaks into the saved context.
- Save logic is episode-based: one window per DISTINCT collision (different colliding UUID +
  a clear gap between contacts), each in its own per-episode dir; trigger at first contact;
  t0-clean (window starts clear) and ego-moved (ego drove the approach) gates.
- Unstick teleport resets the neighbor tracker, turn_hist, and collision episode state.
- Factor decode_turn_indicator out of _predict_batch in simulate.py (reused by both the
  perfect-tracker sim and the reproducer).
…spam

- RouteTimeline._npz_cache was unbounded — it kept every decompressed frame, so a route
  loaded its WHOLE self into RAM (~27 GB on an 18k-frame route, since the sim track-build
  scans all frames). That capped mining to one route at a time and OOM'd any cross-route
  batch. Make it a bounded LRU (768 frames); the rollout only touches a sliding window and
  the track-build is sequential, so per-route RAM drops to ~1 GB and many routes can batch
  to fill the GPU.
- Episode save: pre-check the clearance array before the (expensive) _dump — only attempt
  when the window's start frame is clear. In a sustained contact the lookback is still in
  contact (t0-clean would drop it), so this skips a per-step _dump (npz-load + OBB) that was
  starving the GPU; a later clean window in the same episode is still caught when the prior
  contact scrolls out of the lookback.
…mpress ~10x)

The sim-mode neighbor track-build scans every frame of every route but only needs
neighbor_agents_past[:, -1]; it was going through RouteTimeline.npz() which decompresses the
full model-input set (lanes/routes/polygons/...). Add RouteTimeline.neighbor_last(idx) that
loads just that one key (np.load is lazy per-key) — timeline_load_npz was 1384s/date (the
single biggest non-GPU cost), dominated by this scan.
… one transform)

Replace the per-slot Python loop + per-slot matmul with a single stacked (M,PAST,3) world->ego
transform. Verified bit-for-bit identical to the per-slot output (max diff 0.0). Cuts a chunk of
input_build (the per-step neighbor-context build).
…rn gate, collider UUID)

- route_timeline: guard the bounded LRU npz cache with a lock. The build threads
  (pool.map(_pre_step)) and the background prefetch threads share one timeline, and
  OrderedDict move_to_end/popitem are not mutually atomic -> a get()+touch could race an
  evicting popitem (KeyError / corrupted link order). Decompress stays outside the lock.
- reproducer: seed turn_hist only in sim mode, so recorded mode keeps the recorded
  turn_indicators unchanged (no closed-loop feedback there). Add _feed_turn_indicator and
  call it from the single-segment paths (run/render/extract) so they evolve the sim turn
  signal like the batched loop instead of holding the seed.
- score_step_batched now also returns the OBB-closest collider slot; episode dedup keys the
  colliding UUID off it rather than the centroid-nearest slot 0 (they can differ for
  long/rotated boxes).
- skip the turn-indicator decode (a GPU->CPU sync) when no active segment carries turn_hist.
- fix a stale comment about the gpu _pre_step payload tuple length.
… cleanup, future fill)

- Recorded mode keeps ONE saved window per segment: with no track UUIDs the per-episode
  distinct-collision dedupe can't apply, so fall back to the pre-sim-mode one-save latch
  (wires the previously-dead `saved_collision` field). Sim mode keeps per-episode saving
  keyed on the colliding UUID.
- Clear this segment's stale `_tc#####` episode dirs at setup, so a re-mine that lands
  collisions at different steps doesn't leave superseded scenes for the extract step to ingest.
- Forward-fill sim neighbor_agents_future across a momentary shown-future gap (perception
  drop): an interior [0,0,0,0] would read as an invalid/padding slot embedded in valid motion.
  Hold the last shown pose across interior gaps, back-fill a leading gap; trailing zeros (gone) stay.
- score_step_batched: move clearance/signed-distance vectors to host ONCE, then reduce per
  segment in numpy (min/argmin/any) instead of a per-segment GPU->CPU sync. Bit-identical.
- Drop the dead _feed_turn_indicator calls in render_segment/extract_collision_scenes (both
  recorded-only -> turn_hist always None -> no-op); keep it in the sim-capable run_segment.
…-proof cleanup)

Use Path.iterdir() + name.startswith(prefix) instead of glob() so a glob metacharacter in
a route key can't mis-match the per-segment _tc##### dir cleanup. (Route keys are bag
timestamp stems today, so this is defensive; verified it removes only the target segment's
stale dirs and leaves other segments'/routes' intact.)

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the closed-loop perception reproducer by adding a simulated neighbor-history mode (to prevent “phantom” collisions caused by replaying recorded neighbor velocities while holding agents fixed), and updates the C++ converter sidecar writing path so per-frame JSON includes slot-aligned neighbor_ids for UUID-based neighbor association across frames.

Changes:

  • Added a shared decode_turn_indicator() helper and used it to decode model turn-indicator logits consistently across simulation/reproducer paths.
  • Implemented neighbor_history_mode={recorded,sim} in the reproducer rollout, including SimNeighborTracker to rebuild neighbor_agents_past (and sim-derived neighbor_agents_future when saving) from shown motion keyed by neighbor UUID.
  • Updated the C++ sidecar writing pipeline to pass through neighbor_result.neighbor_ids and removed default args so missing IDs fail loudly; updated unit test accordingly.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
scenario_generation/simulate.py Adds decode_turn_indicator() and routes turn-indicator decoding through it.
scenario_generation/route_timeline.py Introduces a bounded, lock-guarded LRU NPZ cache and adds neighbor_last() for faster track building.
scenario_generation/reproducer_rollout.py Adds sim neighbor-history mode, sim-derived saving behavior, per-episode collision saving logic, and batched scoring enhancements.
rlvr/autoresearch/tools/mine_collisions_reproducer.py Exposes --neighbor_history_mode flag and forwards it into batched rollout.
cpp_tools/src/autoware_diffusion_planner_tools/test/test_frame_writer.cpp Updates tests for the non-defaulted neighbor_ids argument and verifies JSON contains the field.
cpp_tools/src/autoware_diffusion_planner_tools/src/processing/frame_processor.cpp Passes neighbor_result.neighbor_ids into save_frame_json().
cpp_tools/src/autoware_diffusion_planner_tools/include/io/frame_writer.hpp Removes default parameters for neighbor IDs to ensure call sites must pass the vector explicitly.
Comments suppressed due to low confidence (1)

scenario_generation/route_timeline.py:258

  • prefetch() reads _npz_cache without taking _cache_lock (i not in self._npz_cache). In this PR _npz_cache is an OrderedDict that is mutated concurrently (move_to_end/popitem), and the comment above says every cache access should be guarded. Take the lock around the membership test (best-effort skip) to avoid races and keep the concurrency story consistent.
        n = len(self.npz_paths)
        for i in indices:
            if 0 <= i < n and i not in self._npz_cache:
                self.npz(i)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1345 to +1347
for stale in Path(save_dir).iterdir():
if stale.is_dir() and stale.name.startswith(stale_prefix):
shutil.rmtree(stale, ignore_errors=True)
Comment on lines 579 to +590
@torch.no_grad()
def decode_turn_indicator(ti_logit, keep_bias: float = 0.25):
"""Model ``turn_indicator_logit`` -> class id(s), C++ ``TurnIndicatorManager`` style.

Subtracts ``keep_bias`` (0.25 = the cpp planner's value) from the KEEP (class 4) logit
before argmax, then returns the argmax over the last dim as a numpy array. Shared by
``_predict_batch`` (perfect-tracker sim) and the perception reproducer so both decode the
closed-loop turn signal identically. Classes: 0 NONE / 1 DISABLE / 2 LEFT / 3 RIGHT / 4 KEEP."""
biased = ti_logit.clone()
if keep_bias != 0.0 and biased.shape[-1] > 4:
biased[..., 4] -= keep_bias
return biased.argmax(dim=-1).cpu().numpy()
Comment on lines +950 to +975
class SimNeighborTracker:
"""Build the model's neighbor context from the SIMULATED (shown) neighbor motion.

Recorded mode copies each cursor frame's own 31-step history verbatim, so a
cursor-frozen car still reads its recorded velocity (e.g. a moving car held in
place because the ego crept still shows ~11 m/s while it visibly never moves —
input and replay disagree, producing phantom collisions).

This tracker follows each neighbor by track UUID, advances a continuous
recorded-time cursor ``rec_t`` toward the position-keyed cursor's target frame
(capped at ``max_rec_advance`` array-indices per 0.1 s sim step → interpolates
between recorded anchors), and keeps a rolling per-sim-step world history per
UUID. ``neighbor_agents_past`` is rebuilt from that shown history each step:
velocity is the finite difference of the shown positions, so a frozen neighbor
reads v approx 0 (a static obstacle) and a moving one its true speed. Step 0 is
seeded from the recorded history so the first frame equals the original context.
"""

def __init__(self, tl: RouteTimeline, start: int, max_rec_advance: float = 1.0):
self.tl = tl
self.interp, self.attrs, self.span = _route_nbr_tracks(tl)
if not self.interp:
raise ValueError(
"SimNeighborTracker: no neighbor tracks (sidecar neighbor_ids empty). Reconvert "
"the corpus with populated neighbor_ids, or use neighbor_history_mode=recorded."
)
…or/turn-decode tests)

- Guard the stale-episode-dir cleanup: save_dir may not exist on the first run (window dirs
  are created lazily only after gating), so Path(save_dir).iterdir() would raise
  FileNotFoundError. Skip the scan when the dir is absent.
- Add tests for decode_turn_indicator (KEEP-bias flips the argmax; scalar + batched shapes).
- Add a SimNeighborTracker test: a neighbor held at a constant shown position reads v~0 even
  with a large recorded velocity column (the phantom-collision sim mode fixes), while a
  neighbor whose shown position advances reads a real non-zero velocity; UUID slot association
  verified.
@danielsanchezaran

Copy link
Copy Markdown
Author

Thanks — addressed all three in cb3f24a:

  1. save_dir may not exist on first run (real bug): the stale-episode-dir cleanup now guards with save_root.is_dir() before iterdir(), so a fresh --save_dir no longer raises FileNotFoundError.
  2. decode_turn_indicator tests: added test_decode_turn_indicator_keep_bias (KEEP wins raw argmax but loses after the 0.25 bias) and test_decode_turn_indicator_batched_shape (scalar + batched).
  3. SimNeighborTracker invariants: added test_sim_neighbor_velocity_from_shown_motion — a synthetic 2-track timeline where a neighbor held at a constant shown position reads v≈0 despite a recorded vx of 11 m/s (the phantom-collision case), while a moving neighbor reads a real non-zero velocity; UUID→slot association checked.

New tests in scenario_generation/tests/test_reproducer_sim_neighbor.py (3 pass; 14 existing reproducer tests still green).

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

scenario_generation/route_timeline.py:258

  • prefetch() reads self._npz_cache without holding _cache_lock, even though the cache is now an OrderedDict with documented concurrent mutations (eviction/touch). This undermines the lock’s guarantee and can lead to inconsistent behavior under concurrency. Consider taking the lock for the membership check (or just calling self.npz(i) and letting it re-check under the lock).
        n = len(self.npz_paths)
        for i in indices:
            if 0 <= i < n and i not in self._npz_cache:
                self.npz(i)

Comment on lines +914 to +915
u = ids[slot]
wx = pose[0] + row[0] * c - row[1] * s
…k prefetch cache read)

- _build_nbr_world_tracks / _build_neighbor_interp: skip slots whose sidecar neighbor_id is
  blank, so a non-zero neighbor row with an empty UUID can't collapse multiple agents into one
  bogus "" track and corrupt interpolation.
- RouteTimeline.prefetch: drop the lock-free `i not in self._npz_cache` read and delegate
  straight to npz(), which does the cached-check + insert + eviction under _cache_lock. Removes
  the last unguarded cache access.
@danielsanchezaran

Copy link
Copy Markdown
Author

Addressed the re-review in 3568321, and built/tested the C++ change properly this time:

  • Blank neighbor UUIDs: _build_nbr_world_tracks and _build_neighbor_interp now skip slots whose sidecar neighbor_id is empty, so a non-zero row with a blank UUID can't merge multiple agents into one bogus track.
  • prefetch cache read: removed the lock-free in self._npz_cache check — it now delegates to npz(), which does the cached-check/insert/eviction under _cache_lock. No unguarded cache access remains.

C++ verification (built under the pilot-auto.x2 underlay): data_converter + test_frame_writer build clean; BuildFrameJsonTest (asserting the populated neighbor_ids) passes. The package-level build also compiles benchmark_tool/inference_tool, which fail against my local underlay due to pre-existing single-step-inference API drift — unrelated to this PR's converter/sidecar change.

… without re-writing npz

Runs the IDENTICAL conversion pipeline (sequence build, neighbor association, skip decision)
but writes only the per-frame JSON sidecars (pose + neighbor_ids + is_skipped), skipping the
npz tensor serialization entirely. Lets you (re)generate neighbor_ids sidecars for an existing
npz set produced by this same converter: same code path -> byte-identical slot ordering and skip
decision, so the sidecars align to those npzs by token; and it is faster than a full re-convert
because no npz is written.

Verified on a real bag (151-frame accepted sequence): the 151 per-frame sidecars are byte-
identical to a full convert (neighbor_ids + is_skipped + pose), zero npz written, faster wall
time. Unit test covers the default (off). Off by default; a normal convert already emits sidecars.
print_options now reports 'Sidecar only: {}' alongside 'Write skipped npz' so a --sidecar_only
run confirms the no-npz mode is active (review nit; observability only).

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.

Comment on lines +927 to +929
for u, lst in raw.items():
if len(lst) < 2:
continue
…view)

_build_nbr_world_tracks dropped tracks seen in only one recorded frame (len(lst) < 2), so a
neighbor present only at/near the segment start was lost from sim mode — contradicting the
"step 0 reproduces the recorded context" intent and differing from _build_neighbor_interp.
Keep 1-sample tracks as a constant (v~0); _interp_pose already clamps a single anchor. Adds a
regression test (single-frame neighbor is retained with span (k,k)).
@danielsanchezaran

Copy link
Copy Markdown
Author

Status on this review round's comments:

  • reproducer_rollout.py single-frame tracks (the new comment): fixed in the latest commit — _build_nbr_world_tracks now keeps 1-sample tracks as a constant (v≈0), matching _build_neighbor_interp and the "step 0 = recorded context" intent. Regression test added.
  • simulate.py:590 decode_turn_indicator tests / reproducer_rollout.py SimNeighborTracker tests: already added in cb3f24a — see scenario_generation/tests/test_reproducer_sim_neighbor.py (test_decode_turn_indicator_keep_bias, test_decode_turn_indicator_batched_shape, test_sim_neighbor_velocity_from_shown_motion).
  • reproducer_rollout.py empty neighbor UUIDs: already fixed in 3568321 (if not u: continue in both _build_nbr_world_tracks and _build_neighbor_interp).

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.

Comment on lines 334 to +338
def score_step_batched(
neighbors_list: list[np.ndarray],
ego_shapes: list[np.ndarray],
device: str,
) -> list[tuple[float, bool, int]]:
) -> list[tuple[float, bool, int, int]]:
…or coverage canonical

- test_reproducer_collision_scope: assert score_step returns a 3-tuple and score_step_batched a
  4-tuple (collider_slot appended), documenting that the extra return is intentional and that no
  caller breaks (the only positional unpacker — the batched rollout loop — was updated; this test
  compares the first three by index).
- test_reproducer_sim_neighbor: import decode_turn_indicator from its canonical module
  scenario_generation.simulate so the KEEP-bias/shape tests unambiguously cover that implementation.
@danielsanchezaran

Copy link
Copy Markdown
Author

On the latest review:

  • score_step_batched 4-tuple "breaking change": no caller breaks. The only positional unpacker — the batched rollout loop in reproducer_rollout.py — was updated to (cl, col, _M, collider_slot); the cited test_reproducer_collision_scope.py compares the first three by index (a[2]==b[2]), so the appended collider_slot is ignored and the test passes (6/6). I've made this explicit (assert 3-tuple vs 4-tuple) so the intent is unambiguous.
  • decode_turn_indicator / SimNeighborTracker test coverage: these tests already exist in scenario_generation/tests/test_reproducer_sim_neighbor.py (test_decode_turn_indicator_keep_bias, test_decode_turn_indicator_batched_shape, test_sim_neighbor_velocity_from_shown_motion, test_sim_neighbor_keeps_single_frame_track). I've moved the decode_turn_indicator import to its canonical scenario_generation.simulate module so the coverage is unambiguous.

@danielsanchezaran danielsanchezaran merged commit e493ac6 into tier4-main Jun 24, 2026
1 check passed
@danielsanchezaran danielsanchezaran deleted the feat/reproducer-sim-neighbor-history branch June 24, 2026 00:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants