Perception reproducer: simulated neighbor-history mode + populated neighbor_ids sidecar#164
Conversation
Add neighbor_history_mode={recorded,sim} to the perception reproducer.
Recorded mode (default, unchanged) copies each cursor frame's own 31-step
neighbor history verbatim, so a cursor-frozen moving agent still reads its
recorded velocity while held in place -> phantom collisions.
sim mode (SimNeighborTracker) follows each neighbor by track UUID, advances a
continuous recorded-time cursor toward the position-keyed target (one timeline
sample per 0.1s step, interpolating between anchors), and rebuilds
neighbor_agents_past from the rolling shown motion: velocity is the finite
difference of shown positions, so a frozen neighbor reads ~0 and a moving one
its true speed. Step 0 is seeded from the recorded history. Threaded through
run_segment / run_segments_batched (CPU and gpu_transform paths) and exposed as
--neighbor_history_mode on the miner CLI.
Also: cpp frame writer now passes the resolved neighbor_ids at the call site
(was defaulting to empty), with the default removed to fail loud; test updated.
…oop turn signals, episode-based saving - SimNeighborTracker: project the live ego onto the recorded ego path so rec_t advances sub-frame (interpolated smooth motion instead of freeze-then-jump when the live ego is slower than the logged ego). build() now returns slot_uuids + per-step shown world poses. - neighbor_agents_future in saved windows is rebuilt from the simulation's realized shown neighbor motion (UUID-matched, slot-aligned) instead of the recorded GT future, so the target is consistent with the (sim) past. - turn_indicators fed back closed-loop: the model's predicted indicator (decoded via the shared simulate.decode_turn_indicator, C++ keep-bias) is appended each step; the recorded seed phases out within PAST steps. No recorded driver signal leaks into the saved context. - Save logic is episode-based: one window per DISTINCT collision (different colliding UUID + a clear gap between contacts), each in its own per-episode dir; trigger at first contact; t0-clean (window starts clear) and ego-moved (ego drove the approach) gates. - Unstick teleport resets the neighbor tracker, turn_hist, and collision episode state. - Factor decode_turn_indicator out of _predict_batch in simulate.py (reused by both the perfect-tracker sim and the reproducer).
…spam - RouteTimeline._npz_cache was unbounded — it kept every decompressed frame, so a route loaded its WHOLE self into RAM (~27 GB on an 18k-frame route, since the sim track-build scans all frames). That capped mining to one route at a time and OOM'd any cross-route batch. Make it a bounded LRU (768 frames); the rollout only touches a sliding window and the track-build is sequential, so per-route RAM drops to ~1 GB and many routes can batch to fill the GPU. - Episode save: pre-check the clearance array before the (expensive) _dump — only attempt when the window's start frame is clear. In a sustained contact the lookback is still in contact (t0-clean would drop it), so this skips a per-step _dump (npz-load + OBB) that was starving the GPU; a later clean window in the same episode is still caught when the prior contact scrolls out of the lookback.
…mpress ~10x) The sim-mode neighbor track-build scans every frame of every route but only needs neighbor_agents_past[:, -1]; it was going through RouteTimeline.npz() which decompresses the full model-input set (lanes/routes/polygons/...). Add RouteTimeline.neighbor_last(idx) that loads just that one key (np.load is lazy per-key) — timeline_load_npz was 1384s/date (the single biggest non-GPU cost), dominated by this scan.
… one transform) Replace the per-slot Python loop + per-slot matmul with a single stacked (M,PAST,3) world->ego transform. Verified bit-for-bit identical to the per-slot output (max diff 0.0). Cuts a chunk of input_build (the per-step neighbor-context build).
…rn gate, collider UUID) - route_timeline: guard the bounded LRU npz cache with a lock. The build threads (pool.map(_pre_step)) and the background prefetch threads share one timeline, and OrderedDict move_to_end/popitem are not mutually atomic -> a get()+touch could race an evicting popitem (KeyError / corrupted link order). Decompress stays outside the lock. - reproducer: seed turn_hist only in sim mode, so recorded mode keeps the recorded turn_indicators unchanged (no closed-loop feedback there). Add _feed_turn_indicator and call it from the single-segment paths (run/render/extract) so they evolve the sim turn signal like the batched loop instead of holding the seed. - score_step_batched now also returns the OBB-closest collider slot; episode dedup keys the colliding UUID off it rather than the centroid-nearest slot 0 (they can differ for long/rotated boxes). - skip the turn-indicator decode (a GPU->CPU sync) when no active segment carries turn_hist. - fix a stale comment about the gpu _pre_step payload tuple length.
… cleanup, future fill) - Recorded mode keeps ONE saved window per segment: with no track UUIDs the per-episode distinct-collision dedupe can't apply, so fall back to the pre-sim-mode one-save latch (wires the previously-dead `saved_collision` field). Sim mode keeps per-episode saving keyed on the colliding UUID. - Clear this segment's stale `_tc#####` episode dirs at setup, so a re-mine that lands collisions at different steps doesn't leave superseded scenes for the extract step to ingest. - Forward-fill sim neighbor_agents_future across a momentary shown-future gap (perception drop): an interior [0,0,0,0] would read as an invalid/padding slot embedded in valid motion. Hold the last shown pose across interior gaps, back-fill a leading gap; trailing zeros (gone) stay. - score_step_batched: move clearance/signed-distance vectors to host ONCE, then reduce per segment in numpy (min/argmin/any) instead of a per-segment GPU->CPU sync. Bit-identical. - Drop the dead _feed_turn_indicator calls in render_segment/extract_collision_scenes (both recorded-only -> turn_hist always None -> no-op); keep it in the sim-capable run_segment.
…-proof cleanup) Use Path.iterdir() + name.startswith(prefix) instead of glob() so a glob metacharacter in a route key can't mis-match the per-segment _tc##### dir cleanup. (Route keys are bag timestamp stems today, so this is defensive; verified it removes only the target segment's stale dirs and leaves other segments'/routes' intact.)
There was a problem hiding this comment.
Pull request overview
This PR improves the closed-loop perception reproducer by adding a simulated neighbor-history mode (to prevent “phantom” collisions caused by replaying recorded neighbor velocities while holding agents fixed), and updates the C++ converter sidecar writing path so per-frame JSON includes slot-aligned neighbor_ids for UUID-based neighbor association across frames.
Changes:
- Added a shared
decode_turn_indicator()helper and used it to decode model turn-indicator logits consistently across simulation/reproducer paths. - Implemented
neighbor_history_mode={recorded,sim}in the reproducer rollout, includingSimNeighborTrackerto rebuildneighbor_agents_past(and sim-derivedneighbor_agents_futurewhen saving) from shown motion keyed by neighbor UUID. - Updated the C++ sidecar writing pipeline to pass through
neighbor_result.neighbor_idsand removed default args so missing IDs fail loudly; updated unit test accordingly.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| scenario_generation/simulate.py | Adds decode_turn_indicator() and routes turn-indicator decoding through it. |
| scenario_generation/route_timeline.py | Introduces a bounded, lock-guarded LRU NPZ cache and adds neighbor_last() for faster track building. |
| scenario_generation/reproducer_rollout.py | Adds sim neighbor-history mode, sim-derived saving behavior, per-episode collision saving logic, and batched scoring enhancements. |
| rlvr/autoresearch/tools/mine_collisions_reproducer.py | Exposes --neighbor_history_mode flag and forwards it into batched rollout. |
| cpp_tools/src/autoware_diffusion_planner_tools/test/test_frame_writer.cpp | Updates tests for the non-defaulted neighbor_ids argument and verifies JSON contains the field. |
| cpp_tools/src/autoware_diffusion_planner_tools/src/processing/frame_processor.cpp | Passes neighbor_result.neighbor_ids into save_frame_json(). |
| cpp_tools/src/autoware_diffusion_planner_tools/include/io/frame_writer.hpp | Removes default parameters for neighbor IDs to ensure call sites must pass the vector explicitly. |
Comments suppressed due to low confidence (1)
scenario_generation/route_timeline.py:258
prefetch()reads_npz_cachewithout taking_cache_lock(i not in self._npz_cache). In this PR_npz_cacheis anOrderedDictthat is mutated concurrently (move_to_end/popitem), and the comment above says every cache access should be guarded. Take the lock around the membership test (best-effort skip) to avoid races and keep the concurrency story consistent.
n = len(self.npz_paths)
for i in indices:
if 0 <= i < n and i not in self._npz_cache:
self.npz(i)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for stale in Path(save_dir).iterdir(): | ||
| if stale.is_dir() and stale.name.startswith(stale_prefix): | ||
| shutil.rmtree(stale, ignore_errors=True) |
| @torch.no_grad() | ||
| def decode_turn_indicator(ti_logit, keep_bias: float = 0.25): | ||
| """Model ``turn_indicator_logit`` -> class id(s), C++ ``TurnIndicatorManager`` style. | ||
|
|
||
| Subtracts ``keep_bias`` (0.25 = the cpp planner's value) from the KEEP (class 4) logit | ||
| before argmax, then returns the argmax over the last dim as a numpy array. Shared by | ||
| ``_predict_batch`` (perfect-tracker sim) and the perception reproducer so both decode the | ||
| closed-loop turn signal identically. Classes: 0 NONE / 1 DISABLE / 2 LEFT / 3 RIGHT / 4 KEEP.""" | ||
| biased = ti_logit.clone() | ||
| if keep_bias != 0.0 and biased.shape[-1] > 4: | ||
| biased[..., 4] -= keep_bias | ||
| return biased.argmax(dim=-1).cpu().numpy() |
| class SimNeighborTracker: | ||
| """Build the model's neighbor context from the SIMULATED (shown) neighbor motion. | ||
|
|
||
| Recorded mode copies each cursor frame's own 31-step history verbatim, so a | ||
| cursor-frozen car still reads its recorded velocity (e.g. a moving car held in | ||
| place because the ego crept still shows ~11 m/s while it visibly never moves — | ||
| input and replay disagree, producing phantom collisions). | ||
|
|
||
| This tracker follows each neighbor by track UUID, advances a continuous | ||
| recorded-time cursor ``rec_t`` toward the position-keyed cursor's target frame | ||
| (capped at ``max_rec_advance`` array-indices per 0.1 s sim step → interpolates | ||
| between recorded anchors), and keeps a rolling per-sim-step world history per | ||
| UUID. ``neighbor_agents_past`` is rebuilt from that shown history each step: | ||
| velocity is the finite difference of the shown positions, so a frozen neighbor | ||
| reads v approx 0 (a static obstacle) and a moving one its true speed. Step 0 is | ||
| seeded from the recorded history so the first frame equals the original context. | ||
| """ | ||
|
|
||
| def __init__(self, tl: RouteTimeline, start: int, max_rec_advance: float = 1.0): | ||
| self.tl = tl | ||
| self.interp, self.attrs, self.span = _route_nbr_tracks(tl) | ||
| if not self.interp: | ||
| raise ValueError( | ||
| "SimNeighborTracker: no neighbor tracks (sidecar neighbor_ids empty). Reconvert " | ||
| "the corpus with populated neighbor_ids, or use neighbor_history_mode=recorded." | ||
| ) |
…or/turn-decode tests) - Guard the stale-episode-dir cleanup: save_dir may not exist on the first run (window dirs are created lazily only after gating), so Path(save_dir).iterdir() would raise FileNotFoundError. Skip the scan when the dir is absent. - Add tests for decode_turn_indicator (KEEP-bias flips the argmax; scalar + batched shapes). - Add a SimNeighborTracker test: a neighbor held at a constant shown position reads v~0 even with a large recorded velocity column (the phantom-collision sim mode fixes), while a neighbor whose shown position advances reads a real non-zero velocity; UUID slot association verified.
|
Thanks — addressed all three in
New tests in |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
scenario_generation/route_timeline.py:258
prefetch()readsself._npz_cachewithout holding_cache_lock, even though the cache is now anOrderedDictwith documented concurrent mutations (eviction/touch). This undermines the lock’s guarantee and can lead to inconsistent behavior under concurrency. Consider taking the lock for the membership check (or just callingself.npz(i)and letting it re-check under the lock).
n = len(self.npz_paths)
for i in indices:
if 0 <= i < n and i not in self._npz_cache:
self.npz(i)
| u = ids[slot] | ||
| wx = pose[0] + row[0] * c - row[1] * s |
…k prefetch cache read) - _build_nbr_world_tracks / _build_neighbor_interp: skip slots whose sidecar neighbor_id is blank, so a non-zero neighbor row with an empty UUID can't collapse multiple agents into one bogus "" track and corrupt interpolation. - RouteTimeline.prefetch: drop the lock-free `i not in self._npz_cache` read and delegate straight to npz(), which does the cached-check + insert + eviction under _cache_lock. Removes the last unguarded cache access.
|
Addressed the re-review in
C++ verification (built under the pilot-auto.x2 underlay): |
… without re-writing npz Runs the IDENTICAL conversion pipeline (sequence build, neighbor association, skip decision) but writes only the per-frame JSON sidecars (pose + neighbor_ids + is_skipped), skipping the npz tensor serialization entirely. Lets you (re)generate neighbor_ids sidecars for an existing npz set produced by this same converter: same code path -> byte-identical slot ordering and skip decision, so the sidecars align to those npzs by token; and it is faster than a full re-convert because no npz is written. Verified on a real bag (151-frame accepted sequence): the 151 per-frame sidecars are byte- identical to a full convert (neighbor_ids + is_skipped + pose), zero npz written, faster wall time. Unit test covers the default (off). Off by default; a normal convert already emits sidecars.
print_options now reports 'Sidecar only: {}' alongside 'Write skipped npz' so a --sidecar_only
run confirms the no-npz mode is active (review nit; observability only).
| for u, lst in raw.items(): | ||
| if len(lst) < 2: | ||
| continue |
…view) _build_nbr_world_tracks dropped tracks seen in only one recorded frame (len(lst) < 2), so a neighbor present only at/near the segment start was lost from sim mode — contradicting the "step 0 reproduces the recorded context" intent and differing from _build_neighbor_interp. Keep 1-sample tracks as a constant (v~0); _interp_pose already clamps a single anchor. Adds a regression test (single-frame neighbor is retained with span (k,k)).
|
Status on this review round's comments:
|
| def score_step_batched( | ||
| neighbors_list: list[np.ndarray], | ||
| ego_shapes: list[np.ndarray], | ||
| device: str, | ||
| ) -> list[tuple[float, bool, int]]: | ||
| ) -> list[tuple[float, bool, int, int]]: |
…or coverage canonical - test_reproducer_collision_scope: assert score_step returns a 3-tuple and score_step_batched a 4-tuple (collider_slot appended), documenting that the extra return is intentional and that no caller breaks (the only positional unpacker — the batched rollout loop — was updated; this test compares the first three by index). - test_reproducer_sim_neighbor: import decode_turn_indicator from its canonical module scenario_generation.simulate so the KEEP-bias/shape tests unambiguously cover that implementation.
|
On the latest review:
|
Summary
Adds a simulated neighbor-history mode to the perception reproducer (closed-loop collision mining) and makes the C++ converter populate
neighbor_idsin the per-frame JSON sidecar so the reproducer can key neighbors by track UUID.Why
The reproducer drives ego closed-loop in a perfect tracker while replaying recorded neighbors via a position-keyed cursor. In the original recorded mode, each cursor frame copies that frame's own 31-step neighbor history verbatim — so a cursor-frozen moving agent still reads its recorded velocity while held in place, producing phantom collisions. Sim mode rebuilds neighbor history from the actually-shown motion.
Changes
scenario_generation/(reproducer):--neighbor_history_mode {recorded,sim}(recorded = unchanged default).SimNeighborTracker: follows each neighbor by track UUID, advances a continuous recorded-time cursor toward the position-keyed target (one timeline sample per 0.1 s, interpolating between anchors), and rebuildsneighbor_agents_pastfrom the rolling shown motion — velocity is the finite difference of shown positions, so a frozen neighbor reads ~0 and a moving one reads its true speed. Step 0 is seeded from recorded history, then phases out.neighbor_agents_future(UUID-matched, slot-aligned) and closed-loop turn indicators (model-predicted, decoded via the sharedsimulate.decode_turn_indicator), not recorded GT.cpp_tools/(converter sidecar):frame_processornow passes the resolvedneighbor_result.neighbor_ids(slot-aligned toneighbor_agents_past) tosave_frame_json— previously it fell back to the empty default, so sidecars carried no UUIDs. Default argument removed to fail loud; test updated.Testing
tier4-mainand every subsequent review fix (round-1/2 + Copilot) were then verified by closed-loop smoke mining runs in bothsimandrecordedmodes plus the unit tests (test_reproducer_sim_neighbor,test_reproducer_collision_scope, ...). A full corpus re-mine was not repeated on the final revision.data_converterandtest_frame_writerbuild clean and theBuildFrameJsonTestgtest (now asserting the populatedneighbor_ids) passes. (The package-levelcolcon buildalso buildsbenchmark_tool/inference_tool, which fail against my local underlay due to pre-existing single-step-inference API drift — unrelated to the converter/sidecar path this PR touches.)