Skip to content

Pr 44 device map fix#81

Open
ctrl-gaurav wants to merge 4 commits into
mainfrom
pr-44-device-map-fix
Open

Pr 44 device map fix#81
ctrl-gaurav wants to merge 4 commits into
mainfrom
pr-44-device-map-fix

Conversation

@ctrl-gaurav

Copy link
Copy Markdown
Owner

Summary

Make the Transformers fallback engine survive broken device_map="auto" sharding. On some multi-GPU systems, device_map="auto" shards a model across GPUs in a way that produces invalid logits, which then triggers a CUDA device-side assert during sampling and poisons the GPU context for the rest of the process. This change detects that situation and automatically reloads the model pinned to GPU 0 (device_map={"": 0}) before sampling can corrupt CUDA, with a fallback retry path if an assert still slips through.

Merged from PR #44 (Aafiya-H) into pr-44-device-map-fix with the transformers_engine.py conflict resolved — quiet model loading, debug-level logging, and BPE-safe decoding from main were preserved alongside the new device-map recovery logic.

Changes

  • effgen/models/transformers_engine.py
    • Refactor model loading into helpers: _assemble_model_kwargs, _from_pretrained_with_flash_fallback, _load_model_weights, _drop_model_weights (quiet loading + Flash-Attention fallback preserved).
    • Add device_map="auto" recovery: probe logits via a short forward pass (_probe_auto_device_map_logits) and, if they look invalid, reload pinned to GPU 0 before sampling (_ensure_device_map_viable_before_sampling, _apply_cuda_device_map_pin_fallback, _effective_device_map).
    • Detect CUDA device-side asserts (_is_cuda_device_side_assert) and retry generation once after pinning to GPU 0 (_maybe_retry_after_cuda_assert); covers generate, generate_batch, and generate_stream. Raises a clear "restart the process / set CUDA_VISIBLE_DEVICES=0" error when the context is already poisoned.
    • Reset _pin_device_map_for_cuda / _retrying_after_cuda_assert state on unload(), and guard torch.cuda.empty_cache().
  • examples/basic/calculator_agent.py — tear down the math agent in a finally block (agent.close() before model.unload()) so demo/interactive runs exit cleanly without the "Agent was garbage-collected without calling close()" warning, even when generation fails mid-run.
  • CHANGELOG.md — documented the calculator-agent cleanup fix.
  • test.py⚠️ root-level scratch/demo script (loads Qwen2.5-3B and runs one calculation). Recommend removing before merge — it is not part of the test suite and lives outside tests/.

Testing

  • Unit tests pass: pytest tests/unit/ -v --no-cov
  • Integration tests pass (if applicable)
  • CHANGELOG.md updated
  • Manual: multi-GPU run where device_map="auto" previously asserted now falls back to GPU 0 and generates successfully

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation update

Aafiya-H and others added 4 commits May 21, 2026 17:51
…breaks sampling, fix(examples): close math agent before unload

device_map=auto on multi-GPU nodes produced invalid logits and CUDA multinomial
asserts. Probe logits after load and pin to {: 0} before sampling; retry once on
assert with a restart hint if the context is poisoned. Refactor model reload paths
and harden calculator_agent cleanup in finally.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants