Pr 44 device map fix#81
Open
ctrl-gaurav wants to merge 4 commits into
Open
Conversation
…breaks sampling, fix(examples): close math agent before unload
device_map=auto on multi-GPU nodes produced invalid logits and CUDA multinomial
asserts. Probe logits after load and pin to {: 0} before sampling; retry once on
assert with a restart hint if the context is poisoned. Refactor model reload paths
and harden calculator_agent cleanup in finally.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Make the Transformers fallback engine survive broken
device_map="auto"sharding. On some multi-GPU systems,device_map="auto"shards a model across GPUs in a way that produces invalid logits, which then triggers a CUDA device-side assert during sampling and poisons the GPU context for the rest of the process. This change detects that situation and automatically reloads the model pinned to GPU 0 (device_map={"": 0}) before sampling can corrupt CUDA, with a fallback retry path if an assert still slips through.Merged from PR #44 (
Aafiya-H) intopr-44-device-map-fixwith thetransformers_engine.pyconflict resolved — quiet model loading, debug-level logging, and BPE-safe decoding frommainwere preserved alongside the new device-map recovery logic.Changes
effgen/models/transformers_engine.py_assemble_model_kwargs,_from_pretrained_with_flash_fallback,_load_model_weights,_drop_model_weights(quiet loading + Flash-Attention fallback preserved).device_map="auto"recovery: probe logits via a short forward pass (_probe_auto_device_map_logits) and, if they look invalid, reload pinned to GPU 0 before sampling (_ensure_device_map_viable_before_sampling,_apply_cuda_device_map_pin_fallback,_effective_device_map)._is_cuda_device_side_assert) and retry generation once after pinning to GPU 0 (_maybe_retry_after_cuda_assert); coversgenerate,generate_batch, andgenerate_stream. Raises a clear "restart the process / setCUDA_VISIBLE_DEVICES=0" error when the context is already poisoned._pin_device_map_for_cuda/_retrying_after_cuda_assertstate onunload(), and guardtorch.cuda.empty_cache().examples/basic/calculator_agent.py— tear down the math agent in afinallyblock (agent.close()beforemodel.unload()) so demo/interactive runs exit cleanly without the "Agent was garbage-collected without calling close()" warning, even when generation fails mid-run.CHANGELOG.md— documented the calculator-agent cleanup fix.test.py—tests/.Testing
pytest tests/unit/ -v --no-covdevice_map="auto"previously asserted now falls back to GPU 0 and generates successfullyType of Change