Skip to content

fix(gguf): correct mismatched-shape error message in check_quantized_param_shape#13504

Open
Ricardo-M-L wants to merge 1 commit intohuggingface:mainfrom
Ricardo-M-L:fix/gguf-shape-error-message
Open

fix(gguf): correct mismatched-shape error message in check_quantized_param_shape#13504
Ricardo-M-L wants to merge 1 commit intohuggingface:mainfrom
Ricardo-M-L:fix/gguf-shape-error-message

Conversation

@Ricardo-M-L
Copy link
Copy Markdown
Contributor

What does this PR do?

Fixes the misleading error raised by GGUFQuantizer.check_quantized_param_shape when a loaded GGUF weight doesn't match the model's expected shape.

Before

inferred_shape = _quant_shape_from_byte_shape(loaded_param_shape, type_size, block_size)
if inferred_shape != current_param_shape:
    raise ValueError(
        f"{param_name} has an expected quantized shape of: {inferred_shape}, "
        f"but received shape: {loaded_param_shape}"
    )

The check compares inferred_shape against current_param_shape, but the message reports inferred_shape vs loaded_param_shape. Since inferred_shape is derived from loaded_param_shape, the two values on either side of the reported "mismatch" are effectively the same thing described at different unpacking stages — the shape the model actually expected (current_param_shape) never shows up in the message.

Concretely, the 9B Q8 GGUF failure noted in #13001 produced:

ValueError: double_stream_modulation_img.linear.weight has an expected quantized shape of: (24576, 4096), but received shape: torch.Size([24576, 8192])

…even though the model parameter was (36864, 6144), which is the real expected shape and the thing the user needs to see when diagnosing a Klein-vs-Dev/GGUF-variant mix-up.

After

<param_name> has an expected shape of: <current_param_shape>, but the loaded GGUF weight decodes to shape: <inferred_shape>

Now both sides of the comparison are visible, and the "expected" side actually reflects what the model wants.

Related

Partially addresses the error-message confusion noted by @Vargol in #13001 (comment). This PR only touches the error text — it does not change the detection logic or attempt to resolve the underlying Klein-vs-Dev GGUF shape-inference issue that @DN6 is tracking.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline?
  • Did you write any new necessary tests? — N/A; this is a one-line error-message correction with no behavior change.

Who can review?

@DN6 @sayakpaul

check_quantized_param_shape compares inferred_shape against
current_param_shape, but the error message printed inferred_shape
vs loaded_param_shape — and inferred_shape is derived from
loaded_param_shape, so the reported mismatch was effectively
self-referential and gave no signal about the model's expected shape.

Print current_param_shape (what the model expected) vs inferred_shape
(what the quantized weight decodes to) so the two sides of the
comparison are actually visible.

Noted by @Vargol in huggingface#13001.
@github-actions github-actions Bot added quantization size/S PR with diff < 50 LOC labels Apr 19, 2026
@sayakpaul sayakpaul requested a review from DN6 April 21, 2026 14:25
@sayakpaul
Copy link
Copy Markdown
Member

@Ricardo-M-L I am seeing that you're opening a lot of PRs in a very short period of time. I politely as you to reduce that pace a bit.

@Ricardo-M-L
Copy link
Copy Markdown
Contributor Author

Friendly ping — this PR has been approved. Is there anything else needed before merging? Happy to make any requested changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

quantization size/S PR with diff < 50 LOC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants