Improve accuracy for models using shuffle, unshuffle, cat ops (#19159)#19159
Improve accuracy for models using shuffle, unshuffle, cat ops (#19159)#19159abhinaykukkadapu wants to merge 1 commit intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19159
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ You can merge normally! (2 Unrelated Failures)As of commit 7fdec04 with merge base ddd8ac6 ( BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
7afba6c to
295756a
Compare
295756a to
541a852
Compare
|
@abhinaykukkadapu has exported this pull request. If you are a Meta employee, you can view the originating Diff in D102626539. |
…h#19159) Summary: Replace the Qualcomm concat observer path with an explicit same-domain-or-requantize model for `aten.cat`. Preserve shared qparams for `pixel_shuffle` and `pixel_unshuffle`, extend `split_with_sizes_copy` coverage, and add regressions for mismatched `cat` branches plus value-preserving ops that must use `SharedQuantizationSpec`. Differential Revision: D102626539
541a852 to
c6c5d0b
Compare
There was a problem hiding this comment.
Hi @abhinaykukkadapu,
Thanks a lot for this PR and making all changes to improve QNN Backend accuracy.
I think shuffle & unshuffle LGTM.
However, I think there might be some concerns for the concat operation modification.
I have a PR to reproduce the concerns I got: #19182. I also have the outputs and graphs compared in the summary section.
Please have a look.
If change for concat behavior is required to fix accuracy issues for the model you mentioned in #19159, could we possibly reuse custom_annotation feature under https://github.com/pytorch/executorch/blob/main/backends/qualcomm/quantizer/custom_annotation.py for that model.
Thanks
| qscheme=quantization_config.output_activation.qscheme, | ||
| quant_max=quantization_config.output_activation.quant_max, | ||
| quant_min=quantization_config.output_activation.quant_min, | ||
| observer_or_fake_quant_ctr=ConcatObserver.with_args( |
There was a problem hiding this comment.
I believe this change it reverting what this PR is doing: #15162.
The reason #15162 is introduced is because the input[0] could not cover the entire range of values for concat output, so a lot of output values were clipped.
If you have 2 input tensors like:
sample_input = ( torch.tensor([[[[-10.0, 2.0], [3.0, 4.0]]]]), torch.tensor([[[[1.0, 3.0], [8.0, 10.0]]]]), )
and after it goes through cat operation, you will be getting the wrong value with this PR.
[tensor([[[[-9.9798, 1.9849], [ 2.9774, 4.0250], [ 1.0476, 3.0325], [ 4.0802, 4.0802]]]])]
I have a demo PR to reproduce this error, please have a look:
#19182
There was a problem hiding this comment.
Thanks @winskuo-quic for the detailed review, i agree that it might've worked for the model but might not work when the ranges skewed like in your example. Let me revert the cat to concatobserver and test the accuracy.
| # node1 -> q_ui8 (n) -> dq_ui8 -> q_int32 -> dq_int32 -> node2 -> .... | ||
| # We store {node2: quant_attr in dq_int32} in node1.meta | ||
| if n.target in q_ops and n.args[0].target not in dq_ops: | ||
| self._annotate_cat_requant(n) |
There was a problem hiding this comment.
I think adding a specific op in a general pass would not be the best option. Do you think we can possibly move the logic to htp_rules.py if this is necessary?
There was a problem hiding this comment.
Yeah, i was also going back and forth on this, i added a pass instead of shoving it up here.
…h#19159) Summary: Replace the Qualcomm concat observer path with an explicit same-domain-or-requantize model for `aten.cat`. Preserve shared qparams for `pixel_shuffle` and `pixel_unshuffle`, extend `split_with_sizes_copy` coverage, and add regressions for mismatched `cat` branches plus value-preserving ops that must use `SharedQuantizationSpec`. Differential Revision: D102626539
c6c5d0b to
779146e
Compare
…h#19159) Summary: Replace the Qualcomm concat observer path with an explicit same-domain-or-requantize model for `aten.cat`. Preserve shared qparams for `pixel_shuffle` and `pixel_unshuffle`, extend `split_with_sizes_copy` coverage, and add regressions for mismatched `cat` branches plus value-preserving ops that must use `SharedQuantizationSpec`. Differential Revision: D102626539
779146e to
7fdec04
Compare
|
@winskuo-quic here is the latest patch, it uses the same ConcatObserver approach as previously done. Additionally the pass for requant is introduced after the generic AnnotateQuantAttr pass. Please review when you get a chance. FYI the SQNR right now stayed above target with this changeset. (> 30db) |
Summary:
Replace the Qualcomm concat observer path with an explicit same-domain-or-requantize model for
aten.cat. Preserve shared qparams forpixel_shuffleandpixel_unshuffle, extendsplit_with_sizes_copycoverage, and add regressions for mismatchedcatbranches plus value-preserving ops that must useSharedQuantizationSpec.Differential Revision: D102626539