refactor and support for multi algs fusion#1920
Conversation
## Description
**What this PR does**
Introduces a composable QuantizationPipeline that separates
pre-processing algorithms (e.g. AWQ) from the terminal block-quantizer
(e.g. AutoRound/RTN), and lets users compose them declaratively via
config lists.
**Key changes:**
- QuantizationPipeline (algorithms/quantization/pipeline.py): new
orchestration layer — [preprocessors…] + block_quantizer. Replaces the
implicit algorithm coupling in DataDrivenCompressor.
- BasePipelineMember / BaseWeightTransformer / BaseQuantizer (base.py):
clean class hierarchy with unified lifecycle hooks (prepare_run,
quantize_block, finalize_run).
- AWQConfig + AWQQuantizer refactored as a BaseWeightTransformer — pure
weight-smoothing preprocessor, no quantization loop of its own.
- DiffusionMixin injected dynamically at pipeline construction time
(is_diffusion=True) — no if is_diffusion branches in algorithm code.
- CLI (auto_round/cli/) rewritten to expose --alg_configs for composing
pipelines from the command line.
-
**Usage: AWQ + AutoRound fusion**
```python
from auto_round import AutoRound
from auto_round.algorithms.quantization.awq.config import AWQConfig
from auto_round.algorithms.quantization.sign_round.config import
SignRoundConfig
ar = AutoRound(
model_name,
scheme="W4A16",
[AWQConfig(), SignRoundConfig(iters=200)],
)
model, layer_config = ar.quantize()
```
Passing a list of configs activates the pipeline: AWQ smoothing runs
first on each block, then AutoRound's SignSGD optimization runs on the
smoothed weights. Passing a single config (old API) continues to work
unchanged.
Compatibility
- Single-config API (AutoRound(model, ...)) is fully backward
compatible.
- All existing CPU tests pass; pre-existing environment failures
(missing auto-round-lib, device fixtures) are unrelated to this PR.
## Type of Change
<!-- Bug fix / New feature / Documentation / Performance / Refactor /
Other: __________ -->
New feature
## Related Issues
<!-- Link to related issues using #issue_number -->
Fixes or relates to #
## Checklist Before Submitting
- [ ] My code has been tested locally.
- [ ] Documentation has been updated as needed.
- [ ] New or updated tests are included where applicable.
- [ ] The CUDA CI has passed. You can trigger it by commenting `/azp run
Unit-Test-CUDA-AutoRound`.
<!-- Optional: Tag reviewers or add extra notes below -->
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
|
/azp run Unit-Test-CUDA-AutoRound |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
| current_output = to_device(current_output, loss_device) | ||
| output_q = self._get_current_q_output(block, input_ids, input_others, indices, device, loss_device) | ||
| current_output = ctx.reference_batch(indices, device=loss_device) | ||
| output_q = ctx.forward_batch(indices, device=device, cache_device=loss_device) |
There was a problem hiding this comment.
this function is a little confusing, I don't konw which module is forwarding from the API
There was a problem hiding this comment.
I introduced ctx.io as a unified IO abstraction so block input selection, reference output caching, and mini-batch forward logic all go through the same path. Here it is forwarding the current block (ctx.block) on the sampled batch via the IO layer. I’ll add a short comment to clarify it.
There was a problem hiding this comment.
I don't think this function should be part of ctx. It would be clearer to pass it explicitly, e.g.:
forward_batch(module, indices, ctx)
That makes the function's dependencies more obvious. But it's up to you
There was a problem hiding this comment.
This interface is designed for unified management of block inputs. This way, the quantizer doesn't need to design its own logic to manage inputs, devices, types, etc., but can use a unified input. The specific forward prcoess can be modified by inheriting from and overriding the _resolve_block_forward method of the base class. The current design may not be perfect and will be further optimized.
wenhuach21
left a comment
There was a problem hiding this comment.
It might be better to ask Xuehao to help trigger the pre-release tests for this PR. If the full test suite is too heavy, running a representative subset should be sufficient.
| current_output = self._get_current_output(reference_output, indices) | ||
| current_output = to_device(current_output, loss_device) | ||
| output_q = self._get_current_q_output(block, input_ids, input_others, indices, device, loss_device) | ||
| current_output = ctx.reference_batch(indices, device=loss_device) |
There was a problem hiding this comment.
I couldn't tell what the function does from its name.
| def _immediate_pack_and_save_module(self, module_name): | ||
| from auto_round.compressors.shard_writer import ShardWriter | ||
|
|
||
| shard_writer = ShardWriter.get_shard_writer() |
There was a problem hiding this comment.
still coupled, better decouple them in the future
Signed-off-by: n1ck-guo <heng.guo@intel.com>
|
/azp run Unit-Test-CUDA-AutoRound |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Description
What this PR does
Introduces a composable QuantizationPipeline that separates pre-processing algorithms (e.g. AWQ) from the terminal block-quantizer (e.g. AutoRound/RTN), and lets users compose them declaratively via config lists.
Key changes:
Usage: AWQ + AutoRound fusion
Passing a list of configs activates the pipeline: AWQ smoothing runs first on each block, then AutoRound's SignSGD optimization runs on the smoothed weights. Passing a single config (old API) continues to work unchanged.
Compatibility
Type of Change
New feature
Related Issues
Fixes or relates to #
Checklist Before Submitting
/azp run Unit-Test-CUDA-AutoRound.