Skip to content

refactor and support for multi algs fusion#1920

Open
n1ck-guo wants to merge 8 commits into
mainfrom
hengguo/refactor_algs_clean
Open

refactor and support for multi algs fusion#1920
n1ck-guo wants to merge 8 commits into
mainfrom
hengguo/refactor_algs_clean

Conversation

@n1ck-guo

Copy link
Copy Markdown
Contributor

Description

What this PR does
Introduces a composable QuantizationPipeline that separates pre-processing algorithms (e.g. AWQ) from the terminal block-quantizer (e.g. AutoRound/RTN), and lets users compose them declaratively via config lists.

Key changes:

  • QuantizationPipeline (algorithms/quantization/pipeline.py): new orchestration layer — [preprocessors…] + block_quantizer. Replaces the implicit algorithm coupling in DataDrivenCompressor.
  • BasePipelineMember / BaseWeightTransformer / BaseQuantizer (base.py): clean class hierarchy with unified lifecycle hooks (prepare_run, quantize_block, finalize_run).
  • AWQConfig + AWQQuantizer refactored as a BaseWeightTransformer — pure weight-smoothing preprocessor, no quantization loop of its own.
  • DiffusionMixin injected dynamically at pipeline construction time (is_diffusion=True) — no if is_diffusion branches in algorithm code.
  • CLI (auto_round/cli/) rewritten to expose --alg_configs for composing pipelines from the command line.

Usage: AWQ + AutoRound fusion

from auto_round import AutoRound
from auto_round.algorithms.quantization.awq.config import AWQConfig
from auto_round.algorithms.quantization.sign_round.config import
SignRoundConfig
ar = AutoRound(
    model_name,
    scheme="W4A16",
    [AWQConfig(), SignRoundConfig(iters=200)],
)
model, layer_config = ar.quantize()

Passing a list of configs activates the pipeline: AWQ smoothing runs first on each block, then AutoRound's SignSGD optimization runs on the smoothed weights. Passing a single config (old API) continues to work unchanged.
Compatibility

  • Single-config API (AutoRound(model, ...)) is fully backward compatible.
  • All existing CPU tests pass; pre-existing environment failures (missing auto-round-lib, device fixtures) are unrelated to this PR.

Type of Change

New feature

Related Issues

Fixes or relates to #

Checklist Before Submitting

  • My code has been tested locally.
  • Documentation has been updated as needed.
  • New or updated tests are included where applicable.
  • The CUDA CI has passed. You can trigger it by commenting /azp run Unit-Test-CUDA-AutoRound.

## Description
**What this PR does**
Introduces a composable QuantizationPipeline that separates
pre-processing algorithms (e.g. AWQ) from the terminal block-quantizer
(e.g. AutoRound/RTN), and lets users compose them declaratively via
config lists.

**Key changes:**
- QuantizationPipeline (algorithms/quantization/pipeline.py): new
orchestration layer — [preprocessors…] + block_quantizer. Replaces the
implicit algorithm coupling in DataDrivenCompressor.
- BasePipelineMember / BaseWeightTransformer / BaseQuantizer (base.py):
clean class hierarchy with unified lifecycle hooks (prepare_run,
quantize_block, finalize_run).
- AWQConfig + AWQQuantizer refactored as a BaseWeightTransformer — pure
weight-smoothing preprocessor, no quantization loop of its own.
- DiffusionMixin injected dynamically at pipeline construction time
(is_diffusion=True) — no if is_diffusion branches in algorithm code.
- CLI (auto_round/cli/) rewritten to expose --alg_configs for composing
pipelines from the command line.
-
**Usage: AWQ + AutoRound fusion**
```python
from auto_round import AutoRound
from auto_round.algorithms.quantization.awq.config import AWQConfig
from auto_round.algorithms.quantization.sign_round.config import
SignRoundConfig
ar = AutoRound(
    model_name,
    scheme="W4A16",
    [AWQConfig(), SignRoundConfig(iters=200)],
)
model, layer_config = ar.quantize()
```
Passing a list of configs activates the pipeline: AWQ smoothing runs
first on each block, then AutoRound's SignSGD optimization runs on the
smoothed weights. Passing a single config (old API) continues to work
unchanged.
Compatibility
- Single-config API (AutoRound(model, ...)) is fully backward
compatible.
- All existing CPU tests pass; pre-existing environment failures
(missing auto-round-lib, device fixtures) are unrelated to this PR.

## Type of Change

<!-- Bug fix / New feature / Documentation / Performance / Refactor /
Other: __________ -->

New feature

## Related Issues

<!-- Link to related issues using #issue_number -->

Fixes or relates to #

## Checklist Before Submitting

- [ ] My code has been tested locally.
- [ ] Documentation has been updated as needed.
- [ ] New or updated tests are included where applicable.
- [ ] The CUDA CI has passed. You can trigger it by commenting `/azp run
Unit-Test-CUDA-AutoRound`.

<!-- Optional: Tag reviewers or add extra notes below -->

Signed-off-by: n1ck-guo <heng.guo@intel.com>
@n1ck-guo n1ck-guo added api/new ready only add when the PR is ready to merge enhancement New feature or request labels Jun 13, 2026
n1ck-guo added 2 commits June 13, 2026 09:37
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
@chensuyue

Copy link
Copy Markdown
Contributor

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: n1ck-guo <heng.guo@intel.com>
@chensuyue chensuyue added this to the 0.14.0 milestone Jun 15, 2026
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Comment thread auto_round/algorithms/quantization/sign_round/quantizer.py Outdated
Signed-off-by: n1ck-guo <heng.guo@intel.com>
current_output = to_device(current_output, loss_device)
output_q = self._get_current_q_output(block, input_ids, input_others, indices, device, loss_device)
current_output = ctx.reference_batch(indices, device=loss_device)
output_q = ctx.forward_batch(indices, device=device, cache_device=loss_device)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function is a little confusing, I don't konw which module is forwarding from the API

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I introduced ctx.io as a unified IO abstraction so block input selection, reference output caching, and mini-batch forward logic all go through the same path. Here it is forwarding the current block (ctx.block) on the sampled batch via the IO layer. I’ll add a short comment to clarify it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this function should be part of ctx. It would be clearer to pass it explicitly, e.g.:

forward_batch(module, indices, ctx)

That makes the function's dependencies more obvious. But it's up to you

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This interface is designed for unified management of block inputs. This way, the quantizer doesn't need to design its own logic to manage inputs, devices, types, etc., but can use a unified input. The specific forward prcoess can be modified by inheriting from and overriding the _resolve_block_forward method of the base class. The current design may not be perfect and will be further optimized.

@wenhuach21 wenhuach21 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to ask Xuehao to help trigger the pre-release tests for this PR. If the full test suite is too heavy, running a representative subset should be sufficient.

current_output = self._get_current_output(reference_output, indices)
current_output = to_device(current_output, loss_device)
output_q = self._get_current_q_output(block, input_ids, input_others, indices, device, loss_device)
current_output = ctx.reference_batch(indices, device=loss_device)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't tell what the function does from its name.

Comment thread auto_round/algorithms/quantization/sign_round/quantizer.py Outdated
def _immediate_pack_and_save_module(self, module_name):
from auto_round.compressors.shard_writer import ShardWriter

shard_writer = ShardWriter.get_shard_writer()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still coupled, better decouple them in the future

Comment thread auto_round/compressors/base.py
n1ck-guo added 2 commits June 15, 2026 14:07
Signed-off-by: n1ck-guo <heng.guo@intel.com>
@chensuyue

Copy link
Copy Markdown
Contributor

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api/new enhancement New feature or request ready only add when the PR is ready to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants