refactor and support for multi algs fusion by n1ck-guo · Pull Request #1920 · intel/auto-round

n1ck-guo · 2026-06-13T01:22:08Z

Description

What this PR does
Introduces a composable QuantizationPipeline that separates pre-processing algorithms (e.g. AWQ) from the terminal block-quantizer (e.g. AutoRound/RTN), and lets users compose them declaratively via config lists.

Key changes:

QuantizationPipeline (algorithms/quantization/pipeline.py): new orchestration layer — [preprocessors…] + block_quantizer. Replaces the implicit algorithm coupling in DataDrivenCompressor.
BasePipelineMember / BaseWeightTransformer / BaseQuantizer (base.py): clean class hierarchy with unified lifecycle hooks (prepare_run, quantize_block, finalize_run).
AWQConfig + AWQQuantizer refactored as a BaseWeightTransformer — pure weight-smoothing preprocessor, no quantization loop of its own.
DiffusionMixin injected dynamically at pipeline construction time (is_diffusion=True) — no if is_diffusion branches in algorithm code.
CLI (auto_round/cli/) rewritten to expose --alg_configs for composing pipelines from the command line.

Usage: AWQ + AutoRound fusion

from auto_round import AutoRound
from auto_round.algorithms.quantization.awq.config import AWQConfig
from auto_round.algorithms.quantization.sign_round.config import
SignRoundConfig
ar = AutoRound(
    model_name,
    scheme="W4A16",
    [AWQConfig(), SignRoundConfig(iters=200)],
)
model, layer_config = ar.quantize()

Passing a list of configs activates the pipeline: AWQ smoothing runs first on each block, then AutoRound's SignSGD optimization runs on the smoothed weights. Passing a single config (old API) continues to work unchanged.
Compatibility

Single-config API (AutoRound(model, ...)) is fully backward compatible.
All existing CPU tests pass; pre-existing environment failures (missing auto-round-lib, device fixtures) are unrelated to this PR.

Type of Change

New feature

Related Issues

Fixes or relates to #

Checklist Before Submitting

My code has been tested locally.
Documentation has been updated as needed.
New or updated tests are included where applicable.
The CUDA CI has passed. You can trigger it by commenting /azp run Unit-Test-CUDA-AutoRound.

## Description **What this PR does** Introduces a composable QuantizationPipeline that separates pre-processing algorithms (e.g. AWQ) from the terminal block-quantizer (e.g. AutoRound/RTN), and lets users compose them declaratively via config lists. **Key changes:** - QuantizationPipeline (algorithms/quantization/pipeline.py): new orchestration layer — [preprocessors…] + block_quantizer. Replaces the implicit algorithm coupling in DataDrivenCompressor. - BasePipelineMember / BaseWeightTransformer / BaseQuantizer (base.py): clean class hierarchy with unified lifecycle hooks (prepare_run, quantize_block, finalize_run). - AWQConfig + AWQQuantizer refactored as a BaseWeightTransformer — pure weight-smoothing preprocessor, no quantization loop of its own. - DiffusionMixin injected dynamically at pipeline construction time (is_diffusion=True) — no if is_diffusion branches in algorithm code. - CLI (auto_round/cli/) rewritten to expose --alg_configs for composing pipelines from the command line. - **Usage: AWQ + AutoRound fusion** ```python from auto_round import AutoRound from auto_round.algorithms.quantization.awq.config import AWQConfig from auto_round.algorithms.quantization.sign_round.config import SignRoundConfig ar = AutoRound( model_name, scheme="W4A16", [AWQConfig(), SignRoundConfig(iters=200)], ) model, layer_config = ar.quantize() ``` Passing a list of configs activates the pipeline: AWQ smoothing runs first on each block, then AutoRound's SignSGD optimization runs on the smoothed weights. Passing a single config (old API) continues to work unchanged. Compatibility - Single-config API (AutoRound(model, ...)) is fully backward compatible. - All existing CPU tests pass; pre-existing environment failures (missing auto-round-lib, device fixtures) are unrelated to this PR. ## Type of Change  New feature ## Related Issues  Fixes or relates to # ## Checklist Before Submitting - [ ] My code has been tested locally. - [ ] Documentation has been updated as needed. - [ ] New or updated tests are included where applicable. - [ ] The CUDA CI has passed. You can trigger it by commenting `/azp run Unit-Test-CUDA-AutoRound`.  Signed-off-by: n1ck-guo <heng.guo@intel.com>

Signed-off-by: n1ck-guo <heng.guo@intel.com>

chensuyue · 2026-06-13T03:15:10Z

/azp run Unit-Test-CUDA-AutoRound

azure-pipelines · 2026-06-13T03:15:19Z

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: n1ck-guo <heng.guo@intel.com>

wenhuach21 · 2026-06-15T05:37:28Z

-                current_output = to_device(current_output, loss_device)
-                output_q = self._get_current_q_output(block, input_ids, input_others, indices, device, loss_device)
+                current_output = ctx.reference_batch(indices, device=loss_device)
+                output_q = ctx.forward_batch(indices, device=device, cache_device=loss_device)


this function is a little confusing, I don't konw which module is forwarding from the API

I introduced ctx.io as a unified IO abstraction so block input selection, reference output caching, and mini-batch forward logic all go through the same path. Here it is forwarding the current block (ctx.block) on the sampled batch via the IO layer. I’ll add a short comment to clarify it.

I don't think this function should be part of ctx. It would be clearer to pass it explicitly, e.g.:

forward_batch(module, indices, ctx)

That makes the function's dependencies more obvious. But it's up to you

This interface is designed for unified management of block inputs. This way, the quantizer doesn't need to design its own logic to manage inputs, devices, types, etc., but can use a unified input. The specific forward prcoess can be modified by inheriting from and overriding the _resolve_block_forward method of the base class. The current design may not be perfect and will be further optimized.

wenhuach21

It might be better to ask Xuehao to help trigger the pre-release tests for this PR. If the full test suite is too heavy, running a representative subset should be sufficient.

wenhuach21 · 2026-06-15T05:39:01Z

-                current_output = self._get_current_output(reference_output, indices)
-                current_output = to_device(current_output, loss_device)
-                output_q = self._get_current_q_output(block, input_ids, input_others, indices, device, loss_device)
+                current_output = ctx.reference_batch(indices, device=loss_device)


I couldn't tell what the function does from its name.

wenhuach21 · 2026-06-15T05:45:46Z

+    def _immediate_pack_and_save_module(self, module_name):
+        from auto_round.compressors.shard_writer import ShardWriter
+
+        shard_writer = ShardWriter.get_shard_writer()


still coupled, better decouple them in the future

Signed-off-by: n1ck-guo <heng.guo@intel.com>

chensuyue · 2026-06-15T06:52:38Z

/azp run Unit-Test-CUDA-AutoRound

azure-pipelines · 2026-06-15T06:52:47Z

Azure Pipelines successfully started running 1 pipeline(s).

n1ck-guo requested review from WeiweiZhang1, lvliang-intel, wenhuach21 and xin3he June 13, 2026 01:22

n1ck-guo added api/new ready only add when the PR is ready to merge enhancement New feature or request labels Jun 13, 2026

n1ck-guo added 2 commits June 13, 2026 09:37

fix

379b6da

Signed-off-by: n1ck-guo <heng.guo@intel.com>

fix import

d0232da

Signed-off-by: n1ck-guo <heng.guo@intel.com>

fix cuda ut

f409d96

Signed-off-by: n1ck-guo <heng.guo@intel.com>

chensuyue added this to the 0.14.0 milestone Jun 15, 2026

fix ut

0abcc0b

Signed-off-by: n1ck-guo <heng.guo@intel.com>

wenhuach21 reviewed Jun 15, 2026

View reviewed changes

Comment thread auto_round/algorithms/quantization/sign_round/quantizer.py Outdated

add top import of all alg configs

38482d5

Signed-off-by: n1ck-guo <heng.guo@intel.com>

wenhuach21 reviewed Jun 15, 2026

View reviewed changes

n1ck-guo added 2 commits June 15, 2026 14:07

chore: add .worktrees/ to .gitignore

c079f9c

update by comments

4093185

Signed-off-by: n1ck-guo <heng.guo@intel.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor and support for multi algs fusion#1920

refactor and support for multi algs fusion#1920
n1ck-guo wants to merge 8 commits into
mainfrom
hengguo/refactor_algs_clean

n1ck-guo commented Jun 13, 2026

Uh oh!

chensuyue commented Jun 13, 2026

Uh oh!

azure-pipelines Bot commented Jun 13, 2026

Uh oh!

Uh oh!

wenhuach21 Jun 15, 2026

Uh oh!

n1ck-guo Jun 15, 2026

Uh oh!

wenhuach21 Jun 15, 2026

Uh oh!

n1ck-guo Jun 15, 2026

Uh oh!

wenhuach21 left a comment

Uh oh!

wenhuach21 Jun 15, 2026

Uh oh!

Uh oh!

wenhuach21 Jun 15, 2026

Uh oh!

Uh oh!

chensuyue commented Jun 15, 2026

Uh oh!

azure-pipelines Bot commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

n1ck-guo commented Jun 13, 2026

Description

Type of Change

Related Issues

Checklist Before Submitting

Uh oh!

chensuyue commented Jun 13, 2026

Uh oh!

azure-pipelines Bot commented Jun 13, 2026

Uh oh!

Uh oh!

wenhuach21 Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

n1ck-guo Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

wenhuach21 Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

n1ck-guo Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

wenhuach21 left a comment

Choose a reason for hiding this comment

Uh oh!

wenhuach21 Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wenhuach21 Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chensuyue commented Jun 15, 2026

Uh oh!

azure-pipelines Bot commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants