feat(compaction): allow a dedicated model for session summary generation (and bound summariser input)

## Summary

Allow session compaction to use a **dedicated (smaller/faster) model** for summary
generation, independent of the agent's primary model — and optionally **bound the input**
fed to the summariser.

## Current behaviour

When a session is compacted (manual `/compact`, the proactive ~90% threshold trigger, or
post-overflow recovery), the summary is produced by running the compaction agent against the
**session agent's own primary model**:

- `doCompact` → `compactor.RunLLM{ RunAgent: r.runCompactionAgent }`
- `runCompactionAgent(ctx, a, sess)` uses `a = resolveSessionAgent(sess)`
- `compactionContextLimit` clones `a.Model(ctx)`

So the summary call is made with whatever model the agent normally uses.

## Problem

The summary call has to ingest the **entire conversation**, so it is the single most
expensive and slowest LLM call in a session — and it only ever runs when the context is
already large. When an agent's primary model is large/slow/expensive (e.g. a top-tier model
with a very large context window), summary generation is correspondingly slow and costly.
Callers that wrap `Summarize`/compaction in a timeout are especially exposed: the one call
most likely to exceed a deadline is the summary over a big context, i.e. exactly when
compaction is needed.

There is currently no configuration knob for this — the only compaction extension points are
the `pre_compact` / `before_compaction` / `after_compaction` hooks. `before_compaction` can
supply a full summary to bypass the LLM, but that is not a practical place to implement real
summarisation.

## Requests

1. **Dedicated compaction model.** Let users configure the model used for summary generation
   independently of the agent's primary model — e.g. a runtime option (`WithCompactionModel`)
   and/or an agent/team config field. A fast, cheap model is usually more than adequate for
   summarisation.
2. **Bounded summariser input (optional).** Allow capping the number of tokens fed to the
   summariser so summary latency and cost are predictable regardless of total conversation
   size.

## Pointers

- `pkg/runtime/session_compaction.go` — `doCompact`, `runCompactionAgent`,
  `compactionContextLimit`
- `pkg/runtime/compactor/compactor.go` — keep/summary token budgets

## Benefit

Faster, cheaper, and more reliable compaction, with summary cost decoupled from the agent's
primary model choice.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(compaction): allow a dedicated model for session summary generation (and bound summariser input) #3241

Summary

Current behaviour

Problem

Requests

Pointers

Benefit

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

feat(compaction): allow a dedicated model for session summary generation (and bound summariser input) #3241

Description

Summary

Current behaviour

Problem

Requests

Pointers

Benefit

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions