Skip to content

[argus] patch #39: bound the checkpointer pool (elastic 1..4, idle-shrink)#4

Merged
Nichol4s merged 1 commit into
argusfrom
argus-patch-39-pool-bounds
Jul 2, 2026
Merged

[argus] patch #39: bound the checkpointer pool (elastic 1..4, idle-shrink)#4
Nichol4s merged 1 commit into
argusfrom
argus-patch-39-pool-bounds

Conversation

@Nichol4s

@Nichol4s Nichol4s commented Jul 1, 2026

Copy link
Copy Markdown

Why

psycopg_pool's default is a fixed pool: max_size falls back to min_size=4, so every uvicorn worker permanently holds 4 idle Postgres connections. With 2 workers per gateway and one gateway per project stack, the idle floor alone (10 conns per stack, observed exactly) exhausted the shared server's max_connections on 2026-07-01 (FATAL: sorry, too many clients already, hit twice that day). Raising max_connections to 500 bought headroom; this bounds the actual source.

What

_build_postgres_pool() (the single pool-construction site, used by both the legacy checkpointer: and unified database: config paths) now passes:

  • min_size=1 (was: implicit 4)
  • max_size=4 (same ceiling as today's default, now explicit)
  • max_idle=300 seconds, so the pool shrinks back to min after idle

Each is overridable per deployment via DEERFLOW_CHECKPOINTER_POOL_MIN / _MAX / _MAX_IDLE. Keepalive kwargs and check_connection wiring are unchanged. Expected effect on argus: per-gateway idle baseline drops from 10 to ~2 while burst capacity stays identical.

Tests

backend/tests/test_checkpointer_pool_bounds.py pins the defaults, the env overrides, and the keepalive/check wiring (psycopg faked via sys.modules, no live Postgres needed). 207 passed / 1 skipped across all checkpointer-touching suites locally; ruff clean on changed files. Note: lint-backend and frontend-unit-tests CI jobs are red on pre-existing debt (same as patches bytedance#37/bytedance#38).

Deploy (post-merge, separate step)

App-code change: pin bump in argus VERSIONS.md + make deerflow-upgrade + restart stacks canary-first. Rollback = revert pin to 2df36c9.

🤖 Generated with Claude Code

…, idle-shrink)

psycopg_pool's default is a fixed pool: max_size falls back to min_size=4,
so every uvicorn worker permanently holds 4 idle Postgres connections.
With 2 workers per gateway and one gateway per project stack, that idle
floor alone exhausted the shared server's max_connections on 2026-07-01
(FATAL: sorry, too many clients already).

Keep the same per-worker ceiling (4) but make the pool elastic: min_size=1,
grow under load, shrink back after max_idle=300s. Deployment overrides via
DEERFLOW_CHECKPOINTER_POOL_MIN / _MAX / _MAX_IDLE. Keepalive kwargs and
check_connection wiring unchanged.

Tests: tests/test_checkpointer_pool_bounds.py pins the defaults, the env
overrides, and the keepalive/check wiring (psycopg faked via sys.modules,
no live Postgres needed). 207 green across the checkpointer-touching
suites; ruff clean on changed files.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@Nichol4s Nichol4s merged commit f37b829 into argus Jul 2, 2026
5 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant