[argus] patch #39: bound the checkpointer pool (elastic 1..4, idle-shrink)#4
Merged
Merged
Conversation
…, idle-shrink) psycopg_pool's default is a fixed pool: max_size falls back to min_size=4, so every uvicorn worker permanently holds 4 idle Postgres connections. With 2 workers per gateway and one gateway per project stack, that idle floor alone exhausted the shared server's max_connections on 2026-07-01 (FATAL: sorry, too many clients already). Keep the same per-worker ceiling (4) but make the pool elastic: min_size=1, grow under load, shrink back after max_idle=300s. Deployment overrides via DEERFLOW_CHECKPOINTER_POOL_MIN / _MAX / _MAX_IDLE. Keepalive kwargs and check_connection wiring unchanged. Tests: tests/test_checkpointer_pool_bounds.py pins the defaults, the env overrides, and the keepalive/check wiring (psycopg faked via sys.modules, no live Postgres needed). 207 green across the checkpointer-touching suites; ruff clean on changed files. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
psycopg_pool's default is a fixed pool:
max_sizefalls back tomin_size=4, so every uvicorn worker permanently holds 4 idle Postgres connections. With 2 workers per gateway and one gateway per project stack, the idle floor alone (10 conns per stack, observed exactly) exhausted the shared server'smax_connectionson 2026-07-01 (FATAL: sorry, too many clients already, hit twice that day). Raisingmax_connectionsto 500 bought headroom; this bounds the actual source.What
_build_postgres_pool()(the single pool-construction site, used by both the legacycheckpointer:and unifieddatabase:config paths) now passes:min_size=1(was: implicit 4)max_size=4(same ceiling as today's default, now explicit)max_idle=300seconds, so the pool shrinks back to min after idleEach is overridable per deployment via
DEERFLOW_CHECKPOINTER_POOL_MIN/_MAX/_MAX_IDLE. Keepalive kwargs andcheck_connectionwiring are unchanged. Expected effect on argus: per-gateway idle baseline drops from 10 to ~2 while burst capacity stays identical.Tests
backend/tests/test_checkpointer_pool_bounds.pypins the defaults, the env overrides, and the keepalive/check wiring (psycopg faked viasys.modules, no live Postgres needed). 207 passed / 1 skipped across all checkpointer-touching suites locally; ruff clean on changed files. Note:lint-backendandfrontend-unit-testsCI jobs are red on pre-existing debt (same as patches bytedance#37/bytedance#38).Deploy (post-merge, separate step)
App-code change: pin bump in argus VERSIONS.md +
make deerflow-upgrade+ restart stacks canary-first. Rollback = revert pin to2df36c9.🤖 Generated with Claude Code