fix(branches): correct SUM(DISTINCT) cost aggregation bug#191
Conversation
cc4c9dc to
b4f02c2
Compare
| -- We cannot use SUM(DISTINCT s.estimated_cost_usd) because that deduplicates | ||
| -- by value (all $0.00 sessions collapse to one), producing wrong totals. | ||
| -- Instead, aggregate unique (branch, session_id) pairs via a subquery. | ||
| COALESCE(( |
There was a problem hiding this comment.
Correlated subquery in the SELECT runs per branch and can cause N+1 DB work; pre-aggregate or rewrite as a single join/subquery outside the per-row SELECT to avoid repeated scans.
Details
✨ AI Reasoning
The new SELECT embeds a correlated subquery that computes a SUM by scanning/joining branch_tracking, commits, commit_attributions, and sessions for each result row. This turns what was a single set-based aggregation into repeated work proportional to the number of branches returned, causing database-level N+1 behavior and increased query cost as data grows.
🔧 How do I fix it?
Move constant work outside loops. Use StringBuilder instead of string concatenation in loops. Cache compiled regex patterns. Use hash-based lookups instead of nested loops. Batch database operations instead of N+1 queries.
Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More info
SUM(DISTINCT s.estimated_cost_usd) deduplicates by cost value rather than by session identity. When multiple sessions have the same cost (especially $0.00), they collapse to one row and the sum is wrong. Fix: use a correlated subquery to get the distinct set of session_ids per branch first, then sum their costs — ensuring each session is counted exactly once regardless of its cost value. Closes #189
b4f02c2 to
e23b1c9
Compare
Closes #189
Root cause:
SUM(DISTINCT s.estimated_cost_usd)deduplicates by cost value, not by session identity. If multiple sessions have the same cost (especially $0.00 when pricing isn't configured), they collapse to one row and the sum is wrong — showing $0.00 even when there are many sessions.Fix: Replace with a correlated subquery that first collects the distinct
(branch, session_id)pairs, then sums their costs — each session counted exactly once.