Skip to content

fix(branches): correct SUM(DISTINCT) cost aggregation bug#191

Open
hashedone wants to merge 1 commit into
mainfrom
fix/branch-cost
Open

fix(branches): correct SUM(DISTINCT) cost aggregation bug#191
hashedone wants to merge 1 commit into
mainfrom
fix/branch-cost

Conversation

@hashedone
Copy link
Copy Markdown
Contributor

Closes #189

Root cause: SUM(DISTINCT s.estimated_cost_usd) deduplicates by cost value, not by session identity. If multiple sessions have the same cost (especially $0.00 when pricing isn't configured), they collapse to one row and the sum is wrong — showing $0.00 even when there are many sessions.

Fix: Replace with a correlated subquery that first collects the distinct (branch, session_id) pairs, then sums their costs — each session counted exactly once.

-- We cannot use SUM(DISTINCT s.estimated_cost_usd) because that deduplicates
-- by value (all $0.00 sessions collapse to one), producing wrong totals.
-- Instead, aggregate unique (branch, session_id) pairs via a subquery.
COALESCE((
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correlated subquery in the SELECT runs per branch and can cause N+1 DB work; pre-aggregate or rewrite as a single join/subquery outside the per-row SELECT to avoid repeated scans.

Details

✨ AI Reasoning
​The new SELECT embeds a correlated subquery that computes a SUM by scanning/joining branch_tracking, commits, commit_attributions, and sessions for each result row. This turns what was a single set-based aggregation into repeated work proportional to the number of branches returned, causing database-level N+1 behavior and increased query cost as data grows.

🔧 How do I fix it?
Move constant work outside loops. Use StringBuilder instead of string concatenation in loops. Cache compiled regex patterns. Use hash-based lookups instead of nested loops. Batch database operations instead of N+1 queries.

Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More info

SUM(DISTINCT s.estimated_cost_usd) deduplicates by cost value rather
than by session identity. When multiple sessions have the same cost
(especially $0.00), they collapse to one row and the sum is wrong.

Fix: use a correlated subquery to get the distinct set of session_ids
per branch first, then sum their costs — ensuring each session is
counted exactly once regardless of its cost value.

Closes #189
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(analytics): branches view shows $0 cost — investigate SUM(DISTINCT) on cost

1 participant