Skip to content

[superlog] Log query failures at WARN in executeTimedQuery to prevent false-positive ERROR incidents#497

Open
superlog-app[bot] wants to merge 1 commit into
stagingfrom
superlog/warn-on-query-fail-in-executeTimedQuery
Open

[superlog] Log query failures at WARN in executeTimedQuery to prevent false-positive ERROR incidents#497
superlog-app[bot] wants to merge 1 commit into
stagingfrom
superlog/warn-on-query-fail-in-executeTimedQuery

Conversation

@superlog-app

@superlog-app superlog-app Bot commented Jun 26, 2026

Copy link
Copy Markdown

Summary

Insights job-level logs were being emitted at ERROR severity when a single ClickHouse sub-query failed inside the AI agent's multi-query execute_sql tool — even though the job itself completed successfully and generated insights.

The executeTimedQuery utility called logger.error("Query failed", ...) in its catch block, which called requestLogger.error() on the job's evlog wide-event RequestLogger. evlog escalates the final wide-event severity to ERROR whenever .error() is called during the request scope. The execute_sql tool catches per-query errors gracefully (catch (err) { return { error, data: [] }; }), so the agent continues — but by that point the wide event is already marked ERROR, causing the job summary to emit at ERROR despite job_status: "succeeded".

The fix changes logger.error to logger.warn in executeTimedQuery's catch block. Since the error is always re-thrown, the caller decides severity: recoverable callers (the multi-query sqlTool) catch it and keep the job at INFO/WARN; fatal errors propagate to the job-level catch in jobs.ts which correctly calls logger.error(err) and sets job_status: "failed". This matches the pattern in fetch-context.ts where pre-fetch query failures already use requestLogger.warn().

Incident on Superlog


Was this PR helpful? Leave feedback — goes straight to the Superlog team.


Summary by cubic

Log query failures at WARN in executeTimedQuery so recoverable sub-query errors don’t escalate job-wide logs to ERROR. Prevents false-positive ERROR incidents when execute_sql handles a failed sub-query and the job still succeeds.

  • Bug Fixes
    • Changed logger.error to logger.warn in executeTimedQuery and rethrow the error so the caller decides severity.
    • Recoverable paths (multi-query execute_sql) now keep jobs at INFO/WARN; fatal errors still surface as ERROR at the job level.
    • Aligns with fetch-context.ts behavior and avoids false-positive incidents in Superlog.
    • Affected file: packages/ai/src/ai/tools/utils/query.ts.

Written for commit db5aa8f. Summary will update on new commits.

Review in cubic

@vercel vercel Bot temporarily deployed to Preview – documentation June 26, 2026 22:54 Inactive
@vercel

vercel Bot commented Jun 26, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
databuddy-status Ready Ready Preview, Comment Jun 26, 2026 10:55pm
2 Skipped Deployments
Project Deployment Actions Updated (UTC)
dashboard Skipped Skipped Jun 26, 2026 10:55pm
documentation Skipped Skipped Jun 26, 2026 10:55pm

@unkey-deploy

unkey-deploy Bot commented Jun 26, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Unkey Deploy

Name Status Preview Inspect Updated (UTC)
api (preview) Ready Visit Preview Inspect Jun 26, 2026 10:55pm

@greptile-apps

greptile-apps Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR changes a single log call in executeTimedQuery from logger.error to logger.warn to prevent per-query ClickHouse failures from incorrectly escalating the job-level evlog wide-event to ERROR when the insights job itself succeeds. Because the error is always re-thrown, fatal errors still propagate to the job-level catch in jobs.ts, which calls logger.error and emits job_status: "failed" — so observability on true failures is unchanged.

  • Targeted fix for a false-positive alert: The execute_sql tool in insights-agent-tools.ts (line 264) catches per-query errors and returns partial results, keeping the agent alive — but the prior logger.error had already marked the wide-event as ERROR before the catch. Downgrading to WARN at the utility layer breaks that coupling.
  • Aligned with existing pattern: fetch-context.ts already logs pre-fetch query failures at requestLogger.warn() using the same rationale; this PR brings executeTimedQuery into consistency.

Confidence Score: 5/5

Safe to merge — the one changed line downgrades a log call while keeping the error re-throw intact, so all callers behave identically at runtime.

The fix is minimal and correct: the error is always re-thrown after the warn log, so no error suppression occurs. Fatal errors still reach the job-level catch in jobs.ts and emit at ERROR. The change aligns with the established warn-on-recoverable-query-failure pattern already present in fetch-context.ts. There is only one caller of executeTimedQuery (executeAgentSqlForWebsite), and that path is well-understood.

No files require special attention.

Important Files Changed

Filename Overview
packages/ai/src/ai/tools/utils/query.ts Single-line change from logger.error to logger.warn in the catch block; error is still re-thrown so all callers retain full error propagation. Change is consistent with the fetch-context.ts pattern and correctly prevents evlog wide-event escalation for recoverable sub-query failures.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant job as jobs.ts (job-level)
    participant gen as generateWebsiteInsights
    participant sql as execute_sql tool (sqlTool)
    participant etq as executeTimedQuery
    participant rl as requestLogger (evlog)

    job->>gen: processGenerateWebsiteJob()
    gen->>sql: "execute({ queries })"
    loop per query
        sql->>etq: executeAgentSqlForWebsite()
        etq->>rl: chQuery() throws
        Note over etq,rl: BEFORE: logger.error() wide-event = ERROR
        Note over etq,rl: AFTER: logger.warn() wide-event stays INFO/WARN
        etq-->>sql: re-throws error
        sql->>sql: "catch(err) returns { error, data: [] }"
    end
    sql-->>gen: partial results (job continues)
    gen-->>job: result (status: succeeded)
    job->>rl: "logger.emit({ job_status: succeeded })"
    Note over job,rl: Wide-event emits at WARN (correct) not ERROR (false-positive)

    alt truly fatal error propagates
        gen-->>job: throws
        job->>rl: logger.error(err) + job_status: failed
        Note over job,rl: ERROR still fires — no regression for real failures
    end
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant job as jobs.ts (job-level)
    participant gen as generateWebsiteInsights
    participant sql as execute_sql tool (sqlTool)
    participant etq as executeTimedQuery
    participant rl as requestLogger (evlog)

    job->>gen: processGenerateWebsiteJob()
    gen->>sql: "execute({ queries })"
    loop per query
        sql->>etq: executeAgentSqlForWebsite()
        etq->>rl: chQuery() throws
        Note over etq,rl: BEFORE: logger.error() wide-event = ERROR
        Note over etq,rl: AFTER: logger.warn() wide-event stays INFO/WARN
        etq-->>sql: re-throws error
        sql->>sql: "catch(err) returns { error, data: [] }"
    end
    sql-->>gen: partial results (job continues)
    gen-->>job: result (status: succeeded)
    job->>rl: "logger.emit({ job_status: succeeded })"
    Note over job,rl: Wide-event emits at WARN (correct) not ERROR (false-positive)

    alt truly fatal error propagates
        gen-->>job: throws
        job->>rl: logger.error(err) + job_status: failed
        Note over job,rl: ERROR still fires — no regression for real failures
    end
Loading

Reviews (1): Last reviewed commit: "[superlog] Log query failures at WARN in..." | Re-trigger Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants