Skip to content

[superlog] Log SQL tool query failures as WARN instead of ERROR#500

Open
superlog-app[bot] wants to merge 1 commit into
mainfrom
superlog/fix-sql-tool-query-error-level
Open

[superlog] Log SQL tool query failures as WARN instead of ERROR#500
superlog-app[bot] wants to merge 1 commit into
mainfrom
superlog/fix-sql-tool-query-error-level

Conversation

@superlog-app

@superlog-app superlog-app Bot commented Jun 27, 2026

Copy link
Copy Markdown

Summary

Insights generation jobs that succeed are being logged at ERROR severity, creating false-positive incidents. Both sampled occurrences show job_status: "succeeded" and run_status: "succeeded" alongside level: "error" — the jobs completed and produced insights.

The insights service wires the AI package's tool logger to the insights job's wide-event logger (RequestLogger). When executeTimedQuery catches a ClickHouse query failure inside the execute_sql AI tool, it calls requestLogger.error() on the job logger. In evlog, this permanently upgrades the wide event's severity to ERROR. Even though the AI agent recovers from the query failure and completes successfully, the final logger.emit({ job_status: "succeeded" }) fires at ERROR level.

The fix changes logger.error() to logger.warn() for SQL query failures in executeTimedQuery. Tool-level query failures are recoverable — the AI agent catches the error and can try a different approach or proceed — so WARN is the correct severity. All diagnostic fields (SQL, execution time, error message) remain captured. Jobs that actually fail at the job level (thrown errors in processInsightsJob) continue to emit at ERROR via the explicit logger.error(err) call in the failure path.

Alternative: a per-context severity override could be added to createToolLogger so API and insights contexts can configure tool failure severity independently. This one-line change is the most targeted fix with the least blast radius.

Incident on Superlog


Was this PR helpful? Leave feedback — goes straight to the Superlog team.


Summary by cubic

Downgraded SQL tool query failures from ERROR to WARN in executeTimedQuery to prevent false-positive incidents on successful insight jobs. Diagnostics are unchanged; real job failures still log as ERROR.

  • Bug Fixes
    • Switched logger.error("Query failed", ...) to logger.warn(...) in packages/ai/src/ai/tools/utils/query.ts, avoiding job-wide ERROR severity when the agent recovers.

Written for commit 97cc2a6. Summary will update on new commits.

Review in cubic

@vercel

vercel Bot commented Jun 27, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
databuddy-status Ready Ready Preview, Comment Jun 27, 2026 6:08am
2 Skipped Deployments
Project Deployment Actions Updated (UTC)
dashboard Skipped Skipped Jun 27, 2026 6:08am
documentation Skipped Skipped Jun 27, 2026 6:08am

@vercel vercel Bot temporarily deployed to Preview – dashboard June 27, 2026 06:07 Inactive
@vercel vercel Bot temporarily deployed to Preview – documentation June 27, 2026 06:07 Inactive
@unkey-deploy

unkey-deploy Bot commented Jun 27, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Unkey Deploy

Name Status Preview Inspect Updated (UTC)
api (preview) Ready Visit Preview Inspect Jun 27, 2026 6:09am

@greptile-apps

greptile-apps Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a false-positive incident caused by SQL tool-level query failures permanently upgrading the parent wide-event severity to ERROR in evlog, even when the AI agent recovered and the job ultimately succeeded. The fix is a one-line change in packages/ai/src/ai/tools/utils/query.ts.

  • Changes logger.error() to logger.warn() in the catch block of executeTimedQuery, so recoverable ClickHouse query failures inside the execute_sql AI tool no longer contaminate the enclosing job event's severity level.
  • All diagnostic data (SQL snippet, execution time, error message) and the throw error re-throw remain unchanged, so callers still receive the exception and actual job-level failures continue to emit at ERROR via the separate failure path in processInsightsJob.

Confidence Score: 5/5

Safe to merge — the change is a one-line log-level downgrade that fixes a confirmed false-positive alerting issue without altering any error propagation or data path.

The only change is logger.error() → logger.warn() inside the catch block. The exception is still re-thrown, all diagnostic fields are intact, and job-level failures continue to emit at ERROR through the unmodified failure path in the calling code. There are no logic changes, no new error-swallowing, and no risk of data loss or regression.

No files require special attention — the change is confined to a single catch block in packages/ai/src/ai/tools/utils/query.ts.

Important Files Changed

Filename Overview
packages/ai/src/ai/tools/utils/query.ts Single-line change from logger.error() to logger.warn() in the catch block of executeTimedQuery; all diagnostic fields and the re-throw are preserved.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Job as processInsightsJob
    participant Agent as AI Agent
    participant Tool as execute_sql tool
    participant Query as executeTimedQuery
    participant Logger as RequestLogger (wide-event)

    Job->>Agent: run agent
    Agent->>Tool: call execute_sql
    Tool->>Query: executeTimedQuery(sql, ...)
    Query->>Query: chQuery fails
    Note over Query,Logger: BEFORE: logger.error() upgrades wide-event to ERROR
    Note over Query,Logger: AFTER:  logger.warn()  wide-event stays at INFO/WARN
    Query-->>Tool: throw error (re-throw unchanged)
    Tool-->>Agent: error caught, agent recovers
    Agent-->>Job: succeeds
    Job->>Logger: emit job_status succeeded
    Note over Logger: BEFORE: emits at ERROR (false positive)
    Note over Logger: AFTER:  emits at WARN or below (correct)
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Job as processInsightsJob
    participant Agent as AI Agent
    participant Tool as execute_sql tool
    participant Query as executeTimedQuery
    participant Logger as RequestLogger (wide-event)

    Job->>Agent: run agent
    Agent->>Tool: call execute_sql
    Tool->>Query: executeTimedQuery(sql, ...)
    Query->>Query: chQuery fails
    Note over Query,Logger: BEFORE: logger.error() upgrades wide-event to ERROR
    Note over Query,Logger: AFTER:  logger.warn()  wide-event stays at INFO/WARN
    Query-->>Tool: throw error (re-throw unchanged)
    Tool-->>Agent: error caught, agent recovers
    Agent-->>Job: succeeds
    Job->>Logger: emit job_status succeeded
    Note over Logger: BEFORE: emits at ERROR (false positive)
    Note over Logger: AFTER:  emits at WARN or below (correct)
Loading

Reviews (1): Last reviewed commit: "[superlog] Log SQL tool query failures a..." | Re-trigger Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants