You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During a Redis server upgrade the insights worker emitted two worker.error ERROR logs (READONLY Writes are temporarily rejected due to server upgrade and ERR caller gone), creating a false-positive incident. The BullMQ worker is configured with maxRetriesPerRequest: null so ioredis reconnects automatically; all jobs continued to succeed after both errors.
The root cause is that worker.on("error") unconditionally logs every Worker-level connection error at ERROR severity, even for well-known transient Redis signals that require no operator action.
This patch adds a TRANSIENT_REDIS_ERROR_PATTERNS list (READONLY, ERR caller gone, ECONNRESET, Connection is closed, Socket closed unexpectedly) and routes matched errors to WARN level. Unexpected Worker errors continue to be logged at ERROR. An alternative approach would be to suppress transient errors entirely (no log at all), but WARN preserves visibility for debugging without triggering incidents.
Was this PR helpful? Leave feedback — goes straight to the Superlog team.
Summary by cubic
Downgraded transient Redis connection errors in the insights worker from ERROR to WARN to prevent false-positive incidents during upgrades and failovers. Unexpected errors still log at ERROR; job processing is unchanged.
Bug Fixes
Added detection for transient Redis errors (READONLY, ERR caller gone, ECONNRESET, Connection is closed, Socket closed unexpectedly) and treat them as WARN.
Updated worker.on("error") to choose WARN vs ERROR in emitInsightsEvent. The bullmq worker uses maxRetriesPerRequest: null, so ioredis auto-reconnects and jobs continue to succeed.
Written for commit 64906ff. Summary will update on new commits.
This PR addresses a false-positive incident caused by transient Redis errors during a server upgrade being logged at ERROR severity in the insights worker. It introduces a TRANSIENT_REDIS_ERROR_PATTERNS list and an isTransientRedisError helper that routes matched errors to WARN while keeping unrecognized errors at ERROR.
Adds five regex patterns covering known BullMQ/ioredis transient signals (READONLY, ERR caller gone, ECONNRESET, Connection is closed, Socket closed unexpectedly) and branches the worker.on("error") handler on whether the error matches, preserving full observability at WARN level.
The emitInsightsEvent signature already accepts "warn" as a valid LogLevel, so no type changes are needed and the integration is correct.
Confidence Score: 4/5
Safe to merge — the change is narrow, type-correct, and the fallback path (unknown errors stay at ERROR) is sound.
The new isTransientRedisError helper and patterns are unexported with no unit tests, so the exact regex boundary between WARN and ERROR cannot be regression-tested. Everything else — the type compatibility with LogLevel, the correct scoping to the worker.on("error") event (not job failures), and the safe default for non-matching errors — is solid.
apps/insights/src/worker.ts — specifically the untested isTransientRedisError function and pattern list.
Important Files Changed
Filename
Overview
apps/insights/src/worker.ts
Adds TRANSIENT_REDIS_ERROR_PATTERNS and isTransientRedisError to selectively downgrade known transient Redis errors from ERROR to WARN in the worker error handler; logic is correct and type-safe, but the helper is unexported with no unit tests.
The reason will be displayed to describe this comment to others. Learn more.
isTransientRedisError is unexported and untested
The patterns in TRANSIENT_REDIS_ERROR_PATTERNS are the sole gate between WARN and ERROR alerting. There is no worker.test.ts, and the function is unexported, so there is currently no way to verify that the regexes correctly match the intended Redis messages (e.g. READONLY Writes are temporarily rejected…) and do not accidentally match unrelated errors. If a pattern is wrong or a Redis error message format changes, operators would silently receive WARN for a real incident with no indication anything is wrong. Exporting isTransientRedisError (or the pattern list) and adding a few unit tests would make the boundary explicit and regression-proof.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
During a Redis server upgrade the insights worker emitted two
worker.errorERROR logs (READONLY Writes are temporarily rejected due to server upgradeandERR caller gone), creating a false-positive incident. The BullMQ worker is configured withmaxRetriesPerRequest: nullso ioredis reconnects automatically; all jobs continued to succeed after both errors.The root cause is that
worker.on("error")unconditionally logs every Worker-level connection error at ERROR severity, even for well-known transient Redis signals that require no operator action.This patch adds a
TRANSIENT_REDIS_ERROR_PATTERNSlist (READONLY,ERR caller gone,ECONNRESET,Connection is closed,Socket closed unexpectedly) and routes matched errors to WARN level. Unexpected Worker errors continue to be logged at ERROR. An alternative approach would be to suppress transient errors entirely (no log at all), but WARN preserves visibility for debugging without triggering incidents.Incident on Superlog
Was this PR helpful? Leave feedback — goes straight to the Superlog team.
Summary by cubic
Downgraded transient Redis connection errors in the insights worker from ERROR to WARN to prevent false-positive incidents during upgrades and failovers. Unexpected errors still log at ERROR; job processing is unchanged.
worker.on("error")to choose WARN vs ERROR inemitInsightsEvent. Thebullmqworker usesmaxRetriesPerRequest: null, soioredisauto-reconnects and jobs continue to succeed.Written for commit 64906ff. Summary will update on new commits.