Skip to content

fix(fetchers): honor Retry-After header on 429/503 responses#1670

Open
Sanjays2402 wants to merge 2 commits intoMemTensor:mainfrom
Sanjays2402:fix/issue-1620
Open

fix(fetchers): honor Retry-After header on 429/503 responses#1670
Sanjays2402 wants to merge 2 commits intoMemTensor:mainfrom
Sanjays2402:fix/issue-1620

Conversation

@Sanjays2402
Copy link
Copy Markdown

Summary

The LLM and embedding fetchers in apps/memos-local-plugin classified HTTP 429 as transient and retried with exponential backoff, but never read the Retry-After header. Under provider rate limiting, retries fired before the upstream-requested cooldown, extending throttling and wasting paid API calls.

This PR parses Retry-After per RFC 7231 §7.1.3 (delta-seconds or HTTP-date), caps the value to a sane maximum, and prefers it over exponential backoff on 429/503. Falls back to the existing backoff when the header is absent or unparseable.

Changes

  • New shared helper apps/memos-local-plugin/core/util/retry-after.ts exporting parseRetryAfterMs(resp, capMs).
  • apps/memos-local-plugin/core/llm/fetcher.ts: parse Retry-After on 429/503 and use it as the sleep duration; small refactor split backoff() into backoffMs() + sleep() so both code paths share one timer.
  • apps/memos-local-plugin/core/embedding/fetcher.ts: same treatment, mirroring the LLM fetcher.
  • Tests added/extended:
    • tests/unit/util/retry-after.test.ts — covers integer seconds, HTTP-date, capping, malformed, missing, past-date, and negative inputs.
    • tests/unit/llm/fetcher.test.ts and tests/unit/embedding/fetcher.test.ts — three new cases each: integer Retry-After is honored, HTTP-date is honored, malformed value falls back to backoff.

Behavior

  • Retry-After: 5 → sleep ~5s instead of next backoff step.
  • Retry-After: Wed, 21 Oct 2025 07:28:00 GMT → sleep ~(date - now), capped.
  • Header absent or malformed → falls back to existing exponential backoff (no behavior change).
  • Capped at 60s (MAX_RETRY_AFTER_MS) to prevent hostile or buggy servers from pinning the client indefinitely.
  • Only 429 and 503 honor Retry-After. Other 5xx still use exponential backoff (the header is rarely set on those, and we want to keep retrying quickly on non-rate-limit failures).

Notes

  • Could not run tsc --noEmit locally (no node_modules in this worktree), so types were verified by inspection. The new helper has explicit return type number | null and matches the import-style (.js extension, Bundler resolution) used elsewhere in the codebase.
  • Tests use the existing real-timer pattern in tests/unit/embedding/fetcher.test.ts; sleep upper bounds are generous (≥900 ms for Retry-After: 1) to avoid flakes on slow CI.

Fixes #1620

LLM and embedding fetchers in memos-local-plugin classified 429 as
transient and retried with exponential backoff, but never inspected the
Retry-After response header. Under provider rate limiting, retries fired
before the upstream-requested cooldown, extending throttling and wasting
paid API calls.

Parse Retry-After per RFC 7231 §7.1.3 (delta-seconds or HTTP-date),
cap to a sane maximum to avoid hostile pinning, and prefer it over
exponential backoff on 429/503. Fall back to the existing backoff when
the header is absent or unparseable.

Fixes MemTensor#1620
Copilot AI review requested due to automatic review settings May 9, 2026 19:17
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the LLM and embedding HTTP fetchers in apps/memos-local-plugin to honor the upstream Retry-After header on 429/503 responses, preferring the server-provided cooldown over the existing exponential backoff (with a 60s cap), and adds unit coverage for the new behavior.

Changes:

  • Added a shared parseRetryAfterMs(resp, capMs) helper that supports delta-seconds and HTTP-date formats.
  • Updated both LLM and embedding fetchers to use Retry-After (when present/parseable) for 429/503 retry sleeps, otherwise falling back to exponential backoff.
  • Added/extended unit tests to cover parsing behavior and fetcher retry timing behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
apps/memos-local-plugin/core/util/retry-after.ts New helper to parse and cap Retry-After into a millisecond delay.
apps/memos-local-plugin/core/llm/fetcher.ts Uses Retry-After delay for 429/503 retries; refactors backoff into backoffMs() + sleep().
apps/memos-local-plugin/core/embedding/fetcher.ts Mirrors LLM fetcher behavior to honor Retry-After for 429/503 retries.
apps/memos-local-plugin/tests/unit/util/retry-after.test.ts Adds unit tests for delta-seconds, HTTP-date, malformed values, and capping behavior.
apps/memos-local-plugin/tests/unit/llm/fetcher.test.ts Adds timing-based tests asserting Retry-After is honored or falls back to backoff.
apps/memos-local-plugin/tests/unit/embedding/fetcher.test.ts Adds timing-based tests asserting Retry-After is honored or falls back to backoff.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +10 to +11
* `null` when the header is absent, malformed, or non-positive — callers
* should fall back to their existing backoff strategy.
import { parseRetryAfterMs } from "../util/retry-after.js";
import type { LlmProviderLogger, LlmProviderName } from "./types.js";

/** Hard ceiling on any single retry sleep — Retry-After or otherwise. */
Comment on lines +14 to +15
/** Hard ceiling on any single retry sleep — Retry-After or otherwise. */
const MAX_RETRY_AFTER_MS = 60_000;
…0 is valid

Address Copilot review on MemTensor#1670:
- parseRetryAfterMs JSDoc: document that 0 (retry immediately) is a
  valid value; only negative or unparseable values return null.
- LLM and embedding fetchers: cap the final sleep duration with
  MAX_RETRY_AFTER_MS regardless of whether it came from Retry-After
  or exponential backoff. Previously backoff could exceed 60s on
  high attempt counts (config allows up to 10 retries).
@Sanjays2402
Copy link
Copy Markdown
Author

Thanks for the review — addressed in the latest commit:

  • retry-after.ts:11: Updated the JSDoc to clarify that 0 (retry immediately) is a valid value; only negative or unparseable values return null. Tests already covered the 0 case and continue to pass.
  • llm/fetcher.ts:17 / embedding/fetcher.ts:15: Applied Math.min(sleepMs, MAX_RETRY_AFTER_MS) to the final sleep duration in both fetchers, so the 60s cap applies whether the value came from Retry-After or backoffMs(). Previously a maxRetries: 10 config could let exponential backoff exceed 60s.

Diff is small and behavior-only-on-the-margin (only changes anything when attempt >= ~6 in a long retry chain, or when Retry-After: 0 arrives).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

429 retry handling ignores Retry-After header in LLM and embedding fetchers

2 participants