Skip to content

orchestrator: silent sidecar boot failure on first install_embeddings — 60s timeout exceeded by cold-start downloads #1

@evannadeau

Description

@evannadeau

Summary

On a fresh install over a residential connection, the first install_embeddings(install) call fails with Sidecar failed to start. Check the logs for details. despite a healthy host (Python 3.12.3, uv 0.11.7). A second invocation succeeds because the first attempt populated the uv and HuggingFace caches. The "check the logs" guidance is non-actionable — sidecar stdout/stderr are routed to "ignore", so no log file is produced.

Likely not reproducible on fast/datacenter connections where the cold-start downloads finish well inside the current 60 s window.

Environment

  • Plugin version: 0.25.2
  • Host: WSL2 Ubuntu 24.04, residential broadband (~100 Mbps)
  • Python 3.12.3, uv 0.11.7

Root cause

mcp/server.ts:191-196 (trySpawn for uvx --with-requirements) sets a 60 s timeout. On first run that path must:

  1. Resolve requirements.txt and download wheels (onnxruntime ~150 MB, tokenizers, numpy, huggingface_hub).
  2. Build a uv-managed venv.
  3. Download the bge-m3 ONNX model (~2 GB) from HuggingFace under unauthenticated rate limits.
  4. Load the ONNX session and bind the HTTP port.

On residential bandwidth the sum exceeded 60 s; trySpawn killed the process before .sidecar-port was written. Back-of-envelope: 2.15 GB at ~100 Mbps is ~170 s of download alone — well over budget. At gigabit it's ~17 s, comfortably under.

Compounding diagnostics gap

mcp/server.ts:109-110 sets stdout: "ignore", stderr: "ignore" on the spawned sidecar. Boot failures emit zero output, even though system_status advises "check the logs." There is no log.

Manual reproduction with visible stderr

embed_server.py does emit useful INFO logs — they're discarded by the spawn. Direct invocation captures them:

uvx --with-requirements <plugin-root>/sidecar/requirements.txt \
    python <plugin-root>/sidecar/embed_server.py \
    --port 0 --port-file /tmp/diag.port

Output: Downloading/caching model …Loading ONNX session …Model ready (dim=1024) …Wrote port N to …. Any of these would have made the first failure self-diagnosing.

Why the second call succeeded

The first failed attempt populated ~/.cache/uv/ and ~/.cache/huggingface/hub/models--BAAI--bge-m3 (~2.2 GB). Re-invoking skipped both downloads; sidecar booted in ~10 s and reported dim=1024. The underlying boot path is sound — it just needs more time on first run and visible diagnostics when it doesn't get it.

Recommendations (priority order)

  1. Surface stderr. Redirect spawned sidecar stderr to <pluginRoot>/.sidecar.log (truncated on each spawn) instead of "ignore". Surface the path in system_status when sidecar status is error. Highest leverage — converts every future variant of this failure into a self-diagnosing one.
  2. Raise the cold-start timeout to ~180 s, or expose it via env var (ORCH_SIDECAR_BOOT_TIMEOUT_MS). Eliminates the failure on slow links without affecting fast-link users.
  3. Cached-model hint in install_embeddings(check). If the model is in the HF cache but the sidecar isn't running, mention "model already cached, expect ~10 s boot." Disambiguates first-run from broken-run for users on a second attempt.

Happy to follow up with a PR if a maintainer signals interest. Thanks for the plugin — install completed on the second install_embeddings call and adoption is underway.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions