Skip to content

/run_sse stream stays open when a ParallelAgent branch has no terminal event #5455

@dlwldnjs1009

Description

@dlwldnjs1009

Description

With google-adk==1.31.1 + LiteLlm in SSE streaming mode, one leaf of a
ParallelAgent can stop emitting events after tool responses have been
returned, without emitting a final event (partial=false, finishReason=STOP)
and without any exception appearing in the pod stdout or being yielded on the
SSE stream as an error event.

When that happens the parent ParallelAgent never completes, the enclosing
SequentialAgent never advances to the next step, and the /run_sse HTTP
stream remains open — no further data: event, no EOF. The downstream SSE
consumer eventually gives up at its own timeout layer.

We have observed this in two invocations in production traffic today; both
stall in the same leaf agent (sub_workout_recommend_agent) after it has
completed a tool-call round and started streaming its text response. Sibling
branches in the same ParallelAgent complete normally.

Environment

  • google-adk: 1.31.1
  • litellm: 1.83.12
  • Python: 3.13.11
  • Container OS: Debian GNU/Linux 13 (trixie)
  • Host kernel: Linux 6.8.0
  • Package location: /usr/local/lib/python3.13/site-packages
  • Runtime: Kubernetes pod, FastAPI app produced by
    AdkWebServer.get_fast_api_app(), endpoint /run_sse
  • vLLM (server-side): 0.19.1.dev6+g6d4a8e6d2 (git dev build from commit 6d4a8e6d2, not a tagged release)
  • Upstream model served by vLLM: google/gemma-4-31B-it (max_model_len=26000), exposed as alias ait-edge-da
  • LiteLlm model string: hosted_vllm/ait-edge-da
  • LiteLlm additional args:
    model = LiteLlm(model="hosted_vllm/ait-edge-da", api_base="<internal-url>")
    model._additional_args["extra_body"] = {
        "chat_template_kwargs": {"enable_thinking": False}
    }
  • Agent structure:
    • SequentialAgent
      • LlmAgent (health status analysis, text only)
      • ParallelAgent
        • LlmAgent sub_workout_recommend_agent (3 tools)
        • LlmAgent sub_sleep_recommend_agent (1 tool)
      • LlmAgent (summary)
  • Frequency: intermittent — 4 occurrences in ~3h20m of production traffic today.

Observed behavior — two stalled invocations

Redacted full event dumps are attached. Values (text chunks, state-delta
fields, tool args, tool response content) are replaced with
[REDACTED_TEXT len=N] / [REDACTED]; structural fields (partial,
finishReason, author, timestamp, id, invocationId, functionCall.name,
functionResponse.name) are kept verbatim.

Invocation A (e-beb6c3e3) — attached: redacted_e-beb6c3e3.jsonl (765 events)

All four partial=false events:

time (KST) author role
16:07:34 sub_health_analysis_agent text final STOP
16:07:35 sub_sleep_recommend_agent tool-call round STOP
16:07:35 sub_workout_recommend_agent tool-call round STOP
16:07:56.119 sub_sleep_recommend_agent sleep text final STOP — last event of invocation
  • sub_workout_recommend_agent has no partial=false for its text response.
  • sub_summary_agent is never invoked (0 events).
  • No events are generated after 16:07:56.119. No EOF on the SSE stream.
  • Downstream SSE consumer gives up at 16:12:56
    (= 16:07:56 + 300 s idle timeout).

Invocation B (e-d9e02105) — attached: redacted_e-d9e02105.jsonl (1359 events)

time (KST) author role
16:12:55 sub_health_analysis_agent text final STOP
16:12:55 sub_workout_recommend_agent tool-call round STOP (3 calls)
16:12:55 sub_sleep_recommend_agent tool-call round STOP (1 call)
16:13:10 sub_sleep_recommend_agent sleep text final STOP
16:13:20.318 sub_workout_recommend_agent last event: partial=true
  • sub_workout_recommend_agent has no partial=false for its text response.
  • sub_summary_agent is never invoked (0 events).
  • No events after 16:13:20.318. Downstream consumer gives up at 16:18:20.

Common pattern

In both invocations:

  1. All four tool responses return successfully.
  2. sub_sleep_recommend_agent completes its text response normally.
  3. sub_workout_recommend_agent does not emit a partial=false for its
    text response. It either stops mid-chunk (B) or produces no text events
    that terminate (A).
  4. Because one branch of recommend_parallel_agent never completes, the
    parent SequentialAgent cannot advance to sub_summary_agent.
  5. /run_sse stays open without further events and without EOF. Our gateway
    eventually emits its own application-level event after a 300 s idle:
    {"chunkCount":N,"error":"upstream-timeout"}
    
    This payload is from our gateway, not from ADK.

No exception was logged in pod stdout and no error event was yielded on the
SSE stream during the idle window.

Expected behavior

ADK should not leave a branch indistinguishable from an active run indefinitely.
When the upstream model stream ends, errors, or stalls beyond a configured
timeout, the branch should:

  • emit a final event (partial=false with finishReason) and let the parent
    agent complete, or
  • propagate an exception / error / cancellation event so the parent agent and
    the enclosing workflow can observe and react.

In the observations above there is no final event, no error, and no
cancellation — the branch remains open without an observable terminal signal, which is what makes the stall
undetectable to the parent workflow.

Related

  • #5342 — LiteLlm streaming bypass for function-call argument delta
    buffering. Different symptom; does not cover stopped text streaming after
    tool responses.
  • #3665 — LiteLlm streaming finish_reason missing (closed). Different
    symptom; the sibling branch in our observation terminates normally with
    finishReason=STOP.

Additional context

  • Locally (different versions — litellm 1.83.7, Python 3.14.4) we have
    not reproduced this. The issue may be version- or environment-sensitive,
    but we have not verified which component is the cause.
  • Attached redacted traces:
    • redacted_e-beb6c3e3.jsonl — stall A
    • redacted_e-d9e02105.jsonl — stall B
    • redacted_e-7b6bbdbb.jsonl — a successful invocation for comparison
    • was_upstream_timeouts.log — downstream gateway timeouts
    • summary_agent_check.txtsub_summary_agent event counts per invocation
  • LiteLLM DEBUG traces for the same window can be provided on request;
    they contain token-level deltas that likely need additional redaction
    before public posting.

adk-stall-dumps.zip

Metadata

Metadata

Assignees

No one assigned

    Labels

    core[Component] This issue is related to the core interface and implementation

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions