Description
With google-adk==1.31.1 + LiteLlm in SSE streaming mode, one leaf of a
ParallelAgent can stop emitting events after tool responses have been
returned, without emitting a final event (partial=false, finishReason=STOP)
and without any exception appearing in the pod stdout or being yielded on the
SSE stream as an error event.
When that happens the parent ParallelAgent never completes, the enclosing
SequentialAgent never advances to the next step, and the /run_sse HTTP
stream remains open — no further data: event, no EOF. The downstream SSE
consumer eventually gives up at its own timeout layer.
We have observed this in two invocations in production traffic today; both
stall in the same leaf agent (sub_workout_recommend_agent) after it has
completed a tool-call round and started streaming its text response. Sibling
branches in the same ParallelAgent complete normally.
Environment
- google-adk: 1.31.1
- litellm: 1.83.12
- Python: 3.13.11
- Container OS: Debian GNU/Linux 13 (trixie)
- Host kernel: Linux 6.8.0
- Package location: /usr/local/lib/python3.13/site-packages
- Runtime: Kubernetes pod, FastAPI app produced by
AdkWebServer.get_fast_api_app(), endpoint /run_sse
- vLLM (server-side):
0.19.1.dev6+g6d4a8e6d2 (git dev build from commit 6d4a8e6d2, not a tagged release)
- Upstream model served by vLLM:
google/gemma-4-31B-it (max_model_len=26000), exposed as alias ait-edge-da
- LiteLlm model string:
hosted_vllm/ait-edge-da
- LiteLlm additional args:
model = LiteLlm(model="hosted_vllm/ait-edge-da", api_base="<internal-url>")
model._additional_args["extra_body"] = {
"chat_template_kwargs": {"enable_thinking": False}
}
- Agent structure:
SequentialAgent
LlmAgent (health status analysis, text only)
ParallelAgent
LlmAgent sub_workout_recommend_agent (3 tools)
LlmAgent sub_sleep_recommend_agent (1 tool)
LlmAgent (summary)
- Frequency: intermittent — 4 occurrences in ~3h20m of production traffic today.
Observed behavior — two stalled invocations
Redacted full event dumps are attached. Values (text chunks, state-delta
fields, tool args, tool response content) are replaced with
[REDACTED_TEXT len=N] / [REDACTED]; structural fields (partial,
finishReason, author, timestamp, id, invocationId, functionCall.name,
functionResponse.name) are kept verbatim.
Invocation A (e-beb6c3e3) — attached: redacted_e-beb6c3e3.jsonl (765 events)
All four partial=false events:
| time (KST) |
author |
role |
| 16:07:34 |
sub_health_analysis_agent |
text final STOP |
| 16:07:35 |
sub_sleep_recommend_agent |
tool-call round STOP |
| 16:07:35 |
sub_workout_recommend_agent |
tool-call round STOP |
| 16:07:56.119 |
sub_sleep_recommend_agent |
sleep text final STOP — last event of invocation |
sub_workout_recommend_agent has no partial=false for its text response.
sub_summary_agent is never invoked (0 events).
- No events are generated after 16:07:56.119. No EOF on the SSE stream.
- Downstream SSE consumer gives up at 16:12:56
(= 16:07:56 + 300 s idle timeout).
Invocation B (e-d9e02105) — attached: redacted_e-d9e02105.jsonl (1359 events)
| time (KST) |
author |
role |
| 16:12:55 |
sub_health_analysis_agent |
text final STOP |
| 16:12:55 |
sub_workout_recommend_agent |
tool-call round STOP (3 calls) |
| 16:12:55 |
sub_sleep_recommend_agent |
tool-call round STOP (1 call) |
| 16:13:10 |
sub_sleep_recommend_agent |
sleep text final STOP |
| 16:13:20.318 |
sub_workout_recommend_agent |
last event: partial=true |
sub_workout_recommend_agent has no partial=false for its text response.
sub_summary_agent is never invoked (0 events).
- No events after 16:13:20.318. Downstream consumer gives up at 16:18:20.
Common pattern
In both invocations:
- All four tool responses return successfully.
sub_sleep_recommend_agent completes its text response normally.
sub_workout_recommend_agent does not emit a partial=false for its
text response. It either stops mid-chunk (B) or produces no text events
that terminate (A).
- Because one branch of
recommend_parallel_agent never completes, the
parent SequentialAgent cannot advance to sub_summary_agent.
/run_sse stays open without further events and without EOF. Our gateway
eventually emits its own application-level event after a 300 s idle:
{"chunkCount":N,"error":"upstream-timeout"}
This payload is from our gateway, not from ADK.
No exception was logged in pod stdout and no error event was yielded on the
SSE stream during the idle window.
Expected behavior
ADK should not leave a branch indistinguishable from an active run indefinitely.
When the upstream model stream ends, errors, or stalls beyond a configured
timeout, the branch should:
- emit a final event (
partial=false with finishReason) and let the parent
agent complete, or
- propagate an exception / error / cancellation event so the parent agent and
the enclosing workflow can observe and react.
In the observations above there is no final event, no error, and no
cancellation — the branch remains open without an observable terminal signal, which is what makes the stall
undetectable to the parent workflow.
Related
#5342 — LiteLlm streaming bypass for function-call argument delta
buffering. Different symptom; does not cover stopped text streaming after
tool responses.
#3665 — LiteLlm streaming finish_reason missing (closed). Different
symptom; the sibling branch in our observation terminates normally with
finishReason=STOP.
Additional context
- Locally (different versions —
litellm 1.83.7, Python 3.14.4) we have
not reproduced this. The issue may be version- or environment-sensitive,
but we have not verified which component is the cause.
- Attached redacted traces:
redacted_e-beb6c3e3.jsonl — stall A
redacted_e-d9e02105.jsonl — stall B
redacted_e-7b6bbdbb.jsonl — a successful invocation for comparison
was_upstream_timeouts.log — downstream gateway timeouts
summary_agent_check.txt — sub_summary_agent event counts per invocation
- LiteLLM DEBUG traces for the same window can be provided on request;
they contain token-level deltas that likely need additional redaction
before public posting.
adk-stall-dumps.zip
Description
With
google-adk==1.31.1+LiteLlmin SSE streaming mode, one leaf of aParallelAgentcan stop emitting events after tool responses have beenreturned, without emitting a final event (
partial=false,finishReason=STOP)and without any exception appearing in the pod stdout or being yielded on the
SSE stream as an error event.
When that happens the parent
ParallelAgentnever completes, the enclosingSequentialAgentnever advances to the next step, and the/run_sseHTTPstream remains open — no further
data:event, no EOF. The downstream SSEconsumer eventually gives up at its own timeout layer.
We have observed this in two invocations in production traffic today; both
stall in the same leaf agent (
sub_workout_recommend_agent) after it hascompleted a tool-call round and started streaming its text response. Sibling
branches in the same
ParallelAgentcomplete normally.Environment
AdkWebServer.get_fast_api_app(), endpoint/run_sse0.19.1.dev6+g6d4a8e6d2(git dev build from commit6d4a8e6d2, not a tagged release)google/gemma-4-31B-it(max_model_len=26000), exposed as aliasait-edge-dahosted_vllm/ait-edge-daSequentialAgentLlmAgent(health status analysis, text only)ParallelAgentLlmAgentsub_workout_recommend_agent(3 tools)LlmAgentsub_sleep_recommend_agent(1 tool)LlmAgent(summary)Observed behavior — two stalled invocations
Invocation A (
e-beb6c3e3) — attached:redacted_e-beb6c3e3.jsonl(765 events)All four
partial=falseevents:sub_health_analysis_agentsub_sleep_recommend_agentsub_workout_recommend_agentsub_sleep_recommend_agentsub_workout_recommend_agenthas nopartial=falsefor its text response.sub_summary_agentis never invoked (0 events).(= 16:07:56 + 300 s idle timeout).
Invocation B (
e-d9e02105) — attached:redacted_e-d9e02105.jsonl(1359 events)sub_health_analysis_agentsub_workout_recommend_agentsub_sleep_recommend_agentsub_sleep_recommend_agentsub_workout_recommend_agentpartial=truesub_workout_recommend_agenthas nopartial=falsefor its text response.sub_summary_agentis never invoked (0 events).Common pattern
In both invocations:
sub_sleep_recommend_agentcompletes its text response normally.sub_workout_recommend_agentdoes not emit apartial=falsefor itstext response. It either stops mid-chunk (B) or produces no text events
that terminate (A).
recommend_parallel_agentnever completes, theparent
SequentialAgentcannot advance tosub_summary_agent./run_ssestays open without further events and without EOF. Our gatewayeventually emits its own application-level event after a 300 s idle:
No exception was logged in pod stdout and no error event was yielded on the
SSE stream during the idle window.
Expected behavior
ADK should not leave a branch indistinguishable from an active run indefinitely.
When the upstream model stream ends, errors, or stalls beyond a configured
timeout, the branch should:
partial=falsewithfinishReason) and let the parentagent complete, or
the enclosing workflow can observe and react.
In the observations above there is no final event, no error, and no
cancellation — the branch remains open without an observable terminal signal, which is what makes the stall
undetectable to the parent workflow.
Related
#5342— LiteLlm streaming bypass for function-call argument deltabuffering. Different symptom; does not cover stopped text streaming after
tool responses.
#3665— LiteLlm streamingfinish_reasonmissing (closed). Differentsymptom; the sibling branch in our observation terminates normally with
finishReason=STOP.Additional context
litellm 1.83.7,Python 3.14.4) we havenot reproduced this. The issue may be version- or environment-sensitive,
but we have not verified which component is the cause.
redacted_e-beb6c3e3.jsonl— stall Aredacted_e-d9e02105.jsonl— stall Bredacted_e-7b6bbdbb.jsonl— a successful invocation for comparisonwas_upstream_timeouts.log— downstream gateway timeoutssummary_agent_check.txt—sub_summary_agentevent counts per invocationthey contain token-level deltas that likely need additional redaction
before public posting.
adk-stall-dumps.zip