/run_sse stream stays open when a ParallelAgent branch has no terminal event

## Description

With `google-adk==1.31.1` + `LiteLlm` in SSE streaming mode, one leaf of a
`ParallelAgent` can stop emitting events after tool responses have been
returned, without emitting a final event (`partial=false`, `finishReason=STOP`)
and without any exception appearing in the pod stdout or being yielded on the
SSE stream as an error event.

When that happens the parent `ParallelAgent` never completes, the enclosing
`SequentialAgent` never advances to the next step, and the `/run_sse` HTTP
stream remains open — no further `data:` event, no EOF. The downstream SSE
consumer eventually gives up at its own timeout layer.

We have observed this in two invocations in production traffic today; both
stall in the same leaf agent (`sub_workout_recommend_agent`) after it has
completed a tool-call round and started streaming its text response. Sibling
branches in the same `ParallelAgent` complete normally.

## Environment

- google-adk: 1.31.1
- litellm: 1.83.12
- Python: 3.13.11
- Container OS: Debian GNU/Linux 13 (trixie)
- Host kernel: Linux 6.8.0
- Package location: /usr/local/lib/python3.13/site-packages
- Runtime: Kubernetes pod, FastAPI app produced by
  `AdkWebServer.get_fast_api_app()`, endpoint `/run_sse`
- vLLM (server-side): `0.19.1.dev6+g6d4a8e6d2` (git dev build from commit `6d4a8e6d2`, not a tagged release)
- Upstream model served by vLLM: `google/gemma-4-31B-it` (`max_model_len=26000`), exposed as alias `ait-edge-da`
- LiteLlm model string: `hosted_vllm/ait-edge-da`
- LiteLlm additional args:
  ```python
  model = LiteLlm(model="hosted_vllm/ait-edge-da", api_base="<internal-url>")
  model._additional_args["extra_body"] = {
      "chat_template_kwargs": {"enable_thinking": False}
  }
  ```
- Agent structure:
  - `SequentialAgent`
    - `LlmAgent`  (health status analysis, text only)
    - `ParallelAgent`
      - `LlmAgent` `sub_workout_recommend_agent`  (3 tools)
      - `LlmAgent` `sub_sleep_recommend_agent`   (1 tool)
    - `LlmAgent`  (summary)
- Frequency: intermittent — 4 occurrences in ~3h20m of production traffic today.

## Observed behavior — two stalled invocations

> Redacted full event dumps are attached. Values (text chunks, state-delta
> fields, tool args, tool response content) are replaced with
> `[REDACTED_TEXT len=N]` / `[REDACTED]`; structural fields (`partial`,
> `finishReason`, `author`, `timestamp`, `id`, `invocationId`, `functionCall.name`,
> `functionResponse.name`) are kept verbatim.

### Invocation A (`e-beb6c3e3`) — attached: `redacted_e-beb6c3e3.jsonl` (765 events)

All four `partial=false` events:

| time (KST) | author | role |
|---|---|---|
| 16:07:34 | `sub_health_analysis_agent` | text final STOP |
| 16:07:35 | `sub_sleep_recommend_agent` | tool-call round STOP |
| 16:07:35 | `sub_workout_recommend_agent` | tool-call round STOP |
| **16:07:56.119** | **`sub_sleep_recommend_agent`** | **sleep text final STOP — last event of invocation** |

- `sub_workout_recommend_agent` has **no `partial=false` for its text response**.
- `sub_summary_agent` is never invoked (0 events).
- No events are generated after 16:07:56.119. No EOF on the SSE stream.
- Downstream SSE consumer gives up at 16:12:56
  (= 16:07:56 + 300 s idle timeout).

### Invocation B (`e-d9e02105`) — attached: `redacted_e-d9e02105.jsonl` (1359 events)

| time (KST) | author | role |
|---|---|---|
| 16:12:55 | `sub_health_analysis_agent` | text final STOP |
| 16:12:55 | `sub_workout_recommend_agent` | tool-call round STOP (3 calls) |
| 16:12:55 | `sub_sleep_recommend_agent` | tool-call round STOP (1 call) |
| 16:13:10 | `sub_sleep_recommend_agent` | sleep text final STOP |
| **16:13:20.318** | **`sub_workout_recommend_agent`** | **last event: `partial=true`** |

- `sub_workout_recommend_agent` has **no `partial=false` for its text response**.
- `sub_summary_agent` is never invoked (0 events).
- No events after 16:13:20.318. Downstream consumer gives up at 16:18:20.

### Common pattern

In both invocations:

1. All four tool responses return successfully.
2. `sub_sleep_recommend_agent` completes its text response normally.
3. `sub_workout_recommend_agent` does **not** emit a `partial=false` for its
   text response. It either stops mid-chunk (B) or produces no text events
   that terminate (A).
4. Because one branch of `recommend_parallel_agent` never completes, the
   parent `SequentialAgent` cannot advance to `sub_summary_agent`.
5. `/run_sse` stays open without further events and without EOF. Our gateway
   eventually emits its own application-level event after a 300 s idle:
   ```
   {"chunkCount":N,"error":"upstream-timeout"}
   ```
   This payload is **from our gateway, not from ADK**.

No exception was logged in pod stdout and no error event was yielded on the
SSE stream during the idle window.

## Expected behavior

ADK should not leave a branch indistinguishable from an active run indefinitely.
When the upstream model stream ends, errors, or stalls beyond a configured
timeout, the branch should:

- emit a final event (`partial=false` with `finishReason`) and let the parent
  agent complete, or
- propagate an exception / error / cancellation event so the parent agent and
  the enclosing workflow can observe and react.

In the observations above there is no final event, no error, and no
cancellation — the branch remains open without an observable terminal signal, which is what makes the stall
undetectable to the parent workflow.

## Related

- `#5342` — LiteLlm streaming bypass for function-call argument delta
  buffering. Different symptom; does not cover stopped text streaming after
  tool responses.
- `#3665` — LiteLlm streaming `finish_reason` missing (closed). Different
  symptom; the sibling branch in our observation terminates normally with
  `finishReason=STOP`.

## Additional context

- Locally (different versions — `litellm 1.83.7`, `Python 3.14.4`) we have
  not reproduced this. The issue may be version- or environment-sensitive,
  but we have not verified which component is the cause.
- Attached redacted traces:
  - `redacted_e-beb6c3e3.jsonl` — stall A
  - `redacted_e-d9e02105.jsonl` — stall B
  - `redacted_e-7b6bbdbb.jsonl` — a successful invocation for comparison
  - `was_upstream_timeouts.log` — downstream gateway timeouts
  - `summary_agent_check.txt` — `sub_summary_agent` event counts per invocation
- LiteLLM DEBUG traces for the same window can be provided on request;
  they contain token-level deltas that likely need additional redaction
  before public posting.

[adk-stall-dumps.zip](https://github.com/user-attachments/files/27003338/adk-stall-dumps.zip)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

/run_sse stream stays open when a ParallelAgent branch has no terminal event #5455

Description

Environment

Observed behavior — two stalled invocations

Invocation A (`e-beb6c3e3`) — attached: `redacted_e-beb6c3e3.jsonl` (765 events)

Invocation B (`e-d9e02105`) — attached: `redacted_e-d9e02105.jsonl` (1359 events)

Common pattern

Expected behavior

Related

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

time (KST)	author	role
16:07:34	`sub_health_analysis_agent`	text final STOP
16:07:35	`sub_sleep_recommend_agent`	tool-call round STOP
16:07:35	`sub_workout_recommend_agent`	tool-call round STOP
16:07:56.119	`sub_sleep_recommend_agent`	sleep text final STOP — last event of invocation

time (KST)	author	role
16:12:55	`sub_health_analysis_agent`	text final STOP
16:12:55	`sub_workout_recommend_agent`	tool-call round STOP (3 calls)
16:12:55	`sub_sleep_recommend_agent`	tool-call round STOP (1 call)
16:13:10	`sub_sleep_recommend_agent`	sleep text final STOP
16:13:20.318	`sub_workout_recommend_agent`	last event: `partial=true`

/run_sse stream stays open when a ParallelAgent branch has no terminal event #5455

Description

Description

Environment

Observed behavior — two stalled invocations

Invocation A (e-beb6c3e3) — attached: redacted_e-beb6c3e3.jsonl (765 events)

Invocation B (e-d9e02105) — attached: redacted_e-d9e02105.jsonl (1359 events)

Common pattern

Expected behavior

Related

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Invocation A (`e-beb6c3e3`) — attached: `redacted_e-beb6c3e3.jsonl` (765 events)

Invocation B (`e-d9e02105`) — attached: `redacted_e-d9e02105.jsonl` (1359 events)