feat(tts): add gradium tts support#2193
Conversation
TTS guarder + standalone test records (commit 931cc24)Addressed both review comments on
TTS guarder —
|
| Case | Result |
|---|---|
| test_append_input | ✅ passed |
| test_append_input_stress | ✅ passed |
| test_append_input_without_text_input_end | ✅ passed |
| test_append_interrupt | ✅ passed |
| test_basic_audio_setting | ✅ passed |
| test_corner_input | ✅ passed |
| test_dump | ✅ passed |
| test_dump_each_request_id | ✅ passed |
| test_empty_text_request | ✅ passed |
| test_flush | ✅ passed |
| test_interleaved_requests | ✅ passed |
| test_invalid_required_params | ✅ passed |
| test_invalid_text_handling | ✅ passed |
| test_metrics | ✅ passed |
| test_miss_required_params | ✅ passed |
| test_subtitle_alignment | ⏭️ skipped (disabled by default; optional — only for providers that emit word/segment timing, which Gradium does not) |
Standalone suite — task test-extension EXTENSION=...gradium_tts_python
28 passed (includes a new test_output_format_derived_from_sample_rate_only that locks in: a user-supplied output_format is ignored, sample_rate drives the derived pcm_<rate>, and it is not leaked as a vendor passthrough param).
Both runs were executed on the branch code; black format-check is clean.
Code Review: Gradium TTS extensionThanks for the thorough work here — the per-request socket model, the single finalize path, and the test suite mirroring the sibling extensions are all solid. A few things worth a look before merge. Potential bugs1. For a strictly FIFO base class these line up, but the design is fragile:
What's the queue actually buying you over just using the 2. 3. Clean-close detection relies on 4. Code quality
Performance
Security
TestsGood coverage of batching, interleaving, metrics-once, and the state machine. Gaps:
Overall this is close. The |
Review: feat(tts): add gradium tts supportSolid, well-structured extension that closely mirrors the sibling TTS extensions. The per-request socket model, request-scoped Potential bugs
Code quality
Security
TestsGood coverage across basic audio, flush, state machine, params, metrics, and error handling, plus the gradium-specific interleaving and
The follow-ups you listed (punctuation/list robustness regressions, explicit 48 kHz coverage) are reasonable to defer. Nice work overall. |
Update — commits
|
|
I don't think Gradium should batch all text segments with the same A few reasons:
|
|
Review: feat(tts): add gradium tts support Solid extension. It follows the sibling-TTS pattern, the per-request streaming session model is clean, and test coverage is good. A few issues worth addressing before merge. Bugs / correctness
Robustness
Style / minor
Test coverage Coverage is strong — basic audio, dump, flush, segmented sessions, metrics (incl. once-per-request TTFB), params passthrough, sample-rate derivation, robustness (empty/whitespace/punctuation/long/special), and auth-error classification. Two gaps:
Overall this is close. Issues #1 and #2 are the ones I'd want resolved before merge since they affect real graph configs, not just internals. |
Update — stream segments immediately (commit
|
Gradium previously accumulated same-request_id segments into a local buffer and only flushed on a sentence/size threshold or text_input_end. Combined with the base class re-queuing interleaved messages, this could drop sections of TTS audio and required a non-standard on_data override. Forward each segment to the vendor as it arrives over one persistent per-request websocket session (start_session/send_text/end_input plus a concurrent audio reader), matching Gradium's LLM-to-TTS streaming guidance and the other websocket TTS extensions. The base class now owns queuing/ordering, so the on_data override and ingress_messages are removed. Also in this change: - parse json_config (a JSON string per the manifest schema) into an object before sending it on the wire, so manifest, config and README agree. - cancel the background reader task on a send/setup failure so it cannot outlive the request. - read the websockets-14 close code from exc.rcvd/exc.sent (it is no longer a top-level exc.code) so a clean 1000 close is treated as end-of-stream. Tests rewritten to the streaming API, with added coverage for immediate forwarding, single-session-per-request, json_config parsing, reader-task cancellation, and punctuation-only input.
89da4f7 to
cb1a38c
Compare
Review:
|
Capture the gates that were easy to miss this round: the commitlint body-max-line-length rule (there is no local commit-msg hook, so CI is the first thing to catch a long body line), and a single pre-push checklist covering black, lint, standalone tests, guarder tests and commit messages. Also note the stale .ten/ gotcha: a leftover standalone install makes task check report spurious reformatting and breaks the next install.
Review: feat(tts): add gradium tts supportSolid, well-structured extension. It uses the correct websockets 14 API ( A few things worth considering before merge. Potential issues1. raw_msg = await asyncio.wait_for(self.ws.recv(), timeout=WS_RECV_TIMEOUT)But the config exposes 2. Passthrough params can clobber reserved setup fields. if self.config.voice_id:
payload["voice_id"] = self.config.voice_id
...
for key, value in self.config.params.items():
payload[key] = valueA passthrough param named 3. Possible orphaned reader task on request switch without self._reader_task = asyncio.create_task(self._read_audio(t.request_id))In the normal flow the prior request ends via Minor
Test coverageStrong. Covers basic audio, flush, dump byte-for-byte comparison, segmented/single-TTFB metrics, params passthrough + Overall this is in good shape — items 1–3 are the ones I'd want addressed or consciously waived before merge. |
Summary
Adds the Gradium TTS extension (
gradium_tts_python), a websocket streaming TTS built onAsyncTTS2BaseExtension, plus avoice_assistant_gradiumexample graph.gradium_tts.py): websocket setup →ready→text/end_of_stream→audio/error, with auth-error classification (401/403/1008 → fatal).extension.py): streams audio through a single finalize path (_finalize_request→finish_request); request-level TTFB metric (emitted once per request even across multiple vendor segments); audio framing prefers the server-readysample rate, falling back to config.config.py): pydantic model with param normalization, output-format/sample-rate reconciliation, and api-key redaction in logs.voice_assistant_gradiumpredefined graph (deepgram ASR + openai LLM + gradium TTS), manifest dependency, regenerated lockfile.Tests
Standalone suite mirrors the sibling TTS extensions (basic audio, flush, state machine, robustness, params, metrics, error handling), plus gradium-specific coverage for its text-batching and per-request socket model.
task test-extension EXTENSION=agents/ten_packages/extension/gradium_tts_python→ 27 passedtask tts-guarder-test EXTENSION=gradium_tts_python CONFIG_DIR=tests/configs→ 15 passed, 1 skippedFollow-ups (non-blocking)
pcm_48000.