Skip to content

feat: Add extension support for EZ-AI TW TTS #2100

Open
samx81 wants to merge 3 commits into
TEN-framework:mainfrom
samx81:feat/twtts_ext
Open

feat: Add extension support for EZ-AI TW TTS #2100
samx81 wants to merge 3 commits into
TEN-framework:mainfrom
samx81:feat/twtts_ext

Conversation

@samx81

@samx81 samx81 commented Mar 11, 2026

Copy link
Copy Markdown

Summary

Add extension for EZ-AI TW TTS.
Implemented to satisfy specific vendor requirements for the Taiwanese user base.

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Performance improvement
  • Code refactoring

Testing

  • Tests added/updated
  • All tests pass
  • Manual testing completed

Documentation

  • Documentation updated
  • Examples provided if needed

Breaking Changes

No Breaking Changes

@github-actions

Copy link
Copy Markdown

Code Review — EZ-AI TW TTS extension (ezai_tw_tts_python)

Thanks for the contribution. The extension follows the AsyncTTS2BaseExtension pattern well and the request lifecycle (start → audio data → end → usage metrics) is handled cleanly. I found a few issues worth addressing before merge.

Blocking bugs

  1. to_str() signature mismatch — crashes on init. config.py defines def to_str(self) -> str: (no args), but extension.py calls self.config.to_str(sensitive_handling=True). This raises TypeError on every init, which is swallowed and reported as a fatal init error. Match the sibling extensions (e.g. gradium_tts_python) and accept sensitive_handling: bool = True.

  2. speed, denoise, zh_model are read from the wrong place. update_params() only promotes url, voice, sample_rate, channels, sample_width out of self.params. But request_tts builds the payload with getattr(self.config, 'speed', 0.8), getattr(self.config, 'denoise', False), getattr(self.config, 'zh_model', ''). So: speed is never a declared field and never promoted (always falls back to 0.8 — params.speed in property.json is ignored); denoise and zh_model are declared fields but update_params() never copies the params values into them, so config-file values are ignored and field defaults win. Net effect: three documented tuning params don't take effect. Pick one source of truth and promote all three in update_params().

  3. denoise default contradicts itself. config.py default is True, property.json says false, payload fallback is False. Reconcile.

Correctness / robustness

  1. cancel_tts can double-send audio_end. cancel_tts sends audio_end(INTERRUPTED) + usage metrics; the in-flight request_tts loop then breaks on the cancel event and its finally block sends a second audio_end(REQUEST_END) + metrics for the same request_id. Guard the finally path with current_request_finished / the cancel event.

  2. Per-segment blocking HTTP hurts latency. Each sentence is a separate synchronous requests.post in asyncio.to_thread, fully buffered before yielding, with a hardcoded 60s timeout. For multi-sentence input this serializes round-trips and inflates TTFB. Consider an async HTTP client (other extensions use aiohttp/websockets) and make the timeout configurable.

  3. Module-level heavy init. opencc.OpenCC(...), ZhNormalizer(), SentenceSegmenter(...) run at import time. Move into on_init (consider asyncio.to_thread) so load failures surface through the normal init error path and don't block the event loop.

  4. Fire-and-forget dump writes. asyncio.create_task(...write(frames)) is never awaited or tracked — tasks can be GC'd, write out of order, or race the flush(). Await or collect-then-await before flush.

  5. Silent drops on bad params. The except (TypeError, ValueError) branches del the key without setting the field or logging. Add a warning so bad property files are debuggable.

Style / packaging

  1. Unpinned + git dependency. requirements.txt pins nothing and pulls git+https://github.com/samx81/text_utils.git from an unpinned branch. Per repo conventions, pin versions (and pin the git dep to a tag/SHA). The personal-repo git dependency is a supply-chain / reproducibility concern for a vendored extension — worth deciding whether it should be vendored or moved under the org.

  2. README copy-pasted from vibevoice. Title/body still say vibevoice_tts_websocket_python; docs describe a websocket endpoint but the code does HTTP POST over https; the example JSON has a trailing comma after zh_model (invalid JSON); params.speed is documented but not wired up (bug 2). Please regenerate for this extension.

  3. Leftover artifacts. Dump prefix is vibevoice_dump_{id}.pcm (should be ezai-specific); commented-out zh_model line; manifest includes **.tent and BUILD.gn that don't exist here; manifest.json and property.json are missing trailing newlines.

  4. No tests. No tests/ dir; checklist has tests unchecked. Sibling HTTP TTS extensions (e.g. cartesia_tts) ship test_basic / test_params / test_error_msg / test_robustness. Bugs 1 and 2 would have been caught by a basic init/params test — please add at least smoke + params round-trip coverage.

Security

  • Default url points at matcha.ezai-k8s.freeddns.org — confirm this is the intended public default, not an internal host.
  • No auth/token handling. If the endpoint needs a key later, wire it through params with sensitive masking in to_str like the other extensions.

Overall the structure is solid and close, but bugs 1 and 2 mean the extension won't initialize and won't honor most of its configuration as written. A focused pass on config plumbing plus a small test would get this in good shape.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants