Skip to content

feat: add Mistral Voxtral TTS extension#2194

Merged
wangyoucao577 merged 6 commits into
TEN-framework:mainfrom
TiagoAgora:feat/mistral-voxtral-tts
Jul 1, 2026
Merged

feat: add Mistral Voxtral TTS extension#2194
wangyoucao577 merged 6 commits into
TEN-framework:mainfrom
TiagoAgora:feat/mistral-voxtral-tts

Conversation

@TiagoAgora

Copy link
Copy Markdown
Contributor

Summary

Adds a new TTS extension, mistral_tts_python, integrating Mistral's Voxtral text-to-speech via the OpenAI-compatible /v1/audio/speech endpoint.

  • Built on AsyncTTS2HttpExtension (HTTP TTS mode), following the extension development guide.
  • Requests a self-describing WAV stream and converts it to PCM16 mono on the fly (handles int16/int24/int32 and IEEE-float payloads); audio output at 24 kHz.
  • Forwards vendor params through unchanged (model, voice_id, ref_audio, …); API-key auth via the Authorization header.
  • Handles cancellation/flush, content-moderation/auth errors, and TTFB metrics.

Testing

Standalone unit tests pass in the dev container (tman install --standalone + tests/bin/start): 13/13 passing — covering config defaults/validation, URL/base_url resolution, headers, audio dump, flush/cancel, invalid-key handling, reconnect robustness, metrics, and the request state machine.

Live API validation against api.mistral.ai has not been run yet.

@TiagoAgora TiagoAgora force-pushed the feat/mistral-voxtral-tts branch 2 times, most recently from b313a08 to 1064738 Compare June 29, 2026 13:07
@YiminW

YiminW commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

HI Tiago, please run tts-guarder for your new tts extension and add all cases passed sanp here

@TiagoAgora

TiagoAgora commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

TTS guarder results — mistral_tts_python

Ran task tts-guarder-test EXTENSION=mistral_tts_python CONFIG_DIR=tests/configs against the live Mistral / Voxtral API.

14 passed · 1 skipped · 1 failed

mistral_tts_guarder_results

⏭️ test_subtitle_alignment — skipped; Voxtral exposes no word-level timing.

test_interleaved_requests — traced to a pre-existing bug in ten_ai_base (tts2.py), not this extension. When finish_request() releases buffered interleaved requests it pre-sets _processing_request_id, so _process_input_queue() skips the QUEUED → PROCESSING transition. The later QUEUED → FINALIZING is then rejected as an invalid transition, so tts_audio_end is never emitted and the request queue stalls until timeout (log: Invalid state transition … queued -> finalizing). With a one-line fix — transition to PROCESSING when the dequeued item is still QUEUED — the test passes in ~30s. This affects every HTTP-TTS extension, so I've kept it out of this PR. Happy to open a separate ten_ai_base issue/PR.

Three issues the guarder surfaced in this extension are fixed in this branch:

  • httpx[http2] dependency — the client uses http2=True; without the h2 package it failed to initialize on every request.
  • dump_path added to the basic_audio_setting{1,2} test configs (the guarder reads config["dump_path"]).
  • mistral_tts_python added to the fixed-sample-rate allowlist in tests/bin/start (Voxtral emits a fixed 24 kHz, like openai_tts2 / humeai).

Tiago Peres de Sousa and others added 6 commits June 30, 2026 15:57
Add the mistral_tts_python TTS extension (AsyncTTS2HttpExtension) for
Mistral's OpenAI-compatible /v1/audio/speech endpoint. Streams the WAV
response and converts it to PCM16 mono at 24 kHz. Includes config, client,
addon, manifest/property, README, and unit tests (13 passing).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Switch the requested response_format from wav to pcm. Voxtral's pcm is a
headerless float32 LE stream at 24 kHz mono, so the client now rescales each
float32 sample to int16 (Float32ToPcm16) instead of parsing a WAV container.
This lowers time-to-first-audio (no header to buffer) and drops the WAV
chunk-parsing path. Non-finite samples map to silence so a corrupt stream
can't crash conversion.

Tests updated to stream headerless float32 pcm mocks.
- requirements: depend on httpx[http2] so the h2 package is installed
  (client uses http2=True; without h2 it failed to initialize on every request)
- tests/configs: add dump/dump_path to basic_audio_setting1/2 (guarder reads
  config["dump_path"])
- tts_guarder: add mistral_tts_python to the fixed-sample-rate allowlist in
  tests/bin/start (Voxtral emits a fixed 24kHz, like openai/humeai)
…rement

- .env.example: add MISTRAL_API_KEY and MISTRAL_TTS_VOICE with a note that the
  available voices vary by account (and how to list them)
- README: the live cloud API requires a voice (or ref_audio); fix the example
  voice (casual_male does not exist) and document the ${env:MISTRAL_TTS_VOICE}
  used by the test configs
@TiagoAgora TiagoAgora force-pushed the feat/mistral-voxtral-tts branch from 7114e00 to 0538a6b Compare June 30, 2026 14:57
@wangyoucao577 wangyoucao577 merged commit dc488ff into TEN-framework:main Jul 1, 2026
27 of 41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants