feat: add Mistral Voxtral TTS extension#2194
Merged
wangyoucao577 merged 6 commits intoJul 1, 2026
Merged
Conversation
YiminW
reviewed
Jun 25, 2026
b313a08 to
1064738
Compare
Contributor
|
HI Tiago, please run tts-guarder for your new tts extension and add all cases passed sanp here |
Contributor
Author
Add the mistral_tts_python TTS extension (AsyncTTS2HttpExtension) for Mistral's OpenAI-compatible /v1/audio/speech endpoint. Streams the WAV response and converts it to PCM16 mono at 24 kHz. Includes config, client, addon, manifest/property, README, and unit tests (13 passing). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Switch the requested response_format from wav to pcm. Voxtral's pcm is a headerless float32 LE stream at 24 kHz mono, so the client now rescales each float32 sample to int16 (Float32ToPcm16) instead of parsing a WAV container. This lowers time-to-first-audio (no header to buffer) and drops the WAV chunk-parsing path. Non-finite samples map to silence so a corrupt stream can't crash conversion. Tests updated to stream headerless float32 pcm mocks.
- requirements: depend on httpx[http2] so the h2 package is installed (client uses http2=True; without h2 it failed to initialize on every request) - tests/configs: add dump/dump_path to basic_audio_setting1/2 (guarder reads config["dump_path"]) - tts_guarder: add mistral_tts_python to the fixed-sample-rate allowlist in tests/bin/start (Voxtral emits a fixed 24kHz, like openai/humeai)
…rement
- .env.example: add MISTRAL_API_KEY and MISTRAL_TTS_VOICE with a note that the
available voices vary by account (and how to list them)
- README: the live cloud API requires a voice (or ref_audio); fix the example
voice (casual_male does not exist) and document the ${env:MISTRAL_TTS_VOICE}
used by the test configs
7114e00 to
0538a6b
Compare
wangyoucao577
approved these changes
Jul 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
Adds a new TTS extension,
mistral_tts_python, integrating Mistral's Voxtral text-to-speech via the OpenAI-compatible/v1/audio/speechendpoint.AsyncTTS2HttpExtension(HTTP TTS mode), following the extension development guide.model,voice_id,ref_audio, …); API-key auth via theAuthorizationheader.Testing
Standalone unit tests pass in the dev container (
tman install --standalone+tests/bin/start): 13/13 passing — covering config defaults/validation, URL/base_url resolution, headers, audio dump, flush/cancel, invalid-key handling, reconnect robustness, metrics, and the request state machine.Live API validation against
api.mistral.aihas not been run yet.