Deepgram Expands Flux to Ten Languages, Adds Mid-Call Switching

Deepgram announced general availability of Flux Multilingual, a single conversational speech recognition model that supports 10 languages and can auto-detect and switch languages mid-call, according to SiliconANGLE and Deepgram release notes. Per Deepgram's developer documentation, the model is available as flux-general-multi and accepts an optional language_hint parameter. SiliconANGLE and the release notes report it uses model-based turn detection with end-of-turn decisions delivered in under 400 milliseconds. The release is backward-compatible with existing Flux API integrations and is offered via Deepgram's cloud API with an EU endpoint; Deepgram's developer documentation says SDK and self-hosted support were not yet available for flux-general-multi at publication. "Voice AI agents will soon become the default for how global enterprises interact with customers," said Deepgram co-founder and CEO Scott Stephenson in SiliconANGLE's coverage.
What happened
Deepgram announced the general availability of Flux Multilingual, a single conversational speech recognition model that supports 10 languages, according to SiliconANGLE and the company's release notes. Per Deepgram's developer documentation, the model is exposed as flux-general-multi and can accept an optional language_hint parameter to bias recognition toward one or more languages. SiliconANGLE and the release notes report that the model auto-detects language, supports native code-switching mid-conversation, and uses model-based turn detection to produce end-of-turn decisions in under 400 milliseconds. SiliconANGLE also published a verbatim quote from Deepgram co-founder and CEO Scott Stephenson: "Voice AI agents will soon become the default for how global enterprises interact with customers."
Technical details
Per Deepgram's documentation, flux-general-multi provides the same turn-aware and interruption-aware conversational intelligence as flux-general-en while supporting the following languages:
- •English (en)
- •Spanish (es)
- •French (fr)
- •German (de)
- •Hindi (hi)
- •Russian (ru)
- •Portuguese (pt)
- •Japanese (ja)
- •Italian (it)
- •Dutch (nl)
The docs state the language_hint parameter can be supplied once or multiple times to bias the model toward specific languages, or omitted for full auto-detection. The developer guide notes that flux-general-multi uses the same production endpoint and API key as Flux, lists an EU WebSocket endpoint, and indicates SDK and self-hosted support timing differs (the docs say SDK and self-hosted support were not yet available for flux-general-multi at publication).
Industry context
Editorial analysis: Companies building voice agents typically combine separate ASR models, language-identification layers, and routing logic to handle multilingual calls, which increases latency and brittle handoffs. A single conversational model that natively supports code-switching, low-latency turn detection, and interruption handling reduces integration complexity for voice-agent pipelines and the number of inference hops between audio input and agent response. That pattern often improves end-to-end latency and simplifies error handling, but it also concentrates operational dependence onto one model and its deployment choices.
Significance for practitioners
Editorial analysis: For teams building real-time voice agents, flux-general-multi is relevant because it combines conversational features (turn detection, interruption handling) with multilingual support in one API. This can reduce the engineering effort required to orchestrate multiple models and language-routing logic, and it changes the tradeoffs between using specialist monolingual models versus a single multilingual conversational model with language hints.
What to watch
Editorial analysis: Observers should track independent benchmarks of semantic accuracy and latency across the ten supported languages, comparisons between language_hint accuracy and dedicated monolingual models, and timing for full SDK and on-prem/self-hosted support. Also watch enterprise adoption signals and case studies that demonstrate sustained production reliability when callers code-switch frequently.
Scoring Rationale
A broadly available conversational speech model that supports 10 languages and mid-call switching materially lowers integration complexity for real-time voice agents. The change is notable for practitioners building contact-center and voice-agent systems but does not on its own redefine model class frontiers.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems
