
Speechmatics
Multilingual speech-to-text that doesn't choke on accents or two people talking
What is Speechmatics?
Speechmatics is a speech-to-text and text-to-speech API focused on sub-second, speaker-aware transcription for voice agents, ambient medical scribes, and live captioning. Coverage spans 55+ languages with on-device, on-prem, or cloud deployment, plus ISO 27001, SOC 2 Type II, GDPR, and HIPAA. Buyers are developers and ops teams who need accuracy on accented, multi-speaker audio without sending data to a US-only cloud.
Voice agents and conversational AI platforms for calls, qualification, scheduling, support, and audio workflows.
See the full Voice AI guide to compare more tools, buyer criteria, and related workflows.
Use cases to evaluate
Real-time STT for voice AI agents needing sub-second latency
Ambient medical scribe and clinical documentation
Live captioning for broadcast and media
Court transcription and legal proceedings
Fit to evaluate
Voice agent builders needing speaker diarization
Healthcare scribe and meeting note platforms
Broadcasters running live multilingual captioning
European teams that need on-prem or GDPR-clean STT
Business fit
Right for you if you are building a voice agent, scribe, or captioning product and need diarization plus 55+ language coverage with on-prem as an option. Skip if you only need English transcription on small volumes, where cheaper Whisper-based options work fine. The free tier (480 minutes STT, 1M TTS characters monthly) lets you benchmark before committing. Volume discounts kick in automatically above 500 hours per month.
How to evaluate Speechmatics
Use this category when missed calls, slow qualification, or phone support volume affects revenue.
Confirm the exact workflow
Map Speechmatics to one concrete workflow first, such as real-time stt for voice ai agents needing sub-second latency. Avoid buying before the owner, trigger, output, and success metric are clear.
Check category fit
Test voice quality, latency, interruptions, and escalation behavior.
Compare practical alternatives
Shortlist Speechmatics against Retell AI, Vapi, Bland AI so the decision is based on fit, effort, and workflow ownership rather than brand recognition alone.
Validate cost and rollout effort
Free: 480 STT minutes + 1M TTS characters per month, 2 concurrent real-time sessions. Pro: from $0.24/hr STT, 50 concurrent sessions, capped at 6,000 hrs/month. Enterprise: custom with on-prem option. Automatic 20% discount above 500 hrs/month per transcription type. Startup credits up to $50,000+ available. Also confirm implementation time, support needs, and whether the medium setup matches your team.
Compare Speechmatics with alternatives
Use this quick comparison before booking demos or moving data into a new system.
| Primary workflow | Real-time STT for voice AI agents needing sub-second latency, Ambient medical scribe and clinical documentation |
|---|---|
| Best-fit team | Voice agent builders needing speaker diarization, Healthcare scribe and meeting note platforms |
| Implementation effort | Medium setup and maintenance profile |
| Pricing check | Free plan + paid plans |
| Closest alternatives | Retell AIVapiBland AISynthflow |
Speechmatics pricing
| Model | Free plan + paid plans |
|---|---|
| Snapshot | Free: 480 STT minutes + 1M TTS characters per month, 2 concurrent real-time sessions. Pro: from $0.24/hr STT, 50 concurrent sessions, capped at 6,000 hrs/month. Enterprise: custom with on-prem option. Automatic 20% discount above 500 hrs/month per transcription type. Startup credits up to $50,000+ available. |
| Checked |
Common questions about Speechmatics
What is Speechmatics?
Speechmatics is a speech-to-text and text-to-speech API focused on sub-second, speaker-aware transcription for voice agents, ambient medical scribes, and live captioning. Coverage spans 55+ languages with on-device, on-prem, or cloud deployment, plus ISO 27001, SOC 2 Type II, GDPR, and HIPAA. Buyers are developers and ops teams who need accuracy on accented, multi-speaker audio without sending data to a US-only cloud.
What is Speechmatics used for?
Common use cases: Real-time STT for voice AI agents needing sub-second latency; Ambient medical scribe and clinical documentation; Live captioning for broadcast and media; Court transcription and legal proceedings.
How much does Speechmatics cost?
Free: 480 STT minutes + 1M TTS characters per month, 2 concurrent real-time sessions. Pro: from $0.24/hr STT, 50 concurrent sessions, capped at 6,000 hrs/month. Enterprise: custom with on-prem option. Automatic 20% discount above 500 hrs/month per transcription type. Startup credits up to $50,000+ available.
Who is Speechmatics best for?
Speechmatics fits Voice agent builders needing speaker diarization, Healthcare scribe and meeting note platforms, Broadcasters running live multilingual captioning, European teams that need on-prem or GDPR-clean STT. Right for you if you are building a voice agent, scribe, or captioning product and need diarization plus 55+ language coverage with on-prem as an option. Skip if you only need English transcription on small volumes, where cheaper Whisper-based options work fine. The free tier (480 minutes STT, 1M TTS characters monthly) lets you benchmark before committing. Volume discounts kick in automatically above 500 hours per month.
What are alternatives to Speechmatics?
Common alternatives to Speechmatics include Retell AI, Vapi, Bland AI, Synthflow, ElevenLabs Conversational AI, PolyAI.