What does Speechmatics do?

Speechmatics provides speech technology and Voice AI for enterprises, offering accurate Speech-to-Text, Text-to-Speech, and Voice Agent solutions. Our models understand every voice and accent across 53+ languages, helping businesses unlock the full potential of voice data.

How accurate is Speechmatics Speech-to-Text?

Speechmatics delivers best-in-market accuracy, achieving up to 99% word accuracy and 96% medical keyword recall in industry benchmarks. Our models handle multiple accents, noisy environments, and multi speakers with ease.

What makes Speechmatics Text-to-Speech different?

Our low-latency Text-to-Speech (TTS) delivers lifelike, human-sounding voices with sub-150ms latency that is ideal for real-time conversations. Developers can stream natural speech in multiple voices and deploy it in the cloud, hybrid, or on-prem for privacy and control.

Can I build real-time voice agents with Speechmatics?

Our voice AI enables developers to build real-time voice agents that listen, understand, and respond naturally. Plug in fast with a flexible API and native integrations to power your AI voice agents.

Which industries use Speechmatics?

Speechmatics is trusted by organizations in media, healthcare, contact center, finance, education, and accessibility. Our technology powers transcription, translation, call analytics, and voice AI applications worldwide.

There's speech-to-text. Then there's Speechmatics.

An API with a comprehensive range of features, unmatched accuracy, flexible deployment and AI-powered capabilities.

Everything you need to build brilliant voice features and products.

Configuration

Our models are built to deliver for your needs

Get the very best performance and fast transcription whether you choose real-time or batch modes - deployed however suits you.

Configuration

File transcription

Process thousands of hours of pre-recorded files, whenever you need them, and fast.

Configuration

Live transcription

Transcribe media as it happens. Get initial transcriptions in milliseconds, with context-driven accuracy improvements over time.

Configuration

On-Prem

Meet architecture, security and compliance needs by hosting our API in your own environment. Combine with Cloud, deploy using Docker Containers, or preconfigured Virtual Appliances.

Configuration

Cloud

Get secure and scalable access to our API through our cloud deployment and get instant access to all our new features, languages and updates.

Configuration

On-Device

Run Speechmatics directly on your devices for ultra-low latency and maximum data privacy. Ideal for use cases where connectivity is limited and data must stay local.

Transcription Features

Everything you need to hit the highest accuracy possible

Our customization options allow you to finely tune your set up to achieve high accuracy with even the most unique words and phrases.

Feature

Custom Dictionary

Boost accuracy for proper nouns, acronyms or industry-specific terms by providing a list of custom words.

Feature

Speaker & Channel Diarization

Track who said what and when with speaker labelling for each word, available for both batch and real-time transcription.

Feature

Numeral Formatting

Identify and correctly format numbers, dates and currencies automatically to improve transcript readability and enable effective post-processing.

Feature

Profanity & Disfluency Detection

Aid comprehensibility and compliance by detecting and optionally removing words that are considered profanities or hesitations.

Features

File Formats

Minimize the resource needed to prepare audio or video files with support for all major audio and video formats along with automatic sample rate detection.

Advanced Features

Easily push a variety of media formats to the API

Easily push a variety of media formats to the API and get a rich set of metadata to support your post processing needs.

Features

Confidence Scores

Collect confidence scores for every word in the transcript to enable efficient human review and editing.

Feature

Industry Language Packs

We're developing English language packs optimized to industry with sector-specific terminology. Finance is available now, with more to follow soon.

Features

Word Timings

Get accurate timestamps for every word in the transcript to allow for post-processing and improved end user experience.

Feature

Advanced Punctuation & Casing

Improve readability with language-specific capitalization and punctuation including commas, question marks and exclamation marks.

Features

Audio Events

Improve accessibility & fully-automate tedious captioning by identifying and labelling non-speech sounds in media, using AI.

Languages

Partner with Speechmatics to maximize your total addressable market

We deliver for multilingual, multicultural and multinational businesses, with coverage of nearly half the world’s languages across a range of dialects and accents.

Language Coverage

We support 50 languages, covering most native languages with unmatched accuracy.

Accents and dialects

Whether you need Brazilian Portuguese or Canadian French, we have you covered with a single language model that supports all associated accents and dialects.

Translation

Transcribe and translate audio to and from English for over 30 languages using a single API call.

Language Identification

Simplify integration and ensure accurate transcription with automatic detection of the language spoken.

AI Powered Capabilities

The combination of accurate transcription with breathtaking speech capabilities, providing solution bundles for customers makes Speechmatics truly unique.

Translation

With automatic translation with a single API call, you can translate media and provide captions for over half the world’s population.

Summaries

Instantly generate summaries for social and video platforms, so viewers know what to expect, without you having to manually write.

Sentiment

Don’t just rely on reviews. See how customers are feeling about every aspect of your service by identifying sentiment throughout calls.

Topics

Your audience don’t want to (always) watch long media. Give them the topics discussed and the timestamps so they can engage with what they are most interested in.

Chapters

As well as being divided up and summarized, each chapter is given a heading, making it super easy to find the most engaging content.

Ready to Understand Every Voice?

Sign up to our free speech-to-text SaaS Portal and we’ll guide you through the integration of our API.