What does Speechmatics do?

Speechmatics provides speech technology and Voice AI for enterprises, offering accurate Speech-to-Text, Text-to-Speech, and Voice Agent solutions. Our models understand every voice and accent across 55+ languages, helping businesses unlock the full potential of voice data.

How accurate is Speechmatics Speech-to-Text?

Speechmatics delivers best-in-market accuracy, achieving up to 99% word accuracy and 96% medical keyword recall in industry benchmarks. Our models handle multiple accents, noisy environments, and multi speakers with ease.

What makes Speechmatics Text-to-Speech different?

Our low-latency Text-to-Speech (TTS) delivers lifelike, human-sounding voices with sub-150ms latency that is ideal for real-time conversations. Developers can stream natural speech in multiple voices and deploy it in the cloud, hybrid, or on-prem for privacy and control.

Can I build real-time voice agents with Speechmatics?

Our voice AI enables developers to build real-time voice agents that listen, understand, and respond naturally. Plug in fast with a flexible API and native integrations to power your AI voice agents.

Which industries use Speechmatics?

Speechmatics is trusted by organizations in media, healthcare, contact center, finance, education, and accessibility. Our technology powers transcription, translation, call analytics, and voice AI applications worldwide.

What Word Error Rate Is Acceptable for Legal Transcription?

There is no single number that answers the question of word error rate legal transcription.

Legal teams searching for a clean threshold will not find one in court rules, judicial guidance, or professional standards. What they will find is more useful: a way of understanding WER in relation to legal risk, transcript purpose, and review obligations. In U.S. procedure, the question is whether the record “accurately records the witness’s testimony.” In federal court reporting policy, real-time output is treated as draft text rather than a substitute for a certified transcript. In the UK, the clearest numeric benchmark appears in Crown Court contracting, where suppliers are required to deliver transcripts to 99.5% accuracy.

In this article, we explore how the metric works, what legal transcription standards actually require, where automatic speech recognition tends to fail in legal settings, and how legal teams should approach vendor assessment when the stakes are high. The aim is not to chase one magic score. It is to connect transcription accuracy to the real world of depositions, hearings, interviews, and appeals.

Why the metric matters in legal work

Legal transcription is not a forgiving domain. One transcription error in a witness name, a date, a citation, or a speaker label can do more damage than a benchmark percentage suggests. That is why understanding WER matters. A low score can still hide serious risk if the wrong words appear in the wrong places.

This is also where many buyers get misled. ASR systems are often marketed on broad averages, but legal teams do not buy averages. They buy reliability in context. Spoken language relies on nuance, attribution, and context in a way that raw percentages cannot fully capture.

What is word error rate in legal transcription?

At its simplest, WER is the standard way of scoring automatic speech recognition against a human-checked output. The system transcript is compared with a reference transcript, and the differences are counted as substitutions, deletions, and insertions. That makes the metric useful because it is standardized, comparable, and easy to calculate across speech recognition systems.

A lower WER usually means better transcription accuracy. A higher WER usually means more review work and more risk. But legal teams should stop there for a second: the metric does not tell you whether the error landed in a filler word or in a party name. It does not tell you whether the speech was assigned to the right person. It does not tell you whether the transcript is fit for filing, disclosure, or appeal.

So, here, the most useful way to think about the measure is not as a verdict but as a starting point.

Calculating word error rate for legal transcription

Calculating word error rate starts with a reference transcript. The system output is aligned against that reference transcript, then substitutions, deletions, and insertions are counted.

WER = (S + D + I) / N

Where:

S = substitutions
D = deletions
I = insertions
N = total words in the reference transcript

This is the standard formula used in ASR research and benchmarking. A 200-word reference transcript with 4 substitutions, 2 deletions, and 1 insertion produces a word error rate of 3.5%. On paper, that looks strong. In legal practice, the story depends on where the miss happened. If the changed terms were a witness surname and a damages figure, the transcript may still be risky to rely on.

That is why calculating word error rate is useful, but not sufficient. The legal question is never just how many mistakes appeared. It is whether those mistakes altered meaning, attribution, or evidentiary value.

What legal transcription standards actually require

No formal court rule sets one acceptable word error rate for every legal workflow. The legal system tends to describe the duty in qualitative terms: accurate recording, certification, and controlled process. Under FRCP Rule 30, the officer must certify that the deposition accurately records the witness’s testimony. Federal judiciary policy says real-time text may contain errors that affect meaning and does not satisfy the requirement for a certified transcript.

The UK provides the clearest numeric marker. Crown Court transcript supply has been described by government and procurement documents as requiring 99.5% accuracy. That is a useful signal of how exacting official-record work can be, but it is still a contractual service level, not a universal rule for all ASR transcripts or all legal use cases.

Professional bodies are also relevant. NCRA’s RPR sets a 95% accuracy threshold for human skills testing. Those benchmarks matter because they show how demanding formal record work is. But they are competency standards for people, not a blanket legal standard for every transcription platform.

In other words, transcription accuracy in law is judged by whether the record is dependable, reviewable, and defensible.

The table below is the practical part. It shows why one target does not fit every task.

Use case	Accuracy pressure	What errors matter most	Review need
Official court record	Extremely high	names, rulings, citations, speaker attribution	essential
Depositions	High	dates, numbers, exhibits, speaker turns	required
Witness interviews	High if used in evidence	quotes, chronology, identity	strongly recommended
Law enforcement audio	High	completeness, chain of custody, identity	required
Legal dictation	Moderate to high	names, citations, references	required before filing
Internal case notes	Lower	context-specific	still needed

The more formal the record, the less a standalone score tells you. That is why legal teams should not rely on one benchmark in isolation when assessing legal transcription platforms.

Why general-purpose ASR systems fall short in legal settings

General-purpose ASR systems are usually built for meetings, calls, podcasts, and other broad conversational audio. Legal audio is different. It includes interruptions, overlapping speakers, specialist terms, legal citations, proper nouns, and uneven recording conditions.

This is where ASR accuracy becomes context-sensitive. Legal hearings and depositions are not clean demo environments. Background noise, side speech, far-field microphones, and rapid speaker changes all push ASR systems harder than simple single-speaker audio. Research and court guidance both show that audio quality and microphone setup are foundational to results. Poor capture upstream often leads to downstream transcription errors no matter how strong the model looks on paper.

Legal vocabulary and context

Legal work also exposes the limits of generic language models. Spoken language relies on shared context, but courts, depositions, and investigations add unusual names, Latin phrases, citations, and technical terms. Legal-domain research on Supreme Court hearings found that adapting speech recognition systems with in-domain transcripts and custom vocabulary improved transcription accuracy over generic baselines.

That finding matters because a single mistake in a case citation or surname can be much more serious than a missing filler word. So when vendors talk about average scores, legal buyers should ask where the misses occur.

Speaker attribution and overlap

Multi-speaker conditions are another major problem. A transcript can have a relatively lower word error rate and still fail if the answer is attributed to the wrong speaker. That is why newer research evaluates diarization separately instead of pretending the metric captures everything. In practice, hearings with interruptions, overlap, and background noise often create a higher word error rate and a separate speaker-attribution problem at the same time.

Poor audio quality and real legal conditions

Legal teams should be especially cautious about poor audio quality. Courtrooms, police interviews, hearing rooms, and remote depositions do not always produce clean recordings. Guidance for tribunals and courts stresses microphone arrays, channel separation, and recording standards for a reason. Some forensic research goes further and suggests that sufficiently degraded material is not suitable for automatic analysis at all.

So the issue is not just model design. It is the combination of audio quality, room setup, and workflow discipline.

What drives stronger transcription accuracy in legal ASR

When legal teams want better outcomes from ASR systems, three things matter most.

First, domain adaptation matters. ASR systems trained or tuned with legal vocabulary, legal transcripts, and legal-style audio perform better than generic systems. Second, speaker handling matters. Because speech recognition systems can recognise words without reliably identifying who said them, speaker diarization needs separate attention. Third, workflow matters. Human oversight remains central wherever the transcript may be relied on formally.

This is also where human transcription still has an essential role. Even where AI improves speed, legal teams often still need review, correction, and certification by trained people. That is not a weakness in the technology story. It is a recognition that the official record is a legal object, not just text.

Evaluating speech recognition systems for legal use

When evaluating speech recognition systems, the headline benchmark should be the beginning of the conversation, not the end.

Ask what kind of audio was used in testing. Ask whether the vendor can handle multi-speaker conditions, names, citations, and legal vocabulary. Ask whether you can test on your own recordings. Ask what happens under background noise, overlap, and poor audio quality. Ask how the product distinguishes draft output from the reviewed transcript. Ask what security and deployment options exist for sensitive material. Judicial guidance in England and Wales warns that public AI tools should be treated as capable of making public anything entered into them, and the CJIS Security Policy sets the baseline security framework for criminal justice information.

That is a much better way of judging transcription tools than simply comparing one score against another. It is also the only sensible way to connect transcription accuracy to real legal operations.

A practical way to read the metric

So what does a “good” score look like?

For internal notes, a modest word error rate may be workable if the audio is clear and the consequences are low. For depositions, interviews, and disclosure-sensitive material, the bar is much higher. For official records, certified transcripts, and evidentiary uses, legal teams should assume that review is indispensable.

In that sense, understanding word error rate means accepting that WER is a baseline, not a verdict. A lower word error rate helps. But the better question is always whether the output can be trusted for the task in front of you.

Final thoughts on WER and legal

The best way to think about the metric in law is as a useful but incomplete diagnostic. It tells you something real about the gap between machine output and the words spoken, but it does not tell you everything that matters in legal work. It does not fully capture speaker attribution, legal significance, reviewability, or whether the workflow itself is defensible.

That is why the right standard for legal teams is not simply low error. It is strong transcription accuracy, tested on real legal audio, supported by good audio quality, and backed by human oversight where formal reliance is involved. For any team choosing between transcription tools, the most important question is not just, “What is the score?” It is, “Can this system handle our audio, our risk, and our process?”.

Ready to transcribe legal audio with confidence?

Legal speech-to-text built for the courtroom, not just the conference room.

FAQs

What is word error rate?

WER is the standard metric used to compare machine output with a checked transcript by counting substitutions, deletions, and insertions against a reference transcript.

Why do ASR systems struggle in legal settings?

Because legal audio often combines specialist vocabulary, overlap, background noise, and uneven audio quality, all of which increase the likelihood of transcription errors.

How should legal teams approach transcription tools?

By testing transcription tools on real legal audio, looking beyond a headline score, and focusing on workflow, review, and context. This is the practical core of evaluating speech recognition systems in high-stakes settings.

What should buyers ask of speech recognition systems?

They should ask how speech recognition systems perform on their own recordings, how those speech recognition systems handle speaker changes and legal vocabulary, and how those speech recognition systems are deployed for sensitive material.

Can AI replace the formal legal transcript?

Not where a certified record is required. Federal court reporting policy explicitly distinguishes unofficial real-time text from the official certified transcript.

Mar 18, 2026 | Read time 8 min

What Word Error Rate Is Acceptable for Legal Transcription?

Why the metric matters in legal work

What is word error rate in legal transcription?

Calculating word error rate for legal transcription

What legal transcription standards actually require

Why general-purpose ASR systems fall short in legal settings

Legal vocabulary and context

Speaker attribution and overlap

Poor audio quality and real legal conditions

What drives stronger transcription accuracy in legal ASR

Evaluating speech recognition systems for legal use

A practical way to read the metric

Final thoughts on WER and legal

Ready to transcribe legal audio with confidence?

FAQs

What is word error rate?

What is word error rate?

Why do ASR systems struggle in legal settings?

Why do ASR systems struggle in legal settings?

How should legal teams approach transcription tools?

How should legal teams approach transcription tools?

What should buyers ask of speech recognition systems?

What should buyers ask of speech recognition systems?

Can AI replace the formal legal transcript?

Can AI replace the formal legal transcript?

Related Articles

The court reporter shortage crisis: data, causes, and what legal teams are doing about it

Understanding TDT: The Mechanism Behind the Fastest Models on the Open ASR Leaderboard

Speed you can trust: The STT metrics that matter for voice agents

Latest Articles

The court reporter shortage crisis: data, causes, and what legal teams are doing about it

How Nvidia Dominates the HuggingFace Leaderboards in This Key Metric

Why AI-native EHR platforms will treat speech as core infrastructure in 2026

One word changes everything: Speechmatics and Edvak EHR partner to make voice AI safe for clinical automation at scale