Clinical AI is moving from pilot to production, and the bottleneck is documentation that keeps pace with real conversations.
Physicians speak fast, patients interrupt, and acronyms collide with drug names. When the transcript wobbles, downstream tools misfire.
A recent study in NEJM Catalyst tracked over 7,000 physicians using AI scribes across 2.6 million clinical encounters in a single year. The results are telling:
15,700 hours of documentation time saved—equivalent to almost 1,800 workdays
84% of doctors said patient interactions improved
82% reported better job satisfaction
High‑volume users, particularly in emergency medicine, primary care, and mental health, saw the biggest gains. Even low‑frequency users reported measurable time savings… and not a single patient in the study reported a drop in care quality
We ran side by side tests across multiple clinical sets to measure what matters in practice.
Headline results:
93% general accuracy measured as 7.27% WER.
96.0% medical keyword recall so critical terms land in the transcript.
4.0% keyword error rate, which translates to fewer mistakes on diagnoses, drug names, and timelines.
~50% fewer keyword errors on clinical terms, and ~17% lower overall word errors than the next best system.
Model
KER
WER
Accuracy
Speechmatics Medical
4.01%
7.27%
93%
ElevenLabs Scribe
8.51%
8.78%
91%
Deepgram Nova‑3 Medical
9.74%
8.88%
91%
AssemblyAI Standard
11.42%
9.21%
91%
OpenAI Whisper‑1
12.46%
11.10%
89%
Microsoft Enhanced
13.98%
12.25%
88%
Amazon Medical Dictation
12.47%
14.15%
86%
Google Medical Dictation
16.50%
17.10%
83%
Across our test sets, Speechmatics leads both on general accuracy and clinical term handling.
Why KER matters: Clinical documentation rides on keywords. A missed allergy, an incorrect dosage, or a wrong laterality can derail care. Tracking keyword error rate alongside WER gives a clearer view of clinical safety, not just raw accuracy.
What’s new?
Why are we polling first? There are four main changes that matter most for clinical use:
Vocabulary that speaks healthcare. Coverage for drug names, procedures, and clinical shorthand now lands reliably, including correct formatting for numbers, dosages, dates, and times. See our healthcare transcription support for more info.
Real-time diarization that keeps up. The system distinguishes clinicians, patients, and family members in the room, even with background noise or rapid turn-taking. Notes are easier to attribute, and handovers are cleaner.
Accent-independent by design. Healthcare is global. The model understands diverse accents and overlapping speech without forcing users to slow down or over-enunciate.
Real-time first. You get consistent accuracy whether you are streaming live audio or processing files, so teams do not trade precision for speed.
Together these updates reduce cognitive load and help keep the record clean.
Architecture matters when milliseconds add up
The new medical model is engineered for low latency and high throughput. It handles live dictation, in-room capture, and telemedicine sessions without choking on domain-specific language.
Batch workloads run at scale for backlogs and historical records. Developers get predictable performance and operational simplicity across deployment environments.
Our models are also built real-time first, so moving from file-based transcription to streaming does not mean an accuracy trade off.
What the new medical model unlocks
Here is what those gains mean in day to day work.
For developers - real-time transcription that stands up to domain pressure. Clean timestamps and entity handling simplify downstream NLP.
For clinicians - less screen time and more face time. Notes that reflect what was said rather than what the model guessed.
For patients - a calmer room. The computer listens, records, and stays out of the way.
When the transcript is right, everything built on top works better.
Getting started
Hands on is the best proof.
Test the Medical Model in the Speechmatics Portal preview, or integrate it directly via the API. Both real-time and batch are supported.
You can also see our healthcare language coverage via our docs.
If you are heading to HLTH USA in Las Vegas this October, come see it in action.
Bring your toughest audio. Tell us what success looks like and we will help you measure it.
Does it handle speaker changes in busy rooms?
Yes. Real-time speaker diarization separates speakers for cleaner attribution in clinical settings.
Where does the model fit in my stack?
Use it for ambient scribing, clinician dictation, telemedicine, and call-based triage. Feed transcripts into EHRs, analytics, and LLM-powered assistants.
How does it deploy?
Use our managed service or talk to us about enterprise options that meet your security and governance needs.
If you have a question not covered here, reach out and we will get you an answer.