Key Takeaways
- Scribe v2 Realtime achieves the lowest Word Error Rate of any low-latency ASR model on the FLEURS multilingual benchmark across 30 languages, with under 150ms latency — outperforming OpenAI Whisper, Google Gemini Flash, Amazon Transcribe, and Deepgram on both accuracy and multilingual coverage in ElevenLabs’ benchmarks.
- Scribe v2 (batch) is purpose-built for long-form audio — podcasts, legal transcripts, meeting recordings, medical dictation — with speaker diarization for up to 48 distinct speakers, dynamic audio tagging for non-speech events, entity detection across 56 PII categories, and keyterm prompting with up to 1,000 domain-specific terms.
- Three major feature upgrades in the March 2026 update: PII auto-redaction during transcription (before storage), No Verbatim mode (automatic filler word removal), and expanded keyterm capacity from 100 to 1,000 terms — making Scribe v2 enterprise-ready for healthcare, finance, and customer service compliance workflows.
What Scribe v2 Is — and the Two-Model Architecture
ElevenLabs’ speech-to-text platform consists of two specialised models serving distinct use cases. Scribe v-2 (batch) launched January 9, 2026 and is optimised for long-form audio processing — batch transcription, subtitling, captioning, and structured audio analysis at scale. Scribe v2 Realtime launched January 6, 2026 and is purpose-built for live applications — conversational AI agents, meeting assistants, voice-enabled apps, and real-time captioning where latency is the primary constraint.
Both models support 90+ languages with automatic detection, handle accents and diverse acoustic conditions without manual configuration, and are available via REST API and WebSocket. Both carry full enterprise compliance coverage: SOC 2, ISO 27001, PCI DSS L1, HIPAA, and GDPR, with EU and India data residency and zero retention mode for regulated environments.
For context on how Scribe fits into the broader STT market alongside Deepgram, AssemblyAI, and Whisper, see our best speech-to-text software comparison for 2026 (https://elevenlabsmagazine.com/best-speech-to-text-software-2026/).
Scribe v2 Realtime: Architecture and Performance
Scribe v2 Realtime uses predictive transcription — anticipating the most probable next words and punctuation based on context rather than waiting for complete utterances. This architectural approach reduces perceived latency significantly: while the model processes audio in approximately 150ms, predictive text display means the user experience feels faster than the raw latency number suggests.
On the FLEURS multilingual benchmark measuring accuracy across 30 languages, Scribe v-2 Realtime achieved the lowest Word Error Rate of any low-latency ASR model — outperforming Gemini Flash, OpenAI’s real-time STT, and Deepgram on this multilingual measure. ElevenLabs’ internal benchmarks across hundreds of challenging English conversation samples — poor audio quality, diverse accents, filler words — showed Scribe v2 Realtime capturing user intent more accurately than any competing real-time ASR.
Scribe v2 Realtime: Key Technical Capabilities
| Capability | Detail | Use Case |
| Latency | Under 150ms (30–80ms optimised configurations) | Conversational AI agents, live voice apps |
| Languages | 90+ with automatic detection | Multilingual agents, global live captioning |
| Commit control | Developer control over when to finalise transcripts | Custom streaming, fine-tuned accuracy pipelines |
| VAD (Voice Activity Detection) | Automatically detects speech start/stop | Smoother live processing, cleaner segmentation |
| Connection resilience | Continues transcription seamlessly after connection resets | Production-grade live deployments |
| Complex vocabulary | Built-in support for technical language, medications, proper nouns | Healthcare, legal, financial voice agents |
| Agents platform integration | Available as optional upgrade in ElevenLabs Agents | Voice agents requiring highest STT accuracy |
| Pricing | $0.28/hour of audio processed | Scales from startup to enterprise |
Scribe v2 Batch: Architecture and Performance
Scribe v-2 batch is designed for accuracy over speed — processing long podcast episodes, meeting recordings, legal dictation, medical notes, and video subtitles where the richness of output matters more than real-time delivery. It improves on Scribe v1’s stability across long-form audio, handling pauses, tone changes, and extended silences without accuracy degradation common in models optimised for shorter utterances.
Scribe v2 Batch: Feature Set
| Feature | Capability | Enterprise Value |
| Speaker diarization | Up to 48 distinct speakers, intuitive labelling | Meeting transcription, multi-person interviews, call centre recording |
| Dynamic audio tagging | Detects non-speech events: laughter, footsteps, applause | Content indexing, accessibility enrichment, context preservation |
| Entity detection | 56 PII categories with exact timestamps: names, SSNs, credit cards, medical conditions | HIPAA compliance, GDPR data mapping, financial regulation |
| PII redaction | Three modes: complete [REDACTED], categorised [CREDIT_CARD], enumerated [CREDIT_CARD_1] | Automated compliance without manual review |
| Keyterm prompting | Up to 1,000 context-aware domain-specific terms | Technical vocabulary, product names, medical terminology |
| Multi-language in single file | Automatic language detection, no manual segmentation | Multilingual meetings, international interview content |
| No Verbatim mode | Removes filler words (um, uh), repeated phrases, stuttering | Meeting notes, subtitles, polished written records |
| Mixed-script handling | English words remain in Latin script within Indic language audio | Hindi-English, Telugu-English, Kannada-English code-switching |
CHECK OUT: ElevenLabs Eleven v3 and Audio Tags: The Complete Practical Guide (2026)
The March 2026 Feature Upgrades in Detail
1. PII Auto-Redaction During Transcription
The most significant enterprise capability added in the March 2026 update. PII redaction now happens during transcription — sensitive data is removed before it reaches storage or downstream systems. Three redaction modes give compliance teams the right tool for each requirement: complete redaction replaces entities with [REDACTED] for maximum privacy; categorised redaction replaces with [CREDIT_CARD] or [SSN] for audit trail purposes; enumerated redaction uses [CREDIT_CARD_1], [CREDIT_CARD_2] for cases where tracking distinct instances matters.
For healthcare teams transcribing patient calls, financial services recording client conversations, and customer support centres capturing personal details, this eliminates a post-processing step that previously required a separate PII-scanning service or manual review. The data never enters storage unredacted.
2. No Verbatim Mode
No Verbatim mode automatically removes filler words (um, uh), repeated phrases, and stuttering from transcripts — producing clean, polished written records without manual editing. It is activated per-request in the API. For meeting notes, subtitle generation, executive dictation, and any workflow where the goal is a readable document rather than an exact capture of every spoken sound, No Verbatim mode eliminates significant post-processing time.
3. Keyterm Prompting Expanded to 1,000 Terms
Keyterm capacity expanded tenfold from 100 to 1,000 terms in the March update. Unlike standard custom vocabulary that blindly inserts provided terms, Scribe v2 keyterm prompting is context-aware — the model uses surrounding audio to determine whether a keyterm applies before transcribing it. This prevents false positives where a similar-sounding word would trigger incorrect term substitution. For enterprise deployments with large technical vocabularies, product catalogues, or domain-specific terminology, 1,000 terms provides sufficient coverage for most production use cases. Requests with more than 100 keyterms have a minimum billable unit of 20 seconds.
4. Mixed-Script Handling for Indic Languages
Scribe v2 now correctly transcribes English words in Latin script within Indic language audio — Hindi, Telugu, Kannada, and other Indic language code-switching. Many transcription systems previously transliterated English words into Indic scripts, producing unusable transcripts for bilingual content. This fix works automatically with no language configuration required, making Scribe v2 the most practical STT choice for India-market deployments where English-Indic code-switching is common in professional settings.
Scribe v2 vs Competing STT APIs: 2026 Comparison
| Capability | Scribe v2 Realtime | Deepgram Nova-3 | AssemblyAI Universal-2 | OpenAI Whisper (managed) | Google Cloud STT |
| Real-time latency | <150ms (30–80ms optimised) | Sub-300ms | Streaming available | ~200ms+ | Fast |
| Multilingual WER (FLEURS) | Lowest of any low-latency model | Excellent on English and noisy audio | Strong across datasets | Wide language support | 125+ languages |
| Languages | 90+ | 50+ | Multilingual | 99 (open source) | 125+ |
| Keyterm prompting | Yes — up to 1,000 terms, context-aware | Custom vocabulary | Custom vocabulary | No | Custom vocabulary |
| PII entity detection | Yes — 56 categories, timestamps, auto-redaction | Limited | Yes (PII redaction) | No | Limited |
| No Verbatim mode | Yes (March 2026) | No | No | No | No |
| Speaker diarization | Yes — up to 48 speakers (batch) | Yes | Yes | No (managed) | Yes |
| Audio tagging (non-speech) | Yes — laughter, footsteps, etc. | No | No | No | No |
| Mixed-script Indic | Yes (March 2026) | Limited | Limited | Partial | Partial |
| HIPAA/SOC2/GDPR | Yes — full enterprise stack | Yes | Yes | Limited | Yes |
| Zero retention mode | Yes | No | No | No | No |
| Pricing (real-time) | $0.28/hr | $4.50/hr (bundled agent) | Pay-as-you-go | Pay-as-you-go | ~$0.024/min |
Pricing: Scribe v2 in 2026
Scribe v2 Realtime is priced at $0.28 per hour of audio processed — significantly lower than Deepgram’s bundled agent rate and competitive with Google Cloud STT. Enterprise clients benefit from higher concurrency limits (30+ simultaneous streams) and dedicated support. Annual Business plan subscribers receive volume discounts.
For Scribe v2 batch, pricing follows ElevenLabs’ standard API credit structure integrated with the broader platform. Teams using ElevenLabs for TTS, voice cloning, and SFX alongside Scribe benefit from a unified credit system rather than managing separate API accounts and billing for each service.
For the full ElevenLabs credit system and API pricing breakdown, see our ElevenLabs API pricing guide (https://elevenlabsmagazine.com/elevenlabs-api-pricing-guide-2026/).
CHECK OUT: ElevenLabs AI Sound Effects in 2026: The Complete Guide to Text-to-SFX
Production Use Cases: Where Scribe v2 Excels
Conversational AI Agents
Scribe v2 Realtime is the STT layer for ElevenLabs’ own Agents platform — available as an optional upgrade from the default model. For voice agents where the STT accuracy directly determines whether the agent understands what the user said, Scribe v2 Realtime’s performance on accented speech, noisy environments, and technical vocabulary makes it the highest-accuracy option within the ElevenLabs ecosystem. Agent teams handling Spanish, Portuguese, Hindi, and other non-English languages benefit most from Scribe v2’s multilingual accuracy advantage.
For how STT fits into the full voice agent stack, see our ElevenLabs Conversational AI builder’s guide (https://elevenlabsmagazine.com/elevenlabs-conversational-ai-guide-2026/).
Healthcare Transcription — HIPAA Compliance
Scribe v2’s PII auto-redaction during transcription — removing names, medical conditions, SSNs, and other protected health information before storage — combined with HIPAA compliance, BAA agreement availability, and zero retention mode makes it production-ready for clinical dictation, patient call transcription, and medical documentation workflows. Healthcare teams must contact ElevenLabs Sales to sign a BAA before deploying in any HIPAA-regulated context.
Meeting Intelligence and Corporate Notes
Speaker diarization for up to 48 speakers, No Verbatim mode for clean transcripts, entity detection for key information extraction, and integration with ElevenLabs Studio for editing and caption export make Scribe v2 batch the most complete meeting transcription tool within the ElevenLabs ecosystem. For organisations already using ElevenLabs for TTS and voice agent infrastructure, Scribe v2 adds meeting intelligence without adding a separate vendor.
Media Production: Subtitles and Captioning
Scribe v2 is now used in ElevenLabs Studio for automated subtitle and caption generation for podcasts, videos, and interviews. Dynamic audio tagging enriches transcripts with non-speech event markers — [laughter], [applause], [background noise] — providing context that pure word transcription loses. For WCAG-compliant caption production at scale, Scribe v2 batch with audio tagging produces the most complete accessible transcript output available within a single TTS-STT platform.
Customer Support — Financial Services
Call centre transcription for financial services requires PII redaction of credit card numbers, account details, and personal identifiers captured during calls. Scribe v2’s enumerated redaction mode ([CREDIT_CARD_1], [CREDIT_CARD_2]) allows compliance teams to track distinct instances across a call transcript for audit purposes, while preventing raw sensitive data from entering storage. PCI DSS L1 compliance covers payment card data handling requirements.
API Integration: Getting Started with Scribe v2
Scribe v2 Realtime uses WebSocket streaming — authenticate with an API key, send audio chunks, and receive partial or final transcripts with configurable VAD and commit controls. Scribe v2 batch uses the standard REST endpoint accepting MP4, MOV, MP3, WAV, and other common formats. Code examples in Python, JavaScript, and other languages are available in ElevenLabs’ documentation.
Keyterm prompting is added via a keyterms array parameter in the API request. PII redaction mode is set per-request via the redaction_config parameter. No Verbatim mode is toggled with a boolean parameter. All features are available via the same endpoint — no separate API configuration required for enterprise features.
Future of Scribe v2 in 2027
ElevenLabs’ roadmap for Scribe points toward deeper integration with the Agents Platform — Scribe v2 Realtime is already an optional upgrade within the Agents platform and will likely become the default model as quality advantage becomes more pronounced at scale. Speaker diarization for Realtime (currently batch-only) is a logical next capability for meeting assistant applications that need live speaker identification. The mixed-script capability for Indic languages signals investment in emerging market voice deployments where ElevenLabs’ multilingual TTS and STT combination creates the most integrated non-English voice AI platform available.
Key Takeaways
- Use Scribe v2 Realtime for conversational AI agents, meeting assistants, and live captioning — lowest WER of any low-latency model on FLEURS multilingual benchmark, under 150ms latency.
- Use Scribe v2 batch for long-form audio — podcasts, legal transcripts, medical dictation — with 48-speaker diarization, entity detection, and audio tagging.
- PII auto-redaction during transcription (March 2026) makes Scribe v2 enterprise-ready for healthcare and financial services without a separate post-processing step.
- No Verbatim mode eliminates filler words and stuttering automatically — the most practical feature for meeting notes, subtitles, and executive dictation workflows.
- 1,000-term context-aware keyterm prompting is the most powerful domain vocabulary customisation available in any commercial STT API in 2026.
- HIPAA deployment requires a BAA with ElevenLabs Sales before production — do not deploy in regulated healthcare contexts without completing this step.
Conclusion
Scribe v2 completes ElevenLabs’ audio loop — the platform can now generate speech with TTS and voice cloning, process it with STT, dub it across languages, and build voice agents around the full pipeline. For teams already in the ElevenLabs ecosystem, Scribe v2 eliminates the need for a separate STT vendor while delivering accuracy that matches or exceeds dedicated STT platforms on multilingual and noisy audio. For enterprise teams specifically needing PII redaction, HIPAA compliance, and large technical vocabulary support, the March 2026 feature set makes Scribe v2 the most capable enterprise STT option within any unified voice AI platform.
Frequently Asked Questions
What is the latency of Scribe v2 Realtime?
Under 150ms in standard configuration, with 30–80ms achievable in optimised deployments. Predictive transcription further reduces perceived latency by displaying partial results as the speaker talks rather than waiting for utterance completion.
How many languages does Scribe v2 support?
90+ languages with automatic detection. No manual language configuration is required — the model detects which language is being spoken and handles code-switching between languages within the same audio file automatically.
Is ElevenLabs Scribe v2 HIPAA compliant?
Yes, with a BAA agreement. Healthcare teams must contact ElevenLabs Sales to sign a Business Associate Agreement before deploying in any HIPAA-regulated context. Zero Retention mode, where audio is deleted immediately after processing, is available for stricter data control.
What is keyterm prompting in Scribe v2?
A feature allowing up to 1,000 domain-specific words or phrases to bias the model toward accurate transcription of those terms. Unlike standard custom vocabulary, keyterm prompting is context-aware — the model uses surrounding audio to determine whether a keyterm applies before transcribing it, preventing false positives.
What is No Verbatim mode?
A transcription setting that automatically removes filler words (um, uh), repeated phrases, and stuttering from transcripts — producing clean, readable records without manual post-editing. Activated per-request in the API.
How does Scribe v2 compare to Deepgram?
Scribe v2 Realtime outperforms Deepgram on the FLEURS multilingual benchmark. Deepgram Nova-3 leads on noisy English audio accuracy with its 54.2% WER reduction advantage. Scribe v2 has stronger multilingual performance, more extensive PII redaction capabilities, and native integration with the ElevenLabs Agents platform. Deepgram has a broader ecosystem for custom deployment and self-hosted options.
Methodology
Accuracy benchmark data from ElevenLabs’ official Scribe v2 Realtime documentation, FLEURS benchmark results published by ElevenLabs, and GenMediaLab’s independent Scribe v2 analysis (January 2026). Feature specifications from the official Scribe v2 Realtime introduction (January 6, 2026), Scribe v2 batch introduction (January 9, 2026), and the Scribe v2 upgrade blog post (March 2026). Pricing from ElevenLabs documentation and Quasa.io’s launch coverage. This article was drafted with AI assistance and reviewed by the editorial team at ElevenLabsMagazine.com.
CHECK OUT: ElevenLabs Eleven Music in 2026: The Complete Guide
References
ElevenLabs. (2026, January 6). Introducing Scribe v2 Realtime. https://elevenlabs.io/blog/introducing-scribe-v2-realtime
ElevenLabs. (2026, January 9). Introducing Scribe v2. https://elevenlabs.io/blog/introducing-scribe-v2
ElevenLabs. (2026, March). Scribe v2 just got an upgrade — four new features. https://elevenlabs.io/blog/scribe-v2-just-got-an-upgrade
ElevenLabs. (2026). Scribe v2 Realtime live in ElevenLabs Agents. https://elevenlabs.io/blog/scribe-v2-realtime-in-elevenlabs-agents
ElevenLabs. (2026). Speech to Text documentation. https://elevenlabs.io/docs/overview/capabilities/speech-to-text
GenMediaLab. (2026). ElevenLabs Launches Scribe v2. https://www.genmedialab.com/news/elevenlabs-scribe-v2-speech-to-text/
