Key Takeaways
- Eleven v3 is ElevenLabs’ first performance-oriented TTS model — built for emotional delivery, directional Audio Tag control, and multi-character dialogue. It supports 70+ languages (up from 29 in Multilingual v2). ElevenLabs is offering 80% off v3 pricing until end of June 2026.
- Audio Tags are bracketed text cues — [whispers], [excited], [sighs], [gunshot] — placed inline in your script to control emotion, pacing, non-verbal reactions, accents, and inline sound effects within a single generation pass.
- Professional Voice Clones (PVCs) are not yet fully optimised for v3 in alpha — use Instant Voice Clones (IVCs) or library voices for v3 projects until PVC optimisation ships on the near-term roadmap.
What Eleven v3 Is — and Why It Is Different
Every ElevenLabs model before v3 optimised for the same core goal: produce the most accurate, natural-sounding rendition of your text. They were readers. Eleven v3 is the first ElevenLabs model built for performance. Previous models processed text and produced audio. Eleven v3 processes text, interprets emotional subtext and delivery intent, and generates audio that reflects that interpretation — moment by moment.
The practical difference is immediately apparent when you try anything beyond neutral narration. Earlier models plateau when you need a character to sound genuinely frightened, a narrator to convey irony, or dialogue to feel spontaneous rather than scripted. Eleven v3 with Audio Tags addresses these requirements directly — delivery is now a first-class input, not a side effect of the text itself.
As of March 2026, Eleven v3 is in alpha research preview — available in the ElevenLabs UI and via public API. The 80% discount until end of June 2026 makes this the right evaluation window before standard pricing applies.
For context on how Eleven v3 fits within the full ElevenLabs platform, see our honest ElevenLabs review for 2026 (https://elevenlabsmagazine.com/elevenlabs-review-2026-honest-assessment/).
Audio Tags: How They Work
Audio Tags are words or phrases wrapped in square brackets, placed directly in the script text. Eleven v3 interprets these as performance directions — they modify how the surrounding speech is delivered without changing the spoken words. A practical example:
[tired] I’ve been working for 14 hours straight. [sigh] I can’t even feel my hands anymore. [nervously] You sure this is going to work? [gulps] Okay… let’s go.
This passage uses four tag types: a delivery state ([tired]), a non-verbal reaction ([sigh]), an emotional modifier ([nervously]), and another non-verbal ([gulps]). Without tags, a standard TTS model reads this flatly. With Eleven v3, it delivers a continuous emotional arc that shifts line by line — without changing a single word of the script.
Complete Audio Tag Reference
Emotional State Tags
| Tag | Delivery Effect | Best Use Case |
| [excited] | Elevated energy, faster pace, upward inflection | Product announces, sports commentary, reveals |
| [nervous] | Hesitant delivery, micro-pauses, tightened tone | Tension scenes, anxiety moments, interviews |
| [frustrated] | Strained tone, clipped phrasing | Conflict scenes, complaint dialogue, arguments |
| [sorrowful] | Slower pace, dropped pitch, weighted delivery | Grief scenes, apologies, loss |
| [calm] | Even pace, neutral tone, reduced dynamics | Meditation, safety announcements, tutorials |
| [tired] | Slower, flatter, slightly breathy | End-of-day scenes, exhaustion, burnout |
| [cheerfully] | Brighter tone, upward inflection | Customer service, morning content, greetings |
Non-Verbal Reaction Tags
These generate audio sounds rather than modified speech — the model produces the non-verbal audio itself.
| Tag | Audio Generated | Best Use Case |
| [sigh] | Audible breath exhalation | Resignation, exhaustion, relief |
| [laughs] | Natural laugh sound | Comedy, lighthearted scenes |
| [gasps] | Sharp intake of breath | Shock, surprise, horror |
| [gulps] | Audible swallow | Nervousness, fear, tension |
| [whispers] | Quiet, breathy, intimate delivery | Secrets, danger, intimacy |
| [sighs softly] | Gentle exhale | Mild disappointment, quiet reflection |
| [laughs softly] | Quiet, contained laugh | Amusement, suppressed humour |
Delivery Control Tags
| Tag | Effect on Pacing | Best Use Case |
| [pause] | Inserts a beat of silence | Dramatic effect, suspense, listener processing |
| [rushed] | Faster, compressed phrasing | Urgency, panic, excitement |
| [drawn out] | Extended syllables, slower phrasing | Emphasis, reluctance, dramatic weight |
| [stammers] | Broken delivery with repetition | Anxiety, hesitation, cognitive load |
| [hesitates] | Micro-pause before or within speech | Uncertainty, thinking aloud |
| [dramatic tone] | Heightened intensity, slower pace | Storytelling, reveals, climactic moments |
Character Performance Tags
| Tag | Effect | Best Use Case |
| [pirate voice] | Exaggerated accent, gruff delivery | Games, character content, entertainment |
| [French accent] | French-accented English delivery | Character differentiation, language content |
| [Australian accent] | Australian-accented English | Regional character scenes |
| [British accent] | British-accented English | Narrator variation, character scenes |
| [sings] | Melodic singing delivery (experimental) | Children’s content, character intros |
Sound Effect Tags
Eleven v3 can generate non-voice audio events inline within speech — placing sound effects at precise script-level timing without a separate production step.
| Tag | Audio Generated | Best Use Case |
| [gunshot] | Gunshot sound | Action sequences, game dialogue |
| [clapping] | Applause sound | Presentation content, award scenes |
| [explosion] | Explosion audio | Action content, cinematic scenes |
Sound effect tags are more experimental than emotional tags. Test thoroughly before committing to production for critical sequences — consistency varies by voice and context.
Text to Dialogue API: Multi-Character Scenes
Eleven v3 includes a dedicated Text to Dialogue API for generating natural multi-character conversations. Different voices can interrupt, overlap, react, and transition within the same generation pass — producing dialogue that feels spontaneous rather than turn-by-turn scripted. Example:
Marissa: [panicking] Wait, are we crashing? I can’t tell if this is a feature or a—
Chris: [interrupting] Bug!
Marissa: [sighing] Yes, but honestly? [light chuckle] This is kind of fun.
What previously required multiple voice actors, separate recording sessions, and precise audio editor timing can now be generated in a single API call. For scripted podcasts, game dialogue, training simulations, and audio drama, this changes production economics fundamentally.
For voice agent and conversational AI applications where multi-character dialogue is most valuable, see our ElevenLabs Conversational AI builder’s guide (https://elevenlabsmagazine.com/elevenlabs-conversational-ai-guide-2026/).
Eleven v3 vs Other ElevenLabs Models
| Use Case | Best Model | Reason |
| Neutral narration, audiobooks | Multilingual v2 | Stable long-form, PVC-compatible, lower credit cost |
| Real-time voice agents | Flash v2.5 | Sub-75ms latency, optimised for streaming |
| Emotional character performance | Eleven v3 | Best expressiveness, Audio Tags, multi-character dialogue |
| Multi-character scripted content | Eleven v3 (Dialogue API) | Only model with native multi-character dialogue generation |
| Long audiobooks (50k+ chars) | Story Studio + Multilingual v2 | v3 has shorter generation limits in alpha |
| Game NPC dialogue | Eleven v3 | Emotional range and performance depth |
Pricing: Eleven v3 in 2026
Eleven v3 consumes approximately 1.5–2x credits versus Multilingual v2 for equivalent character counts. The 80% promotional discount available until end of June 2026 brings effective v3 cost within standard model range during the promotional period — the right time to evaluate and build v3-specific production pipelines before pricing normalises.
For the full ElevenLabs credit system, see our ElevenLabs API pricing guide (https://elevenlabsmagazine.com/elevenlabs-api-pricing-guide-2026/).
Practical Prompting Guide
1. Match the voice to the emotional range needed
The base voice you select is more important in v3 than in earlier models. A naturally calm voice asked to deliver [shouting] produces a muted result. Select a voice with natural energy and dynamic range for content requiring emotional extremes.
2. Build emotional arcs with sequential tags
[confident] We’re ready to launch. [pause] But honestly? [nervous] There’s one thing I haven’t told you. [sigh] The timeline just moved up by three weeks.
3. Use delivery tags for comedy timing
[pause] before a punchline, [deadpan] for ironic delivery, and [drawn out] for comedic emphasis are the three most effective comedy tools in the v3 tag set. Comedy timing is sensitive to voice selection — test with short scripts first.
4. Avoid stacking incompatible tags
Stacking contradictory tags within the same sentence — [excited] immediately followed by [sorrowful] — produces unpredictable output. Use tags to transition across sentences or with a [pause] between states.
5. Test PVC compatibility before committing
PVCs are not fully optimised for v3 in alpha. Test your specific PVC against v3 before building a full pipeline. Use an IVC or library voice as a fallback if PVC quality is insufficient for your use case.
Where Eleven v3 Changes What Is Possible
Narrative Audiobooks
Eleven v3 enables audiobook production where character dialogue sounds genuinely distinct and emotionally appropriate. A villain sounds menacing. A grieving character sounds genuinely sorrowful. For narrative fiction, v3 is the first ElevenLabs model approaching the expressive range of a skilled human narrator.
For the full audiobook production workflow including ACX compliance, see our AI audiobook creation guide (https://elevenlabsmagazine.com/ai-audiobook-creation-guide-2026/).
Game Dialogue and Interactive Characters
Players are highly attuned to flat delivery in interactive contexts. Eleven v3’s emotional tags and multi-character dialogue capability make it the first ElevenLabs model genuinely suitable for NPC dialogue in narrative games — characters sound surprised, threatened, amused, or exhausted in context.
Scripted Podcasts and Audio Drama
The Text to Dialogue API makes scripted podcast production possible without voice actors — two or more AI characters hold natural-sounding conversations with interruptions, reactions, and emotional shifts. For audio drama where character performance is the product, v3’s generation is a production cost transformation.
Current Limitations
Eleven v3 is in alpha with real constraints to plan around. Generation length is shorter than Multilingual v2 — for very long single-pass generation exceeding 10,000 characters, use Multilingual v2 until v3 exits alpha. PVC incompatibility is a real constraint for users with established cloned voice workflows. The credit premium returns after the June 2026 promotional period — calculate post-promotion economics before fully committing high-volume workflows.
Key Takeaways
- Eleven v3 is a performance model, not a narration model. Neutral high-volume TTS: Multilingual v2. Emotional performance and character dialogue: Eleven v3.
- Audio Tags give directorial control over emotion, pacing, non-verbals, accents, and inline sound effects — from the script, without changing words.
- Text to Dialogue API generates multi-character scenes with interruptions and emotional shifts from one model — no voice actors or separate recording sessions.
- Use IVCs or library voices for v3 projects — PVC optimisation for v3 is on the near-term roadmap.
- 80% discount on v3 until end of June 2026 — evaluate and build now before standard pricing applies.
Conclusion
Eleven v3 and Audio Tags represent the shift from AI voice that reads to AI voice that performs. Emotionally authentic audiobook narration, game NPC dialogue with genuine character, scripted podcast production without voice actors, and interactive training with realistic emotional range are all now achievable. For creators and developers who need speech that sounds like a performance rather than a transcript reading, v3 is the answer in 2026.
CHECK OUT:
ElevenLabs Dubbing 2026: The Complete Guide to Costs, Quality and When to Use It
Best Text to Speech Software for Podcasters in 2026: Tested and Ranked
Frequently Asked Questions
What is Eleven v3?
ElevenLabs’ flagship expressive AI voice model as of 2026. It supports Audio Tags for inline performance direction, a Text to Dialogue API for multi-character scenes, 70+ languages, and a deeper contextual architecture that interprets emotional subtext. Currently in alpha research preview.
What are Audio Tags in ElevenLabs?
Bracketed cues — [whispers], [excited], [sigh], [pause], [gunshot] — placed inline in your script. Eleven v3 interprets these as performance directions modifying emotional delivery, pacing, accent, and non-verbal audio without changing the spoken words.
Can I use my Professional Voice Clone with Eleven v3?
PVCs are not yet optimised for v3 in alpha and may produce lower quality than with earlier models. ElevenLabs recommends IVCs or library voices for v3 projects. PVC optimisation is on the near-term roadmap.
What is the Text to Dialogue API?
A dedicated Eleven v3 endpoint for generating multi-character conversations with interruptions, overlapping speech, and emotional continuity across characters — in a single API call from one model.
Methodology
Eleven v3 and Audio Tag data from ElevenLabs’ official blog posts published March 14, 2026. Independent review data from Ecommerce Fastlane’s Eleven v3 review (April 2026) and Webfuse’s v3 analysis. Drafted with AI assistance, reviewed by ElevenLabsMagazine.com editorial team.
References
ElevenLabs. (2026, March 14). What are Eleven v3 Audio Tags and why they matter. https://elevenlabs.io/blog/v3-audiotags
ElevenLabs. (2026, March 14). Eleven v3 Audio Tags: Emotional context in speech. https://elevenlabs.io/blog/eleven-v3-audio-tags-expressing-emotional-context-in-speech
ElevenLabs. (2026, March 14). Eleven v3 Audio Tags: Multi-character dialogue. https://elevenlabs.io/blog/eleven-v3-audio-tags-bringing-multi-character-dialogue-to-life
Ecommerce Fastlane. (2026). ElevenLabs Eleven V3 Review. https://ecommercefastlane.com/elevenlabs-eleven-v3-review/
