ElevenLabs Voice Design is a text-to-voice tool that generates completely new synthetic voice identities from natural language descriptions. Unlike Voice Cloning, which replicates the voice characteristics of a real audio recording, Voice Design creates voices that have never existed — they are generated entirely from the AI model’s interpretation of your text prompt. The output is a genuinely original voice that belongs to your ElevenLabs account and can be used in all ElevenLabs products.
Voice Design v3, powered by the eleven_ttv_v3 model, is the current generation of the tool. It improves on the previous v2 model with broader emotional range, more accurate accent reproduction, better control over character traits like age and speaking style, and support for Audio Tags in preview generations. The tool is accessible directly from the ElevenLabs dashboard under Voices → Voice Design, and is available via the API at the /v1/text-to-voice/design endpoint.
How to Access Voice Design v3
In the Dashboard
Log in to elevenlabs.io → click Voices in the left sidebar → click Voice Design. You will see a prompt input box with a character limit of 20 to 1,000 characters. Type your voice description, optionally add preview text (100–1,000 characters) to hear the voice speaking specific content, and click Generate. Three voice previews are generated. Listen to each, select the one you want, and save it to your voice library. The saved voice is immediately available in Text to Speech, Studio, and API calls.
Via the API
The Voice Design v3 API endpoint is POST /v1/text-to-voice/design. Required parameters: voice_description (string, 20–1,000 characters), text (string, 100–1,000 characters — the preview content). Optional parameters: model_id (string — specify ‘eleven_ttv_v3’ for the v3 model; defaults to eleven_multilingual_ttv_v2 if not specified), auto_generate_text (boolean — if true, the model generates appropriate preview text automatically), guidance_scale (float — controls how closely the model follows the prompt; lower values allow more creative interpretation), loudness (float, -1 to 1), seed (integer — same seed with same prompt reproduces the same voice). The response returns three voice previews, each with a generated_voice_id and base64-encoded MP3 audio. Use the generated_voice_id with the /v1/text-to-voice endpoint to save the chosen voice permanently.
Writing Effective Voice Design Prompts
The quality of the voice Design v3 output depends heavily on the quality and specificity of the prompt. Vague prompts produce generic voices. Specific, multi-dimensional prompts produce distinctive, character-appropriate voices. Include as many of these dimensions as relevant to your use case:
| Dimension | Example Description | Impact on Output |
| Age | A 35-year-old man / an elderly woman in her 70s | Sets fundamental vocal register and age character |
| Gender | Male / female / androgynous | Core voice characteristic |
| Accent/Region | American Midwest / RP British / Australian / thick Scottish | Accent and dialect character |
| Tone | Warm and approachable / cold and authoritative / playful and energetic | Overall emotional character |
| Pace | Deliberate and measured / quick and conversational / relaxed and unhurried | Speaking rhythm |
| Register | Deep and resonant / light and airy / husky and low / crisp and clear | Vocal texture |
| Emotion baseline | Chronically cheerful / perpetually serious / gently melancholic | Default emotional colouring |
| Profession/context | A news anchor / a children’s storyteller / a corporate CEO / a gaming narrator | Style and formality cues |
| Character trait | Confident and assertive / nervous and hesitant / theatrical and expressive | Personality in delivery |
Weak vs Strong Prompt Examples
Weak prompt: ‘A female narrator with a British accent.’ — This produces a generic British female voice with no distinctive character.
Strong prompt: ‘A composed, authoritative British woman in her late 40s with a classic RP accent. Warm but professional tone — the kind of voice you would trust for a documentary about natural history. Measured pace, clear diction, with a subtle hint of dry humour in her delivery. Resonant but not overpowering.’ — This produces a distinctive, characterful voice with real personality.
The difference in output quality between these two prompts is significant. Voice Design v3 responds to specificity — every additional dimension you describe gives the model more information to generate a more precisely tailored voice. Aim for 150–300 characters minimum for meaningful differentiation.
Voice Design v3 with Reference Audio
The eleven_ttv_v3 model supports an optional reference audio parameter — a base64-encoded audio clip that the model uses alongside the text prompt to guide voice generation. The prompt_reference_audio_strength parameter (0 to 1) controls the balance: 0 means almost no reference audio influence (prompt dominates), 1 means almost no prompt influence (reference audio dominates).
This hybrid approach is useful when you want a voice that has the acoustic character of an existing recording but with modifications described in text. For example: provide a reference clip of a deep-voiced speaker, set prompt_reference_audio_strength to 0.4, and write a prompt adding ‘but with a British accent and a warmer, more approachable tone’ — the generated voice blends the reference audio’s depth with the prompt’s accent and warmth.
Important constraint: the reference audio must be a recording you have the rights to use. Do not use recordings of other people’s voices without consent. Voice Design with reference audio is for creating new synthetic voices inspired by acoustic qualities you provide — not for cloning specific individuals.
Voice Design v3 vs Voice Cloning: Which to Use
| Dimension | Voice Design v3 | Instant Voice Cloning | Professional Voice Cloning |
| Source required | Text prompt only — no audio | Short audio sample (1 min+) | 30-minute structured recording session |
| Output type | New synthetic voice — never existed | Replica of existing voice from sample | Near-perfect replica of real voice |
| Uniqueness | Completely original | Depends on source audio | Identical to source speaker |
| Setup time | Under 2 minutes | 15-30 minutes | 2-3 hours (recording + processing) |
| Best for | Fiction, games, branded characters, anonymised voices | Quick voice replication, podcasters | Brand voice, professional production |
| Can be shared | Yes — shareable in Voice Library | Yes — if you have rights to source audio | Yes — with consent from original speaker |
| Affected by default voice sunset | No — designed voices are permanent | No — cloned voices are permanent | No — cloned voices are permanent |
| Cost | Included in all paid plans | Creator plan and above | Creator plan and above |
Related: Full guide to ElevenLabs Professional Voice Cloning — recording requirements and workflow
Use Cases for Voice Design v3
Game and Fiction Character Voices
Voice Design is the most efficient way to create a library of distinct character voices for games, interactive fiction, audiobooks, and animation. Describe each character’s voice as you would describe the character to an actor — their age, background, emotional state, and personality — and generate unique voices for each. A game with ten distinct NPCs can have ten completely unique AI voices generated in under an hour through Voice Design, each saved to the ElevenLabs library and usable via API in the game engine.
Branded Content and Podcast Hosts
Brands and creators who want a consistent AI voice identity for their content — without cloning a real person’s voice — use Voice Design to create a branded synthetic voice. The voice becomes the audio identity of the channel or brand: distinctive, consistent, and owned by the creator. Unlike Default voices (which expire December 31, 2026), designed voices are permanent and not subject to platform deprecation decisions.
Anonymised Documentary and Journalism
Documentary producers and journalists who need to represent real subjects while protecting their identity use Voice Design to create appropriate synthetic voices — ‘a nervous young man in his 20s with a regional English accent’ — that match the character without using the subject’s actual voice. This is a legitimate and common use case for Voice Design in journalism, particularly for sensitive or protected source material.
Multilingual Content Creators
Creators producing content across multiple languages can design voices optimised for specific target markets — a voice designed with ‘native Spanish accent, warm and conversational, suited to Latin American podcast audiences’ will perform better in Spanish-language content than a generic English-optimised voice adapted to Spanish. Voice Design v3 supports 32 languages for voice creation.
Related: ElevenLabs Dubbing complete guide — multilingual content distribution using designed voices
Guidance Scale: The Most Important Hidden Setting
The guidance_scale parameter in Voice Design v3 controls how strictly the model adheres to the prompt versus exercising creative interpretation. High guidance scale (0.7–1.0): the model attempts to match every element of the prompt as literally as possible. The result is often technically accurate but can sound artificial or robotic — particularly for extreme character descriptions. Low guidance scale (0.2–0.4): the model uses the prompt as a creative direction rather than a literal specification, producing voices that feel more natural and human even if they do not match every prompt detail precisely.
ElevenLabs explicitly recommends using lower guidance scale with longer, more detailed prompts. The combination of specific prompting and relaxed guidance produces the most natural-sounding, distinctive voices. A 200-character prompt at guidance scale 0.3 typically produces better results than the same prompt at guidance scale 0.9. Test your prompts at multiple guidance scale values and compare the three generated previews before selecting.
Three Insights Most Voice Design Guides Miss
1. Designed Voices Are Not Affected by the Default Voice Sunset
ElevenLabs is retiring all Default voices on December 31, 2026. This has prompted many users to find replacement voice options. Voice Design is one of the best long-term solutions — designed voices are permanently saved to your account and are not subject to platform-level deprecation decisions. Unlike Default voices or even Library voices (which depend on the creator maintaining them), a voice you designed belongs to your workspace indefinitely. For any workflow that currently relies on a Default voice, replacing it with a Voice Designed equivalent is the most future-proof migration path.
2. The Seed Parameter Enables Reproducible Voice Generation
The seed parameter in the Voice Design API is rarely documented in guides but is practically important for iterative prompt refinement. When you fix the seed and vary only the prompt, the model produces consistent output that differs only based on prompt changes — allowing direct A/B comparison of prompt variations. When you fix the prompt and vary the seed, the model produces different voice interpretations of the same description — useful for exploring the range of valid interpretations before committing to a prompt direction. Use seed for systematic prompt testing rather than guessing what changes will produce which results.
3. Audio Tags Work in Voice Design Preview — Use Them to Test Before Committing
Voice Design v3 previews support Eleven v3 Audio Tags in the preview text parameter. This means you can test how a designed voice handles emotional direction before saving it. Include Audio Tags in your preview text — ‘[whispers] I have a secret to tell you. [excited] And you are not going to believe it!’ — to hear how the designed voice performs emotional range, not just neutral speech. A voice that sounds good reading neutral text may perform poorly with emotional direction, and vice versa. Testing with Audio Tags in the preview saves regeneration credits by catching character-emotion mismatches before the voice is saved and used in production.
Voice Design v3 in 2027
The Voice Design feature is on a clear development trajectory. The eleven_ttv_v3 model will be followed by subsequent versions with improved accent fidelity (particularly for non-Western accents currently underserved), better reproduction of age-specific vocal characteristics, and finer control over prosodic features like rhythm and stress patterns. The reference audio hybrid generation feature — currently in the eleven_ttv_v3 model — will likely become more sophisticated, allowing multi-speaker reference audio and more nuanced blending between prompt and reference. The character limit for prompts (currently capped at 1,000 characters) may expand as the model becomes better at interpreting longer, more detailed character descriptions.
Key Takeaways
- Voice Design v3 creates completely original synthetic AI voices from text descriptions alone — no recording required. Specify age, gender, accent, tone, pace, and character traits for best results.
- Use the eleven_ttv_v3 model ID in the API. The guidance_scale parameter at 0.2–0.4 with detailed prompts produces the most natural results.
- Include Audio Tags in preview text to test emotional performance before saving — catch character-emotion mismatches before committing to production use.
- Designed voices are permanent — unlike Default voices (expiring December 31, 2026), Voice Designed voices belong to your account indefinitely.
- Reference audio parameter allows hybrid generation blending prompt description with an acoustic reference at adjustable strength.
Conclusion
ElevenLabs Voice Design v3 is the most accessible path to a completely unique AI voice identity — no recording session, no audio equipment, no real person’s voice required. For creators building character libraries, branded voice identities, or anonymised content voices, it delivers distinct, quality synthetic voices in under two minutes. The key to strong results is prompt specificity: describe the voice like a casting director — age, background, emotional character, speaking style, and accent — and test with Audio Tags before committing. With Default voices expiring at the end of 2026, Voice Design is the most future-proof replacement option available to any creator or developer currently depending on ElevenLabs’ pre-loaded voice set.
Frequently Asked Questions
What is ElevenLabs Voice Design v3?
Voice Design v3 is ElevenLabs’ text-to-voice generation tool that creates completely original synthetic voices from natural language descriptions. Powered by the eleven_ttv_v3 model, it generates three unique voice previews from a prompt and saves the chosen voice permanently to your account for use in TTS, Studio, and API integrations.
Is Voice Design the same as Voice Cloning?
No. Voice Design creates entirely new synthetic voices that have never existed — generated from text prompts. Voice Cloning (Instant or Professional) replicates the voice characteristics of a real audio recording. Use Voice Design when you want an original voice. Use Voice Cloning when you want to replicate a specific existing voice.
How long do Voice Design voices last?
Permanently — Voice Designed voices are saved to your account and are not subject to deprecation. Unlike Default voices (expiring December 31, 2026) or Library voices (dependent on the creator maintaining them), designed voices belong to your workspace indefinitely.
What model does Voice Design v3 use?
The eleven_ttv_v3 model. In the API, specify model_id: ‘eleven_ttv_v3’. The previous version used eleven_multilingual_ttv_v2, which remains available but produces less expressive and less emotionally nuanced output than v3.
Can Voice Design v3 create accented voices?
Yes — Voice Design v3 supports 32 languages and can generate voices with specific regional accents. Include explicit accent description in your prompt (‘a native Spanish speaker from Buenos Aires’ or ‘a thick Scottish accent, Edinburgh region’) for the most accurate accent reproduction. Lower guidance scale values produce more naturally-sounding accents than high guidance scale values.
Methodology
Voice Design v3 capabilities from ElevenLabs official Voice Design documentation at elevenlabs.io/voice-design and elevenlabs.io/docs/eleven-creative/voices/voice-design. API parameters from ElevenLabs API reference at elevenlabs.io/docs/api-reference/text-to-voice/design. Model comparison from ElevenLabs official voices documentation. Prompt technique guidance from ElevenLabs official prompting documentation and editorial team testing. This article was drafted with AI assistance and reviewed by the editorial team at ElevenLabsMagazine.com.
References
ElevenLabs. (2026). Voice Design. https://elevenlabs.io/voice-design
ElevenLabs. (2026). Voice Design documentation. https://elevenlabs.io/docs/eleven-creative/voices/voice-design
ElevenLabs. (2026). Design a voice API reference. https://elevenlabs.io/docs/api-reference/text-to-voice/design
ElevenLabs. (2026). Voices documentation. https://elevenlabs.io/docs/overview/capabilities/voices
