ElevenLabs Pronunciation Dictionary 2026: Complete Guide to Custom Pronunciations

The ElevenLabs Pronunciation Dictionary is a tool that allows users to define custom pronunciation rules for specific words or phrases, which ElevenLabs TTS models then apply when generating audio containing those words. Rather than relying on the model’s default inference — which is trained on large text corpora but may not correctly handle specialised vocabulary — the dictionary provides explicit pronunciation instructions that override the model’s defaults.

Dictionaries are created and managed in the ElevenLabs dashboard under the Pronunciation Dictionaries section, and can also be created and modified via the API. Each dictionary contains a set of rules — each rule associating a specific word or phrase with either an IPA phoneme representation or an alias substitution. When a TTS request references a dictionary, the model checks the input text against the dictionary rules and applies the specified pronunciation wherever a match is found.

Creating a Pronunciation Dictionary

In the Dashboard

Navigate to ElevenLabs dashboard → Pronunciation Dictionaries (in the left sidebar or under Settings) → Create new dictionary. Give the dictionary a name relevant to its use case (e.g., ‘Medical Terms’, ‘Product Names’, ‘Brand Glossary’). Add rules by entering the word and selecting either IPA phoneme or alias as the method. Save the dictionary — it is now available to apply to any TTS generation in the dashboard or via API.

Via the API

The Pronunciation Dictionary API allows programmatic creation, modification, and application of dictionaries. Key endpoints: POST /v1/pronunciation-dictionaries creates a new dictionary from a PLS (Pronunciation Lexicon Specification) file or from inline rules. GET /v1/pronunciation-dictionaries lists all dictionaries in the workspace. POST /v1/pronunciation-dictionaries/{dictionary_id}/add-rules adds new rules to an existing dictionary. DELETE /v1/pronunciation-dictionaries/{dictionary_id}/remove-rules removes specific rules. To apply a dictionary to a TTS request, include the pronunciation_dictionary_locators parameter in the TTS API call — an array of objects each containing dictionary_id and version_id. Up to 3 dictionaries can be applied per request.

IPA Phonemes vs Aliases: Which to Use

MethodHow It WorksBest ForRequires Phonetics KnowledgeExample
IPA PhonemeDefines exact phonetic representation using IPA notationTechnical/medical terms, foreign names, precise phonetic controlYes — IPA notation required‘GIF’: /dʒɪf/ versus /ɡɪf/
AliasReplaces the word with a different spelling the model reads naturallyBrand names, abbreviations, technical terms where a phonetic spelling existsNo — just write how it sounds‘SQL’: ‘sequel’ OR ‘S Q L’
Alias (number)Replaces number formatting for correct readingPhone numbers, codes, model numbersNo‘GPT-4’: ‘G P T 4’ or ‘GPT Four’

When to Use IPA

IPA phonemes provide the most precise control over pronunciation and are the correct choice when: the word has a specific phonetic realisation that cannot be approximated by a natural English spelling alternative; you need consistent pronunciation across different models and voices; or the word is from a language with non-English phonemes that English spelling cannot represent. IPA is particularly useful for medical terminology (pharmaceutical drug names, anatomical terms), foreign proper nouns (names, cities, companies), and technical terms with counter-intuitive pronunciation.

When to Use Aliases

Aliases are faster to create, require no phonetics knowledge, and work well for most practical creator use cases. The model reads the alias text and applies its natural language inference — if the alias is written to sound like the intended pronunciation, the output will be correct. Use aliases for: brand names with unusual spelling (‘Xiaomi’ → ‘Shao-mi’), abbreviations where you want individual letters or a specific expansion (‘API’ → ‘A P I’ or ‘ay-pee-eye’), product model numbers (‘iPhone 16 Pro’ → ‘iPhone Sixteen Pro’), and technical terms where writing the phonetic spelling is straightforward.

Most Common Creator Use Cases

Brand Names and Product Names

Brand names are the most frequent source of AI mispronunciation in creator content. Names that appear obvious to a human reader — Hyundai, Xiaomi, LVMH, Worcestershire, Quay, Adobe, Ikea — are often mispronounced by AI models trained primarily on text rather than spoken brand context. A brand names pronunciation dictionary is one of the first and most useful dictionaries to create for any creator producing content about technology, business, or consumer products.

Example rules: ‘Hyundai’ → alias ‘HUN-day’; ‘Xiaomi’ → alias ‘Shao-mee’; ‘Worcestershire’ → alias ‘WUS-ter-sheer’; ‘LVMH’ → alias ‘L V M H’; ‘Quay’ → alias ‘KEY’; ‘Adobe’ → alias ‘Ah-DOH-bee’ (for contexts where the model defaults to ‘ADD-obe’).

Technical and Medical Terminology

Technical content creators — technology reviewers, medical educators, legal content producers, scientific communicators — regularly encounter terms that AI models mispronounce because they appear rarely in the text corpora on which models are trained. Medical drug names, chemical compounds, legal Latin phrases, and engineering terminology all benefit from pronunciation dictionary entries.

Example rules: ‘GIF’ → IPA /dʒɪf/ (if you prefer the ‘jif’ pronunciation); ‘SQL’ → alias ‘sequel’ or ‘S Q L’ depending on context; ‘NGINX’ → alias ‘engine-X’; ‘kubectl’ → alias ‘kube-control’; ‘Kubernetes’ → alias ‘Koo-ber-NET-eez’; ‘ibuprofen’ → IPA /ˌaɪbjuːˈproʊfən/; ‘cortisol’ → IPA /ˈkɔːrtɪˌsɒl/.

Abbreviations and Acronyms

Abbreviations and acronyms are a particularly common source of mispronunciation because AI models must infer whether to read them as words (‘NASA’), individual letters (‘FBI’), or expansions (‘etc.’). The Pronunciation Dictionary removes this inference entirely by specifying exactly how each abbreviation should be read.

Example rules: ‘etc.’ → alias ‘et cetera’; ‘e.g.’ → alias ‘for example’; ‘i.e.’ → alias ‘that is’; ‘CEO’ → alias ‘C E O’; ‘FAQ’ → alias ‘F A Q’ or ‘fack’ depending on preference; ‘AI’ → alias ‘A I’; ‘API’ → alias ‘A P I’; ‘URL’ → alias ‘U R L’.

Names from Non-English Languages

Proper nouns from non-English languages — names of people, cities, companies, and places — are consistently among the hardest words for AI models trained primarily on English text. French, German, Japanese, Chinese, Arabic, and other names all have phonetic patterns that English orthography does not represent, and AI models frequently default to English pronunciation rules that produce incorrect output.

Example rules: ‘Musk’ is consistent, but ‘Elon’ → alias ‘EE-lon’ if the model defaults to ‘ELL-on’; ‘Mati Staniszewski’ → alias ‘MAH-tee Stah-nee-SHEF-skee’; ‘Piotr Dąbkowski’ → alias ‘PYOH-tr DOM-kov-skee’; ‘François’ → IPA /fʁɑ̃.swa/; ‘München’ → IPA /ˈmʏnçən/.

Related: For more on ElevenLabs TTS prompting techniques including punctuation and emphasis, see our Eleven v3 guide 2026

Applying Multiple Dictionaries to One Request

Up to 3 pronunciation dictionaries can be applied simultaneously to a single TTS request. This allows creators to maintain separate, specialised dictionaries for different vocabulary categories — a medical terms dictionary, a brand names dictionary, and a proper nouns dictionary — and apply all three to content that contains all three categories.

Dictionary version control: each time a dictionary is modified, a new version is created. The pronunciation_dictionary_locators parameter in the API requires both the dictionary_id and the version_id, meaning you can specify exactly which version of a dictionary to apply. This allows content regenerated at different points in time to use consistent dictionary versions, and allows controlled testing of dictionary updates before applying them to production generation.

PLS File Format for Bulk Dictionary Creation

For creators and developers who need to add large numbers of pronunciation rules, ElevenLabs supports PLS (Pronunciation Lexicon Specification) file upload — a standard XML format for pronunciation dictionaries that can be created in any text editor and uploaded via the API. The PLS format supports both IPA phoneme entries and alias entries, and allows batch creation of hundreds of rules without manual entry through the dashboard interface.

A basic PLS file structure: the root element is a lexicon element with version and alphabet attributes. Each entry is a lexeme element containing a grapheme element (the word to be matched) and either a phoneme element (IPA notation) or an alias element (the substitution text). Multiple grapheme elements in one lexeme allow a single rule to apply to variants of the same word — singular and plural, different capitalisation forms.

Three Insights Most Pronunciation Dictionary Guides Miss

1. Dictionaries Apply to All Models — Including Flash v2.5 for Real-Time Agents

Most documentation examples for Pronunciation Dictionary show it applied to standard TTS generation. The same dictionaries apply to Flash v2.5 voice agent responses — meaning that a voice agent handling customer calls can be configured with a domain-specific pronunciation dictionary so that product names, technical terms, and company-specific terminology are pronounced correctly in real-time conversation. A customer service voice agent for a pharmaceutical company can have drug name pronunciations pre-loaded; a technical support agent can have product model names correctly pronounced. This real-time application of the Pronunciation Dictionary is rarely documented but practically significant for enterprise voice agent deployments.

2. Dictionary Versions Enable Safe A/B Testing of Pronunciation Changes

When a creator adds or modifies a rule in a pronunciation dictionary, a new version is created. If the creator regenerates existing content using the new dictionary version and finds the change has caused unexpected problems in adjacent words, they can revert by specifying the previous version_id in the API call. This version control mechanism makes pronunciation dictionary development safe to iterate on — changes to pronunciation rules cannot corrupt existing approved content because version_id pins the dictionary state used for any given generation.

3. The Alias Method Can Fix Model Errors Beyond Pronunciation

Aliases are described as a pronunciation tool — you provide a spelling substitution that produces the correct sound. They also work for fixing model errors beyond pronunciation: units of measurement that the model reads incorrectly (‘km/h’ → alias ‘kilometres per hour’), currency formats (‘$1.5M’ → alias ‘1.5 million dollars’), date formats (’05/07/26′ → alias ‘May seventh, twenty twenty-six’), and mathematical notation (‘x^2’ → alias ‘x squared’). The alias substitution happens before TTS generation, so anything that produces wrong output due to text format ambiguity can be corrected through dictionary rules.

Key Takeaways

  • ElevenLabs Pronunciation Dictionary defines custom pronunciation rules for specific words — fixing AI mispronunciations of technical terms, brand names, abbreviations, and foreign proper nouns.
  • Two methods: IPA phonemes (precise, requires phonetic notation knowledge) and aliases (faster, write how it sounds). Aliases handle most practical creator use cases without phonetics knowledge.
  • Up to 3 dictionaries per TTS request. Dictionary versions allow safe iteration. PLS file format enables bulk rule creation.
  • Apply to voice agents as well as standard TTS — Flash v2.5 voice agents can use pronunciation dictionaries for domain-specific terminology in real-time customer conversations.
  • Alias method also fixes text format ambiguities — currency, dates, units, mathematical notation — beyond pronunciation errors.

Conclusion

The ElevenLabs Pronunciation Dictionary is the feature that transforms good AI narration into professional AI narration. Any creator producing content with technical vocabulary, brand names, or proper nouns from non-English languages will encounter AI mispronunciations that break listener trust and require awkward workarounds. The Pronunciation Dictionary solves these systematically and permanently: create the rule once, apply it to every generation of content that contains that word. Start with your most frequently mispronounced terms — typically brand names and technical acronyms — build your first dictionary in under 30 minutes, and expand it as you encounter new mispronunciations in production content.

Frequently Asked Questions

What is the ElevenLabs Pronunciation Dictionary?

A tool that defines custom pronunciation rules for specific words, ensuring ElevenLabs TTS generates them with the correct pronunciation rather than relying on the model’s inference. Supports IPA phoneme notation and alias (text substitution) methods. Up to 3 dictionaries can be applied per TTS request.

How do I fix mispronunciations in ElevenLabs?

Create a Pronunciation Dictionary in the ElevenLabs dashboard → add a rule for the mispronounced word → use either IPA phoneme notation (precise) or an alias substitution (write the phonetic spelling). Apply the dictionary to your TTS request. Alternatively, use phonetic spelling directly in the source text — but a dictionary is more efficient for recurring terms.

What is the difference between IPA phonemes and aliases?

IPA phonemes define the precise phonetic representation of a word using International Phonetic Alphabet notation — highly accurate but requires knowledge of IPA. Aliases substitute the word with alternative text that the model reads naturally — faster to create and requires no phonetics knowledge. For most creator use cases, aliases are sufficient.

Can I apply a pronunciation dictionary to voice agents?

Yes — pronunciation dictionaries can be applied to Flash v2.5 voice agent responses as well as standard TTS generation. This allows domain-specific terminology to be correctly pronounced in real-time customer service and voice agent conversations.

How many rules can a pronunciation dictionary contain?

ElevenLabs does not publish a maximum rule count for pronunciation dictionaries. In practice, dictionaries containing hundreds of rules are supported through PLS file upload. The practical limit is the 3-dictionary maximum per TTS request — if your rule set is very large, organise rules into topic-specific dictionaries and apply the relevant ones per content type.

Methodology

Pronunciation Dictionary features from ElevenLabs official documentation at elevenlabs.io/docs and the Pronunciation Dictionary API reference. IPA versus alias comparison from ElevenLabs official documentation. PLS format support from ElevenLabs API documentation. Voice agent application from ElevenLabs Conversational AI documentation and editorial team testing. Use case examples from editorial team testing and creator community reports. This article was drafted with AI assistance and reviewed by the editorial team at ElevenLabsMagazine.com.

References

ElevenLabs. (2026). Pronunciation Dictionary documentation. https://elevenlabs.io/docs/eleven-creative/voices/pronunciation-dictionaries

ElevenLabs. (2026). Pronunciation Dictionary API reference. https://elevenlabs.io/docs/api-reference/pronunciation-dictionaries

ElevenLabs. (2026). Text to Speech best practices. https://elevenlabs.io/docs/overview/capabilities/text-to-speech/best-practices

Recent Articles

spot_img

Related Stories