ElevenLabs Video Generation Guide 2026: Sora 2, Veo 3.1, Kling & Lip-Sync

ElevenLabs video generation is a feature within ElevenCreative Studio that allows users to generate video from text prompts or reference images using leading third-party AI video models, then combine those videos with ElevenLabs’ native audio tools in a unified timeline editor. The feature is currently in beta and requires a paid plan — free plan users can access image generation only.

The video generation feature is accessed via the Image & Video section in ElevenLabs Studio. Users write a text prompt describing the video they want, select a model from the available options, configure model-specific settings (aspect ratio, duration, negative prompts, audio options depending on the model), and generate. Generated videos can be exported standalone or imported directly into a Studio project timeline for audio integration.

Available Video Models

Model	Provider	Resolution	Duration	Credits/Gen	Best For
Sora 2 Pro	OpenAI	720p or 1080p	4, 8, or 12 sec	12,000	Cinematic quality, complex motion, physics simulation
Sora 2	OpenAI	720p	4, 8, or 12 sec	Lower than Pro	General cinematic content at lower cost
Veo 3.1	Google	Up to 1080p	4-8 sec	8,000	Scene consistency, audio-native, brand storytelling
Veo 3	Google	Up to 1080p	4-8 sec	Lower than 3.1	Creative control, negative prompts
Kling 2.5	Kuaishou	Up to 4K	Variable	Lower cost	Best value at scale, native 4K output
Seedance 1 Pro	ByteDance	1080p	Variable	Variable	Lip-sync accuracy, multilingual dialogue
Wan 2.5	Alibaba	1080p	Variable	Lowest	Open-source option, zero-cost alternative
Flux 1 Kontext Pro	Black Forest Labs	Image only	N/A	Image credits	Image generation and editing

Model Selection Guide

Sora 2 Pro — Best for Cinematic Hero Content

OpenAI’s Sora 2 Pro is the highest-quality video generation model available through ElevenLabs, producing cinematic-quality output at 720p or 1080p. Its strengths are physics simulation (realistic fluid, cloth, and object interaction), complex motion (camera movement, multi-character scenes), and overall visual fidelity. The 12,000-credit cost per generation makes it appropriate for hero content — brand films, product launches, key scenes — where quality justifies the premium. Sora 2 Pro does not support end-frame references in the current implementation, limiting precise scene control compared to some alternatives.

Veo 3.1 — Best for Audio-Native Cinematic Content

Google Veo 3.1 generates video with native audio — sound effects and ambient audio are generated simultaneously with the video rather than added in post-production. This makes Veo 3.1 particularly strong for content where the audiovisual feel matters as much as visual quality — brand storytelling, cinematic sequences, nature content. At 8,000 credits per generation (significantly less than Sora 2 Pro), Veo 3.1 offers strong quality at a more accessible price point for production-volume use.

Kling 2.5 — Best Value at Scale

Kling 2.5 from Kuaishou is the most cost-effective option for high-volume video production in ElevenLabs Studio. It is the only model with native 4K output capability, making it technically superior on resolution to competitors that require post-generation upscaling for 4K results. For creators producing large volumes of content — multiple videos per day, social media libraries, course content — Kling 2.5’s pricing makes it the practical production workhorse while premium models are reserved for high-stakes assets.

Seedance 1 Pro — Best for Lip-Sync and Dialogue

ByteDance’s Seedance 1 Pro is the first unified audio-video joint generation model — audio and video are generated simultaneously in a single model pass rather than as separate streams synchronized afterwards. This gives it the most accurate lip-sync of any model in the ElevenLabs suite, particularly for non-English languages where phoneme-level accuracy is technically more difficult. For talking-head content, educational videos with on-screen speakers, and multilingual dialogue content, Seedance 1 Pro produces the most natural lip-sync results.

Wan 2.5 — Open-Source Zero-Cost Option

Wan 2.5 is an open-source video generation model from Alibaba that can also be run locally — but through ElevenLabs Studio, it is accessible without local setup. It is the most affordable option for prototyping and iteration before committing premium credits to final renders. For testing prompt approaches, exploring visual styles, and generating reference variations, Wan 2.5 provides usable output at the lowest credit cost in the suite.

The Lip-Sync Workflow

ElevenLabs integrates two lip-sync tools alongside its video generation models: OmniHuman 1.5 (also listed as Omnihuman) for animating static images into talking videos, and Veed LipSync for dubbing existing video footage. These tools connect ElevenLabs’ core audio capabilities — Professional Voice Cloning, TTS narration — with the visual layer.

OmniHuman 1.5: Animate Static Images

OmniHuman 1.5 takes a still image of a person and animates it to match a provided audio track — producing a talking video from a single photo and a voice recording or TTS-generated audio. The workflow: generate narration using ElevenLabs TTS or a cloned voice, upload a portrait image, apply OmniHuman 1.5 to animate the image to the narration. This creates a talking-head video without any camera recording. For faceless YouTube channels, educational content, and branded spokesperson content, OmniHuman 1.5 eliminates the need to appear on camera while maintaining the engagement of a talking presenter.

Veed LipSync: Dub Existing Video

Veed LipSync applies to existing video footage — it adjusts the on-screen speaker’s lip movements to match a new audio track. The primary use case is dubbing: take an existing English-language video, generate a Spanish, Portuguese, or Japanese voiceover using ElevenLabs Dubbing or TTS, and apply Veed LipSync to synchronise the speaker’s lip movements with the new language audio. The result is a dubbed video where the speaker appears to be speaking the target language natively rather than the audio appearing dubbed over original lip movements.

4K Upscaling

ElevenLabs integrates Topaz Upscale for post-generation video enhancement, allowing generated videos to be upscaled up to 4x — delivering sharper, higher-fidelity output at 4K resolution from models that generate at 1080p. This addresses the limitation that most AI video models (including Sora 2 Pro and Veo 3.1) do not natively generate at 4K — Topaz Upscale produces 4K output from the 1080p source without the quality degradation of standard interpolation.

Upscaling is applied after generation in the Studio workflow: generate video at the model’s native resolution, review for quality, then apply Topaz Upscale if 4K output is required. This keeps upscaling as an optional step that adds quality for distribution contexts (YouTube 4K, broadcast, large-screen displays) without consuming the additional credits on every generation during iteration.

Complete Creator Workflow in Studio

Step 1: Generate the visual

In ElevenLabs Studio, navigate to Image & Video. Select your model based on use case — Kling 2.5 for cost-efficient production, Sora 2 Pro for hero content, Seedance 1 Pro for dialogue content. Write a detailed prompt including subject, environment, camera motion, and style. Generate and review. Iterate with varied prompts at lower-cost models before committing to premium generation.

Step 2: Export to Studio timeline

Click Export to Studio to import the generated video directly into a Studio project timeline. The video populates the video track with the full clip ready for audio layering.

Step 3: Add narration

Add a narration track using ElevenLabs TTS or your Professional Voice Clone. Use Eleven v3 with Audio Tags for emotional delivery direction. The text transcript auto-syncs with the video timeline — editing the transcript regenerates the corresponding audio segment without affecting adjacent clips.

Step 4: Score with ElevenMusic and SFX V2

Add a music track using ElevenMusic generation from a text prompt. Add sound effects on the SFX track using SFX V2 prompts. Both generate directly within Studio without requiring external tools or downloads.

Step 5: Add lip-sync if needed

Apply OmniHuman 1.5 to animate any static character images to the narration audio, or apply Veed LipSync to dub dialogue content into additional languages.

Step 6: Upscale and export

Apply Topaz Upscale for 4K output if required. Export as MP4 with H.264 or H.265 encoding. The complete project — video, narration, music, SFX — exports as a single production-ready file.

Credit Economics: What Video Generation Costs

Model	Credits/Generation	Equivalent TTS Characters	Monthly Pro Plan (500k credits)	Generations per Pro Month
Sora 2 Pro	12,000	12,000 chars (~10 min audio)	500,000 credits	~41 generations
Veo 3.1	8,000	8,000 chars (~6.5 min audio)	500,000 credits	~62 generations
Kling 2.5	Lower (est. 3,000-5,000)	Variable	500,000 credits	~100-165 generations
Seedance 1 Pro	Variable	Variable	500,000 credits	Variable
Wan 2.5	Lowest (est. 1,000-2,000)	Variable	500,000 credits	~250-500 generations

The practical implication: video generation is not compatible with the lower ElevenLabs subscription tiers for regular production use. The Pro plan at $99/month with 500,000 credits provides approximately 41 Sora 2 Pro generations per month — enough for one video per day if each final render uses one premium generation. Scale plan at $330/month (2,000,000 credits) provides approximately 166 Sora 2 Pro generations — more appropriate for production-volume video content.

Three Insights Most ElevenLabs Video Coverage Misses

1. The Aggregator Strategy Is a Competitive Moat, Not a Weakness

Coverage of ElevenLabs video generation often implies that not building proprietary video models is a competitive weakness. The opposite is more accurate. By aggregating Sora 2 Pro, Veo 3.1, Kling 2.5, and Seedance — models that represent billions of dollars of R&D from OpenAI, Google, and ByteDance — ElevenLabs provides access to enterprise-grade video models that most individual creators could not access directly through their native platforms’ waiting lists and partner programmes. ElevenLabs users access Sora 2 Pro as part of a $99/month Studio subscription that also includes TTS, music, and SFX. Accessing Sora 2 Pro directly through OpenAI’s API would require separate enterprise pricing and integration work.

2. Speech Correction Is the Most Underreported Feature

The Speech Correction workflow in ElevenLabs Studio — where editing a text transcript automatically regenerates the corresponding voiceover segment — is a more significant capability for production workflows than the video generation models themselves. Every video content creator who has experienced the pain of re-recording narration to fix a single mispronounced word understands the value: edit the text, the audio updates. No re-recording, no session rescheduling, no credit consumption on sections that did not change. This feature alone justifies Studio for video content creators even if they never use AI video generation.

3. Veo 3.1’s Native Audio Changes the Multimodal Workflow

Most AI video generation models produce silent video that requires audio to be added in post-production. Veo 3.1 generates ambient audio, sound effects, and environmental sound simultaneously with the video in a single generation pass. For content types where the audiovisual environment is integral — nature content, documentary footage, atmospheric scenes — Veo 3.1’s native audio eliminates an entire production step. The audio generated is not always perfect and often benefits from supplementation with ElevenLabs SFX V2 additions, but the baseline ambient audio from Veo 3.1 is usable for many content types without any additional production work.

ElevenLabs Video Generation in 2027

The trajectory of ElevenLabs’ video platform points toward three developments. First, lower credit costs as competition among video model providers drives pricing down — the current 12,000-credit Sora 2 Pro cost is likely to decline significantly over 12-18 months as more providers enter the market. Second, longer video generation — current models cap at 4-12 seconds per clip, requiring assembly of multiple clips for longer content; future model updates will extend this. Third, end-to-end video generation from script — the workflow currently requires separate generation of visuals, narration, and audio; the direction is toward providing a script and generating a complete video automatically.

Key Takeaways

ElevenLabs video generation integrates Sora 2 Pro, Veo 3.1, Kling 2.5, Seedance, Wan, and Flux into Studio alongside native audio tools — a complete multimodal production environment.
Use Kling 2.5 for cost-efficient production volume, Sora 2 Pro for hero cinematic content, Seedance 1 Pro for lip-sync and multilingual dialogue, Veo 3.1 for audio-native atmospheric content.
Sora 2 Pro costs 12,000 credits per generation — video production requires Pro plan ($99/mo) or higher for regular use.
OmniHuman 1.5 animates still images to narration audio. Veed LipSync dubs existing video into new languages. Both connect directly to ElevenLabs’ audio pipeline.
Topaz Upscale delivers 4K output from 1080p generated video without standard interpolation quality loss.

Conclusion

ElevenLabs video generation transforms Studio from an audio production platform into a complete multimodal content creation environment. For creators who already use ElevenLabs for TTS and audio, the video integration provides the visual layer needed for complete video production without adding additional platform subscriptions. The credit economics require realistic planning — video generation is expensive relative to audio generation, and the Pro plan is the practical minimum for regular video production use. Start with Kling 2.5 or Wan 2.5 for prototyping, reserve Sora 2 Pro and Veo 3.1 for final production renders, and build the complete workflow in Studio to take advantage of the unified timeline that makes ElevenLabs video generation more than the sum of its individual model integrations.

Frequently Asked Questions

What video models does ElevenLabs support?

ElevenLabs Studio integrates OpenAI Sora 2 Pro and Sora 2, Google Veo 3.1 and Veo 3, Kling 2.5, Seedance 1 Pro, Wan 2.5, and Flux 1 Kontext Pro for images. Each model has different strengths, resolutions, durations, and credit costs.

How much does ElevenLabs video generation cost?

Sora 2 Pro costs 12,000 credits per generation. Veo 3.1 costs approximately 8,000 credits. Kling 2.5 and Wan 2.5 are lower cost. Video generation requires a paid plan — free plan users can only generate images with a limit of three per day.

Can ElevenLabs generate 4K video?

Kling 2.5 natively supports 4K output. Other models generate at up to 1080p, with Topaz Upscale available to enhance to 4K resolution post-generation.

What is the lip-sync feature in ElevenLabs?

ElevenLabs integrates OmniHuman 1.5, which animates a still portrait image to match audio narration, and Veed LipSync, which synchronises lip movements in existing video to match dubbed audio in a different language.

Is ElevenLabs video generation out of beta?

As of May 2026, video generation remains in beta on paid plans. Feature availability and credit pricing may change as the feature develops toward general availability.

Methodology

Video model specifications from ElevenLabs official video generation documentation at elevenlabs.io/video and elevenlabs.io/docs/creative-platform/playground/image-video. Model capabilities from Winbuzzer ElevenLabs Studio video generation coverage (November 2025). Credit costs from ElevenLabs official pricing documentation and XYZEO ElevenLabs review (February 2026). AI video model comparison from Lushbinary AI video generation comparison (February 2026). Lip-sync tool specifications from ElevenLabs official documentation. This article was drafted with AI assistance and reviewed by the editorial team at ElevenLabsMagazine.com.

References

ElevenLabs. (2026). Video Generation. https://elevenlabs.io/video

ElevenLabs. (2026). Image & Video Documentation. https://elevenlabs.io/docs/creative-platform/playground/image-video

Winbuzzer. (November 2025). ElevenLabs Studio Pivots to Image and Video Generation. https://winbuzzer.com/2025/11/20/elevenlabs-pivots-to-image-and-video-generation/

Lushbinary. (February 2026). AI Video Generation 2026: Sora 2 vs Veo 3.1 vs Kling 3.0 Compared. https://lushbinary.com/blog/ai-video-generation-sora-veo-kling-seedance-comparison/

ElevenLabs Video Generation 2026: Complete Guide to Sora, Veo, Kling & Lip-Sync