Seedance 2.0
by ByteDance · released 2026-02
ByteDance's ground-up rebuild — up to 2K with native synchronized audio and a 12-asset multi-reference input.
When should you use Seedance 2.0?
Use Seedance 2.0 when one shot needs many references at once — feed up to 9 image, 3 video and 3 audio refs into a single 4-15s generation at up to 2K with synchronized audio. It's the highest-resolution audio-native option in the 2026 lineup. Pick Veo 3.1 instead when shot grammar matters more than reference compositing.
TL;DR — Seedance 2.0 wins when a shot needs multiple references at once — feed up to 9 images, 3 videos, and 3 audio clips into a single 2K generation with synced audio.
Specs
| Clip length | 4s to 15s |
| Resolution | Up to 2K (1080p and 720p tiers also) |
| Frame rate | 24fps or 30fps |
| Aspect ratios | 16:9, 9:16, 1:1, 4:3, 3:4 |
| Native audio | Yes — stereo, synchronized in one pass |
| Reference inputs | Up to 12 assets — 9 images, 3 videos, 3 audio clips |
| Modes | Text-to-video, image-to-video, multi-reference |
| Access | ByteDance Seed, fal.ai, Replicate, aggregators (ShortsFast) |
Best for
- • Multi-reference compositions: up to 9 image refs + 3 video refs + 3 audio refs in one prompt
- • 2K vertical output for premium TikTok / Reels finals where 1080p shows compression
- • Mid-length narrative shots up to 15 seconds with synced ambient + dialogue in a single pass
Weak at
- • Strict camera-grammar adherence — Veo 3.1 still follows specific lens/lighting directions more literally
- • Generations longer than 15 seconds — that's the per-call ceiling
- • Tiny on-frame text (logos, dense UI) at sub-1080p crops
Prompt structure
- Subject — clear noun phrase tied to a reference if you have one
- Action — single beat with a motion endpoint
- Environment — 3 concrete elements, no adjective dump
- Camera — one shot + one move
- Lighting — direction + quality
- Audio — dialogue in quotes, ambient bed described separately
- References — list each attached asset and what it's for ('image_1: subject', 'image_2: location')
Paste-ready recipes
Multi-reference UGC ad (10s, 2K)
Reference image_1: subject_portrait.png. Reference image_2: kitchen_set.png. Reference audio_1: ambient_morning.wav. Animate: the woman from image_1 stands at the counter from image_2, lifts a matte-black mug, and says, "This took me ten seconds." Medium close-up, 35mm, slow push-in. Soft window light from camera-left. Audio bed from audio_1 at -12dB under the dialogue. Style: 2020s Apple commercial, shallow depth of field, 9:16 vertical, 2K.
Note: List each reference asset by index in the prompt — Seedance 2.0's multi-reference parser binds them by name.
Mid-length cinematic (15s)
Wide drone shot of a coastal cliff at sunrise. Beat 1 (0-5s): camera rises from sea level, revealing the cliff face. Beat 2 (5-10s): a single hiker walks along the ridge from left to right. Beat 3 (10-15s): camera pivots into the rising sun, lens flare, hold. 24mm equivalent, smooth gimbal. Cool ambient + warm rim from sun. Audio: wind, distant gulls, no music. Style: nature documentary, 16:9, 2K.
Start+end product reveal (8s)
Reference image_1: shoe_packshot_front.png. Reference image_2: shoe_packshot_3q.png. Animate: the shoe rotates from front view to a three-quarter angle, then a soft spotlight pans across it. Locked macro, 100mm. Hard rim light from behind, soft fill front, deep black background. Audio: subtle whoosh on the spotlight pan. Style: sneaker drop ad, 1:1, 2K.
Multi-character dialogue (12s)
Two friends in their late 20s sit on a rooftop at golden hour. Beat 1: the first laughs and says, "You actually shipped it." Beat 2: the second smirks and replies, "Day six." Locked medium two-shot, 50mm. Warm 5500K key from camera-right, cool fill from sky. Audio: both voices clear, faint city traffic below, no music. Style: indie short film, 9:16, 2K.
FAQ
Is Seedance 2.0 just Seedance 1.0 with more parameters?
No. ByteDance positions Seedance 2.0 as a ground-up rebuild — different architecture, different training, native audio added in-pass, and a multi-reference input system that 1.0 never had. The 2.0 release shipped February 7, 2026.
How many reference assets can Seedance 2.0 take?
Up to 12 per generation: 9 images, 3 videos, and 3 audio clips. This is the largest multi-reference window of any 2026-era video model and the headline reason to pick Seedance for compositional shots that mix character refs, location refs, and audio bedding.
Does Seedance 2.0 generate audio?
Yes — stereo, synchronized in the same generation pass. Both spoken dialogue (when quoted in the prompt) and ambient bed render together. This puts Seedance 2.0 alongside Veo 3.1 and Sora 2 on the audio-native side of the line, opposite Kling 2.5 Turbo.
Seedance 2.0 vs Veo 3.1 — which one?
Pick Seedance 2.0 when you need multiple references in one shot or 2K output. Pick Veo 3.1 when shot grammar (specific lens, specific lighting setup, specific camera move) needs to land literally. Both are audio-native; Seedance has the higher resolution ceiling, Veo has tighter directability.
Primary sources
Use Seedance 2.0 without the per-model subscription
ShortsFast bundles Seedance 2.0 with every other frontier model under one flat $20/mo plan.
Last updated 2026-04-27. ShortsFast has no affiliation with ByteDance. Specs are compiled from the vendor's public documentation and verified against primary sources on the date above.