Veo 3.1
by Google DeepMind · released 2025-10
Google's flagship text-to-video model with synchronized native audio, directable camera, and reference-image conditioning.
When should you reach for Veo 3.1?
Reach for Veo 3.1 when a shot needs cinematic camera grammar — directable lens and movement, natural depth, and synchronized native audio in one 4-8s pass at 720p or 1080p. It's the model to pick when shot direction matters more than narrative arc; for multi-beat storytelling at 15-25s, Sora 2 is the better fit.
TL;DR — Veo 3.1 is the model to reach for when a shot needs to feel cinematic — directable camera, natural depth, and synchronized audio in a single generation.
Specs
| Clip length | 4s, 6s, or 8s |
| Resolution | 720p or 1080p |
| Frame rate | 24fps |
| Aspect ratios | 16:9, 9:16 |
| Native audio | Yes — synchronized dialogue + ambient |
| Modes | Text-to-video, image-to-video, ingredients-to-video |
| API access | Google Cloud Vertex AI (and via aggregators like ShortsFast) |
| Tiers | Standard, Fast, and Lite — Lite ships at $0.05/sec with audio (≈1/4 of Standard) at the same speed as Fast |
Best for
- • Cinematic narrative shots with in-frame dialogue or ambient sound
- • Reference-image conditioning (ingredients-to-video) where a specific character or object must persist
- • Shots where camera grammar matters: dolly, push-in, tracking, rack focus
Weak at
- • Anything longer than 8 seconds — Veo 3.1 caps generation at 8s, so longer scenes require shot stacking
- • Hard physics edge cases (liquid pours, fine hand articulation) when prompts are vague
- • High-frequency strobing or rapid scene cuts inside a single clip
Prompt structure
- Subject — who or what, with one defining attribute
- Action — single verb-driven beat, no compound clauses
- Environment — location + time of day + weather
- Camera — shot size, lens, and movement (dolly / pan / static)
- Lighting — direction, quality (soft, hard), color temperature
- Audio — dialogue in quotes, ambient bed described separately
- Style — film stock, era, or director reference
Paste-ready recipes
Talking-head UGC ad (8s)
A 28-year-old woman in a sunlit kitchen, holding a matte-black coffee mug. She tilts the mug toward camera and says, "This took me ten seconds." Medium close-up, 35mm lens, slow push-in. Soft window light from camera-left, warm 4500K. Ambient: faucet drip, distant traffic. Style: 2020s Apple ad, shallow depth of field.
Note: Quote dialogue verbatim. Veo 3.1 lip-syncs the quoted line.
Cinematic establishing shot (6s)
Wide aerial shot of a fog-covered Norwegian fjord at sunrise. A single fishing boat cuts a wake across mirror-flat water. Camera: drone push-in from 200m altitude descending to 50m. Lighting: cold blue ambient, warm rim from rising sun. Audio: low boat engine hum, distant gulls. Style: Roger Deakins, 35mm anamorphic, slight haze.
Product hero (4s)
A pair of white running shoes spinning on a glossy black turntable in a studio. Camera: locked-off macro, 100mm lens. Lighting: hard rim from behind, soft fill from front, deep black background. Audio: subtle whoosh as the shoe rotates. Style: Nike commercial, high-key product photography, no people.
Image-to-video reference shot (6s)
Reference image: [character_portrait.png]. Animate: the woman from the reference image walks toward camera through a rain-slick Tokyo alley at night. Camera: slow dolly back, 50mm lens. Lighting: neon signs reflecting on wet pavement, magenta and cyan. Audio: rainfall on metal awnings, distant traffic. Style: Blade Runner 2049, anamorphic, shallow focus.
Note: Use ingredients-to-video for character persistence across multiple shots.
FAQ
What is the maximum clip length on Veo 3.1?
Veo 3.1 generates 4-, 6-, or 8-second clips. There is no longer-form mode. Scenes that need to run longer must be assembled by stacking multiple generations and matching motion at the cut.
Does Veo 3.1 generate audio natively?
Yes. Veo 3.1 produces synchronized audio in the same generation — both spoken dialogue (when quoted in the prompt) and ambient soundscape. This is the headline difference vs Kling 2.5 Turbo and Luma Ray 3, which are silent.
Can I use Veo 3.1 for vertical TikTok / Reels content?
Yes. Veo 3.1 supports 9:16 vertical natively at 720p or 1080p. Frame composition adapts — describe the shot for vertical specifically (e.g., 'tall portrait framing, subject centered') for best results.
How is Veo 3.1 priced?
Direct API pricing on Google Cloud Vertex AI is per-second of generation. Aggregators like ShortsFast bundle Veo 3.1 access into a flat $20/mo subscription with credits that also cover Sora 2, Kling 2.5 Turbo, Seedance 2.0, Nano Banana Pro and Flux Pro Ultra.
What is Veo 3.1 Lite and when should I pick it over Fast or Standard?
Veo 3.1 Lite (April 2026) is Google's cheapest Veo tier — $0.05/sec with audio, $0.03/sec without — at the same speed as Fast. Pick Lite for high-volume iteration, social drafts, and rapid concept testing where the marginal quality gain of Fast or Standard isn't worth ~3x the cost. Move up to Fast or Standard for hero shots that ship.
Primary sources
Use Veo 3.1 without the per-model subscription
ShortsFast bundles Veo 3.1 with every other frontier model under one flat $20/mo plan.
Last updated 2026-04-29. ShortsFast has no affiliation with Google DeepMind. Specs are compiled from the vendor's public documentation and verified against primary sources on the date above.