Sora 2 Prompts That Actually Work: 20 Recipes (2026)

20 battle-tested Sora 2 prompt recipes. Each one states the camera shot, beats, lighting, and audio cue so you get a usable clip in one or two generations instead of eight.

By ShortsFast Team

Most Sora 2 prompts fail for the same reason: they describe a vibe instead of a shot. Sora 2’s model has strong cinematography literacy, which means it rewards filmmaker vocabulary (lens, framing, camera move, beat count) and punishes adjective soup (“cinematic, epic, stunning”).

This guide gives you 20 prompt recipes you can paste into Sora 2 and get a usable 10–25 second clip in one or two tries. Each recipe follows the same structure and stays inside the 80–150-word sweet spot where Sora 2 performs best.

Model fact sheet: Sora 2 specs, modes, and recipes.

What Sora 2 actually is (as of April 2026)

  • Duration: 10 to 15 seconds on Sora 2 standard, up to 25 seconds on Sora 2 Pro.
  • Resolution: 720×1280 or 1280×720 on standard; up to 1080p on Pro via ChatGPT subscriptions, up to 1024p via API.
  • Audio: Sora 2 generates synchronized audio with the video in one pass — ambient sound, lip-synced dialogue, footsteps — which is the single biggest jump from Sora 1.
  • Prompt length sweet spot: 80–150 words. Shorter = more creative freedom; longer = more control but more rigidity.

Source: OpenAI Sora 2 Prompting Guide.

The structure every working Sora 2 prompt uses

Every recipe below follows this five-part structure. Deviate from it only when you know why:

  1. Shot type + framing. Wide, medium, close-up, over-the-shoulder.
  2. Lens + camera move. 35mm, 50mm, handheld, slow dolly forward, static. One move per shot. Two moves confuses the model.
  3. Subject and action, in beats. “Beat 1 (0–3s): X. Beat 2 (3–6s): Y.” Count in seconds so Sora 2 paces correctly.
  4. Lighting + palette. Golden hour, harsh noon, neon, tungsten interior, overcast. Name a color palette if it matters.
  5. Audio cue. Ambient bed + any foreground sound. One sentence.

The recipes below assume 1080×1920 vertical (short-form default). Swap to 1920×1080 if you’re producing for horizontal surfaces.


1. Product hero — beverage can

Medium close-up, 50mm lens, slow dolly-in over 4 seconds. A chilled aluminum can with condensation sits on a black slate counter. Beat 1 (0–2s): a single droplet slides down the side. Beat 2 (2–5s): a hand enters frame and picks up the can. Beat 3 (5–8s): reveal the brand label in sharp focus. Cool tungsten key light from camera-right, deep shadow fill. Audio: low refrigerator hum, soft clink when the can moves.

Why it works: one camera move, three clean beats, a concrete object with physical constraints (condensation, hand) the model knows how to render.

2. Faceless founder intro — hands-on-keyboard

Over-the-shoulder, 35mm, handheld with micro-movements. A laptop screen glows in a dark home office. Beat 1 (0–2s): hands type on a mechanical keyboard, two monitors blur in background. Beat 2 (2–6s): the camera slowly tilts up toward the right monitor which displays a codebase. Beat 3 (6–10s): hand reaches for coffee, lifts out of frame. Warm desk-lamp key, cool monitor fill. Audio: tactile keystrokes, distant traffic through a closed window.

3. UGC-style phone unboxing

Medium shot, handheld iPhone-style, 28mm equivalent, no dolly. A creator in a hoodie unboxes a device on a white desk. Beat 1 (0–2s): pulls off the sleeve with one hand. Beat 2 (2–6s): removes the device, shows it to camera. Beat 3 (6–10s): plugs in the charging cable. Window-lit, soft daylight from camera-left. Audio: cardboard tearing, plastic peeling, quiet room tone — no dialogue.

4. Food shot — espresso pull

Extreme close-up, 100mm macro, static. A portafilter attached to a commercial espresso machine. Beat 1 (0–2s): a single drop of crema forms at the spout. Beat 3 (2–8s): twin streams of espresso pour into a white ceramic cup, crema swirling. Practical lighting from a warm ceiling lamp, steam rising into rim light. Audio: pump whine, soft hiss, liquid against ceramic.

5. Street scene — Tokyo alleyway

Wide shot, 24mm, slow dolly forward at walking pace. A narrow Tokyo alley at dusk, wet asphalt reflecting red and blue neon signs. Beat 1 (0–4s): camera moves down the alley, signs in kanji flicker. Beat 2 (4–8s): a figure in a dark coat crosses frame left-to-right in silhouette. Teal-and-magenta palette, high contrast. Audio: distant train, rain on plastic, muted city hum.

6. Nature B-roll — mountain sunrise

Wide shot, 50mm, locked-off static. A mountain ridgeline against the sky before dawn. Beat 1 (0–5s): the first sliver of sun crests the ridge, lens flare blooms. Beat 2 (5–10s): light creeps down the slope revealing textured rock. Warm sunrise palette, deep blue shadow. Audio: high-altitude wind, distant birdcall.

7. Talking-head lifestyle

Medium close-up, 85mm, subtle handheld sway. A person sits in a linen armchair facing slightly off-camera. Beat 1 (0–4s): they look down, then up toward the interviewer. Beat 2 (4–8s): they smile and begin to speak; lip movement visible but no audible dialogue beyond ambient. Window light from frame-right, soft fill, warm earth-tone palette. Audio: muted room tone, distant HVAC.

8. Architecture — brutalist facade

Wide shot, 35mm, slow vertical crane up, 4 seconds. A raw concrete brutalist apartment block fills the frame. Beat 1 (0–2s): camera starts low at the entrance. Beat 2 (2–6s): rises to reveal the repeating balcony pattern. Overcast sky, neutral gray palette. Audio: city drone, wind past concrete.

9. Kinetic logo-ready — particle swirl

Medium shot, static camera, dark void background. Beat 1 (0–2s): a cluster of glowing blue particles drift into frame. Beat 2 (2–6s): they swirl together, coalescing into a rough sphere. Beat 3 (6–8s): particles settle into an orb of soft light. Cool blue key, no fill. Audio: low synth rise, subtle granular texture, no tail on the cut.

10. Cinematic portrait — golden hour

Close-up, 85mm, shallow depth of field, static. A person’s face three-quarter to camera, eyes looking off-frame. Beat 1 (0–3s): a breeze moves a strand of their hair. Beat 2 (3–6s): they blink slowly and exhale. Low-angle golden-hour backlight, warm bokeh highlights. Audio: soft outdoor ambient, no dialogue.

11. Animated B-roll — paper plane

Medium shot, 35mm, tracking right at walking speed. A white paper plane flies across a flat-color studio space. Beat 1 (0–3s): plane enters frame left, follows a gentle arc. Beat 2 (3–6s): plane dips and rises. Beat 3 (6–8s): plane exits frame right. Flat studio lighting, pastel mint-green background. Audio: faint paper flutter, no music.

12. Sci-fi interior — empty spacecraft corridor

Wide shot, 24mm, slow dolly forward. A metallic corridor with ribbed walls and recessed floor lighting. Beat 1 (0–4s): camera advances; lights pulse on in sequence. Beat 2 (4–8s): a side door slides open in the distance. Cyan keys and amber fills, high contrast. Audio: low mechanical hum, door pneumatic hiss.

13. Cafe window POV

Medium close-up through rain-streaked glass, 50mm, static. Beat 1 (0–4s): rain traces down the window in focus. Beat 2 (4–8s): rack focus to the street outside — blurred figures with umbrellas pass. Overcast daylight, desaturated palette. Audio: muffled rain, espresso machine from inside, indistinct chatter.

14. Product demo — app in hand

Over-the-shoulder, 28mm, handheld. A phone screen showing a mobile app. Beat 1 (0–3s): finger taps a button, UI animates. Beat 2 (3–6s): a card slides up from the bottom. Beat 3 (6–8s): finger swipes the card away. Soft overhead daylight, cool-neutral palette. Audio: subtle UI ticks, no music bed.

15. Dog at the beach

Medium wide, 50mm, tracking right at a run. A golden retriever runs along wet sand, waves breaking in background. Beat 1 (0–4s): the dog bounds into frame. Beat 2 (4–8s): it splashes through an incoming wave. Golden-hour low sun, backlit spray, warm palette. Audio: wave crash, paw-splashes, distant gulls.

16. Kinetic typography — single word

Medium shot, static camera, off-white paper background. Beat 1 (0–2s): a black serif word types itself letter-by-letter. Beat 2 (2–5s): the word underlines itself with a single brushstroke. High overhead daylight, clean shadow. Audio: typewriter key clacks, pen scrape.

17. Kitchen skill shot — knife cut

Extreme close-up, 100mm macro, static. A chef’s knife on a walnut board. Beat 1 (0–2s): knife begins a rocking chop on a bunch of parsley. Beat 2 (2–6s): rhythmic chopping, herbs fall forward. Practical overhead pendant lamp, warm key. Audio: crisp knife-on- wood, no music.

18. Lifestyle vignette — morning coffee pour

Medium shot, 50mm, very slight handheld drift. A hand pours coffee from a gooseneck kettle into a paper filter. Beat 1 (0–3s): pour begins, water blooms the grounds. Beat 2 (3–8s): steady pour in circular motion. Window-side daylight, off-white palette, warm wood counter. Audio: water into cup, steam hiss, morning ambient.

19. Testimonial B-roll — typing + smile

Medium close-up, 50mm, subtle push-in over 6s. A person at a laptop in a bright home office. Beat 1 (0–3s): they type intently. Beat 2 (3–6s): they read the screen and smile. Beat 3 (6–8s): they glance off to a colleague. Soft window key from camera-left, warm cream palette. Audio: keystrokes, faint background music, no dialogue.

20. Abstract intro — liquid metal

Medium shot, static camera, black background. Beat 1 (0–3s): a pool of liquid silver mercury ripples in place. Beat 2 (3–6s): spikes rise from the surface and collapse back. Beat 3 (6–8s): the pool smooths to a mirror. Hard specular top light, no fill. Audio: low resonant tone, metallic chime tails.


Patterns that separate working prompts from failed ones

After running hundreds of generations, the pattern is consistent:

Working prompts say “Beat 1 (0–3s): X. Beat 2 (3–6s): Y.” The model paces correctly. Failing prompts say “slowly, then quickly” — the model interprets speed inconsistently.

Working prompts name one camera move. “Slow dolly forward.” Failing prompts chain three moves: “dolly in, then pan left, then tilt up.” Sora 2 will blend them into mud.

Working prompts state lens + lighting explicitly. “50mm, golden hour backlight.” Failing prompts use vibe words: “cinematic, dreamy, beautiful.” The model will pick a default and ignore you.

Working prompts specify audio. Even a one-line audio cue changes output dramatically. Sora 2 generates audio in the same pass as video, and a concrete cue (“espresso machine hiss, no music”) gives you a clip you can drop into an edit unchanged.

Working prompts stay 80–150 words. The OpenAI Sora 2 prompting guide calls this the quality sweet spot; it matches what we’ve seen in production.

Credit economics — generate small, upscale the winner

A practical tip that will save you 3–4x on credits: generate at 720p first on Sora 2 standard, pick the seed that worked, then re-run on Sora 2 Pro at 1080p with the same prompt and seed. You don’t pay Pro rates for the 10 failed attempts.

This workflow works on Kling 2.5 Turbo, Veo 3.1, and Seedance 2.0 too. Sora 2 ignores some seed-control features, so duplicate the prompt verbatim and keep the first-attempt output as your style reference.

Where to run these recipes

You can run every prompt above through Sora 2 directly via sora.com, through ChatGPT Pro / Pro Plus, or through an aggregator. If you want to compare the same recipe across Sora 2, Veo 3.1, Kling 2.5 Turbo, and Seedance 2.0 on one account, ShortsFast bundles them under one flat $20/mo subscription.

The aggregator move is specifically useful for this workflow: the same recipe produces very different clips across models, and the model that wins a given shot isn’t always the one you’d guess. If you only have Sora 2, you’ll accept its first output. If you can run four models, you’ll keep the best of four.

FAQ

Does Sora 2 generate audio?

Yes. Sora 2 was the jump that added synchronized audio generation — ambient bed, foreground sound, and lip-synced dialogue — in the same pass as video. This is the single biggest practical upgrade vs Sora 1.

How long can a Sora 2 clip be?

Sora 2 standard generates clips up to about 15 seconds. Sora 2 Pro extends that to 25 seconds. Both are dramatic upgrades from Sora 1’s 6-second cap.

What resolution does Sora 2 output?

Sora 2 standard tops out at 720p (720×1280 portrait or 1280×720 landscape). Sora 2 Pro goes to 1080p via the ChatGPT subscription or 1024p via API.

How many words should a Sora 2 prompt be?

80–150 words is the quality sweet spot. Shorter gives Sora 2 more creative freedom; longer gives you more control at the cost of rigidity. Stay in that range unless you have a specific reason to leave it.

Can I get consistent characters across shots in Sora 2?

Not reliably with text prompts alone. For character consistency, upload a reference image when possible and use the same seed. Sora 2’s character consistency is weaker than Kling 2.5 Turbo’s on human faces; if your short depends on the same person across multiple clips, consider generating in Kling and using Sora 2 for the scenes that don’t require face continuity.


Sources


Last updated 2026-05-01. Sora 2 ships fast — if something above breaks, it likely means OpenAI changed a default. Re-check the OpenAI Sora 2 Prompting Guide for the current canonical reference.

Written by ShortsFast Team at ShortsFast. Editorial standards →