Veo talking-character scene — lip-sync optimization and the reaction shot technique
Veo's native speech generation is its killer feature — a clip template optimized for clean lip-sync (shorter sentences, consonant-heavy words), plus the reaction-shot technique for variety and the audio mixing levels that make dialogue sound professional.
Veo can generate synced speech — its standout feature. Here's how to get the cleanest results: BASE TEMPLATE: '[Shot size — see shot guide] of [character description] [action while speaking], in [setting], [lighting], [mood/style]. The character says: "[YOUR LINE — see lip-sync optimization]" Audio: [ambient sound at low level], natural lip-sync, [tone of voice], [dialogue -6dB above ambient].' Example: 'Medium close-up of a friendly barista wiping the counter and looking up with a warm smile, in a cozy morning cafe with exposed brick, warm window light with soft shadows, documentary style. The character says: "Morning! The usual today?" Audio: quiet espresso machine hum, gentle background chatter at low level, natural lip-sync, warm and friendly tone, dialogue clear above ambient.' LIP-SYNC OPTIMIZATION (what actually syncs well vs what doesn't): • SHORTER SENTENCES SYNC BETTER: Keep each line to 5-12 words. Longer sentences accumulate sync drift. If you need more dialogue, split into separate clips. • CONSONANT-HEAVY WORDS ARE CLEARER: Words with strong lip movements (P, B, M, F, V, W, TH) sync more visibly than vowel-heavy words. 'Perfect morning for a fresh brew' syncs better than 'Oh, I see you are here again.' • AVOID: Questions with rising intonation at the end (sync often drifts), words with silent letters, rapid-fire dialogue, whispering (lip movement too subtle to sync). • BEST: Declarative sentences, greetings, short questions, exclamations. THE REACTION SHOT TECHNIQUE (for longer dialogue scenes): Don't try to put a full conversation in one clip. Instead: • Clip 1: Character A speaks their line (medium close-up) • Clip 2: Character B LISTENING and REACTING (close-up — nodding, smiling, furrowing brow) with Character A's voice continuing as voiceover • Clip 3: Character B responds This is how real filmmakers handle dialogue — the reaction shot is often more interesting than the speaking shot. It also hides any lip-sync imperfections because the listener's mouth isn't supposed to be moving. SHOT SIZE GUIDE FOR DIALOGUE: • Close-up (face fills frame): Best for emotional lines, confessions, intensity. Lip-sync is most scrutinized here — use your strongest lines. • Medium close-up (head + shoulders): THE SWEET SPOT. Close enough to read lips and emotion, forgiving enough that minor sync issues aren't distracting. • Medium shot (waist up): Good when body language matters (gesturing, working while talking). Lip-sync less critical at this distance. • Wide shot: Avoid for dialogue — lips too small to sync meaningfully. AUDIO MIXING GUIDANCE: • Dialogue should sit approximately -6dB above ambient sound. In practice: describe ambient sounds as 'quiet,' 'low level,' 'subtle background' and the voice as 'clear,' 'prominent,' 'warm tone.' • Always include at least one ambient sound — pure silence feels uncanny. Even 'room tone, quiet air conditioning hum' adds realism. • If adding music in post: keep it -12dB below dialogue. Music should support, never compete with speech. Tips: keep dialogue to ONE short line per clip for the cleanest sync; put the spoken words in quotes so Veo treats them as literal speech; describe the voice TONE (warm, authoritative, playful, conspiratorial) not just the words — delivery matters; for a conversation between two characters, generate each character's lines as separate clips and cut them together in editing.
- Source
- promptfork seed
- License
- CC-BY-4.0
- Published
- 6/22/2026