Run a captions and sound-design pass that lifts watch-through

Takes a finished rough cut and produces a styled-caption spec plus a sound-design map (music ducking, single-moment SFX, beat-sync, loudness target) — the post pass that turns a decent cut into one people finish.

Open in Studio

Prompt

You are a senior short-form post-production specialist. Your job is the captions and sound-design pass that turns a decent cut into one people watch to the end.

I have a finished rough cut. Give me a captions + sound-design pass engineered to raise watch-through on short-form video.

Source:
- Rough-cut summary: [WHAT THE VIDEO IS — e.g. 'a 45s product tip, talking head + screen record']
- Platform: [TIKTOK / REELS / YOUTUBE SHORTS]
- Voice: [ON-CAMERA MIC / LAV / VOICEOVER]
- Music: [YES — TEMPO/BPM / NO]
- Caption style today: [AUTO, UNSTYLED / NONE YET / 'HELP ME PICK']
- Watch-through problem: [WHERE PEOPLE DROP OFF, OR 'UNKNOWN']

Produce a two-part pass:

PART 1 — CAPTIONS
1. A caption styling spec: font family and weight, size relative to the frame, color + stroke/outline, word count per caption group (aim 1-4 words), and hold time per group. Cite platform safe zones so captions never sit under the UI or the right-side rail.
2. Emphasis rules: which words get a color swap, scale-up, or highlight per line, and why (keywords, payoff words). Never highlight more than one phrase per group.
3. An animation spec: which CapCut text animation applies to the in/out of each caption (e.g. 'Pop', 'Typewriter') using a single consistent family — no mixing five animations.
4. Punctuation and readability rules: drop periods at line ends when they slow the read; keep source grammar correct; split long sentences into caption-sized chunks at natural breath points.

PART 2 — SOUND DESIGN
1. A music-ducking map: when the music dips under the voice (-6 to -12 dB) and comes back up, with rough timecodes or beat references.
2. An SFX cue list: where a whoosh, pop, riser, or impact punctuates a cut or an on-screen word. One cue per moment; never stack SFX.
3. A beat-sync map: align every emphasis word and every SFX to the music downbeat where possible; note the BPM.
4. A loudness and export note: target LUFS for speech clarity on the platform, and the CapCut tools to hit it (volume keyframing, the ducking feature, the Audio > Sound Effects library).

Constraints:
- Captions must be readable in the hold time at 1x speed. If a group is too dense, split it.
- Sound design supports the cut; it does not mask a weak script. If a section is boring to the ear, flag it for re-cut, do not just 'add more SFX'.
- One SFX type per moment; one caption animation family for the whole video.

Output: PART 1 caption spec, PART 2 sound-design map, then a combined execution note listing the exact CapCut tools for each step.

Success signal: the output is good only if captions are readable at 1x and stay out of UI zones, every SFX has a single mapped moment, and the loudness target is stated for the platform.

Use case

Use when your cut is locked and you need the captions-plus-sound pass that raises completion rate, not another restructuring of the video.

When to use this

After the picture is locked, before export. Not for scripting or re-cutting — this is the polish layer.

Run a captions and sound-design pass that lifts watch-through

Use case

When to use this

Follow-up prompts

Explore more

More prompts you might like

Engineer a hook-driven short-form edit plan with beat-synced retention beats

Spec a repeatable CapCut template and batch workflow for an episodic series

Sora cinematic drone / aerial establishing shot

Runway Gen-2 VFX element for compositing

YouTube first 30 seconds engineered for maximum retention (with re-hooks by content type)

Cinematic book trailer concept + shot list