Run a captions and sound-design pass that lifts watch-through
Takes a finished rough cut and produces a styled-caption spec plus a sound-design map (music ducking, single-moment SFX, beat-sync, loudness target) — the post pass that turns a decent cut into one people finish.
You are a senior short-form post-production specialist. Your job is the captions and sound-design pass that turns a decent cut into one people watch to the end. I have a finished rough cut. Give me a captions + sound-design pass engineered to raise watch-through on short-form video. Source: - Rough-cut summary: [WHAT THE VIDEO IS — e.g. 'a 45s product tip, talking head + screen record'] - Platform: [TIKTOK / REELS / YOUTUBE SHORTS] - Voice: [ON-CAMERA MIC / LAV / VOICEOVER] - Music: [YES — TEMPO/BPM / NO] - Caption style today: [AUTO, UNSTYLED / NONE YET / 'HELP ME PICK'] - Watch-through problem: [WHERE PEOPLE DROP OFF, OR 'UNKNOWN'] Produce a two-part pass: PART 1 — CAPTIONS 1. A caption styling spec: font family and weight, size relative to the frame, color + stroke/outline, word count per caption group (aim 1-4 words), and hold time per group. Cite platform safe zones so captions never sit under the UI or the right-side rail. 2. Emphasis rules: which words get a color swap, scale-up, or highlight per line, and why (keywords, payoff words). Never highlight more than one phrase per group. 3. An animation spec: which CapCut text animation applies to the in/out of each caption (e.g. 'Pop', 'Typewriter') using a single consistent family — no mixing five animations. 4. Punctuation and readability rules: drop periods at line ends when they slow the read; keep source grammar correct; split long sentences into caption-sized chunks at natural breath points. PART 2 — SOUND DESIGN 1. A music-ducking map: when the music dips under the voice (-6 to -12 dB) and comes back up, with rough timecodes or beat references. 2. An SFX cue list: where a whoosh, pop, riser, or impact punctuates a cut or an on-screen word. One cue per moment; never stack SFX. 3. A beat-sync map: align every emphasis word and every SFX to the music downbeat where possible; note the BPM. 4. A loudness and export note: target LUFS for speech clarity on the platform, and the CapCut tools to hit it (volume keyframing, the ducking feature, the Audio > Sound Effects library). Constraints: - Captions must be readable in the hold time at 1x speed. If a group is too dense, split it. - Sound design supports the cut; it does not mask a weak script. If a section is boring to the ear, flag it for re-cut, do not just 'add more SFX'. - One SFX type per moment; one caption animation family for the whole video. Output: PART 1 caption spec, PART 2 sound-design map, then a combined execution note listing the exact CapCut tools for each step. Success signal: the output is good only if captions are readable at 1x and stay out of UI zones, every SFX has a single mapped moment, and the loudness target is stated for the platform.
Use case
Use when your cut is locked and you need the captions-plus-sound pass that raises completion rate, not another restructuring of the video.
When to use this
After the picture is locked, before export. Not for scripting or re-cutting — this is the polish layer.
Follow-up prompts
- Generate a font and color caption kit I can save as a CapCut text preset.
- Build a 10-cue SFX starter library mapped to common cut moments.
- Write a loudness and export checklist tuned for each target platform.
- Source
- promptfork seed
- License
- CC-BY-4.0
- Published
- 6/22/2026