Professional AI Video Making Guide

01

Script Writing for AI VideoFoundation of every great AI video — structured, timed, avatar-ready

Script Structure (60-second format)

0:00 – 0:08

HOOK — Grab Attention

Bold question, surprising stat, or relatable pain point. 2–3 punchy sentences. Under 30 words. This makes or breaks your video.

0:08 – 0:20

SETUP — Build the Problem

Identify the core tension. Create curiosity. Make the viewer feel understood. Never rush past this — it sells the solution.

0:20 – 0:45

SOLUTION — Core Value Delivery

4–6 sentences. Specific tools, techniques, stats, or mindset shifts. Each sentence earns its place — cut anything vague.

0:45 – 0:55

CALL TO ACTION

One or two direct instructions. "Follow for Part 2." "Save this." "Try this today." Simple, specific, actionable.

0:55 – 1:00

CLOSING LINE — Mic Drop

8–12 words. Quotable. Emotionally resonant. The line viewers screenshot and share. Make it memorable.

Word Count Targets

📊

Total Script Length

130–150 words for a 60-second video. At natural speaking pace (~150 wpm), this lands perfectly without rushing or dead air.

~150 wpm pace

✍️

Writing Rules

Second person ("you", "your"). Active voice. Short sentences. No filler words. Specific beats vague every time.

Conversational tone

🎯

Shot Mapping

Mark each sentence with the shot it belongs to. Helps you sync visuals to audio and prevents over-cutting or under-cutting.

Pre-production step

📋 Script Prompt Template for Claude/ChatGPT

Write a 60-second video script about [YOUR TOPIC].
Structure: Hook (0–8s) → Setup (8–20s) → Solution (20–45s) → CTA (45–55s) → Closing Line (55–60s).
Tone: energetic, second person, no filler.
Target: 130–150 words. End with a quotable closing line.
Also provide: avatar gender, outfit, setting, and shot descriptions for each section.

Pro tip: Always write your script BEFORE designing your avatar or scenes. The script dictates the emotion, energy, and environments needed — don't design in a vacuum.

02

Avatar Design in HiggsfieldCreating a consistent, believable AI presenter

👤

Ethnicity & Age

Specify exact range (e.g., "South Asian woman, late 20s"). Vague = inconsistent results.

💇

Hair

Color, length, style. "Shoulder-length black hair, slight wave" beats just "dark hair".

👁️

Eyes & Face

Eye color, face shape, defining features. More detail = more consistent renders.

📐

Body Type

Build, height cues. Avoid extremes. "Athletic medium build" works well in Higgsfield.

😊

Expression

Match energy to section: expressive for hook, focused for solution, warm for CTA.

🔁

Consistency

Copy-paste the EXACT same avatar description into every shot. Never vary it.

Higgsfield Avatar Workflow

1

Upload a Reference Photo (Optional but Powerful)

Higgsfield's "Custom Avatar" mode lets you upload a real face photo. Use a clean, well-lit front-facing shot. This massively improves consistency across shots.

2

Use Higgsfield's Character Builder

Navigate to "Create Avatar" → fill in the character card: name, appearance, personality. Higgsfield stores this as a reusable template — use it for every scene in your video.

3

Set Default Pose & Expression

Under Avatar Settings, define a "neutral pose" — standing, slight 3/4 turn toward camera, natural arm position. This serves as your baseline for all non-action shots.

4

Generate Test Frames

Before recording full scenes, generate 5–8 still frames of the avatar in different expressions. Check face consistency. If anything drifts, tighten the description.

Higgsfield tip: Use the "Lock Character" feature after you're happy with your avatar. This prevents Higgsfield from drifting the look across scenes when you change prompts.

03

Avatar Wardrobe & ClothingOutfit design rules for topic-matched, scene-consistent clothing

Outfit-to-Topic Mapping

Topic Category	Recommended Outfit	Colors to Use
AI & Productivity	Smart casual — fitted shirt, clean chinos, minimal jewelry	White, navy, cream, slate
Health & Wellness	Activewear — fitted top, leggings/joggers, light layers	Soft pastels, earthy tones, athletic grays
Career & Business	Business casual — blazer, neat top, subtle accessories	Charcoal, white, navy, tan
Tech & Gadgets	Streetwear-tech — hoodie, joggers, visible wireless earbuds	Dark neutrals, pops of accent color
Daily Lifestyle	Comfortable lifestyle — cozy top, relaxed jeans, put-together	Warm neutrals, muted tones

How to Describe Clothing in Higgsfield Prompts

❌ Vague (avoid this)

wearing casual clothes, nice outfit, professional attire

✅ Specific (use this)

white linen short-sleeve shirt, fitted navy chinos, clean white low-top sneakers, minimal silver watch on left wrist, no other jewelry, shirt untucked, neat and fresh

Indoor vs Outdoor Clothing Adjustments

🏠

Indoor Wardrobe Notes

Avoid shiny fabrics (creates render artifacts under studio lights). Matte textures — linen, cotton, knit — render most cleanly. Add subtle layering (open overshirt) for visual depth.

🌳

Outdoor Wardrobe Notes

Lighter fabrics that move slightly look more realistic outdoors. Add sunglasses, cap, or jacket depending on time of day. Wind effects work better with looser garments.

Common mistake: Changing outfit between shots of the same video. Always copy-paste your full clothing description into every shot prompt — even if you're just changing the scene, keep the outfit identical.

04

Indoor & Outdoor Scene DesignBuilding cinematic backgrounds that support (not distract from) your avatar

🏢

Modern Office / Studio

White walls, minimal shelving, soft plant. Large window with diffused daylight. Blurred mid-range depth.

Productivity Business

🌿

Outdoor — Golden Hour

Park, rooftop, or street at sunset. Warm directional light. Slight bokeh on background. Subject backlit with rim light.

Lifestyle Wellness

💡

Dark Studio — Neon Accent

Near-black background. Single accent color light source (purple, teal, or amber). High contrast silhouette. Dramatic and modern.

Tech AI content

☕

Cafe / Co-Working Space

Warm ambient light, coffee cups, wooden surfaces, blurred people in background. Creates relatable, approachable energy.

Lifestyle Daily habits

🏙️

City Street / Urban

Busy street, soft focus background pedestrians, directional sunlight or overcast diffusion. Great for establishing shots.

Energy Travel

🌅

Outdoor — Natural / Forest

Morning mist, dappled light through trees, natural greens. Creates calm, authentic atmosphere for wellness or mindfulness topics.

Wellness Mindfulness

Scene Prompt Structure (Higgsfield / Veo 3)

SCENE PROMPT
[Avatar appearance + outfit] [action / pose / gesture].

      Scene: [Location] + [time of day] + [atmosphere / mood].

      Camera: [Shot type: close-up / medium / wide] + [movement: push in / pan / static].

      Lighting: [Light source, quality, color temperature].

      Background: [Depth, elements, blur level].

      End tags: cinematic, 9:16 vertical, photorealistic, high detail

Scene depth rule: Always specify 3 layers — foreground (avatar zone), midground (immediate environment), background (blurred setting). This creates genuine cinematic depth that separates your content from flat AI video.

05

VFX & Visual EffectsAdding motion, transitions, text effects, and atmospheric elements

VFX Types Available in AI Video Tools

🌊

Camera Motion VFX

Slow push-in, orbit, crane up, rack focus. These are the most cinematic and render cleanly in Higgsfield. Always specify direction and speed.

In-prompt control

⚡

Atmospheric Effects

Light rays, lens flare, bokeh, dust particles, fog, heat haze. Layer 1–2 max. More than that overwhelms the avatar. Specify intensity (subtle/strong).

In-prompt

✂️

Transition Effects

Cross-dissolve (universal), whip pan (energy), zoom transition (modern). Handled in CapCut or Premiere after AI generation. Not in Higgsfield prompts.

Post-production

📝

Text Overlays & Captions

Auto-caption in CapCut. Keyword callouts (highlight key stat/word on screen). Lower-third name plates. Use high-contrast, readable fonts — Helvetica or Inter.

Post-production

🌟

Particle & Energy FX

Digital particles, glowing orbs, energy trails. Good for tech/AI topics. Add via CapCut FX pack or RunwayML's VFX layers after generating base footage.

Layered in CapCut/Runway

🎞️

Color Grading

Apply a LUT in post for mood consistency. Teal-and-orange for cinematic warmth. Desaturated gray for serious/corporate. Warm golden for lifestyle topics.

Post-production LUT

VFX Prompt Examples for Higgsfield
Camera VFX
...slow cinematic push-in toward avatar's face, subtle lens flare as sun catches the edge of frame, very slight depth-of-field shift from background to foreground...
      
Atmospheric VFX
soft volumetric light rays through window, fine floating dust particles in mid-air, warm morning haze on the street behind subject...
      

06

Duration Control & Shot PlanningMatching visual timing to your voiceover — the core editing skill

Shot Duration Rules

Duration	Shot Type	Best For
6 seconds	Fast cut	Hook opener, closing line, reaction beat, transition moment
8 seconds	Standard	Setup explanation, CTA delivery, mid-video transitions
10 seconds	Extended	Solution demo, complex gesture/action, key insight delivery
12–15 seconds	Slow burn	Emotional moments, closing sequences, dramatic reveals

60-Second Video Shot Map (Example)

6s

Shot 1 — Hook

Fast cut. Close-up. Avatar directly addresses camera with bold opening question. High energy expression.

8s

Shot 2 — Setup

Medium shot. Slight camera push-in. Avatar builds the problem with gestures. Start of emotional connection.

10s

Shot 3 — Solution A

Extended. Avatar demonstrates or explains first key point. Camera at mid-distance, static or slow orbit.

10s

Shot 4 — Solution B

Extended. Cut to complementary scene (different environment or angle). Avatar delivers second key point.

8s

Shot 5 — CTA

Standard. Avatar direct-to-camera. Warm, confident. Specific call to action. Slight zoom-out feeling of openness.

6s

Shot 6 — Closing Line

Fast cut. Close-up again. Avatar holds eye contact. Delivers mic-drop closing line. Static camera — let the words land.

Total Shot Time 6+8+10+10+8+6 = 48s

Add 2 × 6s B-roll cutaways to reach 60s total, or extend Solution shots to 12s for a more relaxed pace.

Duration control in Higgsfield: Use the "Video Length" slider when generating each clip. Generate each shot separately at its target duration, then assemble in CapCut or Premiere. Do not generate the full 60 seconds in one shot — quality drops significantly.

07

AI Voiceover GenerationCreating professional narration that matches your script and avatar energy

Top AI Voiceover Tools

🎙️

ElevenLabs

Most realistic AI voices. Emotion control. Clone your own voice in 30 seconds. Multilingual.

Best Quality

🔊

PlayHT

Huge voice library, fast generation, good for batch production. Affordable pricing tiers.

Good Value

📱

CapCut AI Voice

Built directly into the editing workflow. Fast, free tier available. Lower quality but super convenient.

Most Convenient

🎤

Murf AI

Studio-quality voices with pitch/speed/emphasis control. Good for corporate and explainer content.

Studio Grade

Voiceover Settings for Punchy Short-Form

⚡

Speed Setting

110–115% of normal speed. Short-form audiences expect a slightly faster pace. Too slow = boring. Too fast = hard to follow.

📊

Emphasis Tags

Use <emphasis> tags in ElevenLabs to stress key words. Matches natural speech rhythm. Never let the AI emphasize randomly.

⏸️

Pause Control

Add deliberate 0.3–0.5s pauses after hook question and before closing line. Let powerful statements breathe.

🎵

Music vs Voice Balance

Background music at –18dB to –22dB when voice is present. Voice at 0dB reference. Duck the music by –6dB more at hook and closing line.

Pro workflow: Generate voiceover FIRST, then time your video shots to the audio waveform — not the other way around. This is how professional video editors work. Audio is the spine; video is the body.

08

Lip Sync & Voice MatchingMaking AI avatars speak with believable, matched mouth movement

How Lip Sync Works in AI Video

🧠

AI-Driven Lip Sync

Tools like Higgsfield, D-ID, and HeyGen analyze your audio file and generate matching mouth movements frame-by-frame using phoneme mapping.

🔑

Phoneme Accuracy

Professional lip sync works at the phoneme level — individual mouth sounds like "m", "f", "oh". The more phoneme data available, the more realistic the result.

⚙️

Frame Rate Matters

Generate avatar video at 24 or 30fps. Lip sync tools are calibrated to standard frame rates. 60fps can cause interpolation artifacts in mouth movement.

Step-by-Step Lip Sync Workflow

1

Generate Clean Voiceover Audio

Export from ElevenLabs/Murf as WAV (44.1kHz, 16-bit). Avoid heavy compression or EQ at this stage — lip sync engines work better with clean audio.

2

Generate "Silent" Avatar Video First

In Higgsfield, generate your avatar clips with natural expression but NO audio dependency. You'll add speech movement in the next step.

3

Upload to Lip Sync Tool

Use Higgsfield's built-in Lip Sync (best for Higgsfield avatars), HeyGen Lip Sync, or SyncLabs. Upload: (a) avatar video clip, (b) voiceover audio. The tool maps speech to mouth.

4

Adjust Sync Offset

Most tools have a +/–frame offset control. Set to 0 first, preview, then nudge ±1–2 frames if mouth is slightly ahead or behind the audio. This is the most critical quality step.

5

Check for Artifacts at Cut Points

Lip sync can create mouth-shape errors at the first and last frame of each clip. Add a 2-frame crossfade transition between clips in your editor to mask these.

6

Re-check Full Assembly

After assembling all clips in CapCut/Premiere, watch the full video with headphones. Listen for any audio–visual desync, especially after transitions.

Lip sync tip: If the avatar's mouth looks slightly off (common), don't re-render the whole video. Instead, add a very brief cutaway (B-roll, text overlay, or cut to close-up of hands/environment) to cover the desync point. This is what professional editors do.

🔄

SyncLabs (sync.so)

Best standalone lip sync tool. Upload any video + any audio. Outputs with matched lip movement. Free tier available.

Recommended

🤖

HeyGen Lip Sync

Excellent for talking head clips. Integrates with their avatar studio. Good multilingual support.

Great for Multi-lang

09

Full Production WorkflowThe complete end-to-end process from idea to published video

Phase 1 — Pre-Production

Script → Avatar Design → Shot Plan

Write script with timing markers. Design avatar description card. Create shot-by-shot breakdown with duration targets. Map voiceover sentences to each shot.

Phase 2 — Voice Generation

Record or Generate Voiceover

Generate in ElevenLabs with correct speed, emphasis, and pauses. Export as clean WAV. Listen back fully — fix any mis-pronunciations before moving forward.

Phase 3 — Video Generation

Generate Each Shot in Higgsfield

Generate one shot at a time using your prompt templates. Save each clip with a clear filename (shot01_hook_6s.mp4). Generate 2–3 variations per shot and pick the best.

Phase 4 — Lip Sync

Apply Lip Sync Per Clip

Upload each clip to SyncLabs/Higgsfield with its corresponding voiceover segment. Check sync offset. Export synced clips.

Phase 5 — Assembly

Edit in CapCut / Premiere

Assemble synced clips in order. Add transitions (crossfade 2–3 frames). Layer background music at –20dB. Add text overlays and captions. Apply color LUT.

Phase 6 — Post & Export

Final Quality Check & Export

Watch at full volume on phone (your primary audience's device). Check: sync, pacing, caption readability, music balance. Export: 1080×1920 (9:16), H.264, 30fps, 10–15 Mbps.

Time estimate per video: Script (15 min) → Voice gen (10 min) → Video generation (30–45 min total) → Lip sync (20 min) → Edit and assemble (30 min) → Export (5 min). Total: ~2 hours per professional 60-second video. This drops to under 1 hour once you have a template workflow.

10

Complete Professional Tools StackEvery tool you need, what it does, and where it fits

AI Video Generation

🎬

Higgsfield AI

Primary avatar video generation, scene control, character consistency, in-built lip sync. Core tool for this guide.

Core

🖼️

Veo 3 (Google)

Alternative for high-fidelity photorealistic scene generation. Excellent for outdoor and non-avatar scenes.

Scene Gen

🚀

RunwayML Gen-3

Best for cinematic motion, camera movements, and stylized VFX sequences. Great for B-roll.

Cinematic

Voice & Lip Sync

🎙️

ElevenLabs

Best AI voices. Voice cloning. Emotion and emphasis control. API available for automation.

Top Pick

🔄

SyncLabs

Best standalone lip sync. Upload any video + audio, outputs lip-synced result.

Top Pick

Editing & Post-Production

✂️

CapCut

Free, mobile-friendly, AI auto-caption, music library, FX packs. Best for fast short-form production.

Free

🎞️

Adobe Premiere Pro

Professional editing with full color grading, LUT application, and precise audio control for higher-end production.

Pro

AI Script Writing

✍️

Claude (Anthropic)

Best for structured scripts with shot breakdowns, avatar descriptions, and Veo 3 prompts. Use the template in Step 1.

Recommended

Starter stack (free/low-cost): Claude (free tier) for scripting → Higgsfield (free tier, 3 videos/month) → ElevenLabs (free tier, 10 min/month) → SyncLabs (free tier) → CapCut (free). You can produce professional videos at near-zero cost while learning the workflow.

AI Video MakingMasterclass

HOOK — Grab Attention

SETUP — Build the Problem

SOLUTION — Core Value Delivery

CALL TO ACTION

CLOSING LINE — Mic Drop

Word Count Targets

Total Script Length

Writing Rules

Shot Mapping

Ethnicity & Age

Hair

Eyes & Face

Body Type

Expression

Consistency

Higgsfield Avatar Workflow

Upload a Reference Photo (Optional but Powerful)

Use Higgsfield's Character Builder

Set Default Pose & Expression

Generate Test Frames

Indoor vs Outdoor Clothing Adjustments

Indoor Wardrobe Notes

Outdoor Wardrobe Notes

Modern Office / Studio

Outdoor — Golden Hour

Dark Studio — Neon Accent

Cafe / Co-Working Space

City Street / Urban

Outdoor — Natural / Forest

Scene Prompt Structure (Higgsfield / Veo 3)

Camera Motion VFX

Atmospheric Effects

Transition Effects

Text Overlays & Captions

Particle & Energy FX

Color Grading

Shot 1 — Hook

Shot 2 — Setup

Shot 3 — Solution A

Shot 4 — Solution B

Shot 5 — CTA

Shot 6 — Closing Line

Voiceover Settings for Punchy Short-Form

Speed Setting

Emphasis Tags

Pause Control

Music vs Voice Balance

AI-Driven Lip Sync

Phoneme Accuracy

Frame Rate Matters

Generate Clean Voiceover Audio

Generate "Silent" Avatar Video First

Upload to Lip Sync Tool

Adjust Sync Offset

Check for Artifacts at Cut Points

Re-check Full Assembly

Script → Avatar Design → Shot Plan

Record or Generate Voiceover

Generate Each Shot in Higgsfield

Apply Lip Sync Per Clip

Edit in CapCut / Premiere

Final Quality Check & Export

AI Video Making
Masterclass