🎬 Complete Professional Guide

AI Video Making
Masterclass

From script to final render β€” Higgsfield, avatar design, scene setup, VFX, lip sync, voiceover matching, and every pro technique in one guide.

10
Core Modules
50+
Pro Tips
8
Prompt Templates
100%
AI-Powered
01
Script Writing for AI VideoFoundation of every great AI video β€” structured, timed, avatar-ready
Script Structure (60-second format)
0:00 – 0:08
HOOK β€” Grab Attention

Bold question, surprising stat, or relatable pain point. 2–3 punchy sentences. Under 30 words. This makes or breaks your video.

0:08 – 0:20
SETUP β€” Build the Problem

Identify the core tension. Create curiosity. Make the viewer feel understood. Never rush past this β€” it sells the solution.

0:20 – 0:45
SOLUTION β€” Core Value Delivery

4–6 sentences. Specific tools, techniques, stats, or mindset shifts. Each sentence earns its place β€” cut anything vague.

0:45 – 0:55
CALL TO ACTION

One or two direct instructions. "Follow for Part 2." "Save this." "Try this today." Simple, specific, actionable.

0:55 – 1:00
CLOSING LINE β€” Mic Drop

8–12 words. Quotable. Emotionally resonant. The line viewers screenshot and share. Make it memorable.

Word Count Targets

πŸ“Š

Total Script Length

130–150 words for a 60-second video. At natural speaking pace (~150 wpm), this lands perfectly without rushing or dead air.

~150 wpm pace
✍️

Writing Rules

Second person ("you", "your"). Active voice. Short sentences. No filler words. Specific beats vague every time.

Conversational tone
🎯

Shot Mapping

Mark each sentence with the shot it belongs to. Helps you sync visuals to audio and prevents over-cutting or under-cutting.

Pre-production step
πŸ“‹ Script Prompt Template for Claude/ChatGPT
Write a 60-second video script about [YOUR TOPIC].
Structure: Hook (0–8s) β†’ Setup (8–20s) β†’ Solution (20–45s) β†’ CTA (45–55s) β†’ Closing Line (55–60s).
Tone: energetic, second person, no filler.
Target: 130–150 words. End with a quotable closing line.
Also provide: avatar gender, outfit, setting, and shot descriptions for each section.
Pro tip: Always write your script BEFORE designing your avatar or scenes. The script dictates the emotion, energy, and environments needed β€” don't design in a vacuum.
02
Avatar Design in HiggsfieldCreating a consistent, believable AI presenter
πŸ‘€
Ethnicity & Age

Specify exact range (e.g., "South Asian woman, late 20s"). Vague = inconsistent results.

πŸ’‡
Hair

Color, length, style. "Shoulder-length black hair, slight wave" beats just "dark hair".

πŸ‘οΈ
Eyes & Face

Eye color, face shape, defining features. More detail = more consistent renders.

πŸ“
Body Type

Build, height cues. Avoid extremes. "Athletic medium build" works well in Higgsfield.

😊
Expression

Match energy to section: expressive for hook, focused for solution, warm for CTA.

πŸ”
Consistency

Copy-paste the EXACT same avatar description into every shot. Never vary it.

Higgsfield Avatar Workflow

1
Upload a Reference Photo (Optional but Powerful)

Higgsfield's "Custom Avatar" mode lets you upload a real face photo. Use a clean, well-lit front-facing shot. This massively improves consistency across shots.

2
Use Higgsfield's Character Builder

Navigate to "Create Avatar" β†’ fill in the character card: name, appearance, personality. Higgsfield stores this as a reusable template β€” use it for every scene in your video.

3
Set Default Pose & Expression

Under Avatar Settings, define a "neutral pose" β€” standing, slight 3/4 turn toward camera, natural arm position. This serves as your baseline for all non-action shots.

4
Generate Test Frames

Before recording full scenes, generate 5–8 still frames of the avatar in different expressions. Check face consistency. If anything drifts, tighten the description.

Higgsfield tip: Use the "Lock Character" feature after you're happy with your avatar. This prevents Higgsfield from drifting the look across scenes when you change prompts.
03
Avatar Wardrobe & ClothingOutfit design rules for topic-matched, scene-consistent clothing
Outfit-to-Topic Mapping
Topic Category Recommended Outfit Colors to Use
AI & Productivity Smart casual β€” fitted shirt, clean chinos, minimal jewelry White, navy, cream, slate
Health & Wellness Activewear β€” fitted top, leggings/joggers, light layers Soft pastels, earthy tones, athletic grays
Career & Business Business casual β€” blazer, neat top, subtle accessories Charcoal, white, navy, tan
Tech & Gadgets Streetwear-tech β€” hoodie, joggers, visible wireless earbuds Dark neutrals, pops of accent color
Daily Lifestyle Comfortable lifestyle β€” cozy top, relaxed jeans, put-together Warm neutrals, muted tones
How to Describe Clothing in Higgsfield Prompts
❌ Vague (avoid this)
wearing casual clothes, nice outfit, professional attire
βœ… Specific (use this)
white linen short-sleeve shirt, fitted navy chinos, clean white low-top sneakers, minimal silver watch on left wrist, no other jewelry, shirt untucked, neat and fresh

Indoor vs Outdoor Clothing Adjustments

🏠

Indoor Wardrobe Notes

Avoid shiny fabrics (creates render artifacts under studio lights). Matte textures β€” linen, cotton, knit β€” render most cleanly. Add subtle layering (open overshirt) for visual depth.

🌳

Outdoor Wardrobe Notes

Lighter fabrics that move slightly look more realistic outdoors. Add sunglasses, cap, or jacket depending on time of day. Wind effects work better with looser garments.

Common mistake: Changing outfit between shots of the same video. Always copy-paste your full clothing description into every shot prompt β€” even if you're just changing the scene, keep the outfit identical.
04
Indoor & Outdoor Scene DesignBuilding cinematic backgrounds that support (not distract from) your avatar
🏒

Modern Office / Studio

White walls, minimal shelving, soft plant. Large window with diffused daylight. Blurred mid-range depth.

Productivity Business
🌿

Outdoor β€” Golden Hour

Park, rooftop, or street at sunset. Warm directional light. Slight bokeh on background. Subject backlit with rim light.

Lifestyle Wellness
πŸ’‘

Dark Studio β€” Neon Accent

Near-black background. Single accent color light source (purple, teal, or amber). High contrast silhouette. Dramatic and modern.

Tech AI content
β˜•

Cafe / Co-Working Space

Warm ambient light, coffee cups, wooden surfaces, blurred people in background. Creates relatable, approachable energy.

Lifestyle Daily habits
πŸ™οΈ

City Street / Urban

Busy street, soft focus background pedestrians, directional sunlight or overcast diffusion. Great for establishing shots.

Energy Travel
πŸŒ…

Outdoor β€” Natural / Forest

Morning mist, dappled light through trees, natural greens. Creates calm, authentic atmosphere for wellness or mindfulness topics.

Wellness Mindfulness

Scene Prompt Structure (Higgsfield / Veo 3)

SCENE PROMPT
[Avatar appearance + outfit] [action / pose / gesture].
Scene: [Location] + [time of day] + [atmosphere / mood].
Camera: [Shot type: close-up / medium / wide] + [movement: push in / pan / static].
Lighting: [Light source, quality, color temperature].
Background: [Depth, elements, blur level].
End tags: cinematic, 9:16 vertical, photorealistic, high detail
Scene depth rule: Always specify 3 layers β€” foreground (avatar zone), midground (immediate environment), background (blurred setting). This creates genuine cinematic depth that separates your content from flat AI video.
05
VFX & Visual EffectsAdding motion, transitions, text effects, and atmospheric elements
VFX Types Available in AI Video Tools
🌊

Camera Motion VFX

Slow push-in, orbit, crane up, rack focus. These are the most cinematic and render cleanly in Higgsfield. Always specify direction and speed.

In-prompt control
⚑

Atmospheric Effects

Light rays, lens flare, bokeh, dust particles, fog, heat haze. Layer 1–2 max. More than that overwhelms the avatar. Specify intensity (subtle/strong).

In-prompt
βœ‚οΈ

Transition Effects

Cross-dissolve (universal), whip pan (energy), zoom transition (modern). Handled in CapCut or Premiere after AI generation. Not in Higgsfield prompts.

Post-production
πŸ“

Text Overlays & Captions

Auto-caption in CapCut. Keyword callouts (highlight key stat/word on screen). Lower-third name plates. Use high-contrast, readable fonts β€” Helvetica or Inter.

Post-production
🌟

Particle & Energy FX

Digital particles, glowing orbs, energy trails. Good for tech/AI topics. Add via CapCut FX pack or RunwayML's VFX layers after generating base footage.

Layered in CapCut/Runway
🎞️

Color Grading

Apply a LUT in post for mood consistency. Teal-and-orange for cinematic warmth. Desaturated gray for serious/corporate. Warm golden for lifestyle topics.

Post-production LUT
VFX Prompt Examples for Higgsfield
Camera VFX
...slow cinematic push-in toward avatar's face, subtle lens flare as sun catches the edge of frame, very slight depth-of-field shift from background to foreground...
Atmospheric VFX
soft volumetric light rays through window, fine floating dust particles in mid-air, warm morning haze on the street behind subject...
06
Duration Control & Shot PlanningMatching visual timing to your voiceover β€” the core editing skill
Shot Duration Rules
Duration Shot Type Best For
6 seconds Fast cut Hook opener, closing line, reaction beat, transition moment
8 seconds Standard Setup explanation, CTA delivery, mid-video transitions
10 seconds Extended Solution demo, complex gesture/action, key insight delivery
12–15 seconds Slow burn Emotional moments, closing sequences, dramatic reveals
60-Second Video Shot Map (Example)
6s
Shot 1 β€” Hook

Fast cut. Close-up. Avatar directly addresses camera with bold opening question. High energy expression.

8s
Shot 2 β€” Setup

Medium shot. Slight camera push-in. Avatar builds the problem with gestures. Start of emotional connection.

10s
Shot 3 β€” Solution A

Extended. Avatar demonstrates or explains first key point. Camera at mid-distance, static or slow orbit.

10s
Shot 4 β€” Solution B

Extended. Cut to complementary scene (different environment or angle). Avatar delivers second key point.

8s
Shot 5 β€” CTA

Standard. Avatar direct-to-camera. Warm, confident. Specific call to action. Slight zoom-out feeling of openness.

6s
Shot 6 β€” Closing Line

Fast cut. Close-up again. Avatar holds eye contact. Delivers mic-drop closing line. Static camera β€” let the words land.

Total Shot Time 6+8+10+10+8+6 = 48s

Add 2 Γ— 6s B-roll cutaways to reach 60s total, or extend Solution shots to 12s for a more relaxed pace.

Duration control in Higgsfield: Use the "Video Length" slider when generating each clip. Generate each shot separately at its target duration, then assemble in CapCut or Premiere. Do not generate the full 60 seconds in one shot β€” quality drops significantly.
07
AI Voiceover GenerationCreating professional narration that matches your script and avatar energy
Top AI Voiceover Tools
πŸŽ™οΈ
ElevenLabs
Most realistic AI voices. Emotion control. Clone your own voice in 30 seconds. Multilingual.
Best Quality
πŸ”Š
PlayHT
Huge voice library, fast generation, good for batch production. Affordable pricing tiers.
Good Value
πŸ“±
CapCut AI Voice
Built directly into the editing workflow. Fast, free tier available. Lower quality but super convenient.
Most Convenient
🎀
Murf AI
Studio-quality voices with pitch/speed/emphasis control. Good for corporate and explainer content.
Studio Grade

Voiceover Settings for Punchy Short-Form

⚑

Speed Setting

110–115% of normal speed. Short-form audiences expect a slightly faster pace. Too slow = boring. Too fast = hard to follow.

πŸ“Š

Emphasis Tags

Use <emphasis> tags in ElevenLabs to stress key words. Matches natural speech rhythm. Never let the AI emphasize randomly.

⏸️

Pause Control

Add deliberate 0.3–0.5s pauses after hook question and before closing line. Let powerful statements breathe.

🎡

Music vs Voice Balance

Background music at –18dB to –22dB when voice is present. Voice at 0dB reference. Duck the music by –6dB more at hook and closing line.

Pro workflow: Generate voiceover FIRST, then time your video shots to the audio waveform β€” not the other way around. This is how professional video editors work. Audio is the spine; video is the body.
08
Lip Sync & Voice MatchingMaking AI avatars speak with believable, matched mouth movement
How Lip Sync Works in AI Video
🧠

AI-Driven Lip Sync

Tools like Higgsfield, D-ID, and HeyGen analyze your audio file and generate matching mouth movements frame-by-frame using phoneme mapping.

πŸ”‘

Phoneme Accuracy

Professional lip sync works at the phoneme level β€” individual mouth sounds like "m", "f", "oh". The more phoneme data available, the more realistic the result.

βš™οΈ

Frame Rate Matters

Generate avatar video at 24 or 30fps. Lip sync tools are calibrated to standard frame rates. 60fps can cause interpolation artifacts in mouth movement.

Step-by-Step Lip Sync Workflow
1
Generate Clean Voiceover Audio

Export from ElevenLabs/Murf as WAV (44.1kHz, 16-bit). Avoid heavy compression or EQ at this stage β€” lip sync engines work better with clean audio.

2
Generate "Silent" Avatar Video First

In Higgsfield, generate your avatar clips with natural expression but NO audio dependency. You'll add speech movement in the next step.

3
Upload to Lip Sync Tool

Use Higgsfield's built-in Lip Sync (best for Higgsfield avatars), HeyGen Lip Sync, or SyncLabs. Upload: (a) avatar video clip, (b) voiceover audio. The tool maps speech to mouth.

4
Adjust Sync Offset

Most tools have a +/–frame offset control. Set to 0 first, preview, then nudge Β±1–2 frames if mouth is slightly ahead or behind the audio. This is the most critical quality step.

5
Check for Artifacts at Cut Points

Lip sync can create mouth-shape errors at the first and last frame of each clip. Add a 2-frame crossfade transition between clips in your editor to mask these.

6
Re-check Full Assembly

After assembling all clips in CapCut/Premiere, watch the full video with headphones. Listen for any audio–visual desync, especially after transitions.

Lip sync tip: If the avatar's mouth looks slightly off (common), don't re-render the whole video. Instead, add a very brief cutaway (B-roll, text overlay, or cut to close-up of hands/environment) to cover the desync point. This is what professional editors do.
πŸ”„
SyncLabs (sync.so)
Best standalone lip sync tool. Upload any video + any audio. Outputs with matched lip movement. Free tier available.
Recommended
πŸ€–
HeyGen Lip Sync
Excellent for talking head clips. Integrates with their avatar studio. Good multilingual support.
Great for Multi-lang
09
Full Production WorkflowThe complete end-to-end process from idea to published video
Phase 1 β€” Pre-Production
Script β†’ Avatar Design β†’ Shot Plan

Write script with timing markers. Design avatar description card. Create shot-by-shot breakdown with duration targets. Map voiceover sentences to each shot.

Phase 2 β€” Voice Generation
Record or Generate Voiceover

Generate in ElevenLabs with correct speed, emphasis, and pauses. Export as clean WAV. Listen back fully β€” fix any mis-pronunciations before moving forward.

Phase 3 β€” Video Generation
Generate Each Shot in Higgsfield

Generate one shot at a time using your prompt templates. Save each clip with a clear filename (shot01_hook_6s.mp4). Generate 2–3 variations per shot and pick the best.

Phase 4 β€” Lip Sync
Apply Lip Sync Per Clip

Upload each clip to SyncLabs/Higgsfield with its corresponding voiceover segment. Check sync offset. Export synced clips.

Phase 5 β€” Assembly
Edit in CapCut / Premiere

Assemble synced clips in order. Add transitions (crossfade 2–3 frames). Layer background music at –20dB. Add text overlays and captions. Apply color LUT.

Phase 6 β€” Post & Export
Final Quality Check & Export

Watch at full volume on phone (your primary audience's device). Check: sync, pacing, caption readability, music balance. Export: 1080Γ—1920 (9:16), H.264, 30fps, 10–15 Mbps.

Time estimate per video: Script (15 min) β†’ Voice gen (10 min) β†’ Video generation (30–45 min total) β†’ Lip sync (20 min) β†’ Edit and assemble (30 min) β†’ Export (5 min). Total: ~2 hours per professional 60-second video. This drops to under 1 hour once you have a template workflow.
10
Complete Professional Tools StackEvery tool you need, what it does, and where it fits
AI Video Generation
🎬
Higgsfield AI
Primary avatar video generation, scene control, character consistency, in-built lip sync. Core tool for this guide.
Core
πŸ–ΌοΈ
Veo 3 (Google)
Alternative for high-fidelity photorealistic scene generation. Excellent for outdoor and non-avatar scenes.
Scene Gen
πŸš€
RunwayML Gen-3
Best for cinematic motion, camera movements, and stylized VFX sequences. Great for B-roll.
Cinematic
Voice & Lip Sync
πŸŽ™οΈ
ElevenLabs
Best AI voices. Voice cloning. Emotion and emphasis control. API available for automation.
Top Pick
πŸ”„
SyncLabs
Best standalone lip sync. Upload any video + audio, outputs lip-synced result.
Top Pick
Editing & Post-Production
βœ‚οΈ
CapCut
Free, mobile-friendly, AI auto-caption, music library, FX packs. Best for fast short-form production.
Free
🎞️
Adobe Premiere Pro
Professional editing with full color grading, LUT application, and precise audio control for higher-end production.
Pro
AI Script Writing
✍️
Claude (Anthropic)
Best for structured scripts with shot breakdowns, avatar descriptions, and Veo 3 prompts. Use the template in Step 1.
Recommended
Starter stack (free/low-cost): Claude (free tier) for scripting β†’ Higgsfield (free tier, 3 videos/month) β†’ ElevenLabs (free tier, 10 min/month) β†’ SyncLabs (free tier) β†’ CapCut (free). You can produce professional videos at near-zero cost while learning the workflow.