Why Create Videos with AI Instead of Recording
Professional video production requires: recording equipment, good lighting, an appropriate space, editing time, and the willingness to appear in front of a camera. AI eliminates all of those barriers.
With the right workflow, you can produce videos that look professionally recorded without a camera, without a studio, and — in many cases — without significant cost.
This tutorial covers the complete four-step workflow: script → voice → avatar/video → editing → publishing.
The Workflow Tools
Before starting, these are the four tools we will use:
| Tool | Function | Free Plan |
|---|---|---|
| ChatGPT / Claude | Create the script | Yes |
| ElevenLabs | Generate the voice | Yes (10K chars/month) |
| HeyGen | AI talking avatar | Yes (1 min/month) |
| CapCut AI | Editing and captions | Yes |
Free alternatives for each step:
- Script: any free LLM (Gemini, Claude free)
- Voice: ElevenLabs free or Murf.ai free
- Video/Avatar: Kling (for video without avatar) or D-ID (avatar)
- Editing: DaVinci Resolve or CapCut
Step 1: The Script
The script is the foundation of everything. An AI video with a good script outperforms any recorded video with a poor one.
Prompt to create the script with AI:
Act as an educational video scriptwriter. Create a [DURATION] script for a video about [TOPIC].
Format:
- Hook in the first 5 seconds
- Problem the video solves
- Main content (key points with transitions)
- Final call to action
Tone: [casual/professional/technical]
Audience: [audience description]
Rules for a good AI video script:
- Short sentences. The avatar speaks better with sentences of 15-20 words maximum.
- Avoid words that are difficult for AI to pronounce (unusual acronyms, complex proper names).
- Add pause indicators if the text is very dense: [PAUSE]
- For 1 minute of video: 130-150 words in English.
Step 2: The Voice (ElevenLabs)
ElevenLabs generates voices that sound human. Its free plan covers 10,000 characters per month, enough for 5-7 short videos.
Process in ElevenLabs:
- Create account at elevenlabs.io (free)
- Go to "Text to Speech"
- Choose a voice from the catalog (or create one by cloning your voice with 1 minute of audio)
- Paste your script
- Adjust: speed (0.9-1.0 for explainer videos), stability (70-80%) and clarity (75-85%)
- Generate and download the MP3
Tip: try several voices before committing. The voices "Rachel" and "Adam" in English have excellent quality and work well for most content types.
If you need a specific accent or tone, the "Voice Lab" section allows you to fine-tune existing voices or browse voices submitted by the community.
Step 3: The Avatar or Animated Images
You have two options depending on whether you want a presenter or dynamic visual content:
Option A: Talking Avatar (HeyGen)
Best for: tutorials, online courses, corporate presentations, educational content.
- Create account on HeyGen (1 min free/month)
- New video → select stock avatar
- Instead of typing text, upload the audio from ElevenLabs
- Adjust the avatar to sync with the audio
- Choose background (solid, transparent, or image)
- Generate the video (takes 2-5 minutes)
Trick: if you need more time than the free plan allows, use the ElevenLabs audio directly over static or animated images — the result is equally professional for many formats.
Option B: Animated Images (Runway or Kling)
Best for: product videos, B-roll for YouTube, visual content without a presenter.
- Create images with Midjourney, DALL-E, or Flux
- Upload the image to Runway or Kling
- Describe the movement you want ("gentle zoom toward center", "leaves moving in the wind")
- Generate the 4-10 second clip
- Combine multiple clips in editing
With Kling you can do this for free using daily credits.
Step 4: Editing with CapCut AI
CapCut has a free web and mobile version with AI features that significantly accelerate editing.
Editing workflow:
- Import: upload the avatar video + any additional clips
- Auto captions: CapCut generates captions with AI in seconds. Review and correct errors (there are few).
- Background music: use CapCut's free library or import your audio. Adjust volume to -20dB so it does not drown out the voice.
- Transitions and effects: keep transitions simple (fade, cut). Elaborate effects are distracting.
- Export format: for YouTube: 1080p MP4. For Instagram/TikTok: 1080x1920 vertical.
Particularly useful CapCut AI features:
- Automatic silence removal (saves up to 30% of editing time)
- AI color adjustment (one click for professional look)
- AI thumbnail generation
Step 5: Publishing
The final step is configuring the video correctly before publishing.
For YouTube:
- Title: include the main keyword in the first 60 characters
- Description: first 2 lines are critical (appear in search without expanding)
- Custom thumbnail: generate one with CapCut or Canva
- Chapters: add timestamps if the video is longer than 5 minutes
- AI disclosure: enable the option if the video includes an AI avatar
For Instagram/TikTok:
- First 3 seconds are the hook — make sure they are impactful
- Visible captions (CapCut generates them automatically)
- Relevant hashtags (5-10 on Instagram, 3-5 on TikTok)
Complete Workflow Summary
[Script with ChatGPT/Claude]
↓
[Voice with ElevenLabs]
↓
[Avatar with HeyGen / Animated images with Kling]
↓
[Editing + captions with CapCut AI]
↓
[Publishing on YouTube/Social Media]
Estimated total time:
- First video: 90-120 minutes (while learning the tools)
- Subsequent videos: 30-45 minutes
Paid Tools Worth Considering
If you produce more than 4 videos per month, consider:
- ElevenLabs Starter ($5/month): 30,000 characters + voice cloning + no watermark
- HeyGen Essential ($29/month): 15 minutes/month + all avatars + no watermark
- CapCut Pro ($9.99/month): watermark-free export + advanced AI features
The ROI is clear if you avoid hiring a videographer or voice actor for each video.
Ir a la herramienta Ir a la herramienta