How AI Podcast Clipper Turns Podcasts Into Shorts
The pipeline combines upload storage, transcription, AI highlight selection, active speaker framing, captions, rendering, and dashboard review.
The processing flow
Each stage has a narrow job. That makes the output easier to understand, inspect, and improve.
- 1
Upload a podcast .mp4
The creator uploads an .mp4 file up to 900 MB. The upload is stored under a per-user S3 prefix so it can be processed without becoming public.
- 2
Transcribe the conversation
WhisperX creates word-level timing data. The timing is used later to keep captions aligned with the rendered clip.
- 3
Select highlight candidates
Gemini evaluates the transcript for self-contained Q&A or discussion moments that can survive as short-form clips.
- 4
Frame the active speaker
Active speaker detection and face-aware framing guide the 1080x1920 vertical crop or background treatment.
- 5
Burn in captions
English captions use transcript timing. Korean captions are generated for the selected run and rendered into the video.
- 6
Review and download
Generated clips appear in the dashboard so the creator can play, download, keep, delete, or rerun the result.
When results tend to work best
The app is strongest when the source episode contains clear, self-contained conversation moments.
Accepts podcast-style .mp4 files up to 900 MB per upload.
Scores transcript segments and selects 1-4 clips per upload based on the requested run settings.
Supports English or Korean captions selected per processing run.
Exports 1080x1920 .mp4 clips for YouTube Shorts, Instagram Reels, and TikTok review workflows.
Generated clips are played and downloaded from the authenticated dashboard through signed URLs.
Frequently asked questions
- Does AI Podcast Clipper publish clips automatically?
- No. The app generates clips for review and download. Creators decide what to publish and where to publish it.
- What kind of source material works best?
- Conversation-heavy podcast footage with clear speakers, usable audio, and self-contained discussion moments works best.
- Can I choose the caption language?
- Yes. The processing run can use English or Korean captions depending on the selected setting.