Podcast Clipper Features Built for Short-Form Video Workflows
Highlight detection, word-level captions, vertical framing, selectable caption language, and a single dashboard to review every result.
Capabilities
Six pieces that replace a five-tab workflow
Each feature is automated end-to-end so you never need to leave the app for a separate transcription or cropping tool.
LLM planning
AI Q&A Clipping
Gemini 2.5 reads word-level transcripts and plans 40-60 second question-and-answer clips that keep full sentence boundaries.
- 1 to 4 clips per upload, controlled at submit time.
- Highlights are scored on conversational tension, not pure keyword density.
- Sentence boundaries respected so playback never feels abrupt.
Word-level
WhisperX Word Subtitles
WhisperX large-v2 transcribes English audio and aligns every word to precise start and end timings.
- Word JSON makes downstream recuts and syncing painless.
- Caption timing matches actual speech, not paragraph guesses.
- Foundation for English captions or Korean translation, depending on the selected run language.
Face-aware
Auto Vertical Framing
Columbia ASD face tracks steer 1080x1920 crops or blurred backgrounds, rendered with NVENC at 25 fps.
- Active speaker detection per frame so the camera follows the right person.
- Falls back to blurred backdrop when the face track is uncertain.
- Output is publish-ready for YouTube Shorts, Reels, and TikTok.
Caption language
English or Korean Captions
Each processing run uses one selected caption language. English captions are sourced from WhisperX; Korean captions come from Gemini translation.
- Anton style for English emphasis lines.
- Noto Sans KR style for Korean lines.
- Choose English or Korean before starting the run.
Signed URLs
Secure S3 Storage
Originals and clips live in a dedicated S3 bucket. The app fetches them only through AWS presigned URLs.
- Per-user prefixes keep uploads isolated.
- Presigned URLs expire in 1 hour by default.
- Cleanup routines remove abandoned drafts.
Dashboard
Dashboard Review Loop
Upload, request processing, review the clip list, play, download, and delete clips from a single view.
- Status moves from queued to processing to processed without page reloads.
- Per-clip download and delete actions.
- Recoverable upload drafts in case the tab closes mid-flow.
Manual vs automated
Where the time actually goes
Manual short-form workflows fan out into multiple tools. The AI pipeline collapses them into one upload.
| Capability | Manual workflow | AI Podcast Clipper |
|---|---|---|
| Find highlight moments | Scrub through hours of audio and timestamp by hand. | Gemini 2.5 picks 1-4 Q&A moments per upload. |
| Add word-level captions | Hand-time captions or use a generic auto-captioner. | WhisperX word timings burned into the clip automatically. |
| Convert horizontal to vertical | Manually crop and reposition every cut. | Face-aware Columbia ASD crop with blurred backdrop fallback. |
| Choose caption language | Re-cut or re-caption manually when changing language. | English or Korean captions are selected per processing run. |
Frequently asked questions
- How many clips does each run produce?
- You choose 1, 2, 3, or 4 clips per upload. The AI selects the strongest Q&A moments and produces that many vertical clips.
- Is Korean captioning the same quality as English?
- English captions come directly from WhisperX with word-level timing. Korean captions are produced by Gemini translation styled with Noto Sans KR. Both are usable for publishing, but English will track speech more tightly.
- Where are uploads and clips stored?
- All originals and generated clips live in a dedicated AWS S3 bucket under per-user prefixes. The app only ever exposes them through short-lived presigned URLs.
- What is the file size limit?
- Uploads are capped at 900 MB per .mp4. Long episodes still work, but very large files should be exported at a moderate bitrate before upload.