How many clips does each run produce?

You choose 1, 2, 3, or 4 clips per upload. The AI selects the strongest Q&A moments and produces that many vertical clips.

Is Korean captioning the same quality as English?

English captions come directly from WhisperX with word-level timing. Korean captions are produced by Gemini translation styled with Noto Sans KR. Both are usable for publishing, but English will track speech more tightly.

Where are uploads and clips stored?

All originals and generated clips live in a dedicated AWS S3 bucket under per-user prefixes. The app only ever exposes them through short-lived presigned URLs.

What is the file size limit?

Uploads are capped at 900 MB per .mp4. Long episodes still work, but very large files should be exported at a moderate bitrate before upload.

Features

Podcast Clipper Features Built for Short-Form Video Workflows

Highlight detection, word-level captions, vertical framing, selectable caption language, and a single dashboard to review every result.

Try it free See pricing

Capabilities

Six pieces that replace a five-tab workflow

Each feature is automated end-to-end so you never need to leave the app for a separate transcription or cropping tool.

LLM planning

AI Q&A Clipping

Gemini 2.5 reads word-level transcripts and plans 40-60 second question-and-answer clips that keep full sentence boundaries.

1 to 4 clips per upload, controlled at submit time.
Highlights are scored on conversational tension, not pure keyword density.
Sentence boundaries respected so playback never feels abrupt.

Word-level

WhisperX Word Subtitles

WhisperX large-v2 transcribes English audio and aligns every word to precise start and end timings.

Word JSON makes downstream recuts and syncing painless.
Caption timing matches actual speech, not paragraph guesses.
Foundation for English captions or Korean translation, depending on the selected run language.

Face-aware

Auto Vertical Framing

Columbia ASD face tracks steer 1080x1920 crops or blurred backgrounds, rendered with NVENC at 25 fps.

Active speaker detection per frame so the camera follows the right person.
Falls back to blurred backdrop when the face track is uncertain.
Output is publish-ready for YouTube Shorts, Reels, and TikTok.

Caption language

English or Korean Captions

Each processing run uses one selected caption language. English captions are sourced from WhisperX; Korean captions come from Gemini translation.

Anton style for English emphasis lines.
Noto Sans KR style for Korean lines.
Choose English or Korean before starting the run.

Signed URLs

Secure S3 Storage

Originals and clips live in a dedicated S3 bucket. The app fetches them only through AWS presigned URLs.

Per-user prefixes keep uploads isolated.
Presigned URLs expire in 1 hour by default.
Cleanup routines remove abandoned drafts.

Dashboard

Dashboard Review Loop

Upload, request processing, review the clip list, play, download, and delete clips from a single view.

Status moves from queued to processing to processed without page reloads.
Per-clip download and delete actions.
Recoverable upload drafts in case the tab closes mid-flow.

Manual vs automated

Where the time actually goes

Manual short-form workflows fan out into multiple tools. The AI pipeline collapses them into one upload.

Capability	Manual workflow	AI Podcast Clipper
Find highlight moments	Scrub through hours of audio and timestamp by hand.	Gemini 2.5 picks 1-4 Q&A moments per upload.
Add word-level captions	Hand-time captions or use a generic auto-captioner.	WhisperX word timings burned into the clip automatically.
Convert horizontal to vertical	Manually crop and reposition every cut.	Face-aware Columbia ASD crop with blurred backdrop fallback.
Choose caption language	Re-cut or re-caption manually when changing language.	English or Korean captions are selected per processing run.

Pipeline

Want the processing details?

The how-it-works page explains upload storage, transcription, highlight selection, active speaker framing, captions, rendering, and dashboard review.

Read how it works

Frequently asked questions

How many clips does each run produce?: You choose 1, 2, 3, or 4 clips per upload. The AI selects the strongest Q&A moments and produces that many vertical clips.
Is Korean captioning the same quality as English?: English captions come directly from WhisperX with word-level timing. Korean captions are produced by Gemini translation styled with Noto Sans KR. Both are usable for publishing, but English will track speech more tightly.
Where are uploads and clips stored?: All originals and generated clips live in a dedicated AWS S3 bucket under per-user prefixes. The app only ever exposes them through short-lived presigned URLs.
What is the file size limit?: Uploads are capped at 900 MB per .mp4. Long episodes still work, but very large files should be exported at a moderate bitrate before upload.