Capski ~ Audio-to-Karaoke Video Tool
Capski is a command-line tool written in Rust that transforms audio or video files into stylized karaoke-style videos with real-time subtitles. The tool supports transcription, translation, and subtitle rendering with custom styling, making it ideal for content creators and educators.
Tech Stack
- Rust
- whisper-rs
- FFmpeg for video processing
- Advanced SubStation Alpha (ASS) for subtitles
Key Features
- Converts WAV/MP3/MP4 into karaoke-style videos
- Whisper-based transcription with real-time timing
- Optional English translation between multiple languages
- Styled subtitles via JSON config
- Burn subtitles directly into video using FFmpeg
- Simple, ergonomic CLI with helpful flags
What I Did
- Designed the CLI and modular Rust architecture
- Integrated Whisper and FFmpeg
- Implemented ASS subtitle formatting and rendering
- Built translation and real-time word highlighting
- Wrote engineering requirement documentation
Challenges
- Learning Rust
- Handling real-time word-level timestamping
- Subtitle syncing and overlay with FFmpeg
- Keeping the CLI intuitive for non-technical users
Demo
The project is open source and licensed under the GNU General Public License v3.0.