Open Source Contributions > Capski

Capski Logo
Capski ~ Audio-to-Karaoke Video Tool

Capski is a command-line tool written in Rust that transforms audio or video files into stylized karaoke-style videos with real-time subtitles. The tool supports transcription, translation, and subtitle rendering with custom styling, making it ideal for content creators and educators.

Tech Stack

  • Rust
  • whisper-rs
  • FFmpeg for video processing
  • Advanced SubStation Alpha (ASS) for subtitles

Key Features

  • Converts WAV/MP3/MP4 into karaoke-style videos
  • Whisper-based transcription with real-time timing
  • Optional English translation between multiple languages
  • Styled subtitles via JSON config
  • Burn subtitles directly into video using FFmpeg
  • Simple, ergonomic CLI with helpful flags

What I Did

  • Designed the CLI and modular Rust architecture
  • Integrated Whisper and FFmpeg
  • Implemented ASS subtitle formatting and rendering
  • Built translation and real-time word highlighting
  • Wrote engineering requirement documentation

Challenges

  • Learning Rust
  • Handling real-time word-level timestamping
  • Subtitle syncing and overlay with FFmpeg
  • Keeping the CLI intuitive for non-technical users

Demo

The project is open source and licensed under the GNU General Public License v3.0.

Capski Engineering Requirement Documentation Capski Repository