Krunkó - Automatic Karaoke Generator
Find a file
2026-05-02 18:59:53 +00:00
docs Add example to readme 2026-05-02 13:31:46 +00:00
fonts More visuals 2026-05-01 00:00:07 +00:00
src Bugfix 2026-05-02 18:59:53 +00:00
.gitignore Youtube upload 2026-05-02 18:49:51 +00:00
krunko_cli.py Initial prompt now not on by default 2026-05-02 09:34:40 +00:00
LICENSE.md Add LICENSE file 2026-05-01 23:55:01 +00:00
README.md Youtube upload 2026-05-02 18:49:51 +00:00
requirements.txt Youtube upload 2026-05-02 18:49:51 +00:00
requirements_yt_upload.txt Youtube upload 2026-05-02 18:49:51 +00:00
ruff.toml Refactoring code 2026-05-01 23:42:53 +00:00
youtube_upload.py Youtube upload 2026-05-02 18:49:51 +00:00

Krunkó - Automatic Karaoke Generator

Krunkó turns any YouTube video into a karaoke video, automatically! Give it a link to a song and it downloads the audio, separates the vocals from the backing track, transcribes the lyrics with timing information, corrects them using an AI model, and renders a video with synchronized on-screen lyrics. It even adds a little bouncing ball to help you follow along.

The whole pipeline can run locally.

Works well with Icelandic songs! (and many other languages too)

Example output

Example output video generated by Krunkó, using the song "Yfir til þín" by Spaugstofan:

Example output

Pipeline

  • Download audio from YouTube
  • Extract stems (bass, drums, other, vocals) from the audio
  • Transcribe the vocals using OpenAI's Whisper model
  • Automatically search for the correct lyrics online if a reference is not provided
  • Correct the transcription with AI
  • Combine the non-vocal audio (bass/drums/other) into a single instrumental track
  • Generate a karaoke video with the corrected lyrics

Installation

First install yt-dlp, and make sure it is available in your system's PATH.

Then install the required dependencies:

pip install -r requirements.txt

Setup and configuration

AI model for lyric correction

For the best results, use an LLM to fix the transcription errors from the Whisper model. Use --correction-model to specify the model to use for correction. The default is openai:gpt-5.5, for that to work supply a OpenAI API key as an environment variable before running the CLI tool:

export OPENAI_API_KEY="your_api_key_here"

But you can use other models, even local models, anything supported by pydantic-ai, see the pydantic-ai documentation for more details.

The AI lyric correction step significantly improves the quality of the transcribed lyrics. The tool still works without this step, but the transcription may contain more errors. Especially for non-English songs.

Usage

Minimal example

Pass a YouTube URL and Krunkó handles the rest:

python krunko_cli.py https://www.youtube.com/watch?v=tqYoJB5WYRY

It is strongly recommended to always specify --lang with the song's language code. Whisper's language auto-detection can fail or pick the wrong language, leading to garbled transcriptions. Even a correct guess is slower than an explicit hint:

python krunko_cli.py https://www.youtube.com/watch?v=tqYoJB5WYRY --lang is

Customization options

Here is an example using several common parameters:

python krunko_cli.py https://www.youtube.com/watch?v=tqYoJB5WYRY \
  --lang is \
  --title-text "Heyr mína bæn" \
  --end-time 185 \
  --font ./fonts/DejaVuSans-Bold.ttf \
  --show-vocal-waveform \
  --no-bouncing-ball \
  • --title-text — Override the song title shown on the opening screen and used as the default output filename. Useful when the YouTube title contains extra noise like channel names or upload dates.
  • --start-time / --end-time — Crop the source audio to a specific time window (in seconds). Handy for skipping intros, outros, or focusing on a single section.
  • --font — Path to a TTF font file to use for the rendered lyrics.
  • --show-vocal-waveform — Display the vocal waveform below the progress bar for an extra visual aid.
  • --no-bouncing-ball — Disable the bouncing ball if you prefer a simpler karaoke style.

There are many more parameters available. Run the following to see all of them:

python krunko_cli.py --help

Improving results

If the initial output has lyric errors, the following options can help:

  • --lang <code> — Always the first thing to try. Supplying the correct ISO 639-1 language code (e.g. is for Icelandic, de for German) prevents Whisper from misidentifying the language and avoids a significant class of transcription errors.

  • --lyrics-reference <file> — If the automatic lyric search did not find the correct lyrics, or if you have a better source, point Krunkó at a plain-text file containing the correct lyrics. The AI correction step uses this as a reference to align the transcription, which greatly improves word accuracy and spelling.

    python krunko_cli.py https://www.youtube.com/watch?v=yvpEcXBcrY8 \
      --lang is \
      --lyrics-reference lyrics/yfirtilthin.txt
    
  • --auto-whisper-initial-prompt — Automatically constructs an initial prompt for the Whisper model based on the video title and passes it in before transcription begins. This can nudge Whisper toward the correct vocabulary and style for the song, which sometimes reduces hallucinations or fixes recurring word errors. Worth trying when transcription quality is inconsistent.

    python krunko_cli.py https://www.youtube.com/watch?v=yvpEcXBcrY8 \
      --lang is \
      --auto-whisper-initial-prompt
    

Manual lyric correction

If all else fails, you can manually edit the transcriptions.

Simple

Copy and edit the lyrics.txt file generated in the output directory. This is a plain text file containing the transcribed lyrics without timing information. Then use that as the reference for correction in the next run, using the --lyrics-reference parameter.

This might fix spelling errors and improve the overall quality of the lyrics, but it won't fix timing issues or misaligned words, or hallucinated words and sentences.

Advanced

JSON transcription files are stored in the cache/ directory after the first run. They contain text and timing information. Find the corresponding .json file for the song, edit it, and then run the CLI tool again with the same URL and it will use your corrected transcription.

Upload to YouTube

There is also a separate script for uploading the generated karaoke video to YouTube, youtube_upload.py. But I can't be bothered to write documentation for that.

Acknowledgements

This project builds on a set of free and open source tools:

  • yt-dlp — downloading audio from YouTube.
  • demucs — source separation for extracting vocals, bass, drums, and other stems from the downloaded audio.
  • Whisper — automatic speech recognition used to transcribe the vocal stem into timing-aware text.
  • faster-whisper — a faster Whisper implementation that can speed up transcription on local GPUs and CPUs.
  • pydantic-ai — AI model integration for correcting and formatting the raw transcription into polished lyrics.
  • pydub — audio processing and manipulation for combining stems and preparing the instrumental track.
  • moviepy — video generation for rendering the karaoke visuals, text, and final output video.
  • DuckDuckGo Search API — used for automatic lyric search when a reference is not provided.
  • BeautifulSoup — web scraping for automatic lyric search when a reference is not provided.