Making Your Own Karaoke with Free Software

Karaoke is a big deal for me – as an anxious introvert, it’s helped me to break out of my shell when I moved to a new city, and gave me an excuse to sing my favorite songs in front of an audience. The act of singing together with a community of friends and strangers alike can feel convivial – we get to share our love of music together in a space. It’s wonderful.

As a budding amateur musician and unabashed neckbeard, I’ve become increasingly curious about how karaoke songs are made, and whether it might be possible to turn some of my own music into something people could sing along to. Granted, it feels a bit awkward to assume that my music is so good that other people might want to sing it, but I think there’s something to the idea of celebrating the amateur music made by friends in this way.

It turns out, it’s actually really easy to make custom karaoke from scratch!

Karaoke Software

There’s actually a surprising amount of libre karaoke software out there already. I haven’t explored them all yet, but here’s what I’ve managed to find:

  • Karaoke Mugen – An anime-themed karaoke platform that offers a client and a server, and is built with React.offers a great guide
  • OpenKJ – A Qt-based app that wouldn’t look out of place in a KDE setup. This one’s notable because the project also has an Open KJSongbook project, which allows you to integrate song catalogs from the leading karaoke vendor platforms like KaraFun and Sound Choice. The integrated database also makes it possible to lease specific tracks through those vendors.
  • Vocaluxe – An “Open Source game inspired by Sony’s SingStar”
  • UltraStar Deluxe – A sibling project to Vocaluxe, but with a slightly different interface.
  • Performous – A wild hybrid that’s more like StepMania meets Rock Band. Yes, it counts as karaoke because of the singing portion, but it encompasses far more than just singing.

Out of these selections, I ended up going with Karaoke Mugen for two reasons: From a KJ perspective, it’s the most polished, and it seems to be the most approachable. It also appears to have a fairly active community.

Karaoke Mugen’s interface consists of a navigation tool split up into several panes. Each one opens up a distinct management interface, ranging from a jukebox with a playlist, to a public interface for requesting songs, to a robust backend for managing tracks and data repositories.

One thing that makes Karaoke Mugen super useful is that it also offers a Public interface, meaning that you can point people to an address, and they can see information about the current song, make a request, and see where they are in the singer queue.

Karaoke Mugen also comes with a built-in video player, which is a nice little wrapper around MPV. This is used for syncing a video file with the corresponding lyrics.

Making Karaoke

Karaoke Mugen offers a great guide for creating a karaoke that goes super in-depth with what needs to be done. I’ll do my best to summarize what I did for my own song with screenshots and explanations, but the guide is really, really good.

Creating the lyrics

Lyrics are a good starting point for this process, but also the most tedious and time-consuming. The karaoke platform we’re using basically relies on subtitles to make the lyrics, which need some notation on timing and emphasis of lyrics and syllables.

For this, we’ll use the venerable Aegisub, which is primarily known for its use in providing subtitles for anime. Aegisub uses the Advanced Substation Alpha Formatwhich has its own notation markup for karaoke. Yes, it’s called ASS, and yes, I also giggled a bit.

Thanks to this notation, we can do a wide variety of things like add bold or italics and line breaks, but we can also alter how timing is represented, along with how text is shown on the screen.

The process here is fairly straightforward – basically, you should have two separate audio files: one with a singing track, and one without. For now, we’re going to load in the version with singing, and transcribe the lyrics manually.

To do this, we have to play the song, and listen very carefully for a given line. When the video player arrives at a given line, you’ll need to pause, then right-click the timeline, and choose “Insert at video time (after)”

When you left-click on a cell in the timeline, the spectrograph on the right side of the app jumps to a highlighted segment. Usually, the highlighted duration of time is a little bit off, so we have to manually fix it by tweaking the Start Time and End Time cells for the player. If you’re not totally sure that you got it right, you can click either the red or blue player button to repeat the segment.

Eventually, you’ll get to a point where are the lyrics are covered for the first pass of the song. Now, it’s time to go back and adjust the emphasis on specific words and syllables, so that the right parts are highlighted in time with the music. On the spectrograph, click the toolbar button on the far right, titled “Karaoke Mode

Karaoke Mode allows you to indicate on a given line segment how much time should be given to any particular part. Spaces in a given line of text will automatically be broken up, like so:

After that, you’ll drag the yellow dotted lines around to match up with where they belong in that part of the song. This takes some trial and error, but you’ll eventually have something that lines up really well with the sung lyrics, with metadata to tell the player the correct timing and emphasis.

One note when editing the timing: For drawn-out parts that are sung for more than a second or two, you can change a lyric’s notation tag from {k} to {kf}, and the highlighting will be more gradual. Used sparingly, it can be very effective for dramatic moments. Thanks, ASS Notation!

The end result looks like this:

Also worth mentioning: if you need to do something more complex with timing in your lyrics, you can manually split up parts phonetically and adjust them.

Like I said, this process is kind of tedious and takes a while. It’s not particularly difficult to do, but requires attention to detail. When you’re done, export the subtitles as an .ass file, and get ready for the next phase.

Adding a video

It’s fairly standard in Karaoke Mugen to pair a video with the subtitles we just made. Since we already have the lyrics set up, and we’ve been using a blank Dummy Video for the process, it might be fun to add some visual flair. The sky is the limit on what you can do!

I’ll be the first to admit that I’m no filmmaker, and have a shoestring budget. So, I turned to Pexels to gather some Public Domain footage. I figured that, since my song had a sort of retro-synth sound to it, something involving cyberpunk could create a goofy and fun aesthetic.

I started downloading clips and assembling them together in KDEnlive, using the same audio track from the lyrics process to figure out visual timing and emphasis for my shots. When I was satisfied, I swapped out the audio with the karaoke version of the track, that didn’t have any lyrics, and rendered it as an MP4.

Putting it all together

Now, it’s time to finally load our creation into Karaoke Mugen itself. We have our song, mixed in to the video file. We have our subtitles, saved separately. Start up Karaoke Mugen, and select System Panel > Karaokes > New from the main window.

From here, you’ll add all the metadata your song needs, including the files we just created in the last two processes. Save it, switch over to your Operator Interface, and your new creation will live in your library, ready to play. If you run a song repository server for Karaoke Mugen, it’s also possible to put your song into a database for clients to parse. The KM Project even offers a public instance here, with guidance on how to submit songs.

The end result of all our hard work looks like this:

Whoa! I made karaoke!

The Elephant in the Room

While this was a fun and interesting exercise, there’s one other hurdle worth talking about, and that’s KaraFun.

KaraFun dominates the modern karaoke scene because it offers convenience to a lot of Karaoke Jockeys (KJ’s). All a KJ has to do is pay a subscription fee to access the company’s vast but limited cloud-based song library. Navigating more than one app during karaoke might not be conducive to the KJ, who doesn’t really want to have the audience watch them fiddle with a bunch of programs.

It looks a little worse than usual because I’m running it in Wine on my Linux computer

But, it’s technically possible to play your own files, and it’s also possible to upload custom songs to the community for other paid members to use. You could just throw an MP4 at your KJ friend and have them put it in their local library. But, is there a more native approach for playing your custom karaoke song in KaraFun? After all, shouldn’t your songs look like every other one in there?

Oof, that’s super different. Left: ASS format, Right: LRC format

The supported file formats for KaraFun are fairly expansive, but I found that they didn’t cover the ASS subtitles that I put together. Instead, we have to opt for LRC subtitles, which doesn’t have the benefits of k-time notation built into it, so there’s no special emphasis on syllables or longer parts of the line. So, basically I had to start over from scratch and just denote the timestamp of when a line starts in the track.

The good news, though: it works! Less good news: it’s janky. LRC’s emphasis timing is just not as good, and you have to manually add the folder for a given song. That said – all of your custom songs will show up under the My Computer section of the player. So, you’ll have a way to supply your KJ’s with a bunch of custom songs that aren’t in the KaraFun catalogue!

Other Open Source Karaoke Apps

I’ll close out this article by saying that, while it was a worthwhile experiment, there are definitely things I want to improve on. The other big stumbling block I’ve found, after creating all this, was trying to get other open source karaoke systems to support the files I put into Karaoke Mugen.

The sad truth is, different programs support different formats – there isn’t one shared standard yet. UltraStar Deluxe and Vocaluxe come from the same family tree, and are loosely based around SingStar. Performous is supposedly able to use those too. Karaoke Mugen can support all of the above, but the other programs don’t recognize its own recommended method for making karaoke. OpenKJ tries to stick very closely to a particular standard intended for proprietary karaoke machines, and doesn’t open anything I throw into it, regardless of program.

In hindsight, maybe I should’ve used the Composer tool that comes with Performous for the lyrics. The main issue with songs based around the Vocaluxe / UltraStar family tree is that they have to also incorporate pitch along with timing and emphasis, leading to a very different kind of creation process. This highlights a divide between Eastern and Western styles of karaoke, though.

I’ll keep exploring ways to make something that works with all of these programs, but having good support in at least one Free platform and one proprietary one is good enough for now.

So, there you have it – it’s possible to create your own karaoke and host it yourself! I hope you got some value out of this article and maybe feel inspired to put some of your own music out there in this format for people to enjoy. For me, personally, I’m going to keep at it, and also try to convert some Creative Commons / Public Domain music over. Maybe I’ll even explore adding StepMania and Frets of Fire data and see what it looks like in Performous!

Until next time!

Free Software Audio Production, Part 1: Setting up JACK

A few months ago, I became very curious about audio production. The idea of writing, recording, mixing, and distributing my own original  music has long been an enticing fantasy. But most of all, I wanted to  know: is it possible to use only Free Software to produce good music?

The short answer is, ultimately: yes, you absolutely can! It’s entirely possible to use a Linux distribution as a daily driver, and leverage a workflow that only contains Free Software applications.

The longer answer is, again, yes. But there are some fundamental quirks to consider, and you’ll need to expand your knowledge a lot in order to get a usable workflow. I’d love to talk about my own setup, what tools I use, and what things I had to do to get the whole stack working.

Today, we’re going to focus on the basics: getting JACK audio set up with a desktop GNU/Linux distribution.

Linux Distribution

There are a couple of Linux distributions available today that focus specifically on audio production, such as Ubuntu Studio or the more contemporary KXStudio. That being said, you can pretty much use any general Linux distribution you please.

In my case, I ended up sticking with elementaryOS, which is a great generalist distro that focuses on visual design and usability.

Low-Latency Kernel

One important tweak that I made was that I installed the lowlatency Linux kernel, which is a must if you’re working with real-time audio applications. Most deb-based distributions have a package for this,  which makes installation easy:

sudo apt-get install linux-lowlatency

After installation, you can load that kernel specifically by rebooting and choosing your kernel options in GRUB.  You should boot to this kernel whenever you intend to work on audio production.

Understanding JACK

The  sound system situation in Linux has historically been a mess in one way or another. Despite the existence of backends such as PulseAudio, or  perhaps because of it, users can find it challenging to get different audio applications to play nice with each other.

For example, you might be using ALSA to play an audio track in a music sequencer, while  simultaneously using a PortAudio application to record microphone input.  If you’re trying to use real-time microphone monitoring, this can confuse applications such as Audacity!

One great way to work around this is to use JACK,  which is specifically geared for audio production work. The project describes itself as infrastructure for audio applications to communicate with each other and audio hardware.

Have you ever wanted to take the audio output of one piece of software and send it to another? How about taking the output of that same program and send it to two others, then record the result in the first program? Or maybe you’re a programmer who writes real-time audio and music applications and who is looking for a cross-platform API that enables not only device sharing but also inter-application audio routing, and is incredibly easy to learn and use? If so, JACK may be what you’ve been looking for.

It’s a bit of a beast to work with, and can initially seem intimidating to newcomers, but JACK is extremely versatile and makes up for a lot of shortcomings one might feel when trying to use ALSA or PulseAudio.

There’s a few steps to setting JACK set up, so let’s run through them.

Step 1 – Install JACK

JACK is pretty widely packaged across Linux distributions, so you should be able to easily obtain it from your distribution’s package manager.

In a deb-based system, you’ll likely want to run:

sudo apt-get install jack-tools qjackctl

Step 2 – Add Real-Time Permissions

This part is a little more hairy, but we only have to address it once. Basically, we need to make sure that the system gives the JACK daemon permissions for real-time capabilities. The official JACK site has a really great guide that makes this process pretty cut-and-dry. Here’s what I did specifically:

Create a file at /etc/security/limits.d/99-realtime.conf

Give the file these contents:

@realtime   -  rtprio     99
@realtime   -  memlock    unlimited

Then create a realtime group:

groupadd realtime
usermod -a -G realtime yourUserID

Where yourUserID is whatever your system user id is.

After that, just log out and log back in!

Step 3 – Use QJackCtl

The easiest way to use the JACK system is to use a frontend – otherwise, you’re going to have to do everything in the terminal. qjackctl is a great app that makes this process mostly transparent.