Mixing Multiple Audio Files With SoX

SoX, the “Swiss Army knife of sound processing”, is awesome. I’ve been using it a lot lately for a project I’m working on, and I encountered a situation not quite covered in their documentation.

I wanted to mix multiple audio files together to create a new file. Let’s say I have a file with a really cool beat and I wanted to completely ruin it by adding another file with Nicki Minaj rapping.

sox -m sick-beat.wav awful-lyrics.wav output.wav

Very straightforward. Now I have a beat with the sounds of a pregnant wildebeest being tortured in a child’s night terror. But what if I wanted to start the track with her shit lyrics a few seconds after the first track begins? SoX provides the pad effect which takes two parameters: one for before the file plays, and another for after (in seconds). Awesome!

sox -m sick-beat.wav awful-lyrics.wav pad 3 0 output.wav

That should delay her retarded-ass rhymes from starting for 3 seconds, right? Well, no. Instead, it shifts BOTH files. One solution is to pad her rape lyrics first, then apply that intermediary file against the other:

sox awful-lyrics.wav offset-awful-lyrics.wav pad 3 0
sox -m sick-beat.wav offset-awful-lyrics.wav output.wav

That does exactly what we want. But what if you wanted to add the stylings of a more competent rapper to drown out her wailing? You’d have to add yet another line, creating yet another file, before then mixing them. It doesn’t seem too bad for two files (though still wasteful), but it gets more cumbersome as it scales. Granted, it’s a linear growth, but it’s still wasteful since you’ll probably want to be deleting the intermediary files once you’re done. There’s a smarter way to do it!

sox good-rapper.wav -p pad 3 0 | sox - -m awful-lyrics.wav -p pad 3 0 | sox - -m sick-beat.wav combined.wav

SoX provides the useful -p option that treats your command as an input pipe to another SoX command. In this case, the beat starts at 0:00, Nicki ruins it at 0:03, and finally a competent artist like Nas or Mos Def comes in at 0:06 and makes things listenable.

You can also check out avconv and ffmpeg.