Mixing Multiple Audio Files With SoX

SoX, the “Swiss Army knife of sound processing”, is awesome. I’ve been using it a lot lately for a project I’m working on, and I encountered a situation not quite covered in their documentation.

I wanted to mix multiple audio files together to create a new file. Let's say I have a file with a really cool beat and I wanted to add another file with vocals.

sox -m sick-beat.wav awful-lyrics.wav output.wav

Very straightforward. Now I have a beat with vocals. But what if I wanted to start the vocals a few seconds after the first track begins? SoX provides the pad effect which takes two parameters: one for before the file plays, and another for after (in seconds).

sox -m sick-beat.wav awful-lyrics.wav pad 3 0 output.wav

That should delay the vocals from starting for 3 seconds, right? Well, no. Instead, it shifts BOTH files. One solution is to pad the vocals first, then apply that intermediary file against the other:

sox awful-lyrics.wav offset-awful-lyrics.wav pad 3 0
sox -m sick-beat.wav offset-awful-lyrics.wav output.wav

That does exactly what we want. But what if you wanted to add another audio track? You'd have to add yet another line, creating yet another file, before then mixing them. It doesn't seem too bad for two files (though still wasteful), but it gets more cumbersome as it scales. Granted, it's a linear growth, but it's still wasteful since you'll probably want to be deleting the intermediary files once you're done. There's a smarter way to do it!

sox good-rapper.wav -p pad 3 0 | sox - -m awful-lyrics.wav -p pad 3 0 | sox - -m sick-beat.wav combined.wav

SoX provides the useful -p option that treats your command as an input pipe to another SoX command. In this case, the beat starts at 0:00, one track comes in at 0:03, and another track comes in at 0:06.

You can also check out avconv and ffmpeg.