Why use μ-law encoding?

If you are working with audio data in machine learning and want the model to work with discrete inputs, the first idea that comes to mind is to use linear encoding: just map each floating point sample between -1 and +1 into an integer from 0 to 255. This is called linear encoding.

However, there’s a much better way to do this: μ-law encoding. You can read more about the theoretical details in the given Wikipedia link, but I wasn’t convinced without an empirical observation.

So, here it is: same audio file encoded using linear and μ-law algorithm. It’s the first 8 seconds of this Youtube video (which is called the Youtube Mix dataset) sampled at 16 kHz.

Linear

μ-law

The noise in linear encoding is much more prominent compared to the μ-law version.

Share on

Twitter Facebook LinkedIn

Why use μ-law encoding?

Share on

You may also enjoy

Bevy Jam Simulator: Game Jam Retrospective

Tch-rs Installation from PyTorch

Workaround for the Choppy Music in Bevy Web Builds

Learnable Memory Implementation for Image Transformers