Why use μ-law encoding?
If you are working with audio data in machine learning and want the model to work with discrete inputs, the first idea that comes to mind is to use linear encoding: just map each floating point sample between -1 and +1 into an integer from 0 to 255. This is called linear encoding.
However, there’s a much better way to do this: μ-law encoding. You can read more about the theoretical details in the given Wikipedia link, but I wasn’t convinced without an empirical observation.
So, here it is: same audio file encoded using linear and μ-law algorithm. It’s the first 8 seconds of this Youtube video (which is called the Youtube Mix dataset) sampled at 16 kHz.
The noise in linear encoding is much more prominent compared to the μ-law version.