Kraft inequality: Nonlinear Function
Created: April 12, 2022
Modified: April 12, 2022

Kraft inequality

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

The Kraft inequality in information theory states (roughly?) that, for any probability distribution p(x)p(x), there is a prefix code C under which each xx is encoded with a codeword of length log2p(x)\lceil\log_2 p(x)\rceil bits, and that this code minimizes the expected code length.

Conversely, any code can be seen as representing a data distribution, where each element xx with codeword C(x)C(x) has probability 2C(x)2^{-|C(x)|}.

Reference: https://arxiv.org/abs/math/0406077