[cont.] Cross-entropy.
By that interpretation, in contrast to entropy, cross-entropy is the expected number of bits per transmission under a potentially suboptimal encoding {log₂(1/q₁), log₂(1/q₂), ...}, which is based on a potentially inaccurate distribution {q₁, q₂, ...} for the symbols. That is, mathematically cross-entropy is given by:
H(p, q) = ∑ᵢ pᵢ log₂(1/qᵢ).
[cont.] Kullback–Leibler divergence.
Following that line of thought, the Kullback-Leibler divergence between p and q is simply the difference between cross-entropy(p, q) and entropy(p):
D(p||q) =
= ∑ᵢ pᵢ log₂(pᵢ / qᵢ)
= ∑ᵢ pᵢ log₂(1/qᵢ) - ∑ᵢ pᵢ log₂(1/pᵢ)
= H(p, q) - H(p)
#InformationTheory #MachineLearning