KL-Divergence

Last updated 3 months ago

Was this helpful?

KL-Divergence

What is Kullback-Leibler Divergence

KL-Divergance is defined as the entropy difference between two probability distribution. In other words, it tells if two distributions are similar or not.

Cross-entropy gets smaller as two distribution gets similar. Let's take a look at the definition of cross-entropy.

H(P, Q) = - \sum p \cdot log_2(q)

Definition of KL-Divergence

Using cross-entropy, we can express KL-Divergence:

KL(P||Q) = H(P, Q) - H(P) = - \sum p \cdot log_2(\frac{p}{q})

In every cases in ML, distribution P is ground-truth distribution. Which means that $H(P)$ is constant. It is fine to subtract it from cross-entropy.

Characteristics of KL-Divergence

KL(P||Q) \ge 0

This is obvious if we look at the definition of KL-divergence.

Jensen-Shannon divergence

JS-divergence is also used to express the entropy difference between two probability distribution. It is defined as followings:

JS-divergence is expressed using KL-divergence, but not used as much as KL-divergence.

References