KL-Divergence
Last updated
Was this helpful?
Last updated
Was this helpful?
KL-Divergance is defined as the entropy difference between two probability distribution. In other words, it tells if two distributions are similar or not.
Cross-entropy gets smaller as two distribution gets similar. Let's take a look at the definition of cross-entropy.
Using cross-entropy, we can express KL-Divergence:
In every cases in ML, distribution P is ground-truth distribution. Which means that is constant. It is fine to subtract it from cross-entropy.
This is obvious if we look at the definition of KL-divergence.
JS-divergence is also used to express the entropy difference between two probability distribution. It is defined as followings:
JS-divergence is expressed using KL-divergence, but not used as much as KL-divergence.
Since cross-entropy is always bigger than entropy , KL-divergence should always be non-negative.
[1]