Cross Entropy
Last updated
Was this helpful?
Last updated
Was this helpful?
Let's say there is a Machine P and Machine Q.
Machine P outputs character "A", "B", "C", "D" by probability of 0.25, 0.25, 0.25, 0.25.
Machine Q outputs character "A", "B", "C", "D" by probability of 0.5, 0.125, 0.125, 0.25.
As we went through this at https://github.com/jinho-choi123/ball-gitbook-repo/blob/main/machine-learning/broken-reference/README.md, each machine use different strategy to express characters as bits.
We use following strategy R for Machine P.
We use following strategy S for Machine Q.
What if we apply Strategy S to Machine P? or Strategy R to Machine Q?
For two distribution P and Q, applying each other's strategy to be expressed as bits, required # of bits are cross-entropy.
We can calculate the cross-entropy as follows:
We can calculate the cross-entropy as follows:
For two probability distribution P and Q, Cross-Entropy can be expressed as follows:
As two distribution P, Q gets similar, cross-entropy gets smaller. This is the key reason why ML use cross-entropy so often.
For ground-truth distribution P, and learnable distribution Q. We train Q(output of the learnable model) to be similar to distribution P. The training objective would be Make the cross-entropy smaller!!
[1]