ball-blog
  • Welcome, I'm ball
  • Machine Learning - Basic
    • Entropy
    • Cross Entropy
    • KL-Divergence
    • Monte-Carlo Method
    • Variational Auto Encoder
    • SVM
    • Adam Optimizer
    • Batch Normalization
    • Tokenizer
    • Rotary Positional Encoding
    • Vector Quantized VAE(VQ-VAE)
    • DALL-E
    • Diffusion Model
    • Memory Layers at Scale
    • Chain-of-Thought
  • Einsum
  • Linear Algebra
    • Linear Transformation
    • Determinant
    • Eigen-Value Decomposition(EVD)
    • Singular-Value Decomposition(SVD)
  • AI Accelerator
    • CachedAttention
    • SGLang
    • CacheBlend
  • Reinforcement Learning
    • Markov
  • Policy-Improvement Algorithm
  • Machine Learning - Transformer
    • Attention is All you need
    • Why do we need a mask in Transformer
    • Linear Transformer
    • kr2en Translator using Tranformer
    • Segment Anything
    • MNIST, CIFAR10 Classifier using ViT
    • Finetuning PaliGemma using LoRA
    • LoRA: Low-Rank Adaptation
  • EGTR: Extracting Graph from Transformer for SGG
  • Machine Learning - Mamba
    • Function Space(Hilbert Space)
    • HIPPO Framework
    • Linear State Space Layer
    • S4(Structures Space for Sequence Model)
    • Parallel Scan Algorithm
    • Mamba Model
  • Computer System
    • Memory Ordering: Release/Acquire 1
    • Memory Ordering: Release/Acquire 2
    • BUDAlloc
    • Lock-free Hash Table
    • Address Sanitizer
  • App development
    • Bluetooth connection in linux
    • I use Bun!
    • Using Tanstack-query in Frontend
    • Deploying AI Service using EC2
  • Problem Solving
    • Bipartite Graph
    • Shortest Path Problem in Graph
    • Diameter of a Tree
  • Scribbles
Powered by GitBook
On this page
  • Recall Entropy
  • Cross Entropy
  • Cross Entropy for P, Q
  • Why cross-entropy is important in ML?
  • References

Was this helpful?

Edit on GitHub
  1. Machine Learning - Basic

Cross Entropy

Last updated 3 months ago

Was this helpful?

Recall Entropy

Let's say there is a Machine P and Machine Q.

Machine P outputs character "A", "B", "C", "D" by probability of 0.25, 0.25, 0.25, 0.25.

Machine Q outputs character "A", "B", "C", "D" by probability of 0.5, 0.125, 0.125, 0.25.

As we went through this at https://github.com/jinho-choi123/ball-gitbook-repo/blob/main/machine-learning/broken-reference/README.md, each machine use different strategy to express characters as bits.

We use following strategy R for Machine P.

We use following strategy S for Machine Q.

Cross Entropy

What if we apply Strategy S to Machine P? or Strategy R to Machine Q?

For two distribution P and Q, applying each other's strategy to be expressed as bits, required # of bits are cross-entropy.

Strategy S to Machine P

We can calculate the cross-entropy as follows:

Strategy R to Machine Q

We can calculate the cross-entropy as follows:

Cross Entropy for P, Q

For two probability distribution P and Q, Cross-Entropy can be expressed as follows:

Why cross-entropy is important in ML?

As two distribution P, Q gets similar, cross-entropy gets smaller. This is the key reason why ML use cross-entropy so often.

For ground-truth distribution P, and learnable distribution Q. We train Q(output of the learnable model) to be similar to distribution P. The training objective would be Make the cross-entropy smaller!!

References

H(P,Q)=0.25⋅log2(0.5)+0.25⋅log2(0.125)+0.25⋅log2(0.125)+0.25⋅log2(0.25)H(P, Q) = 0.25 \cdot log_2(0.5)+0.25 \cdot log_2(0.125)+ 0.25 \cdot log_2(0.125)+0.25 \cdot log_2(0.25)H(P,Q)=0.25⋅log2​(0.5)+0.25⋅log2​(0.125)+0.25⋅log2​(0.125)+0.25⋅log2​(0.25)
H(Q,P)=0.5⋅log2(0.25)+0.125⋅log2(0.25)+0.125⋅log2(0.25)+0.25⋅log2(0.25)H(Q, P) = 0.5 \cdot log_2(0.25)+0.125 \cdot log_2(0.25)+0.125 \cdot log_2(0.25)+0.25 \cdot log_2(0.25)H(Q,P)=0.5⋅log2​(0.25)+0.125⋅log2​(0.25)+0.125⋅log2​(0.25)+0.25⋅log2​(0.25)
H(P,Q)=−∑p⋅log2(q)H(P, Q) = -\sum p \cdot log_2(q)H(P,Q)=−∑p⋅log2​(q)

[1]

https://hyunw.kim/blog/2017/10/26/Cross_Entropy.html
Strategy R to express characters in Machine P
Strategy S to express charecters in Machine Q
Drawing
Drawing