ball-blog
  • Welcome, I'm ball
  • Machine Learning - Basic
    • Entropy
    • Cross Entropy
    • KL-Divergence
    • Monte-Carlo Method
    • Variational Auto Encoder
    • SVM
    • Adam Optimizer
    • Batch Normalization
    • Tokenizer
    • Rotary Positional Encoding
    • Vector Quantized VAE(VQ-VAE)
    • DALL-E
    • Diffusion Model
    • Memory Layers at Scale
    • Chain-of-Thought
  • Einsum
  • Linear Algebra
    • Linear Transformation
    • Determinant
    • Eigen-Value Decomposition(EVD)
    • Singular-Value Decomposition(SVD)
  • AI Accelerator
    • CachedAttention
    • SGLang
    • CacheBlend
  • Reinforcement Learning
    • Markov
  • Policy-Improvement Algorithm
  • Machine Learning - Transformer
    • Attention is All you need
    • Why do we need a mask in Transformer
    • Linear Transformer
    • kr2en Translator using Tranformer
    • Segment Anything
    • MNIST, CIFAR10 Classifier using ViT
    • Finetuning PaliGemma using LoRA
    • LoRA: Low-Rank Adaptation
  • EGTR: Extracting Graph from Transformer for SGG
  • Machine Learning - Mamba
    • Function Space(Hilbert Space)
    • HIPPO Framework
    • Linear State Space Layer
    • S4(Structures Space for Sequence Model)
    • Parallel Scan Algorithm
    • Mamba Model
  • Computer System
    • Memory Ordering: Release/Acquire 1
    • Memory Ordering: Release/Acquire 2
    • BUDAlloc
    • Lock-free Hash Table
    • Address Sanitizer
  • App development
    • Bluetooth connection in linux
    • I use Bun!
    • Using Tanstack-query in Frontend
    • Deploying AI Service using EC2
  • Problem Solving
    • Bipartite Graph
    • Shortest Path Problem in Graph
    • Diameter of a Tree
  • Scribbles
Powered by GitBook
On this page
  • Summary
  • Hands-on Mamba
  • Wrapup
  • References

Was this helpful?

Edit on GitHub
  1. Machine Learning - Mamba

Mamba Model

Last updated 3 months ago

Was this helpful?

Summary

Mamba is advanced model of S4, which has selection mechanism and hardware-aware algorithm.

Selection Mechanism

Selection mechanism makes the model to select inputs to ignore or not. This will add content-awareness feature to Mamba model.

The attention in Transformer already has this feature. Attending specific parts of the sequence based on its content can be called as Selection Mechanism.

Parallel Scan

To maximize the parallelism in recurrent computation, we use parallel-scan algorithm.

Hardware-aware Algorithm

In modern GPUs, main bottleneck for AI computation is Memory bandwidth in HBM. Every computation is done in "fast" SRAM, and datas are stored in "slow" HBM.

Typical GPU bottleneck happens because frequent data copy happens between SRAM and HBM. To deal with this issue, Mamba only temporarily stores hidden state in SRAM. Importantly, if we use the hidden state in addition/multiplication, we replace the hidden state in SRAM.

You might think that intermediate hiddenstate should be used during back propagation, but Mamba suggests that just simply recalculating during backpropagation is enough. This prevents SRAM-HBM copy, and fully utilize GPU.

Hands-on Mamba

I made a simple tutorial of Mamba with CIFAR-10 dataset. You can find the code at the following repo:

Wrapup

Concept of Mamba is similar to Transformer, but based on SSM.

I strongly believe Mamba will contribute to unveil the AGI.

References

I recommend you to read about Mamba.

To add selection mechanism to S4, Mamba model make B,C,ΔB, C, \DeltaB,C,Δ change based on input. So we generate B,C,ΔB, C, \DeltaB,C,Δ for every sequence step LLL.

Since we are using different B,C,ΔB, C, \DeltaB,C,Δ for every time-step, we cannot calculate using convolution kernel. So we use recurrent computation.

To explain parallel scan in Mamba, it first multiply Bˉ\bar{B}Bˉ to the input. The arrow means "addition", and Aˉ\bar{A}Aˉ in the figure means to multiply Aˉ\bar{A}Aˉ with the first element in the box.

[1]

[2]

fantastic article
https://arxiv.org/abs/2312.00752
https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-mamba-and-state
LogoMamba: Linear-Time Sequence Modeling with Selective State SpacesarXiv.org
Drawing
Figure of GPU memory architecture
Inference process for SSM vs SSM+Selection. From
Figure of parallel scan in Mamba. From
https://arxiv.org/abs/2312.00752
https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-mamba-and-state
https://github.com/jinho-choi123/Mamba_CIFAR10/tree/main