Mamba Model

Last updated 3 months ago

Was this helpful?

Summary

Mamba is advanced model of S4, which has selection mechanism and hardware-aware algorithm.

I recommend you to read about Mamba.

Selection Mechanism

Selection mechanism makes the model to select inputs to ignore or not. This will add content-awareness feature to Mamba model.

To add selection mechanism to S4, Mamba model make $B, C, \Delta$ change based on input. So we generate $B, C, \Delta$ for every sequence step $L$ .

The attention in Transformer already has this feature. Attending specific parts of the sequence based on its content can be called as Selection Mechanism.

Parallel Scan

Since we are using different $B, C, \Delta$ for every time-step, we cannot calculate using convolution kernel. So we use recurrent computation.

To maximize the parallelism in recurrent computation, we use parallel-scan algorithm.

To explain parallel scan in Mamba, it first multiply $\bar{B}$ to the input. The arrow means "addition", and $\bar{A}$ in the figure means to multiply $\bar{A}$ with the first element in the box.

Hardware-aware Algorithm

In modern GPUs, main bottleneck for AI computation is Memory bandwidth in HBM. Every computation is done in "fast" SRAM, and datas are stored in "slow" HBM.

Typical GPU bottleneck happens because frequent data copy happens between SRAM and HBM. To deal with this issue, Mamba only temporarily stores hidden state in SRAM. Importantly, if we use the hidden state in addition/multiplication, we replace the hidden state in SRAM.

You might think that intermediate hiddenstate should be used during back propagation, but Mamba suggests that just simply recalculating during backpropagation is enough. This prevents SRAM-HBM copy, and fully utilize GPU.

Hands-on Mamba

I made a simple tutorial of Mamba with CIFAR-10 dataset. You can find the code at the following repo:

Wrapup

Concept of Mamba is similar to Transformer, but based on SSM.

I strongly believe Mamba will contribute to unveil the AGI.

References

[1]

[2]

To add selection mechanism to S4, Mamba model make $B, C, \Delta$ change based on input. So we generate $B, C, \Delta$ for every sequence step $L$ .

Since we are using different $B, C, \Delta$ for every time-step, we cannot calculate using convolution kernel. So we use recurrent computation.

To explain parallel scan in Mamba, it first multiply $\bar{B}$ to the input. The arrow means "addition", and $\bar{A}$ in the figure means to multiply $\bar{A}$ with the first element in the box.