Linear State Space Layer
Last updated
Was this helpful?
Last updated
Was this helpful?
LSSL(Linear State Space Layer) is a model for seq2seq similar to RNN, Transformer. However, it use SSM(State-Space Model) which is different from existing RNNs or Transformers.
LSSL can be viewed as Convolutional and Recurrence. For training time, for parallelization, it is viewed as Convolution. In inference time, it is viewed as Recurrence for online inference.
LSSL Model contains a lot of math. I will not go through all the math. Please read The Annotated S4
Blog post for more details.
It uses an intermediate hidden state to store the input history.
LSSL view SSM as Discrete, Convolution and Recurrent.
Using various discretization method, LSSL view continuous system as discrete.
By default, SSM use Generalized bilinear Transform.
Discretized SSM is a form of recurrent.
I think this is the most interesting part.
If we set the initial state , then we can express as follows:
Which is applying kernel to sequence.
As a result, for all given inputs we can compute the in parallel.
Previously, I explained about SSM.
What makes LSSL special compared to original SSM?
Use HIPPO framework to express latent space
Make kernel computation(especially, computing power of ) efficient
The matrix in SSM takes a key role. In previous SSMs, it used random matrix , which made the model performance bad.
Using matrix from HIPPO framework made the model remember past input history and improved the performance.
So LSSL applies this theorem.
Surprising fact is that matrix from HIPPO framework is Quasiseperable!
is a architecture that models input_sequence-output_sequence mapping.
Paper says that MVM(Matrix-Vector Multiplication) for Quasiseperable matrix can be computed efficiently.
[1]
[2]
[3]
[4]