HIPPO Framework
Last updated
Was this helpful?
Last updated
Was this helpful?
Many of the real-world datas are series data. For example, sensor datas are time-series data.
Transformer, RNN models use history(previous datas) to predict the next data. However, they suffer from large computation cost(for Transformer) or vanishing gradient(for RNN). The fundamental problem is the memory limitation of storing the history.
HIPPO framework gives a new approach to store previous history with much less memory.
Let's say there is a function that we want to store. At any time , HIPPO approximates the function
This is done by
Define N orthonormal basis of function space
The coefficients can be described as an dynamic system(ODE)
If we discretize the ODE, we can get the coefficients for every timestep
These coefficients are the compressed information of . Compressed it into N-coefficients per timestep
In the Appendix C, it suppose the following equation
can be reduced to dynamics of the form
This conversion needs some insights.
Since is a polynomial in of degree n-1, can be expressed as linear combination of . So the first term in equation (20) can be expressed with .
For many weight function , we can find a scaling function that can be expressed using .
But still I cant think why second term in equation (20) can be expressed as linear combination of .
If you know the answer, please leave comments!!
As we use higher N(max-degree for orthonormal polynomial basis), it approximates the input more accurately.
Project to each basis, and get the coefficient. (using inner-product)
I will not explain the derivation of HIPPO Framework. Please read Appendix C from the original paper or . It shows the math part of deriving ODE for coefficient .
[1]
[2]
[3]
[4]
[5]