Finetuning PaliGemma using LoRA
Last updated
Was this helpful?
Last updated
Was this helpful?
I made a image captioning model using PaliGemma.
Since PaliGemma is a huge model, I used LoRA finetuning technique. If you are not familiar what is LoRA, please look at LoRA: Low-Rank Adaptation.
In this post, I will explain the key-points I felt during this project.
At first, I tried to use jupyter notebook for this project. However, jupyter notebook is awful when modulerizing the codes. I used .py
files, and directly execute&import the python script.
In Google Colab, I executed the code as follows:
Since I was Colab Pro subscriber, I used terminal to execute the train script.
The benefit of using python script is that you don't have to manually restart the kernel and click bunch of cells.
My development process is as follows:
Run the code in colab
Find error, fix error code in the local
Git push to remote repo
Git pull at Colab environment
Go to (1)
Printing all the metrics(train_loss, average_train_loss, val_loss, bleu_loss etc) makes it hard to analyze the model during training. Instead, I used wandb to visualize all the metrics.
Look how beutiful it organize the metrics!
Huggingface is the most popular framework for Transformer models. I enjoy using it but have poor documentations. It was hard to find documentation for specific methods or classes. Sometimes I had to see the raw source-code.
I was impressed by the final result of finetuned model. I set the downstream task as
Explain the image in detail
For the original PaliGemma model, it captioned the image as
bron james
However, my finetuned model captioned the image as
basketball player in a yellow jersey with the number 6 is holding a basketball on a court. The player has tattoos on his arms and legs, and is wearing a bracelet on his left wrist. The background shows a blurred view of other players and a referee, with a focus on the player in the foreground
[1]
[2]
[3]