Finetuning PaliGemma using LoRA

Summary

I made a image captioning model using PaliGemma.

Since PaliGemma is a huge model, I used LoRA finetuning technique. If you are not familiar what is LoRA, please look at LoRA: Low-Rank Adaptation.

In this post, I will explain the key-points I felt during this project.

No more Jupyter notebook!

At first, I tried to use jupyter notebook for this project. However, jupyter notebook is awful when modulerizing the codes. I used .pyfiles, and directly execute&import the python script.

In Google Colab, I executed the code as follows:

Since I was Colab Pro subscriber, I used terminal to execute the train script.

The benefit of using python script is that you don't have to manually restart the kernel and click bunch of cells.

My development process is as follows:

Run the code in colab
Find error, fix error code in the local
Git push to remote repo
Git pull at Colab environment
Go to (1)

Use Wandb(Weights&Bias) for logging

Printing all the metrics(train_loss, average_train_loss, val_loss, bleu_loss etc) makes it hard to analyze the model during training. Instead, I used wandb to visualize all the metrics.

Look how beutiful it organize the metrics!

Huggingface documentation isn't organized

Huggingface is the most popular framework for Transformer models. I enjoy using it but have poor documentations. It was hard to find documentation for specific methods or classes. Sometimes I had to see the raw source-code.

Finetuning makes the model fit to downstream task!

I was impressed by the final result of finetuned model. I set the downstream task as Explain the image in detail

For the original PaliGemma model, it captioned the image as

bron james

However, my finetuned model captioned the image as

basketball player in a yellow jersey with the number 6 is holding a basketball on a court. The player has tattoos on his arms and legs, and is wearing a bracelet on his left wrist. The background shows a blurred view of other players and a referee, with a focus on the player in the foreground

References

Last updated 3 months ago

Was this helpful?