ball-blog
  • Welcome, I'm ball
  • Machine Learning - Basic
    • Entropy
    • Cross Entropy
    • KL-Divergence
    • Monte-Carlo Method
    • Variational Auto Encoder
    • SVM
    • Adam Optimizer
    • Batch Normalization
    • Tokenizer
    • Rotary Positional Encoding
    • Vector Quantized VAE(VQ-VAE)
    • DALL-E
    • Diffusion Model
    • Memory Layers at Scale
    • Chain-of-Thought
  • Einsum
  • Linear Algebra
    • Linear Transformation
    • Determinant
    • Eigen-Value Decomposition(EVD)
    • Singular-Value Decomposition(SVD)
  • AI Accelerator
    • CachedAttention
    • SGLang
    • CacheBlend
  • Reinforcement Learning
    • Markov
  • Policy-Improvement Algorithm
  • Machine Learning - Transformer
    • Attention is All you need
    • Why do we need a mask in Transformer
    • Linear Transformer
    • kr2en Translator using Tranformer
    • Segment Anything
    • MNIST, CIFAR10 Classifier using ViT
    • Finetuning PaliGemma using LoRA
    • LoRA: Low-Rank Adaptation
  • EGTR: Extracting Graph from Transformer for SGG
  • Machine Learning - Mamba
    • Function Space(Hilbert Space)
    • HIPPO Framework
    • Linear State Space Layer
    • S4(Structures Space for Sequence Model)
    • Parallel Scan Algorithm
    • Mamba Model
  • Computer System
    • Memory Ordering: Release/Acquire 1
    • Memory Ordering: Release/Acquire 2
    • BUDAlloc
    • Lock-free Hash Table
    • Address Sanitizer
  • App development
    • Bluetooth connection in linux
    • I use Bun!
    • Using Tanstack-query in Frontend
    • Deploying AI Service using EC2
  • Problem Solving
    • Bipartite Graph
    • Shortest Path Problem in Graph
    • Diameter of a Tree
  • Scribbles
Powered by GitBook
On this page
  • Summary
  • No more Jupyter notebook!
  • Use Wandb(Weights&Bias) for logging
  • Huggingface documentation isn't organized
  • Finetuning makes the model fit to downstream task!
  • References

Was this helpful?

Edit on GitHub
  1. Machine Learning - Transformer

Finetuning PaliGemma using LoRA

Last updated 3 months ago

Was this helpful?

Summary

I made a image captioning model using PaliGemma.

Since PaliGemma is a huge model, I used LoRA finetuning technique. If you are not familiar what is LoRA, please look at LoRA: Low-Rank Adaptation.

In this post, I will explain the key-points I felt during this project.

No more Jupyter notebook!

At first, I tried to use jupyter notebook for this project. However, jupyter notebook is awful when modulerizing the codes. I used .pyfiles, and directly execute&import the python script.

In Google Colab, I executed the code as follows:

Since I was Colab Pro subscriber, I used terminal to execute the train script.

The benefit of using python script is that you don't have to manually restart the kernel and click bunch of cells.

My development process is as follows:

  1. Run the code in colab

  2. Find error, fix error code in the local

  3. Git push to remote repo

  4. Git pull at Colab environment

  5. Go to (1)

Use Wandb(Weights&Bias) for logging

Printing all the metrics(train_loss, average_train_loss, val_loss, bleu_loss etc) makes it hard to analyze the model during training. Instead, I used wandb to visualize all the metrics.

Look how beutiful it organize the metrics!

Huggingface documentation isn't organized

Huggingface is the most popular framework for Transformer models. I enjoy using it but have poor documentations. It was hard to find documentation for specific methods or classes. Sometimes I had to see the raw source-code.

Finetuning makes the model fit to downstream task!

I was impressed by the final result of finetuned model. I set the downstream task as Explain the image in detail

For the original PaliGemma model, it captioned the image as

bron james

However, my finetuned model captioned the image as

basketball player in a yellow jersey with the number 6 is holding a basketball on a court. The player has tattoos on his arms and legs, and is wearing a bracelet on his left wrist. The background shows a blurred view of other players and a referee, with a focus on the player in the foreground

References

[1]

[2]

[3]

https://arxiv.org/abs/2407.07726
https://medium.com/@danushidk507/customizing-paligemma-a-guide-to-fine-tuning-for-targeted-applications-636a522536cc
https://www.youtube.com/watch?v=hDa-M91MSGU&ab_channel=NielsRogge
https://github.com/jinho-choi123/PaliGemma-ImageCaptioning