ball-blog
  • Welcome, I'm ball
  • Machine Learning - Basic
    • Entropy
    • Cross Entropy
    • KL-Divergence
    • Monte-Carlo Method
    • Variational Auto Encoder
    • SVM
    • Adam Optimizer
    • Batch Normalization
    • Tokenizer
    • Rotary Positional Encoding
    • Vector Quantized VAE(VQ-VAE)
    • DALL-E
    • Diffusion Model
    • Memory Layers at Scale
    • Chain-of-Thought
  • Einsum
  • Linear Algebra
    • Linear Transformation
    • Determinant
    • Eigen-Value Decomposition(EVD)
    • Singular-Value Decomposition(SVD)
  • AI Accelerator
    • CachedAttention
    • SGLang
    • CacheBlend
  • Reinforcement Learning
    • Markov
  • Policy-Improvement Algorithm
  • Machine Learning - Transformer
    • Attention is All you need
    • Why do we need a mask in Transformer
    • Linear Transformer
    • kr2en Translator using Tranformer
    • Segment Anything
    • MNIST, CIFAR10 Classifier using ViT
    • Finetuning PaliGemma using LoRA
    • LoRA: Low-Rank Adaptation
  • EGTR: Extracting Graph from Transformer for SGG
  • Machine Learning - Mamba
    • Function Space(Hilbert Space)
    • HIPPO Framework
    • Linear State Space Layer
    • S4(Structures Space for Sequence Model)
    • Parallel Scan Algorithm
    • Mamba Model
  • Computer System
    • Memory Ordering: Release/Acquire 1
    • Memory Ordering: Release/Acquire 2
    • BUDAlloc
    • Lock-free Hash Table
    • Address Sanitizer
  • App development
    • Bluetooth connection in linux
    • I use Bun!
    • Using Tanstack-query in Frontend
    • Deploying AI Service using EC2
  • Problem Solving
    • Bipartite Graph
    • Shortest Path Problem in Graph
    • Diameter of a Tree
  • Scribbles
Powered by GitBook
On this page
  • Memory Ordering: Release/Acquire Series
  • Motivation
  • Enforce the memory ordering

Was this helpful?

Edit on GitHub
  1. Computer System

Memory Ordering: Release/Acquire 1

Last updated 3 months ago

Was this helpful?

Memory Ordering: Release/Acquire Series


Motivation

In concurrent programming, non-determinism comes from 1. Thread interleaving 2. Reordering by Compiler & CPU

For example, let's look at following error-prone code.

// thread A
DATA=42
FLAG=1

// thread B
if FLAG {
	assert(DATA==42)
}

The code would likely panic at assert(DATA=42) fail. This panic happens due to reordering. In a sequential perspective, instructions can execute as following:

FLAG = 1 // #1 thread A
if FLAG // #2 thread B
assert(DATA==42) // #3 thread B
DATA=42 // #4 thread A

Reordering happens in thread A because DATA and FLAG is independent, and thread A doesn't know the existence of thread B.

How can we prevent this code to panic? In other words, how can we prevent the reordering?

Lock/Unlock prevents reordering

Most easiest way to prevent reordering is using a lock.

If we consist the program as following, panic won't happen:

// thread A
lock.acquire()
DATA=42
FLAG=1
lock.release()

// thread B
lock.acquire()
if FLAG {
	assert(DATA==42)
}
lock.release()

This is because thread interleave doesn't happen while executing critical region.

But how can we ensure that code in thread A won't be reordered outside of lock scope?

Enforce the memory ordering

We can enforce the order of memory operations! For example, Acquire Ordering and Release Ordering.

Acquire Ordering

Ordering::Acquire is used with load memory operation. All post-memory operations cannot be reordered in prior order.

In the following program, if #2 is reordered to prior of #1, then A would load the wrong value.

A = load(0($R1), Ordering::Acquire); // #1
store(0($R1), $R3); // #2 store value of $R3 into 0($R1)

Release Ordering

Ordering::Release is used with store memory operation. All pre-memory operations cannot be reordered in post order.

In the following program, if #1 is reordered to post of #2, then A would load a wrong value.

A = load(0($R1), $R5); // #1
//...
store(0($R1), $R3, Ordering::Release); // #2 store value of $R3 into 0($R1)

Spin lock example with Acquire/Release Ordering

The following program is part of spinlock implementation with Rust.

unsafe impl RawLock for SpinLock {
    type Token = ();

    fn lock(&self) {
        let backoff = Backoff::new();

        while self
            .inner
            .compare_exchange(false, true, Acquire, Relaxed)
            .is_err()
        {
            backoff.snooze();
        }
    }

    unsafe fn unlock(&self, _token: ()) {
        self.inner.store(false, Release);
    }
}

Let's see the following execution flow.

memory instructions in #A can move to #B. But memory instructions in #B cannot move to #A, because lock.acquire has Ordering::Acquire.

Also memory instructions in #C can move to #B. But memory instructions in #B cannot move to #C, because lock.release has Ordering::Release

MIPS memory offset 0($R1) implies memory address. Please read this for more information.

Memory Ordering: Release/Acquire 1
Memory Ordering: Release/Acquire 2
post