ball-blog
  • Welcome, I'm ball
  • Machine Learning - Basic
    • Entropy
    • Cross Entropy
    • KL-Divergence
    • Monte-Carlo Method
    • Variational Auto Encoder
    • SVM
    • Adam Optimizer
    • Batch Normalization
    • Tokenizer
    • Rotary Positional Encoding
    • Vector Quantized VAE(VQ-VAE)
    • DALL-E
    • Diffusion Model
    • Memory Layers at Scale
    • Chain-of-Thought
  • Einsum
  • Linear Algebra
    • Linear Transformation
    • Determinant
    • Eigen-Value Decomposition(EVD)
    • Singular-Value Decomposition(SVD)
  • AI Accelerator
    • CachedAttention
    • SGLang
    • CacheBlend
  • Reinforcement Learning
    • Markov
  • Policy-Improvement Algorithm
  • Machine Learning - Transformer
    • Attention is All you need
    • Why do we need a mask in Transformer
    • Linear Transformer
    • kr2en Translator using Tranformer
    • Segment Anything
    • MNIST, CIFAR10 Classifier using ViT
    • Finetuning PaliGemma using LoRA
    • LoRA: Low-Rank Adaptation
  • EGTR: Extracting Graph from Transformer for SGG
  • Machine Learning - Mamba
    • Function Space(Hilbert Space)
    • HIPPO Framework
    • Linear State Space Layer
    • S4(Structures Space for Sequence Model)
    • Parallel Scan Algorithm
    • Mamba Model
  • Computer System
    • Memory Ordering: Release/Acquire 1
    • Memory Ordering: Release/Acquire 2
    • BUDAlloc
    • Lock-free Hash Table
    • Address Sanitizer
  • App development
    • Bluetooth connection in linux
    • I use Bun!
    • Using Tanstack-query in Frontend
    • Deploying AI Service using EC2
  • Problem Solving
    • Bipartite Graph
    • Shortest Path Problem in Graph
    • Diameter of a Tree
  • Scribbles
Powered by GitBook
On this page
  • Memory Ordering: Release/Acquire Series
  • Example 1
  • Example 2
  • Example 3
  • happens before relation with Release/Acquire ordering
  • References

Was this helpful?

Edit on GitHub
  1. Computer System

Memory Ordering: Release/Acquire 2

Last updated 3 months ago

Was this helpful?

Memory Ordering: Release/Acquire Series


We talked about relaxed memory ordering, and what release/acquire memory orderings are.

Let's deep dive into realistic examples that these memory orderings are used.

Example 1

static X: AtomicI32 = AtomicI32::new(0);
static Y: AtomicI32 = AtomicI32::new(0);

fn a() { // thread 1
    X.store(10, Relaxed); 1
    Y.store(20, Relaxed); 2
}

fn b() { // thread 2
    let y = Y.load(Relaxed); 3
    let x = X.load(Relaxed); 4
    println!("{x} {y}");
}

In the following code, we can't ensure what will be printed in thread b. This is because all the stores and loads are using Relaxed memory ordering.

Example 2

#[test]
fn test_concurrent_relaxed_memory_ordering() {
    loom::model(|| {
        loom::lazy_static! {
            static ref X: AtomicU32 = AtomicU32::new(0);
        }
        let t1 = thread::spawn(|| {
            X.fetch_add(5, Relaxed); 
            X.fetch_add(10, Relaxed); 
        });

        let t2 = thread::spawn(|| {
            let a = X.load(Relaxed);
            let b = X.load(Relaxed);
            let c = X.load(Relaxed);
            let d = X.load(Relaxed);
            assert!(a == 0 || a == 5 || a == 15, "a = {}", a);
            assert!(b == 0 || b == 5 || b == 15, "b = {}", a);
            assert!(c == 0 || c == 5 || c == 15, "c = {}", a);
            assert!(d == 0 || d == 5 || d == 15, "d = {}", a);

            assert!(d >= c || d >= b || d >= a, "d err");
            assert!(c >= a || c >= b || d >= c, "d err");
            assert!(b >= a || c >= b || d >= b, "b err");
            assert!(b >= a || c >= a || d >= a, "a err");
            // println!("{a} {b} {c} {d}");
            // as example 0 5 0 15 or 0 0 10 15 is impossible
        });

        t1.join().unwrap();
        t2.join().unwrap();
    });
}

In the following code, only one thread modifies the value of X. So in thread t1, it guarentees the ordering of value X: 0 -> 5 -> 10.

the print result can be

0 0 0 5
0 5 10 15
0 15 15 15
...

Example 3

Let's look at examples of using Release/Acquire memory ordering.

#[test]
fn test_concurrent_release_and_acquire_memory_ordering() {
    loom::model(|| {
        loom::lazy_static! {
            static ref DATA: AtomicU64 = AtomicU64::new(0);
            static ref READY: AtomicBool = AtomicBool::new(false);
        }

        thread::spawn(|| {
            DATA.store(123, Relaxed);
            READY.store(true, Release); // Everything from before this store ..
        });

        while !READY.load(Acquire) { // .. is visible after this loads `true`.
            loom::thread::yield_now();
        }
        assert_eq!(123, DATA.load(Relaxed));
        println!("{}", DATA.load(Relaxed));
    });
}

In spawned thread, Release and Acquire memory gives happens before ordering to READY.store and READY.load in different thread. So every instructions before READY.store will happen before READY.load. Also every instructions after READY.load will not happen before READY.store.

But you have to know that the happens before is not magically done by memory ordering. Memory ordering is just restricting memory operations within a single thread. It has nothing to do with happens before relation with other thread.

happens before relation with Release/Acquire ordering

Maybe this part is the key in our blog post. How can we achieve happens before relation using Release and Acquire memory ordering?

Simplest way is to use a flag, and store/load on flag with Release and Acquire memory ordering.

unsafe impl RawLock for SpinLock {
    type Token = ();

    fn lock(&self) {
        let backoff = Backoff::new();

        while self
            .inner
            .compare_exchange(false, true, Acquire, Relaxed)
            .is_err()
        {
            backoff.snooze();
        }
    }

    unsafe fn unlock(&self, _token: ()) {
        self.inner.store(false, Release);
    }
}

self.inner is a flag that tells if lock is acquireable or not. There is a Acquire in lock, and Release in unlock.

If the store in unlock() success, then it will store false with Release ordering.

And if the CAS in lock() success, then it will swap the value from false->true with Acquire ordering.

If self.inner is true, then lock() will spin on CAS loop.

If both unlock() and lock() success, then we can ensure that memory instructions before unlock will happen before instructions after lock().

References

\

The following code is part of spin lock implementation from

[1] [2]

Memory Ordering: Release/Acquire 1
Memory Ordering: Release/Acquire 2
KAIST-CS431
https://medium.com/@omid.jn/rust-release-and-acquire-memory-ordering-by-example-d8de58ef4e36
https://davekilian.com/acquire-release.html