Locks
Prev: interlude-thread-api Next: lock-based-concurrent-data-structures
Locks: The Basic Idea
Locks are meant to prevent multiple threads from updating the same variable:
lock_t mutex; // some globally-allocated lock ’mutex’
lock(&mutex);
balance = balance + 1;
unlock(&mutex);
Pthread locks
The most common locks API is the POSIX threads library (Pthreads).
The previous example would look like this:
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
Pthread_mutex_lock(&lock); // wrapper; exits on failure
balance = balance + 1;
Pthread_mutex_unlock(&lock);
Building a Lock
To build a lock, we’ll need some hardware support
Evaluating Locks
Locks must provide the following:
- Mutual Exclusion
- Fairness (does no lock starve?)
- Performance (it should not cost much)
Controlling Interrupts
To provide mutual exclusion, the OS can’t interrupt a thread while it is in a critical section. One solution is to disable interrupts while locking:
void lock() { DisableInterrupts(); }
void unlock() { EnableInterrupts(); }
This doesn’t work:
- Is a privileged operation, which applications will abuse
- If an application is coded improperly, the OS will starve
- Misses interrupts if locking for too long
- Slow performance
A Failed Attempt: Just Using Loads/Stores
Another attempt would look like this:
typedef struct __lock_t { int flag; } lock_t;
void init(lock_t *mutex) {
// 0 -> lock is available, 1 -> held
mutex->flag = 0;
}
void lock(lock_t *mutex) {
while (mutex->flag == 1) // TEST the flag
; // spin-wait (do nothing)
mutex->flag = 1; // now SET it!
}
void unlock(lock_t *mutex) {
mutex->flag = 0;
}
This is neither efficient nor correct:
With this interleaving, two threads could enter the critical section:
Thread 1 | Thread 2 |
---|---|
call lock() | |
while (flag == 1) | |
Interrupt: switch to T2 | |
call lock() | |
while (flag == 1) | |
flag = 1; | |
interrupt: switch to T1 | |
flag = 1; |
The correctness is because a while loop is a spin-lock, which is inefficient (it waits forever on its slice, when it should be sleeping and only wake when something changes).
Building Working Spin Locks with Test-And-Set
The simplest hardware support to build locks uses test-and-set
or
atomic-exchange
:
int TestAndSet(int *old_ptr, int new) {
int old = *old_ptr; // fetch old value at old_ptr
*old_ptr = new; // store ’new’ into old_ptr
return old; // return the old value
}
As well, there’s Dekker’s and Peterson’s algorithms to create a lock without hardware support:
int flag[2];
int turn;
void init() {
// indicate you intend to hold the lock w/ ’flag’
flag[0] = flag[1] = 0;
// whose turn is it? (thread 0 or 1)
turn = 0;
}
void lock() {
// ’self’ is the thread ID of caller
flag[self] = 1;
// make it other thread’s turn
turn = 1 - self;
while ((flag[1-self] == 1) && (turn == 1 - self))
; // spin-wait while it’s not your turn
}
void unlock() {
// simply undo your intent
flag[self] = 0;
}
We can use Test-and-set for a simple spin lock:
typedef struct __lock_t {
int flag;
} lock_t;
void init(lock_t *lock) {
// 0: lock is available, 1: lock is held
lock->flag = 0;
}
void lock(lock_t *lock) {
while (TestAndSet(&lock->flag, 1) == 1)
; // spin-wait (do nothing)
}
void unlock(lock_t *lock) {
lock->flag = 0;
}
Evaluating Spin Locks
To evaluate our lock, assess its correctness, fairness, and performance
- correctness: provides mutual exclusion
- fairness: spin locks are not fair and not starvation-free
- performance: extremely inefficient
This approach isn’t terrible for multi-processors, though.
Compare-And-Swap
compare-and-swap
or compare-and-exchange
allows us to build a lock
too:
void lock(lock_t *lock) {
while (CompareAndSwap(&lock->flag, 0, 1) == 1)
; // spin
}
Load-Linked and Store-Conditional
Some platforms provide a pair of instructions for synchronization, instead of just one (like test-and-set or compare-and-swap).
Alpha, PowerPC, and ARM use LL and SC. A Load-linked instruction is like a load instruction, but a Store-Conditional instruction only succeeds if no store has taken place to the address in the LL.
This can create a lock like so:
void lock(lock_t *lock) {
while (LoadLinked(&lock->flag) || !StoreConditional(&lock->flag, 1))
; // spin
}
Fetch-And-Add
The final hardware primitive is fetch-and-add
, which atomically
increments a value, and returns its old value:
int FetchAndAdd(int *ptr) {
int old = *ptr;
*ptr = old + 1;
return old;
}
This allows us to build a ticket lock, which is a starvation-free lock.
Too Much Spinning: What Now?
A ticket lock would look like this: the first thread gets its turn, enters the critical section, then the next, and so forth. This is now starvation-free, but still inefficient (it spin-waits too much).
typedef struct __lock_t {
int ticket;
int turn;
} lock_t;
void lock_init(lock_t *lock) {
lock->ticket = 0;
lock->turn = 0;
}
void lock(lock_t *lock) {
int myturn = FetchAndAdd(&lock->ticket);
while (lock->turn != myturn)
; // spin
}
void unlock(lock_t *lock) {
lock->turn = lock->turn + 1;
}
A Simple Approach: Just Yield, Baby
We need OS support for another system call: yielding. This allows for a cooperative approach: any thread that finds it is locked out will yield to the OS, allowing other threads to make progress.
However there’s no starvation-freedom, and still inefficiencies to prune.
void init() {
flag = 0;
}
void lock() {
while (TestAndSet(&flag, 1) == 1)
yield(); // give up the CPU
}
void unlock() {
flag = 0;
}
Using Queues: Sleeping Instead of Spinning
It’s possible to avoid all the pitfalls previously by adding two system
calls: park()
, which puts a calling thread to sleep, and
unpark(threadID)
, which wakes the thread provided. This allows a lock
that puts a caller to sleep if it tries to acquire a held lock and wakes
when the lock is free.
typedef struct __lock_t {
int flag;
int guard;
queue_t *q;
} lock_t;
void lock_init(lock_t *m) {
m->flag = 0;
m->guard = 0;
queue_init(m->q);
}
void lock(lock_t *m) {
while (TestAndSet(&m->guard, 1) == 1)
; // acquire guard lock by spinning
if (m->flag == 0) {
m->flag = 1; // lock is acquired
m->guard = 0;
} else {
queue_add(m->q, gettid());
m->guard = 0;
park();
}
}
void unlock(lock_t *m) {
while (TestAndSet(&m->guard, 1) == 1)
; // acquire guard lock by spinning
if (queue_empty(m->q))
m->flag = 0; // let go of lock; no one wants it
else
unpark(queue_remove(m->q)); // hold lock
// (for next thread!)
m->guard = 0;
}
Different OS, Different Support
Linux provides futex
, which is like un/park
. futex_wait(address, expected)
puts the calling thread to sleep if the address == expected.
If not, it returns. futex_wake(address)
wakes one thread that is
waiting on the queue.
Two-Phase Locks
A Two-phase lock spins once, trying to acquire the lock. If it does not, it enters a sleep, where it is woken up when the lock becomes free later.
Prev: interlude-thread-api Next: lock-based-concurrent-data-structures