Prev: interlude-thread-api Next: lock-based-concurrent-data-structures
Locks are meant to prevent multiple threads from updating the same variable:
; // some globally-allocated lock ’mutex’
lock_t mutex(&mutex);
lock= balance + 1;
balance (&mutex); unlock
The most common locks API is the POSIX threads library (Pthreads).
The previous example would look like this:
= PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_t lock (&lock); // wrapper; exits on failure
Pthread_mutex_lock= balance + 1;
balance (&lock); Pthread_mutex_unlock
To build a lock, we’ll need some hardware support
Locks must provide the following:
To provide mutual exclusion, the OS can’t interrupt a thread while it is in a critical section. One solution is to disable interrupts while locking:
void lock() { DisableInterrupts(); }
void unlock() { EnableInterrupts(); }
This doesn’t work:
Another attempt would look like this:
typedef struct __lock_t { int flag; } lock_t;
void init(lock_t *mutex) {
// 0 -> lock is available, 1 -> held
->flag = 0;
mutex}
void lock(lock_t *mutex) {
while (mutex->flag == 1) // TEST the flag
; // spin-wait (do nothing)
->flag = 1; // now SET it!
mutex}
void unlock(lock_t *mutex) {
->flag = 0;
mutex}
This is neither efficient nor correct:
With this interleaving, two threads could enter the critical section:
Thread 1 | Thread 2 |
---|---|
call lock() | |
while (flag == 1) | |
Interrupt: switch to T2 | |
call lock() | |
while (flag == 1) | |
flag = 1; | |
interrupt: switch to T1 | |
flag = 1; |
The correctness is because a while loop is a spin-lock, which is inefficient (it waits forever on its slice, when it should be sleeping and only wake when something changes).
The simplest hardware support to build locks uses
test-and-set
or atomic-exchange
:
int TestAndSet(int *old_ptr, int new) {
int old = *old_ptr; // fetch old value at old_ptr
*old_ptr = new; // store ’new’ into old_ptr
return old; // return the old value
}
As well, there’s Dekker’s and Peterson’s algorithms to create a lock without hardware support:
int flag[2];
int turn;
void init() {
// indicate you intend to hold the lock w/ ’flag’
[0] = flag[1] = 0;
flag// whose turn is it? (thread 0 or 1)
= 0;
turn }
void lock() {
// ’self’ is the thread ID of caller
[self] = 1;
flag// make it other thread’s turn
= 1 - self;
turn while ((flag[1-self] == 1) && (turn == 1 - self))
; // spin-wait while it’s not your turn
}
void unlock() {
// simply undo your intent
[self] = 0;
flag}
We can use Test-and-set for a simple spin lock:
typedef struct __lock_t {
int flag;
} lock_t;
void init(lock_t *lock) {
// 0: lock is available, 1: lock is held
->flag = 0;
lock}
void lock(lock_t *lock) {
while (TestAndSet(&lock->flag, 1) == 1)
; // spin-wait (do nothing)
}
void unlock(lock_t *lock) {
->flag = 0;
lock}
To evaluate our lock, assess its correctness, fairness, and performance
This approach isn’t terrible for multi-processors, though.
compare-and-swap
or compare-and-exchange
allows us to build a lock too:
void lock(lock_t *lock) {
while (CompareAndSwap(&lock->flag, 0, 1) == 1)
; // spin
}
Some platforms provide a pair of instructions for synchronization, instead of just one (like test-and-set or compare-and-swap).
Alpha, PowerPC, and ARM use LL and SC. A Load-linked instruction is like a load instruction, but a Store-Conditional instruction only succeeds if no store has taken place to the address in the LL.
This can create a lock like so:
void lock(lock_t *lock) {
while (LoadLinked(&lock->flag) || !StoreConditional(&lock->flag, 1))
; // spin
}
The final hardware primitive is fetch-and-add
, which
atomically increments a value, and returns its old value:
int FetchAndAdd(int *ptr) {
int old = *ptr;
*ptr = old + 1;
return old;
}
This allows us to build a ticket lock, which is a starvation-free lock.
A ticket lock would look like this: the first thread gets its turn, enters the critical section, then the next, and so forth. This is now starvation-free, but still inefficient (it spin-waits too much).
typedef struct __lock_t {
int ticket;
int turn;
} lock_t;
void lock_init(lock_t *lock) {
->ticket = 0;
lock->turn = 0;
lock}
void lock(lock_t *lock) {
int myturn = FetchAndAdd(&lock->ticket);
while (lock->turn != myturn)
; // spin
}
void unlock(lock_t *lock) {
->turn = lock->turn + 1;
lock}
We need OS support for another system call: yielding. This allows for a cooperative approach: any thread that finds it is locked out will yield to the OS, allowing other threads to make progress.
However there’s no starvation-freedom, and still inefficiencies to prune.
void init() {
= 0;
flag }
void lock() {
while (TestAndSet(&flag, 1) == 1)
(); // give up the CPU
yield}
void unlock() {
= 0;
flag }
It’s possible to avoid all the pitfalls previously by adding two
system calls: park()
, which puts a calling thread to sleep,
and unpark(threadID)
, which wakes the thread provided. This
allows a lock that puts a caller to sleep if it tries to acquire a held
lock and wakes when the lock is free.
typedef struct __lock_t {
int flag;
int guard;
*q;
queue_t } lock_t;
void lock_init(lock_t *m) {
->flag = 0;
m->guard = 0;
m(m->q);
queue_init}
void lock(lock_t *m) {
while (TestAndSet(&m->guard, 1) == 1)
; // acquire guard lock by spinning
if (m->flag == 0) {
->flag = 1; // lock is acquired
m->guard = 0;
m} else {
(m->q, gettid());
queue_add->guard = 0;
m();
park}
}
void unlock(lock_t *m) {
while (TestAndSet(&m->guard, 1) == 1)
; // acquire guard lock by spinning
if (queue_empty(m->q))
->flag = 0; // let go of lock; no one wants it
melse
(queue_remove(m->q)); // hold lock
unpark// (for next thread!)
->guard = 0;
m}
Linux provides futex
, which is like
un/park
. futex_wait(address, expected)
puts
the calling thread to sleep if the address == expected. If not, it
returns. futex_wake(address)
wakes one thread that is
waiting on the queue.
A Two-phase lock spins once, trying to acquire the lock. If it does not, it enters a sleep, where it is woken up when the lock becomes free later.
Prev: interlude-thread-api Next: lock-based-concurrent-data-structures