Operating systems
Locks
Prashanth L.A.
2023-10-09
Lecture 28
What are locks
- Want: critical section executes as if it were
a single atomic instruction
- Solution: Add locks around critical section
The basic idea
balance = balance + 1;
1 lock_t mutex; // some globally-allocated lock ’mutex’
2 ...
3 lock(&mutex);
4 balance = balance + 1;
5 unlock(&mutex);
Lock variable
- available (or unlocked or
free): No thread holds the lock
- acquired (or locked or
held): Exactly one thread holds the lock and presumably
is in a critical section
Semantics
- Call lock() to acquire the lock
- If no other thread holds the lock, the thread will
acquire the lock and enter the critical
section \(\Leftarrow\) Owner
thread
- Other threads cannot enter the critical section
- Owner calls unlock(), one of the waiting threads will acquire the
lock
Pthread Locks
1 pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
2
3 Pthread_mutex_lock(&lock); // wrapper; exits on failure
4 balance = balance + 1;
5 Pthread_mutex_unlock(&lock);
Note: different locks to protect different variables
The crux
How can we build an efficient lock?
Efficient locks provide mutual exclusion at low cost, and also might attain a few other properties we discuss below.
What hardware support is needed?
What OS support?
Evaluating locks
Mutual exclusion
- Does the lock prevent multiple threads from entering a critical
section?
Fairness
- Does each thread contending for the lock get a fair shot at
acquiring it once it is free?
- Does any thread contending for the lock starve while doing so, thus
never obtaining it?
- The time overheads added by using the lock
Mutual exclusion by controlling interrupts
- Disable Interrupts for critical sections
void lock() {
DisableInterrupts();
}
void unlock() {
EnableInterrupts();
}
Problems? (poll)
- cannot trust programs
- does not work on multi-processors
First attempt: A simple flag
typedef struct __lock_t { int flag; } lock_t;
void init(lock_t *mutex) {
// 0 -> lock is available, 1 -> held
mutex->flag = 0;
}
void lock(lock_t *mutex) {
while (mutex->flag == 1) // __TEST__ the flag
; // spin-wait (do nothing)
mutex->flag = 1; // now __SET__ it !
}
void unlock(lock_t *mutex) {
mutex->flag = 0;
}
Does this code ensure mutual exclusion?
Problems with first attempt

Any other problems?
- Spin-wait wastes time waiting for another thread
Lecture 29
Solution: Test and set (with HW support)
- An atomic instruction to support the creation of
simple locks
int TestAndSet(int *old_ptr, int new) {
int old = *old_ptr; // fetch old value at old_ptr
*old_ptr = new; // store ’new’ into old_ptr
return old; // return the old value
}
return the old value pointed to by the old ptr, and
simultaneously update said value to new
Test and set is atomic
Build a spin lock with test-and-set
typedef struct __lock_t {
// whatever data structs you need goes here
} lock_t;
void init(lock_t *lock) { // init code goes here
}
void lock(lock_t *lock) {
// lock acquire code goes here
}
void unlock(lock_t *lock) {
// lock release code goes here
}
A Simple Spin Lock using test-and-set
typedef struct __lock_t {
int flag;
} lock_t;
void init(lock_t *lock) {
// 0: lock is available, 1: lock is held
lock->flag = 0;
}
void lock(lock_t *lock) {
while (TestAndSet(&lock->flag, 1) == 1)
; // spin-wait (do nothing)
}
void unlock(lock_t *lock) {
lock->flag = 0;
}
Evaluating Spin Locks
Correctness
- The spin lock only allows a single thread to entry the critical
section
Fairness
- Spin locks don’t provide any fairness guarantees
- Single CPU: high performance overhead
- Multiple CPUs: If #threads roughly equals #CPUs, spin locks work
reasonably well
Compare-And-Swap
- Test whether the value at *addr is equal to expected
- If yes, updatethe memory location pointed to by ptr with the new
value, else do nothing
- In either case, return the old value at that memory
location
int CompareAndSwap(int *addr, int expected, int new) {
int old = *addr;
if (old == expected)
*addr = new;
return old;
}
Build a spin lock with compare-and-swap
typedef struct __lock_t {
// whatever data structs you need goes here
} lock_t;
void init(lock_t *lock) { // init code goes here
}
void lock(lock_t *lock) {
// lock acquire code goes here
}
void unlock(lock_t *lock) {
// lock release code goes here
}
Compare-and-swap based lock
void lock(lock_t *lock) {
while ( CompareAndSwap(&lock->flag, 0, 1) == 1)
; // spin
}
Load-Linked and Store-Conditional
- The store-conditional only succeeds if no intermittent
store to the address has taken place
- success : return 1 and update the value at
ptr to value
- fail : the value at ptr is not updates and
0 is returned
int LoadLinked(int *ptr) {
return *ptr;
}
int StoreConditional(int *ptr, int value) {
if (no one has updated *ptr since the LoadLinked to this address) {
*ptr = value;
return 1; // success!
} else {
return 0; // failed to update
}
}
Build a spin lock with Load-Linked and Store-Conditional
typedef struct __lock_t {
// whatever data structs you need goes here
} lock_t;
void init(lock_t *lock) { // init code goes here
}
void lock(lock_t *lock) {
// lock acquire code goes here
}
void unlock(lock_t *lock) {
// lock release code goes here
}
Spin-ock with LL/SC
void lock(lock_t *lock) {
while (1) {
while (LoadLinked(&lock->flag) == 1)
; // spin until it’s zero
if (StoreConditional(&lock->flag, 1) == 1)
return; // if set-it-to-1 was a success: all done
// otherwise: try it all over again
}
}
void unlock(lock_t *lock) {
lock->flag = 0;
}
Lecture 30
Problems (poll)
- Too much spinning: What happens when one thread
grabs a lock but is interrupted, and there are many other threads
sitting there trying to spin and grab the lock?
- Fairness as a problem: a thread could spin forever
even as other threads acquire and release the lock.
- Priority inversion: can you think of a scenario
where two threads with \(T2>T1\) (T2
is schedued first), and \(T2\) runs
ahead of \(T1\)
- Avoiding spin-locks won’t solve the problem: can you think of a
scenario where three threads with \(T3>T2>T1\), and \(T2\) runs ahead of \(T3\)
- How to fix priority inversion?
Fetch-And-Add
Atomically increment a value while returning the old value
at a particular address
int FetchAndAdd(int *ptr) {
int old = *ptr;
*ptr = old \+ 1;
return old;
}
Ticket Lock using fetch-and-add
typedef struct __lock_t {
int ticket;
int turn;
} lock_t;
void lock_init(lock_t *lock) {
lock->ticket = 0;
lock->turn = 0;
}
void lock(lock_t *lock) {
int myturn = FetchAndAdd(&lock->ticket);
while (lock->turn != myturn)
; // spin
}
void unlock(lock_t *lock) {
FetchAndAdd(&lock->turn);
}
Why is ticket lock better than test-and-set?
So Much Spinning
- HW-based locks are simple and ensure mutual
exclusion
- However, they are inefficient, e.g., a thread may
spin and waste an entire time slice
To avoid spinning, OS needs to do some work
A Simple Approach: Just Yield
- When you are going to spin, give up the CPU to another
thread
- OS system call moves the caller from the running state to
the ready state
void init() {
flag = 0;
}
void lock() {
while (TestAndSet(&flag, 1) == 1)
yield(); // give up the CPU
}
void unlock() {
flag = 0;
}
- Problems
- Context switch cost
- Starvation
Using Queues: Sleeping Instead of Spinning
- Queue to keep track of which threads are
waiting to enter the lock
Solaris solution
park() // Put a calling thread to sleep
unpark(threadID) //Wake a particular thread as designated by threadID
typedef struct __lock_t {
int flag;
int guard;
queue_t * q;
}
lock_t;
void lock_init(lock_t * m) {
m -> flag = 0;
m -> guard = 0;
queue_init(m -> q);
}
void lock(lock_t * m) {
while (TestAndSet( & m -> guard, 1) == 1)
; //acquire guard lock by spinning
if (m -> flag == 0) {
m -> flag = 1; // lock is acquired
m -> guard = 0;
} else {
queue_add(m -> q, gettid());
m -> guard = 0;
park();
}
}
void unlock(lock_t * m) {
while (TestAndSet( & m -> guard, 1) == 1)
; //acquire guard lock by spinning
if (queue_empty(m -> q))
m -> flag = 0; // let go of lock; no one wants it
else
unpark(queue_remove(m -> q)); // hold lock (for next thread!)
m -> guard = 0;
}
Lecture 31
Wakeup/waiting race
- In case of releasing the lock ( thread A ) just before the
call to park() ( thread B ) Thread B would sleep forever
(potentially)
Solaris solution: setpark()
- By calling this routine, a thread can indicate it is about
to park
- If it happens to be interrupted and another thread calls unpark
before park is actually called, the subsequent park returns immediately
instead of sleeping
// lock() code changed bit
queue_add(m->q, gettid());
setpark(); // new code
m->guard = 0;
Other stuff
Linux-based Futex Locks
- Linux provides a futex (is similar to Solaris’s park and
unpark)
- futex_wait(.,.) and futex_wake(.): read these bits on your own
Two-Phase Locks
- A two-phase lock realizes that spinning can be useful if
the lock is about to be released
- In the first phase, the lock spins for a while, hoping that it can
acquire the lock.
- If the lock is not acquired during the first spin phase, a sec- ond
phase is entered, where the caller is put to sleep, and only woken up
when the lock becomes free later.
Data structures with locks
Lock-based Concurrent Data structure
- Add locks to a data structure
- Ensure thread safety .
The crux
When given a particular data structure, how should we add locks to it, in order to make it work correctly?
How do we add locks such that the data structure yields high performance, enabling many threads concurrent access?
Example: Concurrent Counters without Locks
typedef struct __counter_t {
int value;
} counter_t;
void init(counter_t *c) {
c->value = 0;
}
void increment(counter_t *c) {
c->value++;
}
void decrement(counter_t *c) {
c->value--;
}
int get(counter_t *c) {
return c->value;
}
Simple but not scalable
Counter with locks
typedef struct __counter_t {
int value;
pthread_lock_t lock;
}
counter_t;
void init(counter_t * c) {
c -> value = 0;
Pthread_mutex_init( & c -> lock, NULL);
}
void increment(counter_t * c) {
Pthread_mutex_lock( & c -> lock);
c -> value++;
Pthread_mutex_unlock( & c -> lock);
}
Sloppy counter
- Many local counters, one per CPU
- A single global counter
How to count sloppily?
- A thread running on a core increments its local counter.
- The local values are periodically transferred to the global
counter.
- How often the local-to-global transfer occurs? Use a
threshold \(S\)
- Think about small vs. big values for \(S\)
Access control
Sloppy counter example

Choosing sloppiness parameter
S

Non-concurrent linked list
typedef struct __node_t {
int key;
struct __node_t *next;
} node_t;
typedef struct __list_t {
node_t *head;
} list_t;
void List_Init(list_t *L) {
L->head = NULL;
}
void List_Insert(list_t *L, int key) {
node_t *new = malloc(sizeof(node_t));
if (new == NULL) { perror("malloc"); return; }
new->key = key;
new->next = L->head;
L->head = new;
}
int List_Lookup(list_t *L, int key) {
node_t *tmp = L->head;
while (tmp) {
if (tmp->key == key)
return 1;
tmp = tmp->next;
}
return 0;
}
void List_Print(list_t *L) {
node_t *tmp = L->head;
while (tmp) {
printf("%d ", tmp->key);
tmp = tmp->next;
}
printf("\n");
}
int
main(int argc, char *argv[])
{
list_t mylist;
List_Init(&mylist);
List_Insert(&mylist, 10);
List_Insert(&mylist, 30);
List_Insert(&mylist, 5);
List_Print(&mylist);
printf("In List: 10? %d 20? %d\n",
List_Lookup(&mylist, 10), List_Lookup(&mylist, 20));
return 0;
}
Concurrent linked list with locks
void List_Insert(list_t *L, int key) {
pthread_mutex_lock(&L->lock);
node_t *new = malloc(sizeof(node_t));
if (new == NULL) { perror("malloc"); pthread_mutex_unlock(&L->lock); return; }
new->key = key;
new->next = L->head;
L->head = new;
pthread_mutex_unlock(&L->lock);
}
int List_Lookup(list_t *L, int key) {
pthread_mutex_lock(&L->lock);
node_t *curr = L->head;
while (curr) {
if (curr->key == key){
pthread_mutex_unlock(&L->lock);
return 1;
}
curr = curr->next;
}
pthread_mutex_unlock(&L->lock);
return 0;
}
Improvement: lock only the necessary code segments
void List_Insert(list_t *L, int key) {
// synchronization not needed
node_t *new = malloc(sizeof(node_t));
if (new == NULL) {
perror("malloc");
return; }
new->key = key;
// just lock critical section
pthread_mutex_lock(&L->lock);
new->next = L->head;
L->head = new;
pthread_mutex_unlock(&L->lock);
}
int List_Lookup(list_t *L, int key) {
int rv = -1;
pthread_mutex_lock(&L->lock);
node_t *curr = L->head;
while (curr) {
if (curr->key == key) {
rv = 0;
break; }
curr = curr->next;
}
pthread_mutex_unlock(&L->lock);
return rv; // now both success and failure
}
A better alternative
Hand-Over-Locking
- Instead of using a single lock for the entire list, the technique
employs a lock for each node in the list. While traversing the list,
before moving to the next node, the code acquires the lock of the next
node and releases the lock of the current node.