The basic idea

balance = balance + 1;

1 lock_t mutex; // some globally-allocated lock ’mutex’
2 ...
3 lock(&mutex);
4 balance = balance + 1;
5 unlock(&mutex);

Lock variable

available (or unlocked or free): No thread holds the lock
acquired (or locked or held): Exactly one thread holds the lock and presumably is in a critical section

Semantics

Call lock() to acquire the lock
If no other thread holds the lock, the thread will acquire the lock and enter the critical section \(\Leftarrow\) Owner thread
Other threads cannot enter the critical section
Owner calls unlock(), one of the waiting threads will acquire the lock

Pthread Locks

1 pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER; 
2
3 Pthread_mutex_lock(&lock); // wrapper; exits on failure
4 balance = balance + 1;
5 Pthread_mutex_unlock(&lock);

Note: different locks to protect different variables

The crux

How can we build an efficient lock?

Efficient locks provide mutual exclusion at low cost, and also might attain a few other properties we discuss below. 

What hardware support is needed? 

What OS support?

Evaluating locks

Mutual exclusion

Does the lock prevent multiple threads from entering a critical section?

Fairness

Does each thread contending for the lock get a fair shot at acquiring it once it is free?
Does any thread contending for the lock starve while doing so, thus never obtaining it?

Performance

The time overheads added by using the lock

Mutual exclusion by controlling interrupts

Disable Interrupts for critical sections

void lock() {
    DisableInterrupts();
}

void unlock() {
    EnableInterrupts();
}

Problems? (poll)

cannot trust programs
does not work on multi-processors

First attempt: A simple flag

typedef struct __lock_t { int flag; } lock_t;

void init(lock_t *mutex) {
// 0 -> lock is available, 1 -> held
mutex->flag = 0;
}

void lock(lock_t *mutex) {
while (mutex->flag == 1)  //  __TEST__  the flag
;  // spin-wait (do nothing)
mutex->flag = 1;  // now  __SET__  it !
}

void unlock(lock_t *mutex) {
mutex->flag = 0;
}

Does this code ensure mutual exclusion?

Problems with first attempt

Any other problems?

Spin-wait wastes time waiting for another thread

Solution: Test and set (with HW support)

An atomic instruction to support the creation of simple locks

int TestAndSet(int *old_ptr, int new) {
    int old = *old_ptr; // fetch old value at old_ptr
    *old_ptr = new;     // store ’new’ into old_ptr
    return old;         // return the old value
}

return the old value pointed to by the old ptr, and simultaneously update said value to new

Test and set is atomic

Build a spin lock with test-and-set


typedef struct __lock_t {
// whatever data structs you need goes here
} lock_t;

void init(lock_t *lock) { // init code goes here
}

void lock(lock_t *lock) {
// lock acquire code goes here
}

void unlock(lock_t *lock) {
// lock release code goes here
}

A Simple Spin Lock using test-and-set

typedef struct __lock_t {
    int flag;
} lock_t;    
    
void init(lock_t *lock) {
    // 0: lock is available, 1: lock is held
    lock->flag = 0;
}

void lock(lock_t *lock) {
    while (TestAndSet(&lock->flag, 1) == 1)
        ; // spin-wait (do nothing)
}

void unlock(lock_t *lock) {
    lock->flag = 0;
}

Evaluating Spin Locks

Correctness

The spin lock only allows a single thread to entry the critical section

Fairness

Spin locks don’t provide any fairness guarantees

Performance

Single CPU: high performance overhead
Multiple CPUs: If #threads roughly equals #CPUs, spin locks work reasonably well

Compare-And-Swap

Test whether the value at *addr is equal to expected
- If yes, updatethe memory location pointed to by ptr with the new value, else do nothing
- In either case, return the old value at that memory location

int CompareAndSwap(int *addr, int expected, int new) { 
  int old = *addr;
  if (old == expected)
    *addr = new; 
  return old;
}

Build a spin lock with compare-and-swap


typedef struct __lock_t {
// whatever data structs you need goes here
} lock_t;

void init(lock_t *lock) { // init code goes here
}

void lock(lock_t *lock) {
// lock acquire code goes here
}

void unlock(lock_t *lock) {
// lock release code goes here
}

Compare-and-swap based lock

void lock(lock_t *lock) {
while ( CompareAndSwap(&lock->flag, 0, 1) == 1)
; // spin
}

Load-Linked and Store-Conditional

The store-conditional only succeeds if no intermittent store to the address has taken place
- success : return 1 and update the value at ptr to value
- fail : the value at ptr is not updates and 0 is returned

int LoadLinked(int *ptr) {
  return *ptr;
}

int StoreConditional(int *ptr, int value) {
if (no one has updated *ptr since the LoadLinked to this address) {
*ptr = value;
return 1; // success!
} else {
return 0; // failed to update
}
}

Build a spin lock with Load-Linked and Store-Conditional


typedef struct __lock_t {
// whatever data structs you need goes here
} lock_t;

void init(lock_t *lock) { // init code goes here
}

void lock(lock_t *lock) {
// lock acquire code goes here
}

void unlock(lock_t *lock) {
// lock release code goes here
}

Spin-ock with LL/SC

void lock(lock_t *lock) {
           while (1) {
               while (LoadLinked(&lock->flag) == 1)
                   ; // spin until it’s zero
              if (StoreConditional(&lock->flag, 1) == 1)
                   return; // if set-it-to-1 was a success: all done
                           // otherwise: try it all over again
           }
}

void unlock(lock_t *lock) {
  lock->flag = 0;
}

Problems (poll)

Too much spinning: What happens when one thread grabs a lock but is interrupted, and there are many other threads sitting there trying to spin and grab the lock?
Fairness as a problem: a thread could spin forever even as other threads acquire and release the lock.
Priority inversion: can you think of a scenario where two threads with \(T2>T1\) (T2 is schedued first), and \(T2\) runs ahead of \(T1\)
Avoiding spin-locks won’t solve the problem: can you think of a scenario where three threads with \(T3>T2>T1\), and \(T2\) runs ahead of \(T3\)
How to fix priority inversion?

Fetch-And-Add

Atomically increment a value while returning the old value at a particular address

int FetchAndAdd(int *ptr) {
  int old = *ptr;
  *ptr = old \+ 1;
  return old;
}

Ticket Lock using fetch-and-add

typedef struct __lock_t {
  int ticket;
  int turn;
} lock_t;

void lock_init(lock_t *lock) {
  lock->ticket = 0;
  lock->turn = 0;
}

void lock(lock_t *lock) {
  int myturn =  FetchAndAdd(&lock->ticket);
  while (lock->turn != myturn)
    ; // spin
}

void unlock(lock_t *lock) {
  FetchAndAdd(&lock->turn);
}

Why is ticket lock better than test-and-set?

So Much Spinning

HW-based locks are simple and ensure mutual exclusion
However, they are inefficient, e.g., a thread may spin and waste an entire time slice

To avoid spinning, OS needs to do some work

A Simple Approach: Just Yield

When you are going to spin, give up the CPU to another thread
- OS system call moves the caller from the running state to the ready state

void init() {
  flag = 0;
}

void lock() {

while (TestAndSet(&flag, 1) == 1)
  yield(); // give up the CPU
}

void unlock() {
  flag = 0;
}

Problems
Context switch cost
Starvation

Using Queues: Sleeping Instead of Spinning

Queue to keep track of which threads are waiting to enter the lock

Solaris solution

park() // Put a calling thread to sleep
unpark(threadID) //Wake a particular thread as designated by threadID

typedef struct __lock_t {
    int flag;
    int guard;
    queue_t * q;
}
lock_t;

void lock_init(lock_t * m) {
    m -> flag = 0;
    m -> guard = 0;
    queue_init(m -> q);
}

void lock(lock_t * m) {
    while (TestAndSet( & m -> guard, 1) == 1)
    ; //acquire guard lock by spinning 
    if (m -> flag == 0) {
        m -> flag = 1; // lock is acquired
        m -> guard = 0;
    } else {
        queue_add(m -> q, gettid());
        m -> guard = 0;
        park();
    }
}

void unlock(lock_t * m) {
    while (TestAndSet( & m -> guard, 1) == 1)
    ; //acquire guard lock by spinning 
    if (queue_empty(m -> q))
        m -> flag = 0; // let go of lock; no one wants it 
    else
        unpark(queue_remove(m -> q)); // hold lock (for next thread!) 
    m -> guard = 0;
}

Wakeup/waiting race

In case of releasing the lock ( thread A ) just before the call to park() ( thread B ) Thread B would sleep forever (potentially)

Solaris solution: setpark()

By calling this routine, a thread can indicate it is about to park
If it happens to be interrupted and another thread calls unpark before park is actually called, the subsequent park returns immediately instead of sleeping

// lock() code changed bit
queue_add(m->q, gettid()); 
setpark(); // new code 
m->guard = 0;

Other stuff

Linux-based Futex Locks

Linux provides a futex (is similar to Solaris’s park and unpark)
- futex_wait(.,.) and futex_wake(.): read these bits on your own

Two-Phase Locks

A two-phase lock realizes that spinning can be useful if the lock is about to be released
- In the first phase, the lock spins for a while, hoping that it can acquire the lock.
- If the lock is not acquired during the first spin phase, a sec- ond phase is entered, where the caller is put to sleep, and only woken up when the lock becomes free later.

Data structures with locks

Lock-based Concurrent Data structure

Add locks to a data structure
Ensure thread safety .

The crux

When given a particular data structure, how should we add locks to it, in order to make it work correctly? 

How do we add locks such that the data structure yields high performance, enabling many threads concurrent access?

Example: Concurrent Counters without Locks

typedef struct __counter_t {
    int value;
} counter_t;

void init(counter_t *c) {
    c->value = 0;
}

void increment(counter_t *c) {
    c->value++;
}

void decrement(counter_t *c) {
    c->value--;
}

int get(counter_t *c) {
    return c->value;
}

Simple but not scalable

Counter with locks

 typedef struct __counter_t {
     int value;
     pthread_lock_t lock;
 }
 counter_t;

 void init(counter_t * c) {
     c -> value = 0;
     Pthread_mutex_init( & c -> lock, NULL);
 }

 void increment(counter_t * c) {
     Pthread_mutex_lock( & c -> lock);
     c -> value++;
     Pthread_mutex_unlock( & c -> lock);
 }

Performance evaluation

Each thread updates a single shared counter one million times.
Results on a four-core iMac
Four cores, but time taken scales poorly
What is sloppy?

Sloppy counter

Many local counters, one per CPU
A single global counter

How to count sloppily?

A thread running on a core increments its local counter.
The local values are periodically transferred to the global counter.
How often the local-to-global transfer occurs? Use a threshold \(S\)
- Think about small vs. big values for \(S\)

Access control

Global lock
Local locks

Sloppy counter example

Choosing sloppiness parameter S

Non-concurrent linked list

typedef struct __node_t {
    int         key;
    struct __node_t     *next;
} node_t;

typedef struct __list_t {
    node_t      *head;
} list_t;

void List_Init(list_t *L) {
    L->head = NULL;
}

void List_Insert(list_t *L, int key) {
    node_t *new = malloc(sizeof(node_t));
    if (new == NULL) { perror("malloc"); return; }
    new->key  = key;    
    new->next = L->head;
    L->head   = new;
}

int List_Lookup(list_t *L, int key) {
    node_t *tmp = L->head;
    while (tmp) {
    if (tmp->key == key)
        return 1;
    tmp = tmp->next;
    }
    return 0;
}

void List_Print(list_t *L) {
    node_t *tmp = L->head;
    while (tmp) {
    printf("%d ", tmp->key);
    tmp = tmp->next;
    }
    printf("\n");
}

int
main(int argc, char *argv[])
{
    list_t mylist;
    List_Init(&mylist);
    List_Insert(&mylist, 10);
    List_Insert(&mylist, 30);
    List_Insert(&mylist, 5);
    List_Print(&mylist);
    printf("In List: 10? %d 20? %d\n", 
       List_Lookup(&mylist, 10), List_Lookup(&mylist, 20));
    return 0;
}

Concurrent linked list with locks

void List_Insert(list_t *L, int key) {
    pthread_mutex_lock(&L->lock);
    node_t *new = malloc(sizeof(node_t));
    if (new == NULL) { perror("malloc"); pthread_mutex_unlock(&L->lock); return; }
    new->key  = key;    
    new->next = L->head;
    L->head   = new;
    pthread_mutex_unlock(&L->lock);
}

int List_Lookup(list_t *L, int key) {
    pthread_mutex_lock(&L->lock);
    node_t *curr = L->head;
    while (curr) {
        if (curr->key == key){
            pthread_mutex_unlock(&L->lock);
            return 1;
        }
        curr = curr->next;
    }
    pthread_mutex_unlock(&L->lock);
    return 0;
}

Improvement: lock only the necessary code segments

void List_Insert(list_t *L, int key) {
    // synchronization not needed
    node_t *new = malloc(sizeof(node_t));
    if (new == NULL) {
        perror("malloc");
    return; }
    new->key = key;
    // just lock critical section
    pthread_mutex_lock(&L->lock);
    new->next = L->head;
    L->head   = new;
    pthread_mutex_unlock(&L->lock);
}

int List_Lookup(list_t *L, int key) {
    int rv = -1;
    pthread_mutex_lock(&L->lock);
    node_t *curr = L->head;
    while (curr) {
        if (curr->key == key) {
    rv = 0;
    break; }
        curr = curr->next;
    }
    pthread_mutex_unlock(&L->lock);
    return rv; // now both success and failure
}

A better alternative

Hand-Over-Locking

Instead of using a single lock for the entire list, the technique employs a lock for each node in the list. While traversing the list, before moving to the next node, the code acquires the lock of the next node and releases the lock of the current node.

Operating systems

Locks

Lecture 28

What are locks

The basic idea

Lock variable

Semantics

Pthread Locks

The crux

Evaluating locks

Mutual exclusion

Fairness

Performance

Mutual exclusion by controlling interrupts

Problems? (poll)

First attempt: A simple flag

Problems with first attempt

Lecture 29

Solution: Test and set (with HW support)

Build a spin lock with test-and-set

A Simple Spin Lock using test-and-set

Evaluating Spin Locks

Correctness

Fairness

Performance

Compare-And-Swap

Build a spin lock with compare-and-swap

Compare-and-swap based lock

Load-Linked and Store-Conditional

Build a spin lock with Load-Linked and Store-Conditional

Spin-ock with LL/SC

Lecture 30

Problems (poll)

Fetch-And-Add

Ticket Lock using fetch-and-add

So Much Spinning

A Simple Approach: Just Yield

Using Queues: Sleeping Instead of Spinning

Solaris solution

Lecture 31

Wakeup/waiting race

Solaris solution: setpark()

Other stuff

Linux-based Futex Locks

Two-Phase Locks

Data structures with locks

Lock-based Concurrent Data structure

The crux

Example: Concurrent Counters without Locks

Counter with locks

Performance evaluation

Sloppy counter

How to count sloppily?

Access control

Sloppy counter example

Choosing sloppiness parameter S

Non-concurrent linked list

Concurrent linked list with locks

Improvement: lock only the necessary code segments

A better alternative

Hand-Over-Locking