1. Why Use Concurrency?

1.1 Parallel Execution

  • Two programs execute in parallel if they execute at exactly the same time
  • At time t, an instruction is being performed for both P1 and P2
  • Need replicated hardware
  • Why Use Parallel Execution?
    • Tasks may complete more quickly
    • Some tasks must be performed sequentially
    • But some tasks are parallelizable and some are not

1.2 Von Neumann Bottleneck

Speedup without Parallelism
  • Can we achieve speedup without Parallelism?
  • Design faster processors
    • Get speedup without changing software
  • Design processor with more memory
    • Reduces the Von Neumann Bottleneck
    • Cache access time = 1 clock cycle
    • Main memory access time = ~100 clock cycles
    • Increasing on-chip cache improves performance
Moore’s Law
  • Predicted that transistor density would double every two years
  • Smaller transistors switch faster
  • Not a physical law, just an observation
  • Exponential increase in density would lead to exponential increase in speed

1.3 Power wall

Power/Temperature Problem
  • Transistors consume power when they switch
  • Increasing transistor density leads to increased power consumption
    • Smaller transistors use less power, but density scaling is much faster
  • High power leads to high temperature
  • Air cooling(fans) can only remove so much heat
Dynamic Power
  • P = α * CFV2
  • α is percent of time switching
  • C is capacitance(related to size)
  • F is the clock frequency
  • V is voltage swing(from low to high)
    • Voltage is important
    • 0 to 5V uses much more power than 0 to 1.3V
Dennard Scaling
  • Voltage should scale with transistor size

  • Keeps power consumption and temperature low

  • Problem: Voltage can’t go too low
    • Must stay above threshold voltage
    • Noise problems occur
  • Problem: Doesn’t consider leakage power
  • Dennard scaling must stop
Multi-Core Systems
  • P = α * CFV2
  • Cannot increase frequency
  • Can still add processor cores, without increasing frequency
    • Trend is apparent today
  • Parallel execution is needed to exploit multi-core systems
  • Code made to execute on multiple cores
  • Different programs on different cores

1.4 Concurrent vs Parallel

Concurrent execution
  • Concurrent execution is not necessarily the same as parallel execution
  • Concurrent: start and end times overlap
  • Parallel: execute at exactly the same time

  • Parallel tasks must be executed on different hardware
  • Concurrent tasks may be executed on the same hardware
    • Only one task actually executed at a time
  • Mapping from tasks to hardware is not directly controlled by the programmer
    • At least not in go

1.5 Concurrent Programming

  • Programmer determines which tasks can be executed in parallel
  • Mapping tasks to hardware
    • Operating system
    • Go runtime schedule
Benefit 1: Hiding Latency
  • Concurrency improves performance, even without parallelism
  • Tasks must periodically wait for something
    • i.e. wait for memory
    • X = Y + Z read Y, Z from memory
    • May wait 100+ clock cycles
  • Other concurrent tasks can operate while one task is waiting
Benefit 2: Hardware Mapping

Hardware Mapping in Go:

  • Programmer does not determine the hardware mapping
  • Programmer makes parallelism possible
  • Hardware mapping depends on many factors
    • Where is the data?
    • What are the communication costs?

2. Concurrency Basics

2.1 Processes

  • An instance of a running program
  • Things unique to a process
    • Memory
      • Virtual address space
      • Code, stack, heap, shared libraries
    • Registers
      • Program counter, data regs, stack ptr, …
Operating Systems
  • Allows many processes to execute concurrently
  • Processes are switched quickly (20ms)
  • User has the impression of parallelism
  • Operating system must give processes fair access to resources

2.2 Scheduling

Scheduling Processes
  • Operating system schedules processes for execution
  • Gives the illusion of parallel execution

  • OS gives fair access to CPU, memory, etc.
Context Switch
  • Control flow changes from one process to another

  • Process “context” must be swapped

2.3 Threads and Goroutines

Threads vs Processes
  • Threads share some context
  • Many threads can exist in one process
  • OS schedules threads rather than proccesses

Goroutines
  • Like a thread in Go
  • Many Goroutines execute within a single OS thread

Go Runtime Scheduler
  • Schedules goroutines inside an OS thread
  • Like a little OS inside a single OS thread
  • Logical processor is mapped to a thread

2.4 Interleavings

  • Order of executive within a task is known
  • Order of executive between concurrent tasks is unknown
  • Interleaving of instructions between tasks is unknown

  • Many interleavings are possible

  • Must consider all possibilities
  • Ordering is non-deterministic

2.5 Race Conditions

  • Outcome depends on non-deterministic ordering
  • Races occur due to communication

2.6 Communication Between Tasks

  • Threads are largely independent but not completely independent
  • Web server, one thread per client

  • Image processing, 1 thread per pixel block

3. Threads in Go

3.1 Goroutines

Creating a Goroutine
  • One goroutine is created automatically to execute the main()
  • Other goroutines are created using the go keyword

Exiting a Goroutine
  • A goroutine exits when its code is complete
  • When the main goroutine is complete, all other goroutines exit
  • A goroutine may not complete its execution because main completes early

  • Adding a delay to wait for a goroutine is bad, because timing assumptions may be wrong and timing is nondeterministic

  • Need formal synchronization constructs

3.2 Basic Synchronization

  • Using global events whose execution is viewed by all threads, simultaneously

  • Example:

    • Global event is viewed by all tasks at the same time
    • Print must occur after update of x
    • Synchronization is used to restrict bad interleavings
    • In this case, synchronization reduces performance and efficiency. It’s bad, but it’s necessary here.

3.3 Wait Groups

Sync WaitGroup
  • Sync package contains functions to synchronize between goroutines
  • sync.WaitGroup forces a goroutine to wait for other goroutines

  • Contains an internal counter

    • Increment counter for each goroutine to wait for
    • Decrement counter when each goroutine completes
    • Waiting goroutine cannot continue until counter is 0
Using WaitGroup

  • Add() increments the counter
  • Done() decrements the counter
  • Wait() blocks until counter == 0
WaitGroup Example
1
2
3
4
5
6
7
8
9
10
11
12
func foo(wg *sync.WaitGroup) {
fmt.Printf("New routine")
wg.Done()
}

func main() {
var wg sync.WaitGroup
wg.Add(1)
go foo(&wg)
wg.Wait()
fmt.Printf("Main routine")
}

3.4 Communication

Goroutine Communication
  • Goroutines usually work together to perform a bigger task
  • Often need to send data to collaborate
  • Example: Find the product of 4 integers
    • Make 2 goroutines, each multiplies a pair
    • Main goroutine multiplies the 2 results
    • Need to send ints from main routine to the two sub-routines
    • Need to send results from sub-routines back to main routine
Channels
  • Transfer data between goroutines
  • Channels are typed
  • Use make() to create a channel
1
c := make(chan int)
  • Send and receive data using the <- operator
  • Send data on a channel
1
c <- 3
  • Receive data from a channel
1
x := <- c
  • Channel example:
1
2
3
4
5
6
7
8
9
10
11
12
func prod(v1 int, v2 int, c chan int) {
c <- v1 * v2
}

func main() {
c := make(chan int)
go prod(1, 2, c)
go prod(3, 4, c)
a := <- c
b := <- c
fmt.Println(a * b)
}

3.5 Blocking in Channels

Unbuffered Channel
  • Unbuffered channels cannot hold data in transit
    • Default is unbuffered
  • Sending blocks until data is received
  • Receiving blocks until data is sent
Blocking and Synchronization
  • Channel communication is synchronous
  • Blocking is the same as waiting for communication
  • Receiving and ignoring the result is the same as a Wait()

3.6 Buffered Channels

Channel Capacity
  • Channels can contain a limited number of objects
    • Default size 0(unbuffered)
  • Capacity is the number of objects it can hold in transit
  • Optional argument to make() defines channel capacity
1
c := make(chan int, 3)
  • Sending only blocks if buffer is full
  • Receiving only blocks if buffer is empty
Channel Blocking, Receive
  • Channel with capacity 1

  • First receive blocks until send occurs
  • Second receive blocks forever
Channel Blocking, Send

  • Second send blocks until receive is done
  • Receive can block until first send is done
Use of Buffering
  • Sender and receiver do not need to operate at exactly the same speed

  • Speed mismatch is acceptable

4. Synchronized Communication

4.1 Blocking on Channels

Iterating Through a Channel
  • Common to iteratively read from a channel
1
2
3
for i := range c {
fmt.Println(i)
}
  • Continues to read from channel c
  • One iteration each time a new value is received
  • i is assigned to the read value
  • Iterates when sender calls close(c)
Receiving from Multiple Goroutines
  • Multiple channels may be used to receive from multiple sources

  • Data from both sources may be needed
  • Read sequentially
1
2
3
a := <- c1
b := <- c2
fmt.Println(a*b)
Select Statement
  • May have a choice of which data to use
    • i.e. First-come first-served
  • Use the select statement to wait on the first data from a set of channels
1
2
3
4
5
6
select {
case a = <- c1:
fmt.Println(a)
case b = <- c2:
fmt.Println(b)
}

4.2 Select

Select Send or Receive
  • May select either send or receive operations
1
2
3
4
5
6
select {
case a = <- inchan:
fmt.Println("Received a")
case outchan <- b:
fmt.Println("Sent b")
}
Select with an Abort channel
  • Use select with a separate abort channel
  • May want to receive data until an abort signal is received
1
2
3
4
5
6
7
8
for {
select {
case a = <- c:
fmt.Println(a)
case <-abort:
return
}
}
Default Select
  • May want a default operation to avoid blocking
1
2
3
4
5
6
7
8
select {
case a = <- c1:
fmt.Println(a)
case b = <- c2:
fmt.Println(b)
default:
fmt.Println("nop")
}

4.3 Mutual Exclution

Goroutines Sharing Variables
  • Sharing variables concurrently can cause problems
  • Two goroutines writing to a shared variable can interfere with each other

  • Function can be invoked concurrently without interfering with other goroutines(Concurrency-Safe)

Variable Sharing Example
1
2
3
4
5
6
7
8
9
10
11
12
13
var i int = 0
var wg sync.WaitGroup
func inc() {
i = i + 1
wg.Done()
}
func main() {
wg.Add(2)
go inc()
go inc()
wg.Wait()
fmt.Println(i)
}
  • Two goroutines write to i
  • i should equal 2, but this doesn’t always happen. Because there are some possible interleavings
Granularity of Concurrency
  • Concurrency is at the machine code level
  • i = i + 1 might be three machine instructions

  • Interleaving machine instructions causes unexpected problems

  • Interleaving machine instructions

    • Both tasks read 0 for 1 value

4.4 Mutex

Correct Sharing
  • Don‘t let 2 goroutines write to a shared variable at the same time!
  • Need to restrict possible interleavings
  • Access to shared variables cannot be interleaved
  • Code segements in different goroutines which cannot execute concurrently(Mutual Exclusion)
  • Writing to shared variables should be mutually exclusive
Sync.Mutex
  • A Mutex ensures mutual exclusion
  • Uses a binary semaphore

  • Flag up - shared variable is in use
  • Flag down - shared variable is available

4.5 Mutex Methods

#####Sync.Mutex Methods

  • Lock() method puts the flag up
    • Shared variable in use
  • If lock is already taken by a goroutine, Lock() blocks until the flag is put down

  • Unlock() method puts the flag down

    • Done using shared variable
  • When Unlock() is called, a blocked Lock() can proceed
Using Sync.Mutex
  • Increment operation is now mutually exclutive
1
2
3
4
5
6
7
var i int = 0
var mut sync.Mutex
func inc() {
mut.Lock()
i = i + 1
mut.Unlock()
}

4.6 Once Synchronization

Synchronous Initialization
  • Initialization
    • must happen once
    • must happen before everything else
  • How do you perform initialization with multiple goroutines?
  • Could perform initialization before starting the goroutines?
Sync.Once
  • Has one method, once.Do(f)
  • Function if is executed only one time
    • Even if it is called in multiple goroutines
  • All calls to once.Do() block until the first returns
    • Ensures that initialization executes first

#####Sync.Once Example

  • Make two goroutines, initialization only once
  • Each goroutine executes dostuff()
1
2
3
4
5
6
7
8
var wg sync.WaitGroup

func main() {
wg.Add(2)
go dostuff()
go dostuff()
wg.Wait()
}
  • Using Sync.Once

    • setup() should execute only once

    • “hello” should not print until setup() returns

    • 1
      2
      3
      4
      5
      6
      7
      8
      9
      var on sync.Once
      func setup() {
      fmt.Println("Init")
      }
      func dostuff() {
      on.Do(setup)
      fmt.Println("hello")
      wg.Done()
      }
  • Execution Result

    • 1
      2
      3
      Init        //Result of setup()
      hello //Result of one goroutine
      hello //Result of the other goroutine
    • Init appears only once

    • Init appears before hello is printed

4.7 Deadlock

Synchronization Dependencies
  • Synchronization causes the execution of different goroutines to depend on each other

  • G2 cannot continue until G1 does something
Deadlock
  • Circular dependencies cause all involved goroutines to block
    • G1 waits for G2
    • G2 waits for G1
  • Can be caused by waiting on channels
Deadlock Example
1
2
3
4
5
func dostuff(c1 chan int, c2 chan int) {
<- c1
c2 <- 1
wg.Done()
}
  • Read from first channel
    • Wait for write onto first channel
  • Write to second channel
    • Wait for read from second channel
1
2
3
4
5
6
7
8
func main() {
ch1 := make(chan int)
ch2 := make(chan int)
wg.Add(2)
go dostuff(ch1, ch2)
go dostuff(ch2, ch1)
wg.Wait()
}
  • dostuff() argument order is swapped
  • Each goroutine blocked on channel read
Deadlock Detection
  • Golang runtime automatically detects when all goroutines are deadlocked

  • Cannot detect when a subset of goroutines are deadlocked

4.8 Dining Philosophers

Dining Philosophers Problem
  • Classical problem involvingconcurrency and synchronization

  • Problem:

    • 5 philosophers sitting at a round table
    • 1 chopstick is placed between each adjacent pair
    • Want to eat rice from their plate, but needs two chopsticks
    • Only one philosopher can hold a chopstick at a time
    • Not enough chopsticks for everyone to eat at once

  • Each chopstick is a mutex

  • Each philosopher is associated with a goroutine and two chopsticks

  • Chopsticks and Philosophers

    • 1
      2
      3
      4
      5
      6
      7
      type Chops struct {
      sync.Mutex
      }

      type Philo struct {
      leftCS, rightCS *Chops
      }
  • Philosopher Eat Method

    • 1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      func(p Philo) eat() {
      for {
      p.leftCS.Lock()
      p.rightCS.Lock()

      fmt.Println("eating")

      p.rightCS.Unlock()
      p.leftCS.Unlock()
      }
      }
  • Initialization in Main

    • 1
      2
      3
      4
      5
      6
      7
      8
      CSticks := make([]*Chops, 5)
      for i := 0; i < 5; i++ {
      CSticks[i] = new(ChopS)
      }
      philos := make([]*Philo, 5)
      for i := 0; i < 5; i++ {
      philos[i] = &Philo{Csticks[i], Csticks[(i + 1) % 5]}
      }
    • Initialize chopsticks and philosophers

    • Notice (i + 1) % 5]

  • Start the Dining in Main

    • 1
      2
      3
      for i := 0; i < 5; i++ {
      go philos[i].eat()
      }
    • Start each philosopher eating

    • Would also need to wait in the main

  • Deadlock Problem

    • 1
      2
      3
      4
      5
      p.leftCS.Lock()
      p.rightCS.Lock()
      fmt.Println("eating")
      p.leftCS.Unlock()
      p.rightCS.Unlock()
    • All philosophers might lock their left chopsticks concurrently

    • All chopsticks would be locked

    • Noone can lock their right chopsticks

  • Deadlock Solution

    • Each philosopher picks up lowest numbered chopstick first

    • 1
      2
      3
      4
      5
      6
      7
      for i := 0; i < 5; i++ {
      if i < (i + 1) % 5 {
      philos[i] = &Philo{Csticks[i], Csticks[(i + 1) % 5]}
      } else {
      philos[i] = &Philo{Csticks[(i + 1) % 5], Csticks[i]}
      }
      }
    • Philosopher 4 picks up chopstick 0 before chopstick 4

    • Philosopher 4 blocks allowing philosopher 3 to eat

    • No deadlock, but philosopher 4 may starve