Concurrency in Go
1. Why Use Concurrency?
1.1 Parallel Execution
- Two programs execute in parallel if they execute at exactly the same time
- At time t, an instruction is being performed for both P1 and P2
- Need replicated hardware
- Why Use Parallel Execution?
- Tasks may complete more quickly
- Some tasks must be performed sequentially
- But some tasks are parallelizable and some are not
1.2 Von Neumann Bottleneck
Speedup without Parallelism
- Can we achieve speedup without Parallelism?
- Design faster processors
- Get speedup without changing software
- Design processor with more memory
- Reduces the Von Neumann Bottleneck
- Cache access time = 1 clock cycle
- Main memory access time = ~100 clock cycles
- Increasing on-chip cache improves performance
Moore’s Law
- Predicted that transistor density would double every two years
- Smaller transistors switch faster
- Not a physical law, just an observation
- Exponential increase in density would lead to exponential increase in speed
1.3 Power wall
Power/Temperature Problem
- Transistors consume power when they switch
- Increasing transistor density leads to increased power consumption
- Smaller transistors use less power, but density scaling is much faster
- High power leads to high temperature
- Air cooling(fans) can only remove so much heat
Dynamic Power
- P = α * CFV2
- α is percent of time switching
- C is capacitance(related to size)
- F is the clock frequency
- V is voltage swing(from low to high)
- Voltage is important
- 0 to 5V uses much more power than 0 to 1.3V
Dennard Scaling
Voltage should scale with transistor size
Keeps power consumption and temperature low
- Problem: Voltage can’t go too low
- Must stay above threshold voltage
- Noise problems occur
- Problem: Doesn’t consider leakage power
- Dennard scaling must stop
Multi-Core Systems
- P = α * CFV2
- Cannot increase frequency
- Can still add processor cores, without increasing frequency
- Trend is apparent today
- Parallel execution is needed to exploit multi-core systems
- Code made to execute on multiple cores
- Different programs on different cores
1.4 Concurrent vs Parallel
Concurrent execution
- Concurrent execution is not necessarily the same as parallel execution
- Concurrent: start and end times overlap
- Parallel: execute at exactly the same time
- Parallel tasks must be executed on different hardware
- Concurrent tasks may be executed on the same hardware
- Only one task actually executed at a time
- Mapping from tasks to hardware is not directly controlled by the programmer
- At least not in go
1.5 Concurrent Programming
- Programmer determines which tasks can be executed in parallel
- Mapping tasks to hardware
- Operating system
- Go runtime schedule
Benefit 1: Hiding Latency
- Concurrency improves performance, even without parallelism
- Tasks must periodically wait for something
- i.e. wait for memory
- X = Y + Z read Y, Z from memory
- May wait 100+ clock cycles
- Other concurrent tasks can operate while one task is waiting
Benefit 2: Hardware Mapping
Hardware Mapping in Go:
- Programmer does not determine the hardware mapping
- Programmer makes parallelism possible
- Hardware mapping depends on many factors
- Where is the data?
- What are the communication costs?
2. Concurrency Basics
2.1 Processes
- An instance of a running program
- Things unique to a process
- Memory
- Virtual address space
- Code, stack, heap, shared libraries
- Registers
- Program counter, data regs, stack ptr, …
- Memory
Operating Systems
- Allows many processes to execute concurrently
- Processes are switched quickly (20ms)
- User has the impression of parallelism
- Operating system must give processes fair access to resources
2.2 Scheduling
Scheduling Processes
- Operating system schedules processes for execution
- Gives the illusion of parallel execution
- OS gives fair access to CPU, memory, etc.
Context Switch
Control flow changes from one process to another
Process “context” must be swapped
2.3 Threads and Goroutines
Threads vs Processes
- Threads share some context
- Many threads can exist in one process
- OS schedules threads rather than proccesses
Goroutines
- Like a thread in Go
- Many Goroutines execute within a single OS thread
Go Runtime Scheduler
- Schedules goroutines inside an OS thread
- Like a little OS inside a single OS thread
- Logical processor is mapped to a thread
2.4 Interleavings
- Order of executive within a task is known
- Order of executive between concurrent tasks is unknown
Interleaving of instructions between tasks is unknown
Many interleavings are possible
- Must consider all possibilities
- Ordering is non-deterministic
2.5 Race Conditions
- Outcome depends on non-deterministic ordering
- Races occur due to communication
2.6 Communication Between Tasks
- Threads are largely independent but not completely independent
- Web server, one thread per client
- Image processing, 1 thread per pixel block
3. Threads in Go
3.1 Goroutines
Creating a Goroutine
- One goroutine is created automatically to execute the main()
- Other goroutines are created using the go keyword
Exiting a Goroutine
- A goroutine exits when its code is complete
- When the main goroutine is complete, all other goroutines exit
A goroutine may not complete its execution because main completes early
Adding a delay to wait for a goroutine is bad, because timing assumptions may be wrong and timing is nondeterministic
- Need formal synchronization constructs
3.2 Basic Synchronization
Using global events whose execution is viewed by all threads, simultaneously
Example:
- Global event is viewed by all tasks at the same time
- Print must occur after update of x
- Synchronization is used to restrict bad interleavings
- In this case, synchronization reduces performance and efficiency. It’s bad, but it’s necessary here.
3.3 Wait Groups
Sync WaitGroup
- Sync package contains functions to synchronize between goroutines
sync.WaitGroup forces a goroutine to wait for other goroutines
Contains an internal counter
- Increment counter for each goroutine to wait for
- Decrement counter when each goroutine completes
- Waiting goroutine cannot continue until counter is 0
Using WaitGroup
Add()
increments the counterDone()
decrements the counterWait()
blocks until counter == 0
WaitGroup Example
1 | func foo(wg *sync.WaitGroup) { |
3.4 Communication
Goroutine Communication
- Goroutines usually work together to perform a bigger task
- Often need to send data to collaborate
- Example: Find the product of 4 integers
- Make 2 goroutines, each multiplies a pair
- Main goroutine multiplies the 2 results
- Need to send ints from main routine to the two sub-routines
- Need to send results from sub-routines back to main routine
Channels
- Transfer data between goroutines
- Channels are typed
- Use
make()
to create a channel
1 | c := make(chan int) |
- Send and receive data using the
<-
operator - Send data on a channel
1 | c <- 3 |
- Receive data from a channel
1 | x := <- c |
- Channel example:
1 | func prod(v1 int, v2 int, c chan int) { |
3.5 Blocking in Channels
Unbuffered Channel
- Unbuffered channels cannot hold data in transit
- Default is unbuffered
- Sending blocks until data is received
- Receiving blocks until data is sent
Blocking and Synchronization
- Channel communication is synchronous
- Blocking is the same as waiting for communication
- Receiving and ignoring the result is the same as a
Wait()
3.6 Buffered Channels
Channel Capacity
- Channels can contain a limited number of objects
- Default size 0(unbuffered)
- Capacity is the number of objects it can hold in transit
- Optional argument to
make()
defines channel capacity
1 | c := make(chan int, 3) |
- Sending only blocks if buffer is full
- Receiving only blocks if buffer is empty
Channel Blocking, Receive
- Channel with capacity 1
- First receive blocks until send occurs
- Second receive blocks forever
Channel Blocking, Send
- Second send blocks until receive is done
- Receive can block until first send is done
Use of Buffering
- Sender and receiver do not need to operate at exactly the same speed
- Speed mismatch is acceptable
4. Synchronized Communication
4.1 Blocking on Channels
Iterating Through a Channel
- Common to iteratively read from a channel
1 | for i := range c { |
- Continues to read from channel c
- One iteration each time a new value is received
- i is assigned to the read value
- Iterates when sender calls
close(c)
Receiving from Multiple Goroutines
- Multiple channels may be used to receive from multiple sources
- Data from both sources may be needed
- Read sequentially
1 | a := <- c1 |
Select Statement
- May have a choice of which data to use
- i.e. First-come first-served
- Use the
select
statement to wait on the first data from a set of channels
1 | select { |
4.2 Select
Select Send or Receive
- May select either send or receive operations
1 | select { |
Select with an Abort channel
- Use select with a separate abort channel
- May want to receive data until an abort signal is received
1 | for { |
Default Select
- May want a default operation to avoid blocking
1 | select { |
4.3 Mutual Exclution
Goroutines Sharing Variables
- Sharing variables concurrently can cause problems
Two goroutines writing to a shared variable can interfere with each other
Function can be invoked concurrently without interfering with other goroutines(Concurrency-Safe)
Variable Sharing Example
1 | var i int = 0 |
- Two goroutines write to i
- i should equal 2, but this doesn’t always happen. Because there are some possible interleavings
Granularity of Concurrency
- Concurrency is at the machine code level
- i = i + 1 might be three machine instructions
Interleaving machine instructions causes unexpected problems
Interleaving machine instructions
Both tasks read 0 for 1 value
4.4 Mutex
Correct Sharing
- Don‘t let 2 goroutines write to a shared variable at the same time!
- Need to restrict possible interleavings
- Access to shared variables cannot be interleaved
- Code segements in different goroutines which cannot execute concurrently(Mutual Exclusion)
- Writing to shared variables should be mutually exclusive
Sync.Mutex
- A Mutex ensures mutual exclusion
- Uses a binary semaphore
- Flag up - shared variable is in use
- Flag down - shared variable is available
4.5 Mutex Methods
#####Sync.Mutex Methods
Lock()
method puts the flag up- Shared variable in use
If lock is already taken by a goroutine,
Lock()
blocks until the flag is put downUnlock()
method puts the flag down- Done using shared variable
- When
Unlock()
is called, a blockedLock()
can proceed
Using Sync.Mutex
- Increment operation is now mutually exclutive
1 | var i int = 0 |
4.6 Once Synchronization
Synchronous Initialization
- Initialization
- must happen once
- must happen before everything else
- How do you perform initialization with multiple goroutines?
- Could perform initialization before starting the goroutines?
Sync.Once
- Has one method,
once.Do(f)
- Function if is executed only one time
- Even if it is called in multiple goroutines
- All calls to
once.Do()
block until the first returns- Ensures that initialization executes first
#####Sync.Once Example
- Make two goroutines, initialization only once
- Each goroutine executes
dostuff()
1 | var wg sync.WaitGroup |
Using Sync.Once
setup()
should execute only once“hello” should not print until
setup()
returns1
2
3
4
5
6
7
8
9var on sync.Once
func setup() {
fmt.Println("Init")
}
func dostuff() {
on.Do(setup)
fmt.Println("hello")
wg.Done()
}
Execution Result
1
2
3Init //Result of setup()
hello //Result of one goroutine
hello //Result of the other goroutineInit
appears only onceInit
appears beforehello
is printed
4.7 Deadlock
Synchronization Dependencies
- Synchronization causes the execution of different goroutines to depend on each other
- G2 cannot continue until G1 does something
Deadlock
- Circular dependencies cause all involved goroutines to block
- G1 waits for G2
- G2 waits for G1
- Can be caused by waiting on channels
Deadlock Example
1 | func dostuff(c1 chan int, c2 chan int) { |
- Read from first channel
- Wait for write onto first channel
- Write to second channel
- Wait for read from second channel
1 | func main() { |
dostuff()
argument order is swapped- Each goroutine blocked on channel read
Deadlock Detection
- Golang runtime automatically detects when all goroutines are deadlocked
- Cannot detect when a subset of goroutines are deadlocked
4.8 Dining Philosophers
Dining Philosophers Problem
Classical problem involvingconcurrency and synchronization
Problem:
- 5 philosophers sitting at a round table
- 1 chopstick is placed between each adjacent pair
- Want to eat rice from their plate, but needs two chopsticks
- Only one philosopher can hold a chopstick at a time
- Not enough chopsticks for everyone to eat at once
Each chopstick is a mutex
Each philosopher is associated with a goroutine and two chopsticks
Chopsticks and Philosophers
1
2
3
4
5
6
7type Chops struct {
sync.Mutex
}
type Philo struct {
leftCS, rightCS *Chops
}
Philosopher Eat Method
1
2
3
4
5
6
7
8
9
10
11func(p Philo) eat() {
for {
p.leftCS.Lock()
p.rightCS.Lock()
fmt.Println("eating")
p.rightCS.Unlock()
p.leftCS.Unlock()
}
}
Initialization in Main
1
2
3
4
5
6
7
8CSticks := make([]*Chops, 5)
for i := 0; i < 5; i++ {
CSticks[i] = new(ChopS)
}
philos := make([]*Philo, 5)
for i := 0; i < 5; i++ {
philos[i] = &Philo{Csticks[i], Csticks[(i + 1) % 5]}
}Initialize chopsticks and philosophers
Notice
(i + 1) % 5]
Start the Dining in Main
1
2
3for i := 0; i < 5; i++ {
go philos[i].eat()
}Start each philosopher eating
Would also need to wait in the main
Deadlock Problem
1
2
3
4
5p.leftCS.Lock()
p.rightCS.Lock()
fmt.Println("eating")
p.leftCS.Unlock()
p.rightCS.Unlock()All philosophers might lock their left chopsticks concurrently
All chopsticks would be locked
Noone can lock their right chopsticks
Deadlock Solution
Each philosopher picks up lowest numbered chopstick first
1
2
3
4
5
6
7for i := 0; i < 5; i++ {
if i < (i + 1) % 5 {
philos[i] = &Philo{Csticks[i], Csticks[(i + 1) % 5]}
} else {
philos[i] = &Philo{Csticks[(i + 1) % 5], Csticks[i]}
}
}Philosopher 4 picks up chopstick 0 before chopstick 4
Philosopher 4 blocks allowing philosopher 3 to eat
No deadlock, but philosopher 4 may starve