blob: 5bcd15d111f278eaf5c5302eadf5748d503cbe3a [file] [log] [blame]
More Research Problems of Implementing Go
Dmitry Vyukov
Google
https://go.dev/
* About Go
Go is an open source programming language that makes it easy to build simple, reliable, and efficient software.
Design began in late 2007.
- Robert Griesemer, Rob Pike, Ken Thompson
- Russ Cox, Ian Lance Taylor
Became open source in November 2009.
Developed entirely in the open; very active community.
Language stable as of Go 1, early 2012.
Work continues.
* Motivation for Go
.image research2/datacenter.jpg
* Motivation for Go
Started as an answer to software problems at Google:
- multicore processors
- networked systems
- massive computation clusters
- scale: 10⁷⁺ lines of code
- scale: 10³⁺ programmers
- scale: 10⁶⁺ machines (design point)
Deployed: parts of YouTube, dl.google.com, Blogger, Google Code, Google Fiber, ...
* Go
A simple but powerful and fun language.
- start with C, remove complex parts
- add interfaces, concurrency
- also: garbage collection, closures, reflection, strings, ...
For more background on design:
- [[http://commandcenter.blogspot.com/2012/06/less-is-exponentially-more.html][Less is exponentially more]]
- [[/talks/2012/splash.article][Go at Google: Language Design in the Service of Software Engineering]]
* Research and Go
Go is designed for building production systems at Google.
- Goal: make that job easier, faster, better.
- Non-goal: break new ground in programming language research
Plenty of research questions about how to implement Go well.
- Concurrency
- Scheduling
- Garbage collection
- Race and deadlock detection
- Testing of the implementation
* Concurrency
.image research2/busy.jpg
* Concurrency
Go provides two important concepts:
A goroutine is a thread of control within the program, with its own local variables and stack. Cheap, easy to create.
A channel carries typed messages between goroutines.
* Concurrency
.play research2/hello.go
* Concurrency: CSP
Channels adopted from Hoare's Communicating Sequential Processes.
- Orthogonal to rest of language
- Can keep familiar model for computation
- Focus on _composition_ of regular code
Go _enables_ simple, safe concurrent programming.
It doesn't _forbid_ bad programming.
Caveat: not purely memory safe; sharing is legal.
Passing a pointer over a channel is idiomatic.
Experience shows this is practical.
* Concurrency
Sequential network address resolution, given a work list:
.play research2/addr1.go /lookup/+1,/^}/-1
* Concurrency
Concurrent network address resolution, given a work list:
.play research2/addr2.go /lookup/+1,/^}/-1
* Concurrency
Select statements: switch for communication.
.play research2/select.go /select/,/^}/-1
That's select that makes efficient implementation difficult.
* Implementing Concurrency
Challenge: Make channel communication scale
- start with one global channel lock
- per-channel locks, locked in address order for multi-channel operations
Research question: lock-free channels?
* Scheduling
.image research2/gophercomplex6.jpg
* Scheduling
On the one hand we have arbitrary user programs:
- fine-grained goroutines, coarse-grained goroutines or a mix of both
- computational goroutines, IO-bound goroutines or a mix of both
- arbitrary dynamic communication patterns
- busy, idle, bursty programs
No user hints!
* Scheduling
On the other hand we have complex hardware topology:
- per-core caches
- caches shared between cores
- cores shared between hyper threads (HT)
- multiple processors with non-uniform memory access (NUMA)
* Scheduling
Challenge: make it all magically work efficiently
- start with one global lock for all scheduler state
- distributed work-stealing scheduler with per-"processor" state
- integrated network poller into scheduler
- lock-free work queues
* Scheduling
Current scheduler:
┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐
│ │ │ │ │ │ │ │ │ │
├─┤ ├─┤ ├─┤ ├─┤ ├─┤ Global
│ │ │G│ │ │ │ │ │ │ state
├─┤ ├─┤ ├─┤ ├─┤ ├─┤
│G│ │G│ │G│ │ │ │G│
├─┤ ├─┤ ├─┤ ├─┤ ├─┤
│G│ │G│ │G│ │G│ │G│
└┬┘ └┬┘ └┬┘ └┬┘ └─┘
│ │ │ │
↓ ↓ ↓ ↓
┌─┬──────┐ ┌─┬──────┐ ┌─┬──────┐ ┌─┬──────┐ ┌────┐┌──────┐┌───────┐
│P│mcache│ │P│mcache│ │P│mcache│ │P│mcache│ │heap││timers││netpoll│
└┬┴──────┘ └┬┴──────┘ └┬┴──────┘ └┬┴──────┘ └────┘└──────┘└───────┘
│ │ │ │
↓ ↓ ↓ ↓
┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐
│M│ │M│ │M│ │M│ │M│ │M│ │M│
└─┘ └─┘ └─┘ └─┘ └─┘ └─┘ └─┘
G - goroutine; P - logical processor; M - OS thread (machine)
* Scheduling
Want:
- temporal locality to exploit caches
- spatial locality to exploit NUMA
- schedule mostly LIFO but ensure weak fairness
- allocate local memory and stacks
- scan local memory in GC
- collocate communicating goroutines
- distribute non-communicating goroutines
- distribute timers and network poller
- poll network on the same core where last read was issued
* Garbage Collection
* Garbage Collection
Garbage collection simplifies APIs.
- In C and C++, too much API design (and too much programming effort!) is about memory management.
Fundamental to concurrency: too hard to track ownership otherwise.
Fundamental to interfaces: memory management details do not bifurcate otherwise-similar APIs.
Of course, adds cost, latency, complexity in run time system.
* Garbage Collection
Plenty of research about garbage collection, mostly in Java context.
- Parallel stop-the-world
- CMS: concurrent mark-and-sweep, stop-the-world compaction
- G1: region-based incremental copying collector
Java collectors usually:
- are generational/incremental because allocation rate is high
- compact memory to support generations
- have pauses because concurrent compaction is tricky and slow
* Garbage Collection
But Go is very different!
- User can avoid lots of allocations by embedding objects:
type Point struct {
X, Y int
}
type Rectangle struct {
Min, Max Point
}
- Less pointers.
- Lots of stack allocations.
- Interior pointers are allowed:
p := &rect.Max
- Hundreds of thousands of stacks (goroutines)
- No object headers so far
* Implementing Garbage Collection
Current GC: stop the world, parallel mark, start the world, concurrent sweep.
Concurrent mark is almost ready.
Cannot reuse Java GC algorithms directly.
Research question: what GC algorithm is the best fit for Go?
Do we need generations? Do we need compaction? What are efficient data structures that support interior pointers?
* Race and deadlock detection
.image research2/race.png 160 600
* Race detection
Based on ThreadSanitizer runtime, originally mainly targeted C/C++.
Traditional happens-before race detector based on vector clocks (devil in details!).
Works fine for Go, except:
$ go run -race lots_of_goroutines.go
race: limit on 8192 simultaneously alive goroutines is exceeded, dying
Research question: race detector that efficiently supports hundreds of thousands of goroutines?
* Deadlock detection
Deadlock on mutexes due to lock order inversion:
// thread 1 // thread 2
pthread_mutex_lock(&m1); pthread_mutex_lock(&m2);
pthread_mutex_lock(&m2); pthread_mutex_lock(&m1);
... ...
pthread_mutex_unlock(&m2); pthread_mutex_unlock(&m1);
pthread_mutex_unlock(&m1); pthread_mutex_unlock(&m2);
Lock order inversions are easy to detect:
- build "M1 is locked under M2" relation.
- if it becomes cyclic, there is a potential deadlock.
- whenever a new edge is added to the graph, do DFS to find cycles.
* Deadlock detection
Go has channels and mutexes. Channels are semaphores. A mutex can be unlocked in
a different goroutine, so it is essentially a binary semaphore too.
Deadlock example:
// Parallel file tree walk.
func worker(pendingItems chan os.FileInfo)
for f := range pendingItems {
if f.IsDir() {
filepath.Walk(f.Name(), func(path string, info os.FileInfo, err error) error {
pendingItems <- info
})
} else {
visit(f)
}
}
}
pendingItems channel has limited capacity. All workers can block on send to pendingItems.
* Deadlock detection
Another deadlock example:
var (
c = make(chan T, 100)
mtx sync.RWMutex
)
// goroutine 1 // goroutine 2 // goroutine 3
// does send // does receive // "resizes" the channel
mtx.RLock() mtx.RLock() mtx.Lock()
c <- v v := <-c tmp := make(chan T, 200)
mtx.RUnlock() mtx.RUnlock() copyAll(c, tmp)
c = tmp
mtx.Unlock()
RWMutex is fair for both readers and writers: when a writer arrives, new readers are not let to enter the critical section.
Goroutine 1 blocks on chan send; then goroutine 3 blocks on mtx.Lock; then goroutine 2 blocks on mtx.RLock.
* Deadlock detection
Research question: how to detect deadlocks on semaphores?
No known theory to date.
* Testing of the implementation
.image research2/gopherswrench.jpg 240 405
* Testing of the implementation
So now we have a new language with several complex implementations:
- lexer
- parser
- transformation and optimization passes
- code generation
- linker
- channel and map operations
- garbage collector
- ...
*How*do*you*test*it?*
* Testing of the implementation
Csmith is a tool that generates random C programs that statically and dynamically conform to the C99 standard.
.image research2/csmith.png
* Testing of the implementation
Gosmith is a tool that generates random Go programs that statically and dynamically conform to the Go standard.
Turned out to be much simpler than C: no undefined behavior all around!
- no uninitialized variables
- no concurrent mutations between sequence points (x[i++] = --i)
- no UB during signed overflow
- total 191 kinds of undefined behavior and 52 kinds of unspecified behavior in C
* Testing of the implementation
But generates uninteresting programs from execution point of view: most of them deadlock or crash on nil deref.
Trophies:
- 31 bugs in gc compiler
- 18 bugs in gccgo compiler
- 5 bugs in llgo compiler
- 1 bug in gofmt
- 3 bugs in the spec
.image research2/emoji.png
* Testing of the implementation
Research question: how to generate random *interesting*concurrent* Go programs?
Must:
- create and wait for goroutines
- communicate over channels
- protect data with mutexes (reader-writer)
- pass data ownership between goroutines (explicitly and implicitly)
Must not:
- deadlock
- cause data races
- have non-deterministic results
* Research and Go
Plenty of research questions about how to implement Go well.
- Concurrency
- Scheduling
- Garbage collection
- Race and deadlock detection
- Testing of the implementation
- [Polymorphism]
- [Program translation]