talks: add "More Research Problems of Implementing Go" talk
Change-Id: I5a741d7cf7341fb243480d297506581f978aa377
Reviewed-on: https://go-review.googlesource.com/1382
Reviewed-by: Russ Cox <rsc@golang.org>
diff --git a/2014/research2.slide b/2014/research2.slide
new file mode 100644
index 0000000..b947170
--- /dev/null
+++ b/2014/research2.slide
@@ -0,0 +1,440 @@
+More Research Problems of Implementing Go
+
+Dmitry Vyukov
+Google
+
+http://golang.org/
+
+* About Go
+
+Go is an open source programming language that makes it easy to build simple, reliable, and efficient software.
+
+Design began in late 2007.
+
+- Robert Griesemer, Rob Pike, Ken Thompson
+- Russ Cox, Ian Lance Taylor
+
+Became open source in November 2009.
+
+Developed entirely in the open; very active community.
+Language stable as of Go 1, early 2012.
+Work continues.
+
+* Motivation for Go
+
+.image research2/datacenter.jpg
+
+* Motivation for Go
+
+Started as an answer to software problems at Google:
+
+- multicore processors
+- networked systems
+- massive computation clusters
+- scale: 10⁷⁺ lines of code
+- scale: 10³⁺ programmers
+- scale: 10⁶⁺ machines (design point)
+
+Deployed: parts of YouTube, dl.google.com, Blogger, Google Code, Google Fiber, ...
+
+* Go
+
+A simple but powerful and fun language.
+
+- start with C, remove complex parts
+- add interfaces, concurrency
+- also: garbage collection, closures, reflection, strings, ...
+
+For more background on design:
+
+- [[http://commandcenter.blogspot.com/2012/06/less-is-exponentially-more.html][Less is exponentially more]]
+- [[http://talks.golang.org/2012/splash.article][Go at Google: Language Design in the Service of Software Engineering]]
+
+* Research and Go
+
+Go is designed for building production systems at Google.
+
+- Goal: make that job easier, faster, better.
+- Non-goal: break new ground in programming language research
+
+Plenty of research questions about how to implement Go well.
+
+- Concurrency
+- Scheduling
+- Garbage collection
+- Race and deadlock detection
+- Testing of the implementation
+
+
+
+
+* Concurrency
+
+.image research2/busy.jpg
+
+* Concurrency
+
+Go provides two important concepts:
+
+A goroutine is a thread of control within the program, with its own local variables and stack. Cheap, easy to create.
+
+A channel carries typed messages between goroutines.
+
+* Concurrency
+
+.play research2/hello.go
+
+* Concurrency: CSP
+
+Channels adopted from Hoare's Communicating Sequential Processes.
+
+- Orthogonal to rest of language
+- Can keep familiar model for computation
+- Focus on _composition_ of regular code
+
+Go _enables_ simple, safe concurrent programming.
+It doesn't _forbid_ bad programming.
+
+Caveat: not purely memory safe; sharing is legal.
+Passing a pointer over a channel is idiomatic.
+
+Experience shows this is practical.
+
+* Concurrency
+
+Sequential network address resolution, given a work list:
+
+.play research2/addr1.go /lookup/+1,/^}/-1
+
+* Concurrency
+
+Concurrent network address resolution, given a work list:
+
+.play research2/addr2.go /lookup/+1,/^}/-1
+
+* Concurrency
+
+Select statements: switch for communication.
+
+.play research2/select.go /select/,/^}/-1
+
+That's select that makes efficient implementation difficult.
+
+* Implementing Concurrency
+
+Challenge: Make channel communication scale
+
+- start with one global channel lock
+- per-channel locks, locked in address order for multi-channel operations
+
+Research question: lock-free channels?
+
+
+
+
+* Scheduling
+
+.image research2/gophercomplex6.jpg
+
+* Scheduling
+
+On the one hand we have arbitrary user programs:
+
+- fine-grained goroutines, coarse-grained goroutines or a mix of both
+- computational goroutines, IO-bound goroutines or a mix of both
+- arbitrary dynamic communication patterns
+- busy, idle, bursty programs
+
+No user hints!
+
+* Scheduling
+
+On the other hand we have complex hardware topology:
+
+- per-core caches
+- caches shared between cores
+- cores shared between hyper threads (HT)
+- multiple processors with non-uniform memory access (NUMA)
+
+* Scheduling
+
+Challenge: make it all magically work efficiently
+
+- start with one global lock for all scheduler state
+- distributed work-stealing scheduler with per-"processor" state
+- integrated network poller into scheduler
+- lock-free work queues
+
+* Scheduling
+
+Current scheduler:
+
+ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐
+ │ │ │ │ │ │ │ │ │ │
+ ├─┤ ├─┤ ├─┤ ├─┤ ├─┤ Global
+ │ │ │G│ │ │ │ │ │ │ state
+ ├─┤ ├─┤ ├─┤ ├─┤ ├─┤
+ │G│ │G│ │G│ │ │ │G│
+ ├─┤ ├─┤ ├─┤ ├─┤ ├─┤
+ │G│ │G│ │G│ │G│ │G│
+ └┬┘ └┬┘ └┬┘ └┬┘ └─┘
+ │ │ │ │
+ ↓ ↓ ↓ ↓
+ ┌─┬──────┐ ┌─┬──────┐ ┌─┬──────┐ ┌─┬──────┐ ┌────┐┌──────┐┌───────┐
+ │P│mcache│ │P│mcache│ │P│mcache│ │P│mcache│ │heap││timers││netpoll│
+ └┬┴──────┘ └┬┴──────┘ └┬┴──────┘ └┬┴──────┘ └────┘└──────┘└───────┘
+ │ │ │ │
+ ↓ ↓ ↓ ↓
+ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐
+ │M│ │M│ │M│ │M│ │M│ │M│ │M│
+ └─┘ └─┘ └─┘ └─┘ └─┘ └─┘ └─┘
+
+G - goroutine; P - logical processor; M - OS thread (machine)
+
+* Scheduling
+
+Want:
+
+- temporal locality to exploit caches
+- spatial locality to exploit NUMA
+- schedule mostly LIFO but ensure weak fairness
+- allocate local memory and stacks
+- scan local memory in GC
+- collocate communicating goroutines
+- distribute non-communicating goroutines
+- distribute timers and network poller
+- poll network on the same core where last read was issued
+
+
+
+
+
+* Garbage Collection
+
+* Garbage Collection
+
+Garbage collection simplifies APIs.
+
+- In C and C++, too much API design (and too much programming effort!) is about memory management.
+
+Fundamental to concurrency: too hard to track ownership otherwise.
+
+Fundamental to interfaces: memory management details do not bifurcate otherwise-similar APIs.
+
+Of course, adds cost, latency, complexity in run time system.
+
+* Garbage Collection
+
+Plenty of research about garbage collection, mostly in Java context.
+
+- Parallel stop-the-world
+- CMS: concurrent mark-and-sweep, stop-the-world compaction
+- G1: region-based incremental copying collector
+
+Java collectors usually:
+
+- are generational/incremental because allocation rate is high
+- compact memory to support generations
+- have pauses because concurrent compaction is tricky and slow
+
+* Garbage Collection
+
+But Go is very different!
+
+- User can avoid lots of allocations by embedding objects:
+
+ type Point struct {
+ X, Y int
+ }
+ type Rectangle struct {
+ Min, Max Point
+ }
+
+- Less pointers.
+- Lots of stack allocations.
+- Interior pointers are allowed:
+
+ p := &rect.Max
+
+- Hundreds of thousands of stacks (goroutines)
+- No object headers so far
+
+* Implementing Garbage Collection
+
+Current GC: stop the world, parallel mark, start the world, concurrent sweep.
+Concurrent mark is almost ready.
+
+Cannot reuse Java GC algorithms directly.
+
+Research question: what GC algorithm is the best fit for Go?
+Do we need generations? Do we need compaction? What are efficient data structures that support interior pointers?
+
+
+
+
+* Race and deadlock detection
+
+.image research2/race.png 160 600
+
+* Race detection
+
+Based on ThreadSanitizer runtime, originally mainly targeted C/C++.
+Traditional happens-before race detector based on vector clocks (devil in details!).
+Works fine for Go, except:
+
+ $ go run -race lots_of_goroutines.go
+ race: limit on 8192 simultaneously alive goroutines is exceeded, dying
+
+Research question: race dectector that efficiently supports hundreds of thousands of goroutines?
+
+* Deadlock detection
+
+Deadlock on mutexes due to lock order inversion:
+
+ // thread 1 // thread 2
+ pthread_mutex_lock(&m1); pthread_mutex_lock(&m2);
+ pthread_mutex_lock(&m2); pthread_mutex_lock(&m1);
+ ... ...
+ pthread_mutex_unlock(&m2); pthread_mutex_unlock(&m1);
+ pthread_mutex_unlock(&m1); pthread_mutex_unlock(&m2);
+
+Lock order inversions are easy to detect:
+
+- build "M1 is locked under M2" relation.
+- if it becomes cyclic, there is a potential deadlock.
+- whenever a new edge is added to the graph, do DFS to find cycles.
+
+* Deadlock detection
+
+Go has channels and mutexes. Channels are semaphores. A mutex can be unlocked in
+a different goroutine, so it is essentially a binary semaphore too.
+
+Deadlock example:
+
+ // Parallel file tree walk.
+ func worker(pendingItems chan os.FileInfo)
+ for f := range pendingItems {
+ if f.IsDir() {
+ filepath.Walk(f.Name(), func(path string, info os.FileInfo, err error) error {
+ pendingItems <- info
+ })
+ } else {
+ visit(f)
+ }
+ }
+ }
+
+pendingItems channel has limited capacity. All workers can block on send to pendingItems.
+
+* Deadlock detection
+
+Another deadlock example:
+
+ var (
+ c = make(chan T, 100)
+ mtx sync.RWMutex
+ )
+
+ // goroutine 1 // goroutine 2 // goroutine 3
+ // does send // does receive // "resizes" the channel
+ mtx.RLock() mtx.RLock() mtx.Lock()
+ c <- v v := <-c tmp := make(chan T, 200)
+ mtx.RUnlock() mtx.RUnlock() copyAll(c, tmp)
+ c = tmp
+ mtx.Unlock()
+
+RWMutex is fair for both readers and writers: when a writer arrives, new readers are not let to enter the critical section.
+Goroutine 1 blocks on chan send; then goroutine 3 blocks on mtx.Lock; then goroutine 2 blocks on mtx.RLock.
+
+* Deadlock detection
+
+Research question: how to detect deadlocks on semaphores?
+
+No known theory to date.
+
+
+
+
+* Testing of the implementation
+
+.image research2/gopherswrench.jpg 240 405
+
+* Testing of the implementation
+
+So now we have a new language with several complex implementations:
+
+- lexer
+- parser
+- transformation and optimization passes
+- code generation
+- linker
+- channel and map operations
+- garbage collector
+- ...
+
+*How*do*you*test*it?*
+
+* Testing of the implementation
+
+Csmith is a tool that generates random C programs that statically and dynamically conform to the C99 standard.
+
+.image research2/csmith.png
+
+* Testing of the implementation
+
+Gosmith is a tool that generates random Go programs that statically and dynamically conform to the Go standard.
+
+Turned out to be much simpler than C: no undefined behavior all around!
+
+- no unitialized variables
+- no concurrent mutations between sequence points (x[i++] = --i)
+- no UB during signed overflow
+- total 191 kinds of undefined behavior and 52 kinds of unspecified behavior in C
+
+* Testing of the implementation
+
+But generates uninteresting programs from execution point of view: most of them deadlock or crash on nil deref.
+
+Trophies:
+
+- 31 bugs in gc compiler
+- 18 bugs in gccgo compiler
+- 5 bugs in llgo compiler
+- 1 bug in gofmt
+- 3 bugs in the spec
+
+.image research2/emoji.png
+
+* Testing of the implementation
+
+Research question: how to generate random *interesting*concurrent* Go programs?
+
+Must:
+
+- create and wait for goroutines
+- communicate over channels
+- protect data with mutexes (reader-writer)
+- pass data ownership between goroutines (explicitly and implicitly)
+
+Must not:
+
+- deadlock
+- cause data races
+- have non-deterministic results
+
+
+
+
+
+* Research and Go
+
+Plenty of research questions about how to implement Go well.
+
+- Concurrency
+- Scheduling
+- Garbage collection
+- Race and deadlock detection
+- Testing of the implementation
+- [Polymorphism]
+- [Program translation]
+
diff --git a/2014/research2/addr1.go b/2014/research2/addr1.go
new file mode 100644
index 0000000..7ff3986
--- /dev/null
+++ b/2014/research2/addr1.go
@@ -0,0 +1,83 @@
+// +build OMIT
+
+package main
+
+import (
+ "fmt"
+ "math/rand"
+ "time"
+)
+
+func lookup() {
+ for _, w := range worklist {
+ w.addrs, w.err = LookupHost(w.host)
+ }
+}
+
+func main() {
+ rand.Seed(time.Now().UnixNano())
+
+ t0 := time.Now()
+ lookup()
+
+ fmt.Printf("\n")
+ for _, w := range worklist {
+ if w.err != nil {
+ fmt.Printf("%s: error: %v\n", w.host, w.err)
+ continue
+ }
+ fmt.Printf("%s: %v\n", w.host, w.addrs)
+ }
+ fmt.Printf("total lookup time: %.3f seconds\n", time.Since(t0).Seconds())
+}
+
+var worklist = []*Work{
+ {host: "fast.com"},
+ {host: "slow.com"},
+ {host: "fast.missing.com"},
+ {host: "slow.missing.com"},
+}
+
+type Work struct {
+ host string
+ addrs []string
+ err error
+}
+
+func LookupHost(name string) (addrs []string, err error) {
+ t0 := time.Now()
+ defer func() {
+ fmt.Printf("lookup %s: %.3f seconds\n", name, time.Since(t0).Seconds())
+ }()
+ h := hosts[name]
+ if h == nil {
+ h = failure
+ }
+ return h(name)
+}
+
+type resolver func(string) ([]string, error)
+
+var hosts = map[string]resolver{
+ "fast.com": delay(10*time.Millisecond, fixedAddrs("10.0.0.1")),
+ "slow.com": delay(2*time.Second, fixedAddrs("10.0.0.4")),
+ "fast.missing.com": delay(10*time.Millisecond, failure),
+ "slow.missing.com": delay(2*time.Second, failure),
+}
+
+func fixedAddrs(addrs ...string) resolver {
+ return func(string) ([]string, error) {
+ return addrs, nil
+ }
+}
+
+func delay(d time.Duration, f resolver) resolver {
+ return func(name string) ([]string, error) {
+ time.Sleep(d/2 + time.Duration(rand.Int63n(int64(d/2))))
+ return f(name)
+ }
+}
+
+func failure(name string) ([]string, error) {
+ return nil, fmt.Errorf("unknown host %v", name)
+}
diff --git a/2014/research2/addr2.go b/2014/research2/addr2.go
new file mode 100644
index 0000000..4f9078f
--- /dev/null
+++ b/2014/research2/addr2.go
@@ -0,0 +1,92 @@
+// +build OMIT
+
+package main
+
+import (
+ "fmt"
+ "math/rand"
+ "time"
+)
+
+func lookup() {
+ done := make(chan bool)
+
+ for _, w := range worklist {
+ go func(w *Work) {
+ w.addrs, w.err = LookupHost(w.host)
+ done <- true
+ }(w)
+ }
+
+ for range worklist {
+ <-done
+ }
+}
+
+func main() {
+ rand.Seed(time.Now().UnixNano())
+
+ t0 := time.Now()
+ lookup()
+
+ fmt.Printf("\n")
+ for _, w := range worklist {
+ if w.err != nil {
+ fmt.Printf("%s: error: %v\n", w.host, w.err)
+ continue
+ }
+ fmt.Printf("%s: %v\n", w.host, w.addrs)
+ }
+ fmt.Printf("total lookup time: %.3f seconds\n", time.Since(t0).Seconds())
+}
+
+var worklist = []*Work{
+ {host: "fast.com"},
+ {host: "slow.com"},
+ {host: "fast.missing.com"},
+ {host: "slow.missing.com"},
+}
+
+type Work struct {
+ host string
+ addrs []string
+ err error
+}
+
+func LookupHost(name string) (addrs []string, err error) {
+ t0 := time.Now()
+ defer func() {
+ fmt.Printf("lookup %s: %.3f seconds\n", name, time.Since(t0).Seconds())
+ }()
+ h := hosts[name]
+ if h == nil {
+ h = failure
+ }
+ return h(name)
+}
+
+type resolver func(string) ([]string, error)
+
+var hosts = map[string]resolver{
+ "fast.com": delay(10*time.Millisecond, fixedAddrs("10.0.0.1")),
+ "slow.com": delay(2*time.Second, fixedAddrs("10.0.0.4")),
+ "fast.missing.com": delay(10*time.Millisecond, failure),
+ "slow.missing.com": delay(2*time.Second, failure),
+}
+
+func fixedAddrs(addrs ...string) resolver {
+ return func(string) ([]string, error) {
+ return addrs, nil
+ }
+}
+
+func delay(d time.Duration, f resolver) resolver {
+ return func(name string) ([]string, error) {
+ time.Sleep(d/2 + time.Duration(rand.Int63n(int64(d/2))))
+ return f(name)
+ }
+}
+
+func failure(name string) ([]string, error) {
+ return nil, fmt.Errorf("unknown host %v", name)
+}
diff --git a/2014/research2/busy.jpg b/2014/research2/busy.jpg
new file mode 100644
index 0000000..35ce9f4
--- /dev/null
+++ b/2014/research2/busy.jpg
Binary files differ
diff --git a/2014/research2/csmith.png b/2014/research2/csmith.png
new file mode 100644
index 0000000..6f2e3d7
--- /dev/null
+++ b/2014/research2/csmith.png
Binary files differ
diff --git a/2014/research2/datacenter.jpg b/2014/research2/datacenter.jpg
new file mode 100644
index 0000000..c2de0f3
--- /dev/null
+++ b/2014/research2/datacenter.jpg
Binary files differ
diff --git a/2014/research2/emoji.png b/2014/research2/emoji.png
new file mode 100644
index 0000000..aae97c0
--- /dev/null
+++ b/2014/research2/emoji.png
Binary files differ
diff --git a/2014/research2/gophercomplex6.jpg b/2014/research2/gophercomplex6.jpg
new file mode 100644
index 0000000..7963c1b
--- /dev/null
+++ b/2014/research2/gophercomplex6.jpg
Binary files differ
diff --git a/2014/research2/gopherswrench.jpg b/2014/research2/gopherswrench.jpg
new file mode 100644
index 0000000..690fdb0
--- /dev/null
+++ b/2014/research2/gopherswrench.jpg
Binary files differ
diff --git a/2014/research2/hello.go b/2014/research2/hello.go
new file mode 100644
index 0000000..6e52b38
--- /dev/null
+++ b/2014/research2/hello.go
@@ -0,0 +1,14 @@
+// +build OMIT
+
+package main
+
+import "fmt"
+
+func main() {
+ c := make(chan string)
+ go func() {
+ c <- "Hello"
+ c <- "World"
+ }()
+ fmt.Println(<-c, <-c)
+}
diff --git a/2014/research2/race.png b/2014/research2/race.png
new file mode 100644
index 0000000..8934a64
--- /dev/null
+++ b/2014/research2/race.png
Binary files differ
diff --git a/2014/research2/select.go b/2014/research2/select.go
new file mode 100644
index 0000000..ae1084b
--- /dev/null
+++ b/2014/research2/select.go
@@ -0,0 +1,23 @@
+package main
+
+import (
+ "fmt"
+ "time"
+)
+
+func main() {
+ c1 := make(chan int, 1)
+ c2 := make(chan int, 1)
+ c1 <- 42
+
+ select {
+ case v := <-c1:
+ fmt.Println("received from c1: ", v)
+ case c2 <- 1:
+ fmt.Println("sent to c2")
+ case <-time.After(time.Second):
+ fmt.Println("timed out")
+ default:
+ fmt.Println("nothing ready at the moment")
+ }
+}