talks: add "More Research Problems of Implementing Go" talk

Change-Id: I5a741d7cf7341fb243480d297506581f978aa377
Reviewed-on: https://go-review.googlesource.com/1382
Reviewed-by: Russ Cox <rsc@golang.org>
diff --git a/2014/research2.slide b/2014/research2.slide
new file mode 100644
index 0000000..b947170
--- /dev/null
+++ b/2014/research2.slide
@@ -0,0 +1,440 @@
+More Research Problems of Implementing Go
+
+Dmitry Vyukov
+Google
+
+http://golang.org/
+
+* About Go
+
+Go is an open source programming language that makes it easy to build simple, reliable, and efficient software.
+
+Design began in late 2007.
+
+- Robert Griesemer, Rob Pike, Ken Thompson
+- Russ Cox, Ian Lance Taylor
+
+Became open source in November 2009.
+
+Developed entirely in the open; very active community.
+Language stable as of Go 1, early 2012.
+Work continues.
+
+* Motivation for Go
+
+.image research2/datacenter.jpg
+
+* Motivation for Go
+
+Started as an answer to software problems at Google:
+
+- multicore processors
+- networked systems
+- massive computation clusters
+- scale: 10⁷⁺ lines of code
+- scale: 10³⁺ programmers
+- scale: 10⁶⁺ machines (design point)
+
+Deployed: parts of YouTube, dl.google.com, Blogger, Google Code, Google Fiber, ...
+
+* Go
+
+A simple but powerful and fun language.
+
+- start with C, remove complex parts
+- add interfaces, concurrency
+- also: garbage collection, closures, reflection, strings, ...
+
+For more background on design:
+
+- [[http://commandcenter.blogspot.com/2012/06/less-is-exponentially-more.html][Less is exponentially more]]
+- [[http://talks.golang.org/2012/splash.article][Go at Google: Language Design in the Service of Software Engineering]]
+
+* Research and Go
+
+Go is designed for building production systems at Google.
+
+- Goal: make that job easier, faster, better.
+- Non-goal: break new ground in programming language research
+
+Plenty of research questions about how to implement Go well.
+
+- Concurrency
+- Scheduling
+- Garbage collection
+- Race and deadlock detection
+- Testing of the implementation
+
+
+
+
+* Concurrency
+
+.image research2/busy.jpg
+
+* Concurrency
+
+Go provides two important concepts:
+
+A goroutine is a thread of control within the program, with its own local variables and stack. Cheap, easy to create.
+
+A channel carries typed messages between goroutines.
+
+* Concurrency
+
+.play research2/hello.go
+
+* Concurrency: CSP
+
+Channels adopted from Hoare's Communicating Sequential Processes.
+
+- Orthogonal to rest of language
+- Can keep familiar model for computation
+- Focus on _composition_ of regular code
+
+Go _enables_ simple, safe concurrent programming.
+It doesn't _forbid_ bad programming.
+
+Caveat: not purely memory safe; sharing is legal.
+Passing a pointer over a channel is idiomatic.
+
+Experience shows this is practical.
+
+* Concurrency
+
+Sequential network address resolution, given a work list:
+
+.play research2/addr1.go /lookup/+1,/^}/-1
+
+* Concurrency
+
+Concurrent network address resolution, given a work list:
+
+.play research2/addr2.go /lookup/+1,/^}/-1
+
+* Concurrency
+
+Select statements: switch for communication.
+
+.play research2/select.go /select/,/^}/-1
+
+That's select that makes efficient implementation difficult.
+
+* Implementing Concurrency
+
+Challenge: Make channel communication scale
+
+- start with one global channel lock
+- per-channel locks, locked in address order for multi-channel operations
+
+Research question: lock-free channels?
+
+
+
+
+* Scheduling
+
+.image research2/gophercomplex6.jpg
+
+* Scheduling
+
+On the one hand we have arbitrary user programs:
+
+- fine-grained goroutines, coarse-grained goroutines or a mix of both
+- computational goroutines, IO-bound goroutines or a mix of both
+- arbitrary dynamic communication patterns
+- busy, idle, bursty programs
+
+No user hints!
+
+* Scheduling
+
+On the other hand we have complex hardware topology:
+
+- per-core caches
+- caches shared between cores
+- cores shared between hyper threads (HT)
+- multiple processors with non-uniform memory access (NUMA)
+
+* Scheduling
+
+Challenge: make it all magically work efficiently
+
+- start with one global lock for all scheduler state
+- distributed work-stealing scheduler with per-"processor" state
+- integrated network poller into scheduler
+- lock-free work queues
+
+* Scheduling
+
+Current scheduler:
+
+ ┌─┐         ┌─┐         ┌─┐         ┌─┐                  ┌─┐
+ │ │         │ │         │ │         │ │                  │ │
+ ├─┤         ├─┤         ├─┤         ├─┤                  ├─┤ Global
+ │ │         │G│         │ │         │ │                  │ │ state
+ ├─┤         ├─┤         ├─┤         ├─┤                  ├─┤
+ │G│         │G│         │G│         │ │                  │G│
+ ├─┤         ├─┤         ├─┤         ├─┤                  ├─┤
+ │G│         │G│         │G│         │G│                  │G│
+ └┬┘         └┬┘         └┬┘         └┬┘                  └─┘
+  │           │           │           │
+  ↓           ↓           ↓           ↓
+ ┌─┬──────┐  ┌─┬──────┐  ┌─┬──────┐  ┌─┬──────┐     ┌────┐┌──────┐┌───────┐
+ │P│mcache│  │P│mcache│  │P│mcache│  │P│mcache│     │heap││timers││netpoll│
+ └┬┴──────┘  └┬┴──────┘  └┬┴──────┘  └┬┴──────┘     └────┘└──────┘└───────┘
+  │           │           │           │
+  ↓           ↓           ↓           ↓
+ ┌─┐         ┌─┐         ┌─┐         ┌─┐               ┌─┐ ┌─┐ ┌─┐
+ │M│         │M│         │M│         │M│               │M│ │M│ │M│
+ └─┘         └─┘         └─┘         └─┘               └─┘ └─┘ └─┘
+
+G - goroutine; P - logical processor; M - OS thread (machine)
+
+* Scheduling
+
+Want:
+
+- temporal locality to exploit caches
+- spatial locality to exploit NUMA
+- schedule mostly LIFO but ensure weak fairness
+- allocate local memory and stacks
+- scan local memory in GC
+- collocate communicating goroutines
+- distribute non-communicating goroutines
+- distribute timers and network poller
+- poll network on the same core where last read was issued
+
+
+
+
+
+* Garbage Collection
+
+* Garbage Collection
+
+Garbage collection simplifies APIs.
+
+- In C and C++, too much API design (and too much programming effort!) is about memory management.
+
+Fundamental to concurrency: too hard to track ownership otherwise.
+
+Fundamental to interfaces: memory management details do not bifurcate otherwise-similar APIs.
+
+Of course, adds cost, latency, complexity in run time system.
+
+* Garbage Collection
+
+Plenty of research about garbage collection, mostly in Java context.
+
+- Parallel stop-the-world
+- CMS: concurrent mark-and-sweep, stop-the-world compaction
+- G1: region-based incremental copying collector
+
+Java collectors usually:
+
+- are generational/incremental because allocation rate is high
+- compact memory to support generations
+- have pauses because concurrent compaction is tricky and slow
+
+* Garbage Collection
+
+But Go is very different!
+
+- User can avoid lots of allocations by embedding objects:
+
+	type Point struct {
+		X, Y int
+	}
+	type Rectangle struct {
+		Min, Max Point
+	}
+
+- Less pointers.
+- Lots of stack allocations.
+- Interior pointers are allowed:
+
+	p := &rect.Max
+
+- Hundreds of thousands of stacks (goroutines)
+- No object headers so far
+
+* Implementing Garbage Collection
+
+Current GC: stop the world, parallel mark, start the world, concurrent sweep.
+Concurrent mark is almost ready.
+
+Cannot reuse Java GC algorithms directly.
+
+Research question: what GC algorithm is the best fit for Go?
+Do we need generations? Do we need compaction? What are efficient data structures that support interior pointers?
+
+
+
+
+* Race and deadlock detection
+
+.image research2/race.png 160 600
+
+* Race detection
+
+Based on ThreadSanitizer runtime, originally mainly targeted C/C++.
+Traditional happens-before race detector based on vector clocks (devil in details!).
+Works fine for Go, except:
+
+ $ go run -race lots_of_goroutines.go
+ race: limit on 8192 simultaneously alive goroutines is exceeded, dying
+
+Research question: race dectector that efficiently supports hundreds of thousands of goroutines?
+
+* Deadlock detection
+
+Deadlock on mutexes due to lock order inversion:
+
+ // thread 1                       // thread 2
+ pthread_mutex_lock(&m1);          pthread_mutex_lock(&m2);
+ pthread_mutex_lock(&m2);          pthread_mutex_lock(&m1);
+ ...                               ...
+ pthread_mutex_unlock(&m2);        pthread_mutex_unlock(&m1);
+ pthread_mutex_unlock(&m1);        pthread_mutex_unlock(&m2);
+
+Lock order inversions are easy to detect:
+
+- build "M1 is locked under M2" relation.
+- if it becomes cyclic, there is a potential deadlock.
+- whenever a new edge is added to the graph, do DFS to find cycles.
+
+* Deadlock detection
+
+Go has channels and mutexes. Channels are semaphores. A mutex can be unlocked in
+a different goroutine, so it is essentially a binary semaphore too.
+
+Deadlock example:
+
+	// Parallel file tree walk.
+	func worker(pendingItems chan os.FileInfo)
+		for f := range pendingItems {
+			if f.IsDir() {
+				filepath.Walk(f.Name(), func(path string, info os.FileInfo, err error) error {
+					pendingItems <- info
+				})
+			} else {
+				visit(f)
+			}
+		}
+	}
+
+pendingItems channel has limited capacity. All workers can block on send to pendingItems.
+
+* Deadlock detection
+
+Another deadlock example:
+
+ var (
+ 	c = make(chan T, 100)
+ 	mtx sync.RWMutex
+  )
+ 
+ // goroutine 1      // goroutine 2         // goroutine 3
+ // does send        // does receive        // "resizes" the channel
+ mtx.RLock()         mtx.RLock()            mtx.Lock()
+ c <- v              v := <-c               tmp := make(chan T, 200)
+ mtx.RUnlock()       mtx.RUnlock()          copyAll(c, tmp)
+                                            c = tmp
+                                            mtx.Unlock()
+
+RWMutex is fair for both readers and writers: when a writer arrives, new readers are not let to enter the critical section.
+Goroutine 1 blocks on chan send; then goroutine 3 blocks on mtx.Lock; then goroutine 2 blocks on mtx.RLock.
+
+* Deadlock detection
+
+Research question: how to detect deadlocks on semaphores?
+
+No known theory to date.
+
+
+
+
+* Testing of the implementation
+
+.image research2/gopherswrench.jpg 240 405
+
+* Testing of the implementation
+
+So now we have a new language with several complex implementations:
+
+- lexer
+- parser
+- transformation and optimization passes
+- code generation
+- linker
+- channel and map operations
+- garbage collector
+- ...
+
+*How*do*you*test*it?*
+
+* Testing of the implementation
+
+Csmith is a tool that generates random C programs that statically and dynamically conform to the C99 standard.
+
+.image research2/csmith.png
+
+* Testing of the implementation
+
+Gosmith is a tool that generates random Go programs that statically and dynamically conform to the Go standard.
+
+Turned out to be much simpler than C: no undefined behavior all around!
+
+- no unitialized variables
+- no concurrent mutations between sequence points (x[i++] = --i)
+- no UB during signed overflow
+- total 191 kinds of undefined behavior and 52 kinds of unspecified behavior in C
+
+* Testing of the implementation
+
+But generates uninteresting programs from execution point of view: most of them deadlock or crash on nil deref.
+
+Trophies:
+
+- 31 bugs in gc compiler
+- 18 bugs in gccgo compiler
+- 5 bugs in llgo compiler
+- 1 bug in gofmt
+- 3 bugs in the spec
+
+.image research2/emoji.png
+
+* Testing of the implementation
+
+Research question: how to generate random *interesting*concurrent* Go programs?
+
+Must:
+
+- create and wait for goroutines
+- communicate over channels
+- protect data with mutexes (reader-writer)
+- pass data ownership between goroutines (explicitly and implicitly)
+
+Must not:
+
+- deadlock
+- cause data races
+- have non-deterministic results
+
+
+
+
+
+* Research and Go
+
+Plenty of research questions about how to implement Go well.
+
+- Concurrency
+- Scheduling
+- Garbage collection
+- Race and deadlock detection
+- Testing of the implementation
+- [Polymorphism]
+- [Program translation]
+
diff --git a/2014/research2/addr1.go b/2014/research2/addr1.go
new file mode 100644
index 0000000..7ff3986
--- /dev/null
+++ b/2014/research2/addr1.go
@@ -0,0 +1,83 @@
+// +build OMIT
+
+package main
+
+import (
+	"fmt"
+	"math/rand"
+	"time"
+)
+
+func lookup() {
+	for _, w := range worklist {
+		w.addrs, w.err = LookupHost(w.host)
+	}
+}
+
+func main() {
+	rand.Seed(time.Now().UnixNano())
+
+	t0 := time.Now()
+	lookup()
+
+	fmt.Printf("\n")
+	for _, w := range worklist {
+		if w.err != nil {
+			fmt.Printf("%s: error: %v\n", w.host, w.err)
+			continue
+		}
+		fmt.Printf("%s: %v\n", w.host, w.addrs)
+	}
+	fmt.Printf("total lookup time: %.3f seconds\n", time.Since(t0).Seconds())
+}
+
+var worklist = []*Work{
+	{host: "fast.com"},
+	{host: "slow.com"},
+	{host: "fast.missing.com"},
+	{host: "slow.missing.com"},
+}
+
+type Work struct {
+	host  string
+	addrs []string
+	err   error
+}
+
+func LookupHost(name string) (addrs []string, err error) {
+	t0 := time.Now()
+	defer func() {
+		fmt.Printf("lookup %s: %.3f seconds\n", name, time.Since(t0).Seconds())
+	}()
+	h := hosts[name]
+	if h == nil {
+		h = failure
+	}
+	return h(name)
+}
+
+type resolver func(string) ([]string, error)
+
+var hosts = map[string]resolver{
+	"fast.com":         delay(10*time.Millisecond, fixedAddrs("10.0.0.1")),
+	"slow.com":         delay(2*time.Second, fixedAddrs("10.0.0.4")),
+	"fast.missing.com": delay(10*time.Millisecond, failure),
+	"slow.missing.com": delay(2*time.Second, failure),
+}
+
+func fixedAddrs(addrs ...string) resolver {
+	return func(string) ([]string, error) {
+		return addrs, nil
+	}
+}
+
+func delay(d time.Duration, f resolver) resolver {
+	return func(name string) ([]string, error) {
+		time.Sleep(d/2 + time.Duration(rand.Int63n(int64(d/2))))
+		return f(name)
+	}
+}
+
+func failure(name string) ([]string, error) {
+	return nil, fmt.Errorf("unknown host %v", name)
+}
diff --git a/2014/research2/addr2.go b/2014/research2/addr2.go
new file mode 100644
index 0000000..4f9078f
--- /dev/null
+++ b/2014/research2/addr2.go
@@ -0,0 +1,92 @@
+// +build OMIT
+
+package main
+
+import (
+	"fmt"
+	"math/rand"
+	"time"
+)
+
+func lookup() {
+	done := make(chan bool)
+
+	for _, w := range worklist {
+		go func(w *Work) {
+			w.addrs, w.err = LookupHost(w.host)
+			done <- true
+		}(w)
+	}
+
+	for range worklist {
+		<-done
+	}
+}
+
+func main() {
+	rand.Seed(time.Now().UnixNano())
+
+	t0 := time.Now()
+	lookup()
+
+	fmt.Printf("\n")
+	for _, w := range worklist {
+		if w.err != nil {
+			fmt.Printf("%s: error: %v\n", w.host, w.err)
+			continue
+		}
+		fmt.Printf("%s: %v\n", w.host, w.addrs)
+	}
+	fmt.Printf("total lookup time: %.3f seconds\n", time.Since(t0).Seconds())
+}
+
+var worklist = []*Work{
+	{host: "fast.com"},
+	{host: "slow.com"},
+	{host: "fast.missing.com"},
+	{host: "slow.missing.com"},
+}
+
+type Work struct {
+	host  string
+	addrs []string
+	err   error
+}
+
+func LookupHost(name string) (addrs []string, err error) {
+	t0 := time.Now()
+	defer func() {
+		fmt.Printf("lookup %s: %.3f seconds\n", name, time.Since(t0).Seconds())
+	}()
+	h := hosts[name]
+	if h == nil {
+		h = failure
+	}
+	return h(name)
+}
+
+type resolver func(string) ([]string, error)
+
+var hosts = map[string]resolver{
+	"fast.com":         delay(10*time.Millisecond, fixedAddrs("10.0.0.1")),
+	"slow.com":         delay(2*time.Second, fixedAddrs("10.0.0.4")),
+	"fast.missing.com": delay(10*time.Millisecond, failure),
+	"slow.missing.com": delay(2*time.Second, failure),
+}
+
+func fixedAddrs(addrs ...string) resolver {
+	return func(string) ([]string, error) {
+		return addrs, nil
+	}
+}
+
+func delay(d time.Duration, f resolver) resolver {
+	return func(name string) ([]string, error) {
+		time.Sleep(d/2 + time.Duration(rand.Int63n(int64(d/2))))
+		return f(name)
+	}
+}
+
+func failure(name string) ([]string, error) {
+	return nil, fmt.Errorf("unknown host %v", name)
+}
diff --git a/2014/research2/busy.jpg b/2014/research2/busy.jpg
new file mode 100644
index 0000000..35ce9f4
--- /dev/null
+++ b/2014/research2/busy.jpg
Binary files differ
diff --git a/2014/research2/csmith.png b/2014/research2/csmith.png
new file mode 100644
index 0000000..6f2e3d7
--- /dev/null
+++ b/2014/research2/csmith.png
Binary files differ
diff --git a/2014/research2/datacenter.jpg b/2014/research2/datacenter.jpg
new file mode 100644
index 0000000..c2de0f3
--- /dev/null
+++ b/2014/research2/datacenter.jpg
Binary files differ
diff --git a/2014/research2/emoji.png b/2014/research2/emoji.png
new file mode 100644
index 0000000..aae97c0
--- /dev/null
+++ b/2014/research2/emoji.png
Binary files differ
diff --git a/2014/research2/gophercomplex6.jpg b/2014/research2/gophercomplex6.jpg
new file mode 100644
index 0000000..7963c1b
--- /dev/null
+++ b/2014/research2/gophercomplex6.jpg
Binary files differ
diff --git a/2014/research2/gopherswrench.jpg b/2014/research2/gopherswrench.jpg
new file mode 100644
index 0000000..690fdb0
--- /dev/null
+++ b/2014/research2/gopherswrench.jpg
Binary files differ
diff --git a/2014/research2/hello.go b/2014/research2/hello.go
new file mode 100644
index 0000000..6e52b38
--- /dev/null
+++ b/2014/research2/hello.go
@@ -0,0 +1,14 @@
+// +build OMIT
+
+package main
+
+import "fmt"
+
+func main() {
+	c := make(chan string)
+	go func() {
+		c <- "Hello"
+		c <- "World"
+	}()
+	fmt.Println(<-c, <-c)
+}
diff --git a/2014/research2/race.png b/2014/research2/race.png
new file mode 100644
index 0000000..8934a64
--- /dev/null
+++ b/2014/research2/race.png
Binary files differ
diff --git a/2014/research2/select.go b/2014/research2/select.go
new file mode 100644
index 0000000..ae1084b
--- /dev/null
+++ b/2014/research2/select.go
@@ -0,0 +1,23 @@
+package main
+
+import (
+	"fmt"
+	"time"
+)
+
+func main() {
+	c1 := make(chan int, 1)
+	c2 := make(chan int, 1)
+	c1 <- 42
+
+	select {
+	case v := <-c1:
+		fmt.Println("received from c1: ", v)
+	case c2 <- 1:
+		fmt.Println("sent to c2")
+	case <-time.After(time.Second):
+		fmt.Println("timed out")
+	default:
+		fmt.Println("nothing ready at the moment")
+	}
+}