go.talks: add 2012/waza
Concurrency is not Parallelism talk from Waza 2012.

R=golang-dev, campoy
CC=golang-dev
https://golang.org/cl/6847062
diff --git a/2012/waza.slide b/2012/waza.slide
new file mode 100644
index 0000000..b574eef
--- /dev/null
+++ b/2012/waza.slide
@@ -0,0 +1,447 @@
+# This presentation was the closing keynote of the Heroku Waza conference in January, 2012.
+# It has been slightly modified here for clarity and for use in the "present" format; the original
+# used a precursor to that tool.
+
+Concurrency is not Parallelism
+Waza Jan 11, 2012
+
+Rob Pike
+r@golang.org
+
+* The modern world is parallel
+
+Multicore.
+
+Networks.
+
+Clouds of CPUs.
+
+Loads of users.
+
+Our technology should help.
+That's where concurrency comes in.
+
+* Go supports concurrency
+
+Go provides:
+
+- concurrent execution (goroutines)
+- synchronization and messaging (channels)
+- multi-way concurrent control (select)
+
+* Concurrency is cool! Yay parallelism!!
+
+NO! A fallacy.
+
+When Go was announced, many were confused by the distinction.
+
+"I ran the prime sieve with 4 processors and it got slower!"
+
+* Concurrency
+
+Programming as the composition of independently executing processes.
+
+(Processes in the general sense, not Linux processes. Famously hard to define.)
+
+* Parallelism
+
+Programming as the simultaneous execution of (possibly related) computations.
+
+* Concurrency vs. parallelism
+
+Concurrency is about dealing with lots of things at once.
+
+Parallelism is about doing lots of things at once.
+
+Not the same, but related.
+
+Concurrency is about structure, parallelism is about execution.
+
+Concurrency provides a way to structure a solution to solve a problem that may (but not necessarily) be parallelizable.
+
+* An analogy
+
+Concurrent: Mouse, keyboard, display, and disk drivers.
+
+Parallel: Vector dot product.
+
+* Concurrency plus communication
+
+Concurrency is a way to structure a program by breaking it into pieces that can be executed independently.
+
+Communication is the means to coordinate the independent executions.
+
+This is the Go model and (like Erlang and others) it's based on CSP:
+
+C. A. R. Hoare: Communicating Sequential Processes (CACM 1978)
+
+* Gophers
+
+This is too abstract. Let's get concrete.
+
+* Our problem
+
+Move a pile of obsolete language manuals to the incinerator.
+
+.image waza/gophersimple1.jpg
+
+With only one gopher this will take too long.
+
+* More gophers!
+
+.image waza/gophersimple3.jpg
+
+More gophers are not enough; they need more carts.
+
+* More gophers and more carts
+
+.image waza/gophersimple2.jpg
+
+This will go faster, but there will be bottlenecks at the pile and incinerator.
+Also need to synchronize the gophers.
+A message (that is, a communication between the gophers) will do.
+
+* Double everything
+
+Remove the bottleneck; make them really independent.
+
+.image waza/gophersimple4.jpg
+
+This will consume input twice as fast.
+
+* Concurrent composition
+
+.image waza/gophersimple4.jpg
+The concurrent composition of two gopher procedures.
+
+* Concurrent composition
+
+This design is not automatically parallel!
+
+What if only one gopher is moving at a time?
+Then it's still concurrent (that's in the design), just not parallel.
+
+However, it's automatically parallelizable!
+
+Moreover the concurrent composition suggests other models.
+
+* Another design
+
+.image waza/gophercomplex0.jpg
+
+Three gophers in action, but with likely delays.
+Each gopher is an independently executing procedure,
+plus coordination (communication).
+
+* Finer-grained concurrency
+
+Add another gopher procedure to return the empty carts.
+
+.image waza/gophercomplex1.jpg
+
+Four gophers in action for better flow, each doing one simple task.
+
+If we arrange everything right (implausible but not impossible), that's four times faster than our original one-gopher design.
+
+* Observation
+
+We improved performance by adding a concurrent procedure to the existing design.
+
+More gophers doing more work; it runs better.
+
+This is a deeper insight than mere parallelism.
+
+* Concurrent procedures
+
+Four distinct gopher procedures:
+
+- load books onto cart
+- move cart to incinerator
+- unload cart into incinerator
+- return empty cart
+
+Different concurrent designs enable different ways to parallelize.
+
+* More parallelization!
+
+We can now parallelize on the other axis; the concurrent design makes it easy. Eight gophers, all busy.
+
+.image waza/gophercomplex2.jpg
+
+* Or maybe no parallelization at all
+
+Keep in mind, even if only one gopher is active at a time (zero parallelism), it's still a correct and concurrent solution.
+
+.image waza/gophercomplex2.jpg
+
+* Another design
+
+Here's another way to structure the problem as the concurrent composition of gopher procedures.
+
+Two gopher procedures, plus a staging pile.
+
+.image waza/gophercomplex3.jpg
+
+* Parallelize the usual way
+
+Run more concurrent procedures to get more throughput.
+
+.image waza/gophercomplex4.jpg
+
+* Or a different way
+
+Bring the staging pile to the multi-gopher concurrent model:
+
+.image waza/gophercomplex5.jpg
+
+* Full on optimization
+
+Use all our techniques. Sixteen gophers hard at work!
+
+.image waza/gophercomplex6.jpg
+
+* Lesson
+
+There are many ways to break the processing down.
+
+That's concurrent design.
+
+Once we have the breakdown, parallelization can fall out and correctness is easy.
+
+* Back to Computing
+
+In our book transport problem, substitute:
+
+- book pile => web content
+- gopher => CPU
+- cart => marshaling, rendering, or networking
+- incinerator => proxy, browser, or other consumer
+
+It becomes a concurrent design for a scalable web service.
+Gophers serving web content.
+
+* A little background about Go
+
+Not the place for a tutorial, just quick highlights.
+
+* Goroutines
+
+A goroutine is a function running independently  in the same address space as other goroutines
+
+.code waza/snippets /f.runs/
+
+.code waza/snippets /f.starts.running/,/return/
+
+Like launching a function with shell's `&` notation.
+
+* Goroutines are not threads 
+
+(They're a bit like threads, but they're much cheaper.)
+
+Goroutines are multiplexed onto OS threads as required.
+
+When a goroutine blocks, that thread blocks but no other goroutine blocks.
+
+* Channels
+
+Channels are typed values that allow goroutines to synchronize and exchange information.
+
+.code waza/snippets /make.*chan/,/completedAt/
+
+* Select
+
+The `select` statement is like a `switch`, but the decision is based on ability to communicate rather than equal values.
+
+.code waza/snippets /select/,/}/
+
+* Go really supports concurrency
+
+Really.
+
+It's routine to create thousands of goroutines in one program.
+(Once debugged a program after it had created 1.3 million.)
+
+Stacks start small, but grow and shrink as required.
+
+Goroutines aren't free, but they're very cheap.
+
+* Closures are also part of the story
+
+Make some concurrent calculations easier to express.
+
+They are just local functions.
+Here's a non-concurrent example:
+
+.code waza/snippets /Compose/,/sin,/
+
+* Some examples
+
+Learn concurrent Go by osmosis.
+
+* Launching daemons
+
+Use a closure to wrap a background operation.
+
+This copies items from the input channel to the output channel:
+
+.code waza/snippets /copy.input/,/^}/
+
+The `for` `range` operation runs until channel is drained.
+
+* A simple load balancer (1)
+
+A unit of work:
+
+.code waza/load1 /type/,/^}/
+
+* A simple load balancer (2)
+
+A worker task
+
+.code waza/load1 /worker/,/^}/
+
+Must make sure other workers can run when one blocks.
+
+* A simple load balancer (3)
+
+The runner
+
+.code waza/load1 /Run/,/^}/
+
+Easy problem but also hard to solve concisely without concurrency.
+
+* Concurrency enables parallelism
+
+The load balancer is implicitly parallel and scalable.
+
+`NumWorkers` could be huge.
+
+The tools of concurrency make it almost trivial to build a safe, working, scalable, parallel design.
+
+* Concurrency simplifies synchronization
+
+No explicit synchronization needed.
+
+The structure of the program is implicitly synchronized.
+
+* That was too easy
+
+Let's do a more realistic load balancer.
+
+* Load balancer
+
+.image waza/gopherchart.jpg
+
+* Request definition
+
+The requester sends Requests to the balancer
+
+.code waza/load2 /^type.Request/,/^}/
+
+Note the return channel inside the request.
+Channels are first-class values.
+
+* Requester function
+
+An artificial but illustrative simulation of a requester, a load generator.
+
+.code waza/load2 /^func.requester/,/^}/
+
+* Worker definition
+
+A channel of requests, plus some load tracking data.
+
+.code waza/load2 /type.Worker/,/^}/
+
+* Worker
+
+Balancer sends request to most lightly loaded worker
+
+.code waza/load2 /^func.*work.*done/,/^}/
+
+The channel of requests (`w.requests`) delivers requests to each worker.  The balancer tracks the number of pending requests as a measure of load.
+Each response goes directly to its requester.
+
+Could run the loop body as a goroutine for parallelism.
+
+* Balancer definition
+
+The load balancer needs a pool of workers and a single channel to which requesters can report task completion.
+
+.code waza/load2 /type.Pool/,/^}/
+
+* Balancer function
+
+Easy!
+
+.code waza/load2 /func.*balance/,/^}/
+
+Just need to implement dispatch and completed.
+
+* A heap of channels
+
+Make Pool an implementation of the `Heap` interface by providing a few methods such as:
+
+.code waza/load2 /func.*Less/,/^}/
+
+Now we balance by making the `Pool` a heap tracked by load.
+
+* Dispatch
+
+All the pieces are in place.
+
+.code waza/load2 /Send.Request/,/^}/
+
+* Completed
+
+.code waza/load2 /Job.is.complete/,/^}/
+
+* Lesson
+
+A complex problem can be broken down into easy-to-understand components.
+
+The pieces can be composed concurrently.
+
+The result is easy to understand, efficient, scalable, and correct.
+
+Maybe even parallel.
+
+* One more example
+
+We have a replicated database and want to minimize latency by asking them all and returning the first response to arrive.
+
+* Query a replicated database
+
+.code waza/snippets /func.Query/,/^}/
+Concurrent tools and garbage collection make this an easy solution to a subtle problem.
+
+(Teardown of late finishers is left as an exercise.)
+
+
+* Conclusion
+
+
+Concurrency is powerful.
+
+Concurrency is not parallelism.
+
+Concurrency enables parallelism.
+
+Concurrency makes parallelism (and scaling and everything else) easy.
+
+* For more information
+
+Go: golang.org
+
+Some history: swtch.com/~rsc/thread/
+
+A previous talk (video): tinyurl.com/newsqueak
+
+Parellelism is not concurrency (Harper): tinyurl.com/pincharper
+
+A concurrent window system (Pike): tinyurl.com/pikecws
+
+Concurrent power series (McIlroy): tinyurl.com/powser
+
+And finally, parallel but not concurrent:
+research.google.com/archive/sawzall.html
\ No newline at end of file
diff --git a/2012/waza/balance.go b/2012/waza/balance.go
new file mode 100644
index 0000000..ec30cea
--- /dev/null
+++ b/2012/waza/balance.go
@@ -0,0 +1,160 @@
+package main
+
+import (
+	"container/heap"
+	"fmt"
+	"math/rand"
+	"time"
+)
+
+const nRequester = 100
+const nWorker = 10
+
+// Simulation of some work: just sleep for a while and report how long.
+func op() int {
+	n := rand.Int63n(int64(time.Second))
+	time.Sleep(time.Duration(nWorker * n))
+	return int(n)
+}
+
+type Request struct {
+	fn func() int
+	c  chan int
+}
+
+func requester(work chan Request) {
+	c := make(chan int)
+	for {
+		time.Sleep(time.Duration(rand.Int63n(int64(nWorker * 2 * time.Second))))
+		work <- Request{op, c}
+		<-c
+	}
+}
+
+type Worker struct {
+	i        int
+	requests chan Request
+	pending  int
+}
+
+func (w *Worker) work(done chan *Worker) {
+	for {
+		req := <-w.requests
+		req.c <- req.fn()
+		done <- w
+	}
+}
+
+type Pool []*Worker
+
+func (p Pool) Len() int { return len(p) }
+
+func (p Pool) Less(i, j int) bool {
+	return p[i].pending < p[j].pending
+}
+
+func (p *Pool) Swap(i, j int) {
+	a := *p
+	a[i], a[j] = a[j], a[i]
+	a[i].i = i
+	a[j].i = j
+}
+
+func (p *Pool) Push(x interface{}) {
+	a := *p
+	n := len(a)
+	a = a[0 : n+1]
+	w := x.(*Worker)
+	a[n] = w
+	w.i = n
+	*p = a
+}
+
+func (p *Pool) Pop() interface{} {
+	a := *p
+	*p = a[0 : len(a)-1]
+	w := a[len(a)-1]
+	w.i = -1 // for safety
+	return w
+}
+
+type Balancer struct {
+	pool Pool
+	done chan *Worker
+	i    int
+}
+
+func NewBalancer() *Balancer {
+	done := make(chan *Worker, nWorker)
+	b := &Balancer{make(Pool, 0, nWorker), done, 0}
+	for i := 0; i < nWorker; i++ {
+		w := &Worker{requests: make(chan Request, nRequester)}
+		heap.Push(&b.pool, w)
+		go w.work(b.done)
+	}
+	return b
+}
+
+func (b *Balancer) balance(work chan Request) {
+	for {
+		select {
+		case req := <-work:
+			b.dispatch(req)
+		case w := <-b.done:
+			b.completed(w)
+		}
+		b.print()
+	}
+}
+
+func (b *Balancer) print() {
+	sum := 0
+	sumsq := 0
+	for _, w := range b.pool {
+		fmt.Printf("%d ", w.pending)
+		sum += w.pending
+		sumsq += w.pending * w.pending
+	}
+	avg := float64(sum) / float64(len(b.pool))
+	variance := float64(sumsq)/float64(len(b.pool)) - avg*avg
+	fmt.Printf(" %.2f %.2f\n", avg, variance)
+}
+
+func (b *Balancer) dispatch(req Request) {
+	if false {
+		w := b.pool[b.i]
+		w.requests <- req
+		w.pending++
+		b.i++
+		if b.i >= len(b.pool) {
+			b.i = 0
+		}
+		return
+	}
+
+	w := heap.Pop(&b.pool).(*Worker)
+	w.requests <- req
+	w.pending++
+	//	fmt.Printf("started %p; now %d\n", w, w.pending)
+	heap.Push(&b.pool, w)
+}
+
+func (b *Balancer) completed(w *Worker) {
+	if false {
+		w.pending--
+		return
+	}
+
+	w.pending--
+	//	fmt.Printf("finished %p; now %d\n", w, w.pending)
+	heap.Remove(&b.pool, w.i)
+	heap.Push(&b.pool, w)
+}
+
+func main() {
+	work := make(chan Request)
+	for i := 0; i < nRequester; i++ {
+		go requester(work)
+	}
+	NewBalancer().balance(work)
+}
diff --git a/2012/waza/gopherchart.jpg b/2012/waza/gopherchart.jpg
new file mode 100644
index 0000000..5465da8
--- /dev/null
+++ b/2012/waza/gopherchart.jpg
Binary files differ
diff --git a/2012/waza/gophercomplex0.jpg b/2012/waza/gophercomplex0.jpg
new file mode 100644
index 0000000..3ea0738
--- /dev/null
+++ b/2012/waza/gophercomplex0.jpg
Binary files differ
diff --git a/2012/waza/gophercomplex1.jpg b/2012/waza/gophercomplex1.jpg
new file mode 100644
index 0000000..5813956
--- /dev/null
+++ b/2012/waza/gophercomplex1.jpg
Binary files differ
diff --git a/2012/waza/gophercomplex2.jpg b/2012/waza/gophercomplex2.jpg
new file mode 100644
index 0000000..45d2314
--- /dev/null
+++ b/2012/waza/gophercomplex2.jpg
Binary files differ
diff --git a/2012/waza/gophercomplex3.jpg b/2012/waza/gophercomplex3.jpg
new file mode 100644
index 0000000..9cbf16d
--- /dev/null
+++ b/2012/waza/gophercomplex3.jpg
Binary files differ
diff --git a/2012/waza/gophercomplex4.jpg b/2012/waza/gophercomplex4.jpg
new file mode 100644
index 0000000..d87123e
--- /dev/null
+++ b/2012/waza/gophercomplex4.jpg
Binary files differ
diff --git a/2012/waza/gophercomplex5.jpg b/2012/waza/gophercomplex5.jpg
new file mode 100644
index 0000000..35ce9f4
--- /dev/null
+++ b/2012/waza/gophercomplex5.jpg
Binary files differ
diff --git a/2012/waza/gophercomplex6.jpg b/2012/waza/gophercomplex6.jpg
new file mode 100644
index 0000000..7963c1b
--- /dev/null
+++ b/2012/waza/gophercomplex6.jpg
Binary files differ
diff --git a/2012/waza/gophersimple1.jpg b/2012/waza/gophersimple1.jpg
new file mode 100644
index 0000000..0663f6b
--- /dev/null
+++ b/2012/waza/gophersimple1.jpg
Binary files differ
diff --git a/2012/waza/gophersimple2.jpg b/2012/waza/gophersimple2.jpg
new file mode 100644
index 0000000..d626b63
--- /dev/null
+++ b/2012/waza/gophersimple2.jpg
Binary files differ
diff --git a/2012/waza/gophersimple3.jpg b/2012/waza/gophersimple3.jpg
new file mode 100644
index 0000000..ed10996
--- /dev/null
+++ b/2012/waza/gophersimple3.jpg
Binary files differ
diff --git a/2012/waza/gophersimple4.jpg b/2012/waza/gophersimple4.jpg
new file mode 100644
index 0000000..bb7ddd7
--- /dev/null
+++ b/2012/waza/gophersimple4.jpg
Binary files differ
diff --git a/2012/waza/load1 b/2012/waza/load1
new file mode 100644
index 0000000..b5a265d
--- /dev/null
+++ b/2012/waza/load1
@@ -0,0 +1,20 @@
+type Work struct {
+    x, y, z int
+}
+
+func worker(in <-chan *Work, out chan<- *Work) {
+   for w := range in {
+      w.z = w.x * w.y
+      Sleep(w.z)
+      out <- w
+   }
+}
+
+func Run() {
+   in, out := make(chan *Work), make(chan *Work)
+   for i := 0; i < NumWorkers; i++ {
+       go worker(in, out)
+   }
+   go sendLotsOfWork(in)
+   receiveLotsOfResults(out)
+}
diff --git a/2012/waza/load2 b/2012/waza/load2
new file mode 100644
index 0000000..438a018
--- /dev/null
+++ b/2012/waza/load2
@@ -0,0 +1,76 @@
+type Request struct {
+    fn func() int  // The operation to perform.
+    c  chan int    // The channel to return the result.
+}
+
+func requester(work chan<- Request) {
+    c := make(chan int)
+    for {
+    	// Kill some time (fake load).
+        Sleep(rand.Int63n(nWorker * 2 * Second))
+        work <- Request{workFn, c} // send request
+        result := <-c              // wait for answer
+        furtherProcess(result)  
+	}	
+}
+
+func (w *Worker) work(done chan *Worker) {
+    for {
+        req := <-w.requests // get Request from balancer
+        req.c <- req.fn()   // call fn and send result
+        done <- w           // we've finished this request
+    }
+}
+
+type Pool []*Worker
+
+type Balancer struct {
+	pool Pool
+	done chan *Worker
+}
+
+func (b *Balancer) balance(work chan Request) {
+    for {
+        select {
+        case req := <-work: // received a Request...
+            b.dispatch(req) // ...so send it to a Worker
+        case w := <-b.done: // a worker has finished ...
+            b.completed(w)  // ...so update its info
+        }
+    }
+}
+func (p Pool) Less(i, j int) bool {
+    return p[i].pending < p[j].pending
+}
+
+type Worker struct {
+    requests chan Request // work to do (buffered channel)
+    pending  int          // count of pending tasks
+    index     int         // index in the heap
+}
+
+// Send Request to worker
+func (b *Balancer) dispatch(req Request) {
+	// Grab the least loaded worker...
+    w := heap.Pop(&b.pool).(*Worker)
+    // ...send it the task.
+    w.requests <- req
+    // One more in its work queue.
+    w.pending++
+    // Put it into its place on the heap.
+    heap.Push(&b.pool, w)
+}
+
+// Job is complete; update heap
+func (b *Balancer) completed(w *Worker) {
+	// One fewer in the queue.
+	w.pending--
+	// Remove it from heap.                  
+	heap.Remove(&b.pool, w.index)
+    // Put it into its place on the heap.
+	heap.Push(&b.pool, w)
+}
+
+func (p Pool) Less(i, j int) bool {
+   return p[i].pending < p[j].pending
+}
diff --git a/2012/waza/snippets b/2012/waza/snippets
new file mode 100644
index 0000000..6ac4e6a
--- /dev/null
+++ b/2012/waza/snippets
@@ -0,0 +1,64 @@
+f("hello", "world") // f runs; we wait
+
+go f("hello", "world") // f starts running
+g() // does not wait for f to return
+
+timerChan := make(chan time.Time)
+go func() {
+    time.Sleep(deltaT)
+    timerChan <- time.Now() // send time on timerChan
+}()
+// Do something else; when ready, receive.
+// Receive will block until timerChan delivers.
+// Value sent is other goroutine's completion time.
+completedAt := <-timerChan
+
+select {
+case v := <-ch1:
+    fmt.Println("channel 1 sends", v)
+case v := <-ch2:
+    fmt.Println("channel 2 sends", v)
+default: // optional
+    fmt.Println("neither channel was ready")
+}
+
+func Query(conns []Conn, query string) Result {
+    ch := make(chan Result, len(conns))  // buffered
+    for _, conn := range conns {
+        go func(c Conn) {
+            ch <- c.DoQuery(query):
+        }(conn)
+    }
+    return <-ch
+}
+
+func XQuery(conns []Conn, query string) Result {
+    ch := make(chan Result, 1)  // buffer of 1 item
+    for _, conn := range conns {
+      go func(c Conn) {
+        select {
+          case ch <- c.DoQuery(query):
+            // nothing to do
+          default: // executes if ch is blocked
+            // nothing to do
+        }
+      }(conn)
+    }
+    return <-ch
+}
+
+
+func Compose(f, g func(x float) float)
+                  func(x float) float {
+     return func(x float) float {
+        return f(g(x))
+    }
+}
+
+print(Compose(sin, cos)(0.5))
+
+go func() { // copy input to output
+	for val := range input {
+		output <- val
+	}
+}()