_content/talks/2012/waza.slide - website - Git at Google

 #_This presentation was the closing keynote of the Heroku Waza conference in January, 2012.
 #_It has been slightly modified here for clarity and for use in the "present" format; the original
 #_used a precursor to that tool.

 Concurrency is not Parallelism
 Waza Jan 11, 2012

 Rob Pike
 r@golang.org

 * Video

 This talk was presented at Heroku's Waza conference in January 2012.

 .link http://vimeo.com/49718712 Watch the talk on Vimeo

 * The modern world is parallel

 Multicore.

 Networks.

 Clouds of CPUs.

 Loads of users.

 Our technology should help.
 That's where concurrency comes in.

 * Go supports concurrency

 Go provides:

 - concurrent execution (goroutines)
 - synchronization and messaging (channels)
 - multi-way concurrent control (select)

 * Concurrency is cool! Yay parallelism!!

 NO! A fallacy.

 When Go was announced, many were confused by the distinction.

 "I ran the prime sieve with 4 processors and it got slower!"

 * Concurrency

 Programming as the composition of independently executing processes.

 (Processes in the general sense, not Linux processes. Famously hard to define.)

 * Parallelism

 Programming as the simultaneous execution of (possibly related) computations.

 * Concurrency vs. parallelism

 Concurrency is about dealing with lots of things at once.

 Parallelism is about doing lots of things at once.

 Not the same, but related.

 Concurrency is about structure, parallelism is about execution.

 Concurrency provides a way to structure a solution to solve a problem that may (but not necessarily) be parallelizable.

 * An analogy

 Concurrent: Mouse, keyboard, display, and disk drivers.

 Parallel: Vector dot product.

 * Concurrency plus communication

 Concurrency is a way to structure a program by breaking it into pieces that can be executed independently.

 Communication is the means to coordinate the independent executions.

 This is the Go model and (like Erlang and others) it's based on CSP:

 C. A. R. Hoare: Communicating Sequential Processes (CACM 1978)

 * Gophers

 This is too abstract. Let's get concrete.

 * Our problem

 Move a pile of obsolete language manuals to the incinerator.

 .image waza/gophersimple1.jpg

 With only one gopher this will take too long.

 * More gophers!

 .image waza/gophersimple3.jpg

 More gophers are not enough; they need more carts.

 * More gophers and more carts

 .image waza/gophersimple2.jpg

 This will go faster, but there will be bottlenecks at the pile and incinerator.
 Also need to synchronize the gophers.
 A message (that is, a communication between the gophers) will do.

 * Double everything

 Remove the bottleneck; make them really independent.

 .image waza/gophersimple4.jpg

 This will consume input twice as fast.

 * Concurrent composition

 .image waza/gophersimple4.jpg
 The concurrent composition of two gopher procedures.

 * Concurrent composition

 This design is not automatically parallel!

 What if only one gopher is moving at a time?
 Then it's still concurrent (that's in the design), just not parallel.

 However, it's automatically parallelizable!

 Moreover the concurrent composition suggests other models.

 * Another design

 .image waza/gophercomplex0.jpg

 Three gophers in action, but with likely delays.
 Each gopher is an independently executing procedure,
 plus coordination (communication).

 * Finer-grained concurrency

 Add another gopher procedure to return the empty carts.

 .image waza/gophercomplex1.jpg

 Four gophers in action for better flow, each doing one simple task.

 If we arrange everything right (implausible but not impossible), that's four times faster than our original one-gopher design.

 * Observation

 We improved performance by adding a concurrent procedure to the existing design.

 More gophers doing more work; it runs better.

 This is a deeper insight than mere parallelism.

 * Concurrent procedures

 Four distinct gopher procedures:

 - load books onto cart
 - move cart to incinerator
 - unload cart into incinerator
 - return empty cart

 Different concurrent designs enable different ways to parallelize.

 * More parallelization!

 We can now parallelize on the other axis; the concurrent design makes it easy. Eight gophers, all busy.

 .image waza/gophercomplex2.jpg

 * Or maybe no parallelization at all

 Keep in mind, even if only one gopher is active at a time (zero parallelism), it's still a correct and concurrent solution.

 .image waza/gophercomplex2.jpg

 * Another design

 Here's another way to structure the problem as the concurrent composition of gopher procedures.

 Two gopher procedures, plus a staging pile.

 .image waza/gophercomplex3.jpg

 * Parallelize the usual way

 Run more concurrent procedures to get more throughput.

 .image waza/gophercomplex4.jpg

 * Or a different way

 Bring the staging pile to the multi-gopher concurrent model:

 .image waza/gophercomplex5.jpg

 * Full on optimization

 Use all our techniques. Sixteen gophers hard at work!

 .image waza/gophercomplex6.jpg

 * Lesson

 There are many ways to break the processing down.

 That's concurrent design.

 Once we have the breakdown, parallelization can fall out and correctness is easy.

 * Back to Computing

 In our book transport problem, substitute:

 - book pile => web content
 - gopher => CPU
 - cart => marshaling, rendering, or networking
 - incinerator => proxy, browser, or other consumer

 It becomes a concurrent design for a scalable web service.
 Gophers serving web content.

 * A little background about Go

 Not the place for a tutorial, just quick highlights.

 * Goroutines

 A goroutine is a function running independently  in the same address space as other goroutines

 .code waza/snippets /f.runs/

 .code waza/snippets /f.starts.running/,/return/

 Like launching a function with shell's `&` notation.

 * Goroutines are not threads

 (They're a bit like threads, but they're much cheaper.)

 Goroutines are multiplexed onto OS threads as required.

 When a goroutine blocks, that thread blocks but no other goroutine blocks.

 * Channels

 Channels are typed values that allow goroutines to synchronize and exchange information.

 .code waza/snippets /make.*chan/,/completedAt/

 * Select

 The `select` statement is like a `switch`, but the decision is based on ability to communicate rather than equal values.

 .code waza/snippets /select/,/}/

 * Go really supports concurrency

 Really.

 It's routine to create thousands of goroutines in one program.
 (Once debugged a program after it had created 1.3 million.)

 Stacks start small, but grow and shrink as required.

 Goroutines aren't free, but they're very cheap.

 * Closures are also part of the story

 Make some concurrent calculations easier to express.

 They are just local functions.
 Here's a non-concurrent example:

 .code waza/snippets /Compose/,/sin,/

 * Some examples

 Learn concurrent Go by osmosis.

 * Launching daemons

 Use a closure to wrap a background operation.

 This copies items from the input channel to the output channel:

 .code waza/snippets /copy.input/,/^}/

 The `for` `range` operation runs until channel is drained.

 * A simple load balancer (1)

 A unit of work:

 .code waza/load1 /type/,/^}/

 * A simple load balancer (2)

 A worker task

 .code waza/load1 /worker/,/^}/

 Must make sure other workers can run when one blocks.

 * A simple load balancer (3)

 The runner

 .code waza/load1 /Run/,/^}/

 Easy problem but also hard to solve concisely without concurrency.

 * Concurrency enables parallelism

 The load balancer is implicitly parallel and scalable.

 `NumWorkers` could be huge.

 The tools of concurrency make it almost trivial to build a safe, working, scalable, parallel design.

 * Concurrency simplifies synchronization

 No explicit synchronization needed.

 The structure of the program is implicitly synchronized.

 * That was too easy

 Let's do a more realistic load balancer.

 * Load balancer

 .image waza/gopherchart.jpg

 * Request definition

 The requester sends Requests to the balancer

 .code waza/load2 /^type.Request/,/^}/

 Note the return channel inside the request.
 Channels are first-class values.

 * Requester function

 An artificial but illustrative simulation of a requester, a load generator.

 .code waza/load2 /^func.requester/,/^}/

 * Worker definition

 A channel of requests, plus some load tracking data.

 .code waza/load2 /type.Worker/,/^}/

 * Worker

 Balancer sends request to most lightly loaded worker

 .code waza/load2 /^func.*work.*done/,/^}/

 The channel of requests (`w.requests`) delivers requests to each worker.  The balancer tracks the number of pending requests as a measure of load.
 Each response goes directly to its requester.

 Could run the loop body as a goroutine for parallelism.

 * Balancer definition

 The load balancer needs a pool of workers and a single channel to which requesters can report task completion.

 .code waza/load2 /type.Pool/,/^}/

 * Balancer function

 Easy!

 .code waza/load2 /func.*balance/,/^}/

 Just need to implement dispatch and completed.

 * A heap of channels

 Make Pool an implementation of the `Heap` interface by providing a few methods such as:

 .code waza/load2 /func.*Less/,/^}/

 Now we balance by making the `Pool` a heap tracked by load.

 * Dispatch

 All the pieces are in place.

 .code waza/load2 /Send.Request/,/^}/

 * Completed

 .code waza/load2 /Job.is.complete/,/^}/

 * Lesson

 A complex problem can be broken down into easy-to-understand components.

 The pieces can be composed concurrently.

 The result is easy to understand, efficient, scalable, and correct.

 Maybe even parallel.

 * One more example

 We have a replicated database and want to minimize latency by asking them all and returning the first response to arrive.

 * Query a replicated database

 .code waza/snippets /func.Query/,/^}/
 Concurrent tools and garbage collection make this an easy solution to a subtle problem.

 (Teardown of late finishers is left as an exercise.)


 * Conclusion


 Concurrency is powerful.

 Concurrency is not parallelism.

 Concurrency enables parallelism.

 Concurrency makes parallelism (and scaling and everything else) easy.

 * For more information

 Go: golang.org

 Some history: swtch.com/~rsc/thread/

 A previous talk (video): tinyurl.com/newsqueak1

 Parallelism is not concurrency (Harper): tinyurl.com/pincharper

 A concurrent window system (Pike): tinyurl.com/pikecws

 Concurrent power series (McIlroy): tinyurl.com/powser

 And finally, parallel but not concurrent:
 research.google.com/archive/sawzall.html
	#_This presentation was the closing keynote of the Heroku Waza conference in January, 2012.
	#_It has been slightly modified here for clarity and for use in the "present" format; the original
	#_used a precursor to that tool.

	Concurrency is not Parallelism
	Waza Jan 11, 2012

	Rob Pike
	r@golang.org

	* Video

	This talk was presented at Heroku's Waza conference in January 2012.

	.link http://vimeo.com/49718712 Watch the talk on Vimeo

	* The modern world is parallel

	Multicore.

	Networks.

	Clouds of CPUs.

	Loads of users.

	Our technology should help.
	That's where concurrency comes in.

	* Go supports concurrency

	Go provides:

	- concurrent execution (goroutines)
	- synchronization and messaging (channels)
	- multi-way concurrent control (select)

	* Concurrency is cool! Yay parallelism!!

	NO! A fallacy.

	When Go was announced, many were confused by the distinction.

	"I ran the prime sieve with 4 processors and it got slower!"

	* Concurrency

	Programming as the composition of independently executing processes.

	(Processes in the general sense, not Linux processes. Famously hard to define.)

	* Parallelism

	Programming as the simultaneous execution of (possibly related) computations.

	* Concurrency vs. parallelism

	Concurrency is about dealing with lots of things at once.

	Parallelism is about doing lots of things at once.

	Not the same, but related.

	Concurrency is about structure, parallelism is about execution.

	Concurrency provides a way to structure a solution to solve a problem that may (but not necessarily) be parallelizable.

	* An analogy

	Concurrent: Mouse, keyboard, display, and disk drivers.

	Parallel: Vector dot product.

	* Concurrency plus communication

	Concurrency is a way to structure a program by breaking it into pieces that can be executed independently.

	Communication is the means to coordinate the independent executions.

	This is the Go model and (like Erlang and others) it's based on CSP:

	C. A. R. Hoare: Communicating Sequential Processes (CACM 1978)

	* Gophers

	This is too abstract. Let's get concrete.

	* Our problem

	Move a pile of obsolete language manuals to the incinerator.

	.image waza/gophersimple1.jpg

	With only one gopher this will take too long.

	* More gophers!

	.image waza/gophersimple3.jpg

	More gophers are not enough; they need more carts.

	* More gophers and more carts

	.image waza/gophersimple2.jpg

	This will go faster, but there will be bottlenecks at the pile and incinerator.
	Also need to synchronize the gophers.
	A message (that is, a communication between the gophers) will do.

	* Double everything

	Remove the bottleneck; make them really independent.

	.image waza/gophersimple4.jpg

	This will consume input twice as fast.

	* Concurrent composition

	.image waza/gophersimple4.jpg
	The concurrent composition of two gopher procedures.

	* Concurrent composition

	This design is not automatically parallel!

	What if only one gopher is moving at a time?
	Then it's still concurrent (that's in the design), just not parallel.

	However, it's automatically parallelizable!

	Moreover the concurrent composition suggests other models.

	* Another design

	.image waza/gophercomplex0.jpg

	Three gophers in action, but with likely delays.
	Each gopher is an independently executing procedure,
	plus coordination (communication).

	* Finer-grained concurrency

	Add another gopher procedure to return the empty carts.

	.image waza/gophercomplex1.jpg

	Four gophers in action for better flow, each doing one simple task.

	If we arrange everything right (implausible but not impossible), that's four times faster than our original one-gopher design.

	* Observation

	We improved performance by adding a concurrent procedure to the existing design.

	More gophers doing more work; it runs better.

	This is a deeper insight than mere parallelism.

	* Concurrent procedures

	Four distinct gopher procedures:

	- load books onto cart
	- move cart to incinerator
	- unload cart into incinerator
	- return empty cart

	Different concurrent designs enable different ways to parallelize.

	* More parallelization!

	We can now parallelize on the other axis; the concurrent design makes it easy. Eight gophers, all busy.

	.image waza/gophercomplex2.jpg

	* Or maybe no parallelization at all

	Keep in mind, even if only one gopher is active at a time (zero parallelism), it's still a correct and concurrent solution.

	.image waza/gophercomplex2.jpg

	* Another design

	Here's another way to structure the problem as the concurrent composition of gopher procedures.

	Two gopher procedures, plus a staging pile.

	.image waza/gophercomplex3.jpg

	* Parallelize the usual way

	Run more concurrent procedures to get more throughput.

	.image waza/gophercomplex4.jpg

	* Or a different way

	Bring the staging pile to the multi-gopher concurrent model:

	.image waza/gophercomplex5.jpg

	* Full on optimization

	Use all our techniques. Sixteen gophers hard at work!

	.image waza/gophercomplex6.jpg

	* Lesson

	There are many ways to break the processing down.

	That's concurrent design.

	Once we have the breakdown, parallelization can fall out and correctness is easy.

	* Back to Computing

	In our book transport problem, substitute:

	- book pile => web content
	- gopher => CPU
	- cart => marshaling, rendering, or networking
	- incinerator => proxy, browser, or other consumer

	It becomes a concurrent design for a scalable web service.
	Gophers serving web content.

	* A little background about Go

	Not the place for a tutorial, just quick highlights.

	* Goroutines

	A goroutine is a function running independently in the same address space as other goroutines

	.code waza/snippets /f.runs/

	.code waza/snippets /f.starts.running/,/return/

	Like launching a function with shell's `&` notation.

	* Goroutines are not threads

	(They're a bit like threads, but they're much cheaper.)

	Goroutines are multiplexed onto OS threads as required.

	When a goroutine blocks, that thread blocks but no other goroutine blocks.

	* Channels

	Channels are typed values that allow goroutines to synchronize and exchange information.

	.code waza/snippets /make.*chan/,/completedAt/

	* Select

	The `select` statement is like a `switch`, but the decision is based on ability to communicate rather than equal values.

	.code waza/snippets /select/,/}/

	* Go really supports concurrency

	Really.

	It's routine to create thousands of goroutines in one program.
	(Once debugged a program after it had created 1.3 million.)

	Stacks start small, but grow and shrink as required.

	Goroutines aren't free, but they're very cheap.

	* Closures are also part of the story

	Make some concurrent calculations easier to express.

	They are just local functions.
	Here's a non-concurrent example:

	.code waza/snippets /Compose/,/sin,/

	* Some examples

	Learn concurrent Go by osmosis.

	* Launching daemons

	Use a closure to wrap a background operation.

	This copies items from the input channel to the output channel:

	.code waza/snippets /copy.input/,/^}/

	The `for` `range` operation runs until channel is drained.

	* A simple load balancer (1)

	A unit of work:

	.code waza/load1 /type/,/^}/

	* A simple load balancer (2)

	A worker task

	.code waza/load1 /worker/,/^}/

	Must make sure other workers can run when one blocks.

	* A simple load balancer (3)

	The runner

	.code waza/load1 /Run/,/^}/

	Easy problem but also hard to solve concisely without concurrency.

	* Concurrency enables parallelism

	The load balancer is implicitly parallel and scalable.

	`NumWorkers` could be huge.

	The tools of concurrency make it almost trivial to build a safe, working, scalable, parallel design.

	* Concurrency simplifies synchronization

	No explicit synchronization needed.

	The structure of the program is implicitly synchronized.

	* That was too easy

	Let's do a more realistic load balancer.

	* Load balancer

	.image waza/gopherchart.jpg

	* Request definition

	The requester sends Requests to the balancer

	.code waza/load2 /^type.Request/,/^}/

	Note the return channel inside the request.
	Channels are first-class values.

	* Requester function

	An artificial but illustrative simulation of a requester, a load generator.

	.code waza/load2 /^func.requester/,/^}/

	* Worker definition

	A channel of requests, plus some load tracking data.

	.code waza/load2 /type.Worker/,/^}/

	* Worker

	Balancer sends request to most lightly loaded worker

	.code waza/load2 /^func.work.done/,/^}/

	The channel of requests (`w.requests`) delivers requests to each worker. The balancer tracks the number of pending requests as a measure of load.
	Each response goes directly to its requester.

	Could run the loop body as a goroutine for parallelism.

	* Balancer definition

	The load balancer needs a pool of workers and a single channel to which requesters can report task completion.

	.code waza/load2 /type.Pool/,/^}/

	* Balancer function

	Easy!

	.code waza/load2 /func.*balance/,/^}/

	Just need to implement dispatch and completed.

	* A heap of channels

	Make Pool an implementation of the `Heap` interface by providing a few methods such as:

	.code waza/load2 /func.*Less/,/^}/

	Now we balance by making the `Pool` a heap tracked by load.

	* Dispatch

	All the pieces are in place.

	.code waza/load2 /Send.Request/,/^}/

	* Completed

	.code waza/load2 /Job.is.complete/,/^}/

	* Lesson

	A complex problem can be broken down into easy-to-understand components.

	The pieces can be composed concurrently.

	The result is easy to understand, efficient, scalable, and correct.

	Maybe even parallel.

	* One more example

	We have a replicated database and want to minimize latency by asking them all and returning the first response to arrive.

	* Query a replicated database

	.code waza/snippets /func.Query/,/^}/
	Concurrent tools and garbage collection make this an easy solution to a subtle problem.

	(Teardown of late finishers is left as an exercise.)


	* Conclusion


	Concurrency is powerful.

	Concurrency is not parallelism.

	Concurrency enables parallelism.

	Concurrency makes parallelism (and scaling and everything else) easy.

	* For more information

	Go: golang.org

	Some history: swtch.com/~rsc/thread/

	A previous talk (video): tinyurl.com/newsqueak1

	Parallelism is not concurrency (Harper): tinyurl.com/pincharper

	A concurrent window system (Pike): tinyurl.com/pikecws

	Concurrent power series (McIlroy): tinyurl.com/powser

	And finally, parallel but not concurrent:
	research.google.com/archive/sawzall.html