blob: dc1acd4e212a7eb6883fa2e4b20eee6bf528c9f0 [file] [log] [blame]
Go, for Distributed Systems
Russ Cox
Google
* About the Talk
I gave variants of this talk three times in 2013, once at SOSP's Programming Languages and Operating Systems (PLOS) workshop, once at MIT Lincoln Lab's annual Software Engineering Symposium, and once at Twitter's Cambridge, Massachusetts office.
The talk assumes an audience familiar with the basic problems of building distributed systems. It presents Go's approach to solving some of those problems.
* Go
Go is an open source programming language that makes it easy to build simple, reliable, and efficient software.
* History
Design began in late 2007.
- Robert Griesemer, Rob Pike, Ken Thompson
- Ian Lance Taylor, Russ Cox
Became open source in November 2009.
Developed entirely in the open; very active community.
Language stable as of Go 1, early 2012.
* Motivation
Started as an answer to software problems at Google:
- multicore processors
- networked systems
- massive computation clusters
- scale: 10 lines of code
- scale: 10³ programmers
- scale: 10⁶⁺ machines (design point)
* Go
A simple but powerful and fun language.
- start with C, remove complex parts
- add interfaces, concurrency
- also: garbage collection, closures, reflection, strings, ...
For more background on design:
- [[http://commandcenter.blogspot.com/2012/06/less-is-exponentially-more.html][Less is exponentially more]]
- [[http://talks.golang.org/2012/splash.article][Go at Google: Language Design in the Service of Software Engineering]]
* This Talk
Engineering
Interfaces
Concurrency
* Engineering
* Engineering: Imports
.play distsys/hello0.go
import "fmt" guaranteed to read exactly one file.
* Engineering: Imports
$ go get github.com/golang/glog
.play distsys/hello1.go
Still guaranteed to read exactly one file.
Import name space is decentralized.
* Engineering: Program Rewrites
$ gofmt -r 'glog.Infof -> glog.Errorf' hello1.go
package main
import (
"flag"
"github.com/golang/glog"
)
func main() {
flag.Set("logtostderr", "true")
glog.Errorf("hello, world")
}
$
* Engineering: Garbage Collection
In C and C++, too much programming _and_ API design is about memory management.
Go has garbage collection, only.
Fundamental for interfaces: memory management details do not bifurcate otherwise-similar APIs.
Fundamental for concurrency: too hard to track ownership otherwise.
Of course, adds cost, latency, complexity in run time system.
* Engineering: Garbage Collection
Experience with Java: Uncontrollable cost, too much tuning.
Go lets you limit allocation by controlling memory layout.
Examples:
type Ring struct {
R, W int
Data [512]byte
}
type Point struct {
X, Y int
}
type Rectangle struct {
Min, Max Point
}
* Engineering: Garbage Collection
Garbage collector implementation remains an active area of work and research.
Design decision: Interior pointers are allowed, as are foreign pointers.
- Cannot reuse Java GC algorithms directly.
- But gives _programmer_ more control over allocation.
Current design: parallel mark-and-sweep.
With care to use memory wisely, works well in production.
* Interfaces
* Interfaces
An interface defines a set of methods.
package io
type Writer interface {
Write(data []byte) (n int, err error)
}
* Interfaces
A type implements the interface by implementing the methods.
package bytes
type Buffer struct {
...
}
func (b *Buffer) Write(data []byte) (n int, err error) {
...
}
* Interfaces
An implementation of an interface can be assigned to a variable of that interface type.
package fmt
func Fprintf(w io.Writer, format string, args ...interface{})
* Interfaces
.play distsys/writebuffer.go /^func.main/+1,/^}/-1
* Interfaces
Reader is the obvious counterpart.
package io
type Reader interface {
Read(data []byte) (n int, err error)
}
func Copy(dst Writer, src Reader) (n int64, err error)
* Interfaces
.play distsys/writebuffer2.go /^func.main/+1,/^}/-1
* Interfaces
Reader and Writer turn out to be very useful.
Adapters
package io
func MultiWriter(writers ...Writer) Writer
MultiWriter creates a writer that duplicates its writes to all the
provided writers, similar to the Unix tee(1) command.
Chaining
package gzip // compress/gzip
func NewWriter(w io.Writer) *Writer
Also: buffered writers, encrypted writers, limited writers, HTTP responses.
* Interfaces
Networking:
package net
type Conn interface {
Read(b []byte) (n int, err error)
Write(b []byte) (n int, err error)
Close() error
LocalAddr() Addr
RemoteAddr() Addr
SetDeadline(t time.Time) error
SetReadDeadline(t time.Time) error
SetWriteDeadline(t time.Time) error
}
func Dial(network, address string) (Conn, error)
* Interfaces
Networking example:
.play distsys/finger.go /^func.finger/+1,/^}/-1
* Interfaces
Networking client as adapter function:
package smtp
func NewClient(conn net.Conn, host string) (*Client, error)
Other implementations of net.Conn: testing, SSL, ...
* Interface Lessons
Key advantages:
- no dependence between interface and implementation
- expressive composition
- easy testing
- avoids overdesign, rigid hierarchy of inheritance-based OO
The source of all generality in the Go language.
* Concurrency
* Concurrency vs Parallelism
Concurrency is about dealing with lots of things at once.
Parallelism is about doing lots of things at once.
Concurrency is about structure, parallelism is about execution.
Concurrency provides a way to structure a solution to solve a problem that may be parallelizable (or not).
* Concurrency vs Parallelism
Concurrent: mouse, keyboard, display, and disk drivers in operating system.
Parallel: vector dot product, matrix multiply.
Concurrency can enable parallelism but is useful on its own: modern programs must deal with many things at once.
* Concurrency
Go provides two important concepts:
A goroutine is a thread of control within the program, with its own local variables and stack. Cheap, easy to create.
A channel carries typed messages between goroutines.
* Concurrency
.play distsys/hello.go
* Concurrency: CSP
Channels adopted from Hoare's Communicating Sequential Processes.
- Orthogonal to rest of language
- Can keep familiar model for computation
- Focus on _composition_ of regular code
Go _enables_ simple, safe concurrent programming.
It doesn't _forbid_ bad programming.
Caveat: not purely memory safe; sharing is legal.
Passing a pointer over a channel is idiomatic.
Experience shows this is practical.
* Concurrency
Sequential network address resolution, given a work list:
.play distsys/addr1.go /lookup/+1,/^}/-1
* Concurrency
Parallel network address resolution, given a work list:
.play distsys/addr2.go /lookup/+1,/^}/-1
* Concurrency
Aside: can abstract this pattern.
.play distsys/addr3.go /lookup/+1,/^}/-1
* Concurrency
Aside: can abstract this pattern further (hypothetical):
var par ParallelDo
for _, w := range worklist {
w := w // copy iteration variable
par.Do(func() {
w.addrs, w.err = net.LookupHost(w.host)
})
)
par.Wait()
But it's still useful to be able to construct alternate patterns.
* Concurrency
Bounded parallelism:
.play distsys/addr4.go /lookup/+1,/^}/-1
* Concurrency
Bounded parallelism, 2:
.play distsys/addr5.go /lookup/+1,/^}/-1
* Concurrency:
Aside: can abstract (still hypothetical):
par.Limit(10)
for _, w := range work {
w := w // copy iteration variable
par.Do(func() {
w.addrs, w.err = net.LookupHost(w.host)
})
)
par.Wait()
* Concurrency
Example: replicated storage with read and write quorums.
const (
F = 2
N = 5 // >= 2F + 1
ReadQuorum = F + 1
WriteQuorum = N - F
)
* Concurrency
Replicated write, returning after enough writes have succeeded.
.play distsys/replwrite.go /^func.Write/+2,/if.delay/-2/
* Concurrency
Replicated read, returning after enough reads have been gathered.
.play distsys/replread.go /^func.Read/+2,/if.delay/-2/
* Concurrency
Select allows choosing between multiple channel operations.
Example, chat program:
for {
select {
case event := <-ui:
// process user interface event
case msg := <-server:
// process server message
case t := <-tick:
// time has elapsed
}
}
* Concurrency Lessons
- Key feature for building distributed systems.
- Supported by closures and garbage collection.
- Message passing inside program, also outside program.
Most important:
- Do not communicate by sharing memory.
- Instead, share memory by communicating.
* Production Use
vitess/vtocc, MySQL query balancer
- serves all of YouTube's MySQL queries
- months of crash-free and leak-free operation
groupcache
- distributed in-memory immutable key-value cache
- used by (parts of) dl.google.com, Blogger, Google Code, Google Fiber, production monitoring systems
* More information and related talks
http://golang.org/
rsc@golang.org
Videos:
- Concurrency is not parallelism [[http://vimeo.com/49718712][video]]
- Go concurrency patterns [[http://www.youtube.com/watch?v=f6kdp27TYZs][video]]
- Advanced Go concurrency patterns [[http://www.youtube.com/watch?v=QDDwwePbDtw][video]]
Questions?