blob: 4498d7c9e2fff66e4d5b927824ff857575572a85 [file] [log] [blame]
The Research Problems of Implementing Go
Russ Cox
Google
http://golang.org/
* About the Talk
I gave this talk at Google's Cambridge, Massachusetts office at an event for area Ph.D. students. The purpose of the event and the talk was to give a sense of some of the research that goes on at Google. The talk presents some research questions motivated by Go. We have answered some well, but others remain open.
* About Go
Go is an open source programming language that makes it easy to build simple, reliable, and efficient software.
Design began in late 2007.
- Robert Griesemer, Rob Pike, Ken Thompson
- Russ Cox, Ian Lance Taylor
Became open source in November 2009.
Developed entirely in the open; very active community.
Language stable as of Go 1, early 2012.
Work continues.
* Motivation for Go
.image ../2012/splash/datacenter.jpg
* Motivation for Go
Started as an answer to software problems at Google:
- multicore processors
- networked systems
- massive computation clusters
- scale: 10⁷ lines of code
- scale: 10³ programmers
- scale: 10⁶⁺ machines (design point)
Deployed: parts of YouTube, dl.google.com, Blogger, Google Code, Google Fiber, ...
* Go
A simple but powerful and fun language.
- start with C, remove complex parts
- add interfaces, concurrency
- also: garbage collection, closures, reflection, strings, ...
For more background on design:
- [[http://commandcenter.blogspot.com/2012/06/less-is-exponentially-more.html][Less is exponentially more]]
- [[http://talks.golang.org/2012/splash.article][Go at Google: Language Design in the Service of Software Engineering]]
* Research and Go
Go is designed for building production systems at Google.
- Goal: make that job easier, faster, better.
- Non-goal: break new ground in programming language research
Plenty of research questions about how to implement Go well.
- Concurrency
- Polymorphism
- Garbage collection
- Program translation
* Concurrency
* Concurrency
Go provides two important concepts:
A goroutine is a thread of control within the program, with its own local variables and stack. Cheap, easy to create.
A channel carries typed messages between goroutines.
* Concurrency
.play ../2013/distsys/hello.go
* Concurrency: CSP
Channels adopted from Hoare's Communicating Sequential Processes.
- Orthogonal to rest of language
- Can keep familiar model for computation
- Focus on _composition_ of regular code
Go _enables_ simple, safe concurrent programming.
It doesn't _forbid_ bad programming.
Caveat: not purely memory safe; sharing is legal.
Passing a pointer over a channel is idiomatic.
Experience shows this is practical.
* Concurrency
Sequential network address resolution, given a work list:
.play ../2013/distsys/addr1.go /lookup/+1,/^}/-1
* Concurrency
Parallel network address resolution, given a work list:
.play ../2013/distsys/addr2.go /lookup/+1,/^}/-1
* Implementing Concurrency
Challenge: Make channel communication scale
- start with one global channel lock
- per-channel locks, locked in address order for multi-channel operations
Research question: lock-free channels?
* Polymorphism
* Interfaces
An interface defines a set of methods.
package io
type Writer interface {
Write(data []byte) (n int, err error)
}
* Interfaces
A type implements the interface by implementing the methods.
package bytes
type Buffer struct {
...
}
func (b *Buffer) Write(data []byte) (n int, err error) {
...
}
* Interfaces
An implementation of an interface can be assigned to a variable of that interface type.
package fmt
func Fprintf(w io.Writer, format string, args ...interface{})
* Interfaces
.play ../2013/distsys/writebuffer.go /^func.main/+1,/^}/-1
* Interface Advantages
- no dependence between interface and implementation
- easy testing
- avoids overdesign, rigid hierarchy of inheritance-based OO
The source of all generality in the Go language.
* Implementing Interfaces
How do you make method dispatch efficient?
b := new(bytes.Buffer)
var w io.Writer
w = b
fmt.Fprintf(w, "hello, %s\n", "world")
... w.Write(text) // what happens here?
At w.Write call, how does the runtime find the method to call?
* Implementing Interfaces
How do you make method dispatch efficient?
b := new(bytes.Buffer)
var w io.Writer
w = b // do the work here!
fmt.Fprintf(w, "hello, %s\n", "world")
... w.Write(text) // plain indirect function call
Interface holds two words: "itable" and actual value (or pointer to value).
Itable contains type information plus list of function pointers for methods in interface.
w.itable.fn[1](w.data, text)
Conversion sites usually trivial to cache.
* Interfaces for Algorithms
package sort
type Interface interface {
Len() int // return number of elements, len(x)
Less(i, j int) bool // report whether x[i] < x[j]
Swap(i, j int) // x[i], x[j] = x[j], x[i]
}
func Sort(data Interface)
Requires some boilerplate for each use:
type bySubject []Thread
func (x bySubject) Less(i, j int) bool { return x[i].Subject < x[j].Subject }
func (x bySubject) Swap(i, j int) { x[i], x[j] = x[j], x[i] }
func (x bySubject) Len() int { return len(x) }
sort.Sort(bySubject(threads))
* Polymorphism: Can we do better?
func Sort(data []T, less func(x, y *T) bool)
sort.Sort(threads, func(x, y *Thread) bool {
return x.Subject < y.Subject
})
Research question: what's a reasonable semantics?
Research question: what's a reasonable implementation?
- C says don't bother.
- C++ makes many copies of the same function.
- Java boxes everything implicitly: one function, but expensive data model.
- Java discards run-time type information.
Do you want slow programmers, slow compilers and bloated binaries, or slow execution?
* Garbage Collection
* Garbage Collection
Garbage collection simplifies APIs.
- In C and C++, too much API design (and too much programming effort!) is about memory management.
Fundamental to concurrency: too hard to track ownership otherwise.
Fundamental to interfaces: memory management details do not bifurcate otherwise-similar APIs.
Of course, adds cost, latency, complexity in run time system.
* Avoiding Garbage Collection
Observation: garbage collection is a service, and like any service it can be overloaded, oversubscribed.
Go lets you limit allocation by controlling memory layout.
type Point struct {
X, Y int
}
type Rectangle struct {
Min, Max Point
}
* Implementing Garbage Collection
Language decision: interior pointers are allowed, as are foreign pointers
- Cannot reuse Java GC algorithms directly.
- But gives _programmer_ more control over allocation.
Allocator: objects are allocated in pages with other objects of the same size.
Current GC: stop the world, parallel mark, start the world, background sweep.
Research question: how to make collector lower latency, possibly incremental?
* Program Translation
* Program Translation
Go programs can be parsed without context (unlike C and C++).
Go ships with a standard program formatter.
Makes automated edits indistinguishable from manual edits.
$ cat x.go
package main
var b bytes.Buffer
$ gofmt -r 'bytes.Buffer -> bytes.Writer' x.go
package main
var b bytes.Writer
$
More advanced rewrites: "go fix" for API adjustments.
* Program Translation
Research Question: What about translating other programs to Go?
Exploring the conversion of C programs to Go today.
- Decide return type (for example, int vs bool).
- Decide which variables are pointers vs arrays.
- Decide which functions are really methods.
- Decide natural package boundaries.
What about other languages?
* Research and Go
Plenty of research questions about how to implement Go well.
- Concurrency
- Polymorphism
- Garbage collection
- Program translation