blob: cf9f3532e427b9fddeb2dc90f3c103232bb29a60 [file] [log] [blame]
Go at Google
SPLASH, Tucson, Oct 25, 2012
Rob Pike
Google, Inc.
https://go.dev
* Preamble
* What is Go?
Go is:
- open source
- concurrent
- garbage-collected
- efficient
- scalable
- simple
- fun
- boring (to some)
.link / go.dev
* History
Design began in late 2007.
Key players:
- Robert Griesemer, Rob Pike, Ken Thompson
- Later: Ian Lance Taylor, Russ Cox
Became open source in November 2009.
Developed entirely in the open; very active community.
Language stable as of Go 1, early 2012.
* Go at Google
Go is a programming language designed by Google to help solve Google's problems.
Google has big problems.
* Big hardware
.image splash/datacenter.jpg
* Big software
- C++ (mostly) for servers, plus lots of Java and Python
- thousands of engineers
- gazillions of lines of code
- distributed build system
- one tree
And of course:
- zillions of machines, which we treat as a modest number of compute clusters
Development at Google can be slow, often clumsy.
But it _is_ effective.
* The reason for Go
Goals:
- eliminate slowness
- eliminate clumsiness
- improve effectiveness
- maintain (even improve) scale
Go was designed by and for people who write—and read and debug and maintain—large software systems.
Go's purpose is _not_ research into programming language design.
Go's purpose is to make its designers' programming lives better.
* Today's theme
A talk about software engineering more than language design.
To be more accurate:
- Language design in the service of software engineering.
In short:
- How does a language help software engineering?
* Features?
Reaction upon launch:
My favorite feature isn't in Go! Go Sucks!
This misses the point.
* Pain
What makes large-scale development hard with C++ or Java (at least):
- slow builds
- uncontrolled dependencies
- each programmer using a different subset of the language
- poor program understanding (documentation, etc.)
- duplication of effort
- cost of updates
- version skew
- difficulty of automation (auto rewriters etc.): tooling
- cross-language builds
Language _features_ don't usually address these.
* Focus
In the design of Go, we tried to focus on solutions to _these_ problems.
Example: indentation for structure vs. C-like braces
* Dependencies in C and C++
* A personal history of dependencies in C
`#ifdef`
`#ifndef` "guards":
#ifndef _SYS_STAT_H_
1984: `<sys/stat.h>` times 37
ANSI C and `#ifndef` guards:
- dependencies accumulate
- throw includes at the program until it compiles
- no way to know what can be removed
* A personal history of dependencies in Plan 9's C
Plan 9, another take:
- no `#ifdefs` (or `#ifndefs`)
- documentation and topological sort
- easy to find out what can be removed
Need to document dependencies, but much faster compilation.
In short:
- _ANSI_C_made_a_costly_mistake_ in requiring `#ifndef` guards.
* A personal history of dependencies in C++
C++ exacerbated the problem:
- one `#include` file per class
- code (not just declarations) in `#include` files
- `#ifndef` guards persist
2004: Mike Burrows and Chubby: `<xxx>` times 37,000
1984: Tom Cargill and pi
* A personal history of dependencies at Google
Plan 9 demo: a story
Early Google: one `Makefile`
2003: `Makefile` generated from per-directory `BUILD` files
- explicit dependencies
- 40% smaller binaries
Dependencies still not checkable!
* Result
To build a large Google binary on a single computer is impractical.
In 2007, instrumented the build of a major Google binary:
- 2000 files
- 4.2 megabytes
- 8 gigabytes delivered to compiler
- 2000 bytes sent to compiler for every C++ source byte
- it's real work too: `<string>` for example
- hours to build
* Tools can help
New distributed build system:
- no more `Makefile` (still uses `BUILD` files)
- many buildbots
- much caching
- much complexity (a large program in its own right)
Even with Google's massive distributed build system, a large build still takes minutes.
(In 2007 that binary took 45 minutes; today, 27 minutes.)
Poor quality of life.
* Enter Go
While that build runs, we have time to think.
Want a language to improve the quality of life.
And dependencies are only one such problem....
* Primary considerations
Must work at scale:
- large programs
- large teams
- large number of dependencies
Must be familiar, roughly C-like
* Modernize
The old ways are _old_.
Go should be:
- suitable for multicore machines
- suitable for networked machines
- suitable for web stuff
* The design of Go
From a software engineering perspective.
* Dependencies in Go
* Dependencies
Dependencies are defined (syntactically) in the language.
Explicit, clear, computable.
import "encoding/json"
Unused dependencies cause error at compile time.
Efficient: dependencies traversed once per source file...
* Hoisting dependencies
Consider:
`A` imports `B` imports `C` but `A` does not directly import `C`.
The object code for `B` includes all the information about `C` needed to import `B`.
Therefore in `A` the line
import "B"
does not require the compiler to read `C` when compiling `A`.
Also, the object files are designed so the "export" information comes first; compiler doing import does not need to read whole file.
Exponentially less data read than with `#include` files.
With Go in Google, about 40X fanout (recall C++ was 2000x)
Plus in C++ it's general code that must be parsed; in Go it's just export data.
* No circular imports
Circular imports are illegal in Go.
The big picture in a nutshell:
- occasional minor pain,
- but great reduction in annoyance overall
- structural typing makes it less important than with type hierarchies
- keeps us honest!
Forces clear demarcation between packages.
Simplifies compilation, linking, initialization.
* API design
Through the design of the standard library, great effort spent on controlling dependencies.
It can be better to copy a little code than to pull in a big library for one function.
(A test in the system build complains if new core dependencies arise.)
Dependency hygiene trumps code reuse.
Example:
The (low-level) `net` package has own `itoa` to avoid dependency on the big formatted I/O package.
* Packages
* Packages
Every Go source file, e.g. `"encoding/json/json.go"`, starts
package json
where `json` is the "package name", an identifier.
Package names are usually concise.
To use a package, need to identify it by path:
import "encoding/json"
And then the package name is used to qualify items from package:
var dec = json.NewDecoder(reader)
Clarity: can always tell if name is local to package from its syntax: `Name` vs. `pkg.Name`.
(More on this later.)
Package combines properties of library, name space, and module.
* Package paths are unique, not package names
The path is `"encoding/json"` but the package name is `json`.
The path identifies the package and must be unique.
Project or company name at root of name space.
import "google/base/go/log"
Package name might not be unique; can be overridden. These are both `package`log`:
import "log" // Standard package
import googlelog "google/base/go/log" // Google-specific package
Every company might have its own `log` package; no need to make the package name unique.
Another: there are many `server` packages in Google's code base.
* Remote packages
Package path syntax works with remote repositories.
The import path is just a string.
Can be a file, can be a URL:
go get github.com/4ad/doozer // Command to fetch package
import "github.com/4ad/doozer" // Doozer client's import statement
var client doozer.Conn // Client's use of package
* Go's Syntax
* Syntax
Syntax is not important...
- unless you're programming
- or writing tools
Tooling is essential, so Go has a clean syntax.
Not super small, just clean:
- regular (mostly)
- only 25 keywords
- straightforward to parse (no type-specific context required)
- easy to predict, reason about
* Declarations
Uses Pascal/Modula-style syntax: name before type, more type keywords.
var fn func([]int) int
type T struct { a, b int }
not
int (*fn)(int[]);
struct T { int a, b; }
Easier to parse—no symbol table needed. Tools become easier to write.
One nice effect: can drop `var` and derive type of variable from expression:
var buf *bytes.Buffer = bytes.NewBuffer(x) // explicit
buf := bytes.NewBuffer(x) // derived
For more information:
.link /s/decl-syntax go.dev/s/decl-syntax
* Function syntax
Function on type `T`:
func Abs(t T) float64
Method of type `T`:
func (t T) Abs() float64
Variable (closure) of type `T`:
negAbs := func(t T) float64 { return -Abs(t) }
In Go, functions can return multiple values. Common case: `error`.
func ReadByte() (c byte, err error)
c, err := ReadByte()
if err != nil { ... }
More about errors later.
* No default arguments
Go does not support default function arguments.
Why not?
- too easy to throw in defaulted args to fix design problems
- encourages too many args
- too hard to understand the effect of the fn for different combinations of args
Extra verbosity may happen but that encourages extra thought about names.
Related: Go has easy-to-use, type-safe support for variadic functions.
* Naming
* Export syntax
Simple rule:
- upper case initial letter: `Name` is visible to clients of package
- otherwise: `name` (or `_Name`) is not visible to clients of package
Applies to variables, types, functions, methods, constants, fields....
That Is It.
Not an easy decision.
One of the most important things about the language.
Can see the visibility of an identifier without discovering the declaration.
Clarity.
* Scope
Go has very simple scope hierarchy:
- universe
- package
- file (for imports only)
- function
- block
* Locality of naming
Nuances:
- no implicit `this` in methods (receiver is explicit); always see `rcvr.Field`
- package qualifier always present for imported names
- (first component of) every name is always declared in current package
No surprises when importing:
- adding an exported name to my package cannot break your package!
Names do not leak across boundaries.
In C, C++, Java the name `y` could refer to anything
In Go, `y` (or even `Y`) is always defined within the package.
In Go, `x.Y` is clear: find `x` locally, `Y` belongs to it.
* Function and method lookup
Method lookup by name only, not type.
A type cannot have two methods with the same name, ever.
Easy to identify which function/method is referred to.
Simple implementation, simpler program, fewer surprises.
Given a method `x.M`, there's only ever one `M` associated with `x`.
* Semantics
* Basics
Generally C-like:
- statically typed
- procedural
- compiled
- pointers etc.
Should feel familiar to programmers from the C family.
* But...
Many small changes in the aid of robustness:
- no pointer arithmetic
- no implicit numeric conversions
- array bounds checking
- no type aliases
- `++` and `--` as statements not expressions
- assignment not an expression
- legal (encouraged even) to take address of stack variable
- many more
Plus some big ones...
* Bigger things
Some elements of Go step farther from C, even C++ and Java:
- concurrency
- garbage collection
- interface types
- reflection
- type switches
* Concurrency
* Concurrency
Important to modern computing environment.
Not well served by C++ or even Java.
Go embodies a variant of CSP with first-class channels.
Why CSP?
- The rest of the language can be ordinary and familiar.
Must be able to couple concurrency with computation.
Example: concurrency and cryptography.
* CSP is practical
For a web server, the canonical Go program, the model is a great fit.
Go _enables_ simple, safe concurrent programming.
It doesn't _forbid_ bad programming.
Focus on _composition_ of regular code.
Caveat: not purely memory safe; sharing is legal.
Passing a pointer over a channel is idiomatic.
Experience shows this is a practical design.
* Garbage collection
* The need for garbage collection
Too much programming in C and C++ is about memory allocation.
But also the design revolves too much about memory management.
Leaky abstractions, leaky dependencies.
Go has garbage collection, only.
Needed for concurrency: tracking ownership too hard otherwise.
Important for abstraction: separate behavior from resource management.
A key part of scalability: APIs remain local.
Use of the language is much simpler because of GC.
Adds run-time cost, latency, complexity to the implementation.
Day 1 design decision.
* Garbage collection in Go
A garbage-collected systems language is heresy!
Experience with Java: Uncontrollable cost, too much tuning.
But Go is different.
Go lets you limit allocation by controlling memory layout.
Example:
type X struct {
a, b, c int
buf [256]byte
}
Example: Custom arena allocator with free list.
* Interior pointers
Early decision: allow interior pointers such as `X.buf` from previous slide.
Tradeoff: Affects which GC algorithms that can be used, but in return reduces pressure on the collector.
Gives the _programmer_ tools to control GC overhead.
Experience, compared to Java, shows it has significant effect on memory pressure.
GC remains an active subject.
Current design: parallel mark-and-sweep.
With care to use memory wisely, works well in production.
* Interfaces
Composition not inheritance
* Object orientation and big software
Go is object-oriented.
Go does not have classes or subtype inheritance.
What does this mean?
* No type hierarchy
O-O is important because it provides uniformity of interface.
Outrageous example: the Plan 9 kernel.
Problem: subtype inheritance encourages _non-uniform_ interfaces.
* O-O and program evolution
Design by type inheritance oversold.
Generates brittle code.
Early decisions hard to change, often poorly informed.
Makes every programmer an interface designer.
(Plan 9 was built around a single interface everything needed to satisfy.)
Therefore encourages overdesign early on: predict every eventuality.
Exacerbates the problem, complicates designs.
* Go: interface composition
In Go an interface is _just_ a set of methods:
type Hash interface {
Write(p []byte) (n int, err error)
Sum(b []byte) []byte
Reset()
Size() int
BlockSize() int
}
No `implements` declaration.
All hash implementations satisfy this implicitly. (Statically checked.)
* Interfaces in practice: composition
Tend to be small: one or two methods are common.
Composition falls out trivially. Easy example, from package `io`:
type Reader interface {
Read(p []byte) (n int, err error)
}
`Reader` (plus the complementary `Writer`) makes it easy to chain:
- files, buffers, networks, encryptors, compressors, GIF, JPEG, PNG, ...
Dependency structure is not a hierarchy; these also implement other interfaces.
Growth through composition is _natural_, does not need to be pre-declared.
And that growth can be _ad_hoc_ and linear.
* Compose with functions, not methods
Hard to overstate the effect that Go's interfaces have on program design.
One big effect: functions with interface arguments.
func ReadAll(r io.Reader) ([]byte, error)
Wrappers:
func LoggingReader(r io.Reader) io.Reader
func LimitingReader(r io.Reader, n int64) io.Reader
func ErrorInjector(r io.Reader) io.Reader
The designs are nothing like hierarchical, subtype-inherited methods.
Much looser, organic, decoupled, independent.
* Errors
* Error handling
Multiple function return values inform the design for handling errors.
Go has no `try-catch` control structures for exceptions.
Return `error` instead: built-in interface type that can "stringify" itself:
type error interface { Error() string }
Clear and simple.
Philosophy:
Forces you think about errors—and deal with them—when they arise.
Errors are _normal_. Errors are _not_exceptional_.
Use the existing language to compute based on them.
Don't need a sublanguage that treats them as exceptional.
Result is better code (if more verbose).
* (OK, not all errors are normal. But most are.)
.image splash/fire.jpg
* Tools
* Tools
Software engineering requires tools.
Go's syntax, package design, naming, etc. make tools easy to write.
Standard library includes lexer and parser; type checker nearly done.
* Gofmt
Always intended to do automatic code formatting.
Eliminates an entire class of argument.
Runs as a "presubmit" to the code repositories.
Training:
- The community has always seen `gofmt` output.
Sharing:
- Uniformity of presentation simplifies sharing.
Scaling:
- Less time spent on formatting, more on content.
Often cited as one of Go's best features.
* Gofmt and other tools
Surprise: The existence of `gofmt` enabled _semantic_ tools:
Can rewrite the tree; `gofmt` will clean up output.
Examples:
- `gofmt`-r`'a[b:len(a)]`->`a[b:]'`
- `gofix`
And good front-end libraries enable ancillary tools:
- `godoc`
- `go`get`, `go`build`, etc.
- `api`
* Gofix
The `gofix` tool allowed us to make sweeping changes to APIs and language features leading up to the release of Go 1.
- change to map deletion syntax
- new time API
- many more
Also allows us to _update_ code even if the old code still works.
Recent example:
Changed Go's protocol buffer implementation to use getter functions; updated _all_ Google Go code to use them with `gofix`.
* Conclusion
* Go at Google
Go's use is growing inside Google.
Several big services use it:
- golang.org
- youtube.com
- dl.google.com
Many small ones do, many using Google App Engine.
* Go outside Google
Many outside companies use it, including:
- BBC Worldwide
- Canonical
- Heroku
- Nokia
- SoundCloud
* What's wrong?
Not enough experience yet to know if Go is truly successful.
Not enough big programs.
Some minor details wrong. Examples:
- declarations still too fussy
- `nil` is overloaded
- lots of library details
`Gofix` and `gofmt` gave us the opportunity to fix many problems, ranging from eliminating semicolons to redesigning the `time` package.
But we're still learning (and the language is frozen for now).
The implementation still needs work, the run-time system in particular.
But all indicators are positive.
* Summary
Software engineering guided the design.
But a productive, fun language resulted because that design enabled productivity.
Clear dependencies
Clear syntax
Clear semantics
Composition not inheritance
Simplicity of model (GC, concurrency)
Easy tooling (the `go` tool, `gofmt`, `godoc`, `gofix`)
* Try it!
.link / go.dev
.image splash/appenginegophercolor.jpg