_content/talks/2012/splash.article - website - Git at Google

 Go at Google: Language Design in the Service of Software Engineering

 Rob Pike
 Google, Inc.
 @rob_pike
 http://golang.org/s/plusrob
 http://golang.org

 * Abstract

 (This is a modified version of the keynote talk given by Rob Pike
 at the SPLASH 2012 conference in Tucson, Arizona, on October 25, 2012.)

 The Go programming language was conceived in late 2007 as an answer to
 some of the problems we were seeing developing software infrastructure
 at Google.
 The computing landscape today is almost unrelated to the environment
 in which the languages being used, mostly C++, Java, and Python, had
 been created.
 The problems introduced by multicore processors, networked systems,
 massive computation clusters, and the web programming model were being
 worked around rather than addressed head-on.
 Moreover, the scale has changed: today's server programs comprise tens
 of millions of lines of code, are worked on by hundreds or even
 thousands of programmers, and are updated literally every day.
 To make matters worse, build times, even on large compilation
 clusters, have stretched to many minutes, even hours.

 Go was designed and developed to make working in this environment more
 productive.
 Besides its better-known aspects such as built-in concurrency and
 garbage collection, Go's design considerations include rigorous
 dependency management, the adaptability of software architecture as
 systems grow, and robustness across the boundaries between components.

 This article explains how these issues were addressed while building
 an efficient, compiled programming language that feels lightweight and
 pleasant.
 Examples and explanations will be taken from the real-world problems
 faced at Google.

 * Introduction

 Go is a compiled, concurrent, garbage-collected, statically typed language
 developed at Google.
 It is an open source project: Google
 imports the public repository rather than the other way around.

 Go is efficient, scalable, and productive. Some programmers find it fun
 to work in; others find it unimaginative, even boring.
 In this article we
 will explain why those are not contradictory positions.
 Go was designed to address the problems faced in software development
 at Google, which led to a language that is not a breakthrough research language
 but is nonetheless an excellent tool for engineering large software projects.

 * Go at Google

 Go is a programming language designed by Google to help solve Google's problems, and Google has big problems.

 The hardware is big and the software is big.
 There are many millions of lines of software, with servers mostly in C++
 and lots of Java and Python for the other pieces.
 Thousands of engineers work on the code,
 at the "head" of a single tree comprising all the software,
 so from day to day there are significant changes to all levels of the tree.
 A large
 [[http://google-engtools.blogspot.com/2011/06/build-in-cloud-accessing-source-code.html][custom-designed distributed build system]]
 makes development at this scale feasible, but it's still big.

 And of course, all this software runs on zillions of machines, which are treated as a modest number of independent, networked compute clusters.

 .image splash/datacenter.jpg

 In short, development at Google is big, can be slow, and is often clumsy. But it _is_ effective.

 The goals of the Go project were to eliminate the slowness and clumsiness of software development at Google,
 and thereby to make the process more productive and scalable.
 The language was designed by and for people who write—and read and debug and maintain—large software systems.

 Go's purpose is therefore _not_ to do research into programming language design;
 it is to improve the working environment for its designers and their coworkers.
 Go is more about software engineering than programming language research.
 Or to rephrase, it is about language design in the service of software engineering.

 But how can a language help software engineering?
 The rest of this article is an answer to that question.

 * Pain points

 When Go launched, some claimed it was missing particular features or methodologies that were regarded as _de_rigueur_ for a modern language.
 How could Go be worthwhile in the absence of these facilities?
 Our answer to that is that the properties Go _does_ have address the issues that make large-scale software development difficult.
 These issues include:

 - slow builds
 - uncontrolled dependencies
 - each programmer using a different subset of the language
 - poor program understanding (code hard to read, poorly documented, and so on)
 - duplication of effort
 - cost of updates
 - version skew
 - difficulty of writing automatic tools
 - cross-language builds

 Individual features of a language don't address these issues.
 A larger view of software engineering is required, and
 in the design of Go we tried to focus on solutions to _these_ problems.

 As a simple, self-contained example, consider the representation of program structure.
 Some observers objected to Go's C-like block structure with braces, preferring the use of spaces for indentation, in the style of Python or Haskell.
 However, we have had extensive experience tracking down build and test failures caused by cross-language builds where a Python snippet embedded in another language,
 for instance through a SWIG invocation,
 is subtly and _invisibly_ broken by a change in the indentation of the surrounding code.
 Our position is therefore that, although spaces for indentation is nice for small programs, it doesn't scale well,
 and the bigger and more heterogeneous the code base, the more trouble it can cause.
 It is better to forgo convenience for safety and dependability, so Go has brace-bounded blocks.

 * Dependencies in C and C++

 A more substantial illustration of scaling and other issues arises in the handling of package dependencies.
 We begin the discussion with a review of how they work in C and C++.

 ANSI C, first standardized in 1989, promoted the idea of  `#ifndef` "guards" in the standard header files.
 The idea, which is ubiquitous now, is that each header file be bracketed with a conditional compilation clause so that the file may be included multiple times without error.
 For instance, the Unix header file `<sys/stat.h>` looks schematically like this:

 	/* Large copyright and licensing notice */
 	#ifndef _SYS_STAT_H_
 	#define _SYS_STAT_H_
 	/* Types and other definitions */
 	#endif

 The intent is that the C preprocessor reads in the file but disregards the contents on
 the second and subsequent
 readings of the file.
 The symbol `_SYS_STAT_H_`, defined the first time the file is read, "guards" the invocations that follow.

 This design has some nice properties, most important that each header file can safely `#include`
 all its dependencies, even if other header files will also include them.
 If that rule is followed, it permits orderly code that, for instance, sorts the `#include`
 clauses alphabetically.

 But it scales very badly.

 In 1984, a compilation of `ps.c`, the source to the Unix `ps` command, was observed
 to `#include` `<sys/stat.h>` 37 times by the time all the preprocessing had been done.
 Even though the contents are discarded 36 times while doing so, most C
 implementations would open the file, read it, and scan it all 37 times.
 Without great cleverness, in fact, that behavior is required by the potentially
 complex macro semantics of the C preprocessor.

 The effect on software is the gradual accumulation of `#include` clauses in C programs.
 It won't break a program to add them, and it's very hard to know when they are no
 longer needed.
 Deleting a `#include` and compiling the program again isn't even sufficient to test that,
 since another `#include` might itself contain a `#include` that pulls it in anyway.

 Technically speaking, it does not have to be like that.
 Realizing the long-term problems with the use of `#ifndef` guards, the designers
 of the Plan 9 libraries took a different, non-ANSI-standard approach.
 In Plan 9, header files were forbidden from containing further `#include` clauses; all
 `#includes` were required to be in the top-level C file.
 This required some discipline, of course—the programmer was required to list
 the necessary dependencies exactly once, in the correct order—but documentation
 helped and in practice it worked very well.
 The result was that, no matter how many dependencies a C source file had,
 each `#include` file was read exactly once when compiling that file.
 And, of course, it was also easy to see if an `#include` was necessary by taking
 it out: the edited program would compile if and only if the dependency was unnecessary.

 The most important result of the Plan 9 approach was much faster compilation: the amount of
 I/O the compilation requires can be dramatically less than when compiling a program
 using libraries with `#ifndef` guards.

 Outside of Plan 9, though, the "guarded" approach is accepted practice for C and C++.
 In fact, C++ exacerbates the problem by using the same approach at finer granularity.
 By convention, C++ programs are usually structured with one header file per class, or perhaps
 small set of related classes, a grouping much smaller than, say, `<stdio.h>`.
 The dependency tree is therefore much more intricate, reflecting not library dependencies but the full type hierarchy.
 Moreover, C++ header files usually contain real code—type, method, and template
 declarations—not just the simple constants and function signatures typical of a C header file.
 Thus not only does C++ push more to the compiler, what it pushes is harder to compile,
 and each invocation of the compiler must reprocess this information.
 When building a large C++ binary, the compiler might be taught thousands of times how to
 represent a string by processing the header file `<string>`.
 (For the record, around 1984 Tom Cargill observed that the use of the
 C preprocessor for dependency management would be a long-term liability for C++ and
 should be addressed.)

 The construction of a single C++ binary at Google can open and read hundreds of individual header files
 tens of thousands of times.
 In 2007, build engineers at Google instrumented the compilation of a major Google binary.
 The file contained about two thousand files that, if simply concatenated together, totaled 4.2 megabytes.
 By the time the `#includes` had been expanded, over 8 gigabytes were being delivered to the input of the compiler, a blow-up of 2000 bytes for every C++ source byte.

 As another data point, in 2003 Google's build system was moved from a single Makefile to a per-directory design
 with better-managed, more explicit dependencies.
 A typical binary shrank about 40% in file size, just from having more accurate dependencies recorded.
 Even so, the properties of C++ (or C for that matter) make it impractical to verify those dependencies automatically,
 and today we still do not have an accurate understanding of the dependency requirements
 of large Google C++ binaries.

 The consequence of these uncontrolled dependencies and massive scale is that it is
 impractical to build Google server binaries on a single computer, so
 a large distributed compilation system was created.
 With this system, involving many machines, much caching, and
 much complexity (the build system is a large program in its own right), builds at
 Google are practical, if still cumbersome.

 Even with the distributed build system, a large Google build can still take many minutes.
 That 2007 binary took 45 minutes using a precursor distributed build system; today's
 version of the same program takes 27 minutes, but of course the program and its
 dependencies have grown in the interim.
 The engineering effort required to scale up the build system has barely been able
 to stay ahead of the growth of the software it is constructing.

 * Enter Go

 When builds are slow, there is time to think.
 The origin myth for Go states that it was during one of those 45 minute builds
 that Go was conceived. It was believed to be worth trying to design a new language
 suitable for writing large Google programs such as web servers,
 with software engineering considerations that would improve the quality
 of life of Google programmers.

 Although the discussion so far has focused on dependencies,
 there are many other issues that need attention.
 The primary considerations for any language to succeed in this context are:

 - It must work at scale, for large programs with large numbers of dependencies, with large teams of programmers working on them.

 - It must be familiar, roughly C-like. Programmers working at Google are early in their careers and are most familiar with procedural languages, particularly from the C family. The need to get programmers productive quickly in a new language means that the language cannot be too radical.

 - It must be modern. C, C++, and to some extent Java are quite old, designed before the advent of multicore machines, networking, and web application development. There are features of the modern world that are better met by newer approaches, such as built-in concurrency.

 With that background, then, let us look at the design of Go from a software engineering perspective.

 * Dependencies in Go

 Since we've taken a detailed look at dependencies in C and C++, a good place to start
 our tour is to see how Go handles them.
 Dependencies are defined, syntactically and semantically, by the language.
 They are explicit, clear, and "computable", which is to say, easy to write tools to analyze.

 The syntax is that, after the `package` clause (the subject of the next section),
 each source file may have one or more import statements, comprising the
 `import` keyword and a string constant identifying the package to be imported
 into this source file (only):

 	import "encoding/json"

 The first step to making Go scale, dependency-wise, is that the _language_ defines
 that unused dependencies are a compile-time error (not a warning, an _error_).
 If the source file imports a package it does not use, the program will not compile.
 This guarantees by construction that the dependency tree for any Go program
 is precise, that it has no extraneous edges. That, in turn, guarantees that no
 extra code will be compiled when building the program, which minimizes
 compilation time.

 There's another step, this time in the implementation of the compilers, that
 goes even further to guarantee efficiency.
 Consider a Go program with three packages and this dependency graph:

 -  package `A` imports package `B`;
 -  package `B` imports package `C`;
 -  package `A` does _not_ import package `C`

 This means that package `A` uses `C` only transitively through its use of `B`;
 that is, no identifiers from `C` are mentioned in the source code to `A`,
 even if some of the items `A` is using from `B` do mention `C`.
 For instance, package `A` might reference a  `struct` type defined in `B` that has a field with
 a type defined in `C` but that `A` does not reference itself.
 As a motivating example, imagine that `A` imports a formatted I/O package
 `B` that uses a buffered I/O implementation provided by `C`, but that `A` does
 not itself invoke buffered I/O.

 To build this program, first, `C` is compiled;
 dependent packages must be built before the packages that depend on them.
 Then `B` is compiled; finally `A` is compiled, and then the program can be linked.

 When `A` is compiled, the compiler reads the object file for `B`, not its source code.
 That object file for `B` contains all the type information necessary for the compiler
 to execute the

 	import "B"

 clause in the source code for `A`. That information includes whatever information
 about `C` that clients of `B` will need at compile time.
 In other words, when `B` is compiled, the generated object file includes type
 information for all dependencies of `B` that affect the public interface of `B`.

 This design has the important
 effect that when the compiler executes an import clause,
 _it_opens_exactly_one_file_, the object file identified by the string in the import clause.
 This is, of course, reminiscent of the Plan 9 C (as opposed to ANSI C)
 approach to dependency management, except that, in effect, the compiler
 writes the header file when the Go source file is compiled.
 The process is more automatic and even
 more efficient than in Plan 9 C, though: the data being read when evaluating the import is just
 "exported" data, not general program source code. The effect on overall
 compilation time can be huge, and scales well as
 the code base grows. The time to execute the dependency graph, and
 hence to compile, can be exponentially less than in the "include of
 include file" model of C and C++.

 It's worth mentioning that this general approach to dependency management
 is not original; the ideas go back to the 1970s and flow through languages like
 Modula-2 and Ada. In the C family Java has elements of this approach.

 To make compilation even more efficient, the object file is arranged so the export
 data is the first thing in the file, so the compiler can stop reading as soon
 as it reaches the end of that section.

 This approach to dependency management is the single biggest reason
 why Go compilations are faster than C or C++ compilations.
 Another factor is that Go places the export data in the object file; some
 languages require the author to write or the compiler to
 generate a second file with that information. That's twice as many files
 to open. In Go there is only one file to open to import a package.
 Also, the single file approach means that the export data (or header
 file, in C/C++) can never go out of date relative to the object file.

 For the record, we measured the compilation of a large Google program
 written in Go to see how the source code fanout compared to the C++
 analysis done earlier. We found it was about 40X, which is
 fifty times better than C++ (as well as being simpler and hence faster
 to process), but it's still bigger than we expected. There are two reasons for
 this. First, we found a bug: the Go compiler was generating a substantial
 amount of data in the export section that did not need to be there. Second,
 the export data uses a verbose encoding that could be improved.
 We plan to address these issues.

 Nonetheless, a factor of fifty less to do turns minutes into seconds,
 coffee breaks into interactive builds.

 Another feature of the Go dependency graph is that it has no cycles.
 The language defines that there can be no circular imports in the graph,
 and the compiler and linker both check that they do not exist.
 Although they are occasionally useful, circular imports introduce
 significant problems at scale.
 They require the compiler to deal with larger sets of source files
 all at once, which slows down incremental builds.
 More important, when allowed, in our experience such imports end up
 entangling huge swaths of the source tree into large subpieces that are
 difficult to manage independently, bloating binaries and complicating
 initialization, testing, refactoring, releasing, and other tasks of
 software development.

 The lack of circular imports causes occasional annoyance but keeps the tree clean,
 forcing a clear demarcation between packages. As with many of the
 design decisions in Go, it forces the programmer to think earlier about a
 larger-scale issue (in this case, package boundaries) that if left until
 later may never be addressed satisfactorily.

 Through the design of the standard library, great effort was spent on controlling
 dependencies. It can be better to copy a little code than to pull in a big
 library for one function. (A test in the system build complains if new core
 dependencies arise.) Dependency hygiene trumps code reuse.
 One example of this in practice is that
 the (low-level) `net` package has its own integer-to-decimal conversion routine
 to avoid depending on the bigger and dependency-heavy formatted I/O package.
 Another is that the string conversion package `strconv` has a private implementation
 of the definition of 'printable' characters rather than pull in the large Unicode
 character class tables; that `strconv` honors the Unicode standard is verified by the
 package's tests.

 * Packages

 The design of Go's package system combines some of the properties of libraries,
 name spaces, and modules into a single construct.

 Every Go source file, for instance `"encoding/json/json.go"`, starts with a package clause, like this:

 	package json

 where `json` is the "package name", a simple identifier.
 Package names are usually concise.

 To use a package, the importing source file identifies it by its _package_path_
 in the import clause.
 The meaning of "path" is not specified by the language, but in
 practice and by convention it is the slash-separated directory path of the
 source package in the repository, here:

 	import "encoding/json"

 Then the package name (as distinct from path) is used to qualify items from
 the package in the importing source file:

 	var dec = json.NewDecoder(reader)

 This design provides clarity.
 One may always tell whether a name is local to package from its syntax: `Name` vs. `pkg.Name`.
 (More on this later.)

 For our example, the package path is `"encoding/json"` while the package name is `json`.
 Outside the standard repository, the convention is to place the
 project or company name at the root of the name space:

 	import "google/base/go/log"

 It's important to recognize that package _paths_ are unique,
 but there is no such requirement for package _names_.
 The path must uniquely identify the package to be imported, while the
 name is just a convention for how clients of the package can refer to its
 contents.
 The package name need not be unique and can be overridden
 in each importing source file by providing a local identifier in the
 import clause. These two imports both reference packages that
 call themselves `package` `log`, but to import them in a single source
 file one must be (locally) renamed:

 	import "log"                          // Standard package
 	import googlelog "google/base/go/log" // Google-specific package

 Every company might have its own `log` package but
 there is no need to make the package name unique.
 Quite the opposite: Go style suggests keeping package names short and clear
 and obvious in preference to worrying about collisions.

 Another example: there are many `server` packages in Google's code base.

 * Remote packages

 An important property of Go's package system is that the package path,
 being in general an arbitrary string, can be co-opted to refer to remote
 repositories by having it identify the URL of the site serving the repository.

 Here is how to use the `doozer` package from `github`. The `go` `get` command
 uses the `go` build tool to fetch the repository from the site and install it.
 Once installed, it can be imported and used like any regular package.

 	$ go get github.com/4ad/doozer // Shell command to fetch package

 	import "github.com/4ad/doozer" // Doozer client's import statement

 	var client doozer.Conn         // Client's use of package

 It's worth noting that the `go` `get` command downloads dependencies
 recursively, a property made possible only because the dependencies are
 explicit.
 Also, the allocation of the space of import paths is delegated to URLs,
 which makes the naming of packages decentralized and therefore scalable,
 in contrast to centralized registries used by other languages.

 * Syntax

 Syntax is the user interface of a programming language. Although it has
 limited effect on the semantics of the language, which is arguably the
 more important component, syntax determines the readability and hence
 clarity of the language. Also, syntax is critical to tooling: if the language
 is hard to parse, automated tools are hard to write.

 Go was therefore designed with clarity and tooling in mind, and has
 a clean syntax.
 Compared to other languages in the C family, its
 grammar is modest in size, with only 25 keywords (C99 has
 37; C++11 has 84; the numbers continue to grow).
 More important,
 the grammar is regular and therefore easy to parse (mostly; there
 are a couple of quirks we might have fixed but didn't discover early
 enough).
 Unlike C and Java and especially C++, Go can be parsed without
 type information or a symbol table;
 there is no type-specific context. The grammar is
 easy to reason about and therefore tools are easy to write.

 One of the details of Go's syntax that surprises C programmers is that
 the declaration syntax is closer to Pascal's than to C's.
 The declared name appears before the type and there are more keywords:

 	var fn func([]int) int
 	type T struct { a, b int }

 as compared to C's

 	int (*fn)(int[]);
 	struct T { int a, b; }

 Declarations introduced by keyword are easier to parse both for people and
 for computers, and having the type syntax not be the expression syntax
 as it is in C has a significant effect on parsing: it adds grammar
 but eliminates ambiguity.
 But there is a nice side effect, too: for initializing declarations,
 one can drop the `var` keyword and just take the type of the variable
 from that of the expression. These two declarations are equivalent;
 the second is shorter and idiomatic:

 	var buf *bytes.Buffer = bytes.NewBuffer(x) // explicit
 	buf := bytes.NewBuffer(x)                  // derived

 There is a blog post at [[http://golang.org/s/decl-syntax][golang.org/s/decl-syntax]] with more detail about the syntax of declarations in Go and
 why it is so different from C.

 Function syntax is straightforward for simple functions.
 This example declares the function  `Abs`, which accepts a single
 variable `x` of type `T` and returns a single `float64` value:

 	func Abs(x T) float64

 A method is just a function with a special parameter, its _receiver_,
 which can be passed to the function using the standard "dot" notation.
 Method declaration syntax places the receiver in parentheses before the
 function name. Here is the same function, now as a method of type `T`:

 	func (x T) Abs() float64

 And here is a variable (closure) with a type `T` argument; Go has first-class
 functions and closures:

 	negAbs := func(x T) float64 { return -Abs(x) }

 Finally, in Go functions can return multiple values. A common case is to
 return the function result and an `error` value as a pair, like this:

 	func ReadByte() (c byte, err error)

 	c, err := ReadByte()
 	if err != nil { ... }

 We'll talk more about errors later.

 One feature missing from Go is that it
 does not support default function arguments. This was a deliberate
 simplification. Experience tells us that defaulted arguments make it
 too easy to patch over API design flaws by adding more arguments,
 resulting in too many arguments with interactions that are
 difficult to disentangle or even understand.
 The lack of default arguments requires more functions or methods to be defined,
 as one function cannot hold the entire interface,
 but that leads to a clearer API that is easier to understand.
 Those functions all need separate names, too, which makes it clear
 which combinations exist, as well as encouraging more
 thought about naming, a critical aspect of clarity and readability.

 One mitigating factor for the lack of default arguments is that Go
 has easy-to-use, type-safe support for variadic functions.

 * Naming

 Go takes an unusual approach to defining the _visibility_ of an identifier,
 the ability for a client of a package to use the item named by the identifier.
 Unlike, for instance, `private` and `public` keywords, in Go the name itself
 carries the information: the case of the initial letter of the identifier
 determines the visibility. If the initial character is an upper case letter,
 the identifier is _exported_ (public); otherwise it is not:

 - upper case initial letter: `Name` is visible to clients of package
 - otherwise: `name` (or `_Name`) is not visible to clients of package

 This rule applies to variables, types, functions, methods, constants, fields...
 everything. That's all there is to it.

 This was not an easy design decision.
 We spent over a year struggling to
 define the notation to specify an identifier's visibility.
 Once we settled on using the case of the name, we soon realized it had
 become one of the most important properties about the language.
 The name is, after all, what clients of the package use; putting
 the visibility in the name rather than its type means that it's always
 clear when looking at an identifier whether it is part of the public API.
 After using Go for a while, it feels burdensome when going back to
 other languages that require looking up the declaration to discover
 this information.

 The result is, again, clarity: the program source text expresses the
 programmer's meaning simply.

 Another simplification is that Go has a very compact scope hierarchy:

 - universe (predeclared identifiers such as `int` and `string`)
 - package (all the source files of a package live at the same scope)
 - file (for package import renames only; not very important in practice)
 - function (the usual)
 - block (the usual)

 There is no scope for name space or class or other wrapping
 construct. Names come from very few places in Go, and all names
 follow the same scope hierarchy: at any given location in the source,
 an identifier denotes exactly one language object, independent of how
 it is used. (The only exception is statement labels, the targets of `break`
 statements and the like; they always have function scope.)

 This has consequences for clarity. Notice for instance that methods
 declare an explicit receiver and that it must be used to access fields and
 methods of the type. There is no implicit `this`. That is, one always
 writes

 	rcvr.Field

 (where rcvr is whatever name is chosen for the receiver variable)
 so all the elements of the type always appear lexically bound to
 a value of the receiver type. Similarly, a package qualifier is always present
 for imported names; one writes `io.Reader` not `Reader`.
 Not only is this clear, it frees up the identifier `Reader` as a useful
 name to be used in any package. There are in fact multiple exported
 identifiers in the standard library with name `Reader`, or `Printf`
 for that matter, yet which one is being referred to is always unambiguous.

 Finally, these rules combine to guarantee that, other than the top-level
 predefined names such as `int`, (the first component of) every name is
 always declared in the current package.

 In short, names are local. In C, C++, or Java the name `y` could refer to anything.
 In Go, `y` (or even `Y`) is always defined within the package,
 while the interpretation of `x.Y` is clear: find `x` locally, `Y` belongs to it.

 These rules provide an important property for scaling because they guarantee
 that adding an exported name to a package can never break a client
 of that package. The naming rules decouple packages, providing
 scaling, clarity, and robustness.

 There is one more aspect of naming to be mentioned: method lookup
 is always by name only, not by signature (type) of the method.
 In other words, a single type can never have two methods with the same name.
 Given a method `x.M`, there's only ever one `M` associated with `x`.
 Again, this makes it easy to identify which method is referred to given
 only the name.
 It also makes the implementation of method invocation simple.

 * Semantics

 The semantics of Go statements is generally C-like. It is a compiled, statically typed,
 procedural language with pointers and so on. By design, it should feel
 familiar to programmers accustomed to languages in the C family.
 When launching a new language
 it is important that the target audience be able to learn it quickly; rooting Go
 in the C family helps make sure that young programmers, most of whom
 know Java, JavaScript, and maybe C, should find Go easy to learn.

 That said, Go makes many small changes to C semantics, mostly in the
 service of robustness. These include:

 - there is no pointer arithmetic
 - there are no implicit numeric conversions
 - array bounds are always checked
 - there are no type aliases (after `type`X`int`, `X` and `int` are distinct types not aliases)
 - `++` and `--` are statements not expressions
 - assignment is not an expression
 - it is legal (encouraged even) to take the address of a stack variable
 - and many more

 There are some much bigger changes too, stepping far from the traditional
 C, C++, and even Java models. These include linguistic support for:

 - concurrency
 - garbage collection
 - interface types
 - reflection
 - type switches

 The following sections provide brief discussions of two of these topics in Go,
 concurrency and garbage collection,
 mostly from a software engineering perspective.
 For a full discussion of the language semantics and uses see the many
 resources on the [[http://golang.org]] web site.

 * Concurrency

 Concurrency is important to the modern computing environment with its
 multicore machines running web servers with multiple clients,
 what might be called the typical Google program.
 This kind of software is not especially well served by C++ or Java,
 which lack sufficient concurrency support at the language level.

 Go embodies a variant of CSP with first-class channels.
 CSP was chosen partly due to familiarity (one of us had worked on
 predecessor languages that built on CSP's ideas), but also because
 CSP has the property that it is easy to add to a procedural programming
 model without profound changes to that model.
 That is, given a C-like language, CSP can be added to the language
 in a mostly orthogonal way, providing extra expressive power without
 constraining the language's other uses. In short, the rest of the
 language can remain "ordinary".

 The approach is thus the composition of independently executing
 functions of otherwise regular procedural code.

 The resulting language allows us to couple concurrency with computation
 smoothly. Consider a web server that must verify security certificates for
 each incoming client call; in Go it is easy to construct the software using
 CSP to manage the clients as independently executing procedures but
 to have the full power of an efficient compiled language available for
 the expensive cryptographic calculations.

 In summary, CSP is practical for Go and for Google. When writing
 a web server, the canonical Go program, the model is a great fit.

 There is one important caveat: Go is not purely memory safe in the presence
 of concurrency. Sharing is legal and passing a pointer over a channel is idiomatic
 (and efficient).

 Some concurrency and functional programming experts are disappointed
 that Go does not take a write-once approach to value semantics
 in the context of concurrent computation, that Go is not more like
 Erlang for example.
 Again, the reason is largely about familiarity and suitability for the
 problem domain. Go's concurrent features work well in a context
 familiar to most programmers.
 Go _enables_ simple, safe concurrent
 programming but does not _forbid_ bad programming.
 We compensate by convention, training programmers to think
 about message passing as a version of ownership control. The motto is,
 "Don't communicate by sharing memory, share memory by communicating."

 Our limited experience with programmers new to both Go and concurrent
 programming shows that this is a practical approach. Programmers
 enjoy the simplicity that support for concurrency brings to network
 software, and simplicity engenders robustness.

 * Garbage collection

 For a systems language, garbage collection can be a controversial feature,
 yet we spent very little time deciding that Go would be a
 garbage-collected language.
 Go has no explicit memory-freeing operation: the only way allocated
 memory returns to the pool is through the garbage collector.

 It was an easy decision to make because memory management
 has a profound effect on the way a language works in practice.
 In C and C++, too much programming effort is spent on memory allocation
 and freeing.
 The resulting designs tend to expose details of memory management
 that could well be hidden; conversely memory considerations
 limit how they can be used. By contrast, garbage collection makes interfaces
 easier to specify.

 Moreover, in a concurrent object-oriented language it's almost essential
 to have automatic memory management because the ownership of a piece
 of memory can be tricky to manage as it is passed around among concurrent
 executions. It's important to separate behavior from resource management.

 The language is much easier to use because of garbage collection.

 Of course, garbage collection brings significant costs: general overhead,
 latency, and complexity of the implementation. Nonetheless, we believe
 that the benefits, which are mostly felt by the programmer, outweigh
 the costs, which are largely borne by the language implementer.

 Experience with Java in particular as a server language has made some
 people nervous about garbage collection in a user-facing system.
 The overheads are uncontrollable, latencies can be large, and much
 parameter tuning is required for good performance.
 Go, however, is different. Properties of the language mitigate some of these
 concerns. Not all of them of course, but some.

 The key point is that Go gives the programmer tools to limit allocation
 by controlling the layout of data structures. Consider this simple
 type definition of a data structure containing a buffer (array) of bytes:

 	type X struct {
 		a, b, c int
 		buf [256]byte
 	}

 In Java, the `buf` field would require a second allocation and accesses
 to it a second level of indirection. In Go, however, the buffer is allocated
 in a single block of memory along with the containing struct and no
 indirection is required. For systems programming, this design can have a
 better performance as well as reducing the number
 of items known to the collector. At scale it can make a significant
 difference.

 As a more direct example, in Go it is easy and efficient to provide
 second-order allocators, for instance an arena allocator that allocates
 a large array of structs and links them together with a free list.
 Libraries that repeatedly use many small structures like this can,
 with modest prearrangement, generate no garbage yet
 be efficient and responsive.

 Although Go is a garbage collected language, therefore, a knowledgeable
 programmer can limit the pressure placed on the collector and thereby
 improve performance. (Also, the Go installation comes with good tools
 for studying the dynamic memory performance of a running program.)

 To give the programmer this flexibility, Go must support
 what we call _interior_pointers_ to objects
 allocated in the heap. The `X.buf` field in the example above lives
 within the struct but it is legal to capture the address of this inner field,
 for instance to pass it to an I/O routine. In Java, as in many garbage-collected
 languages, it is not possible to construct an interior pointer like this,
 but in Go it is idiomatic.
 This design point affects which collection algorithms can be used,
 and may make them more difficult, but after careful thought we decided
 that it was necessary to allow interior pointers because of the benefits
 to the programmer and the ability to reduce pressure on the (perhaps
 harder to implement) collector.
 So far, our experience comparing similar Go and Java programs shows
 that use of interior pointers can have a significant effect on total arena size,
 latency, and collection times.

 In summary, Go is garbage collected but gives the programmer
 some tools to control collection overhead.

 The garbage collector remains an active area of development.
 The current design is a parallel mark-and-sweep collector and there remain
 opportunities to improve its performance or perhaps even its design.
 (The language specification does not mandate any particular implementation
 of the collector.)
 Still, if the programmer takes care to use memory wisely,
 the current implementation works well for production use.

 * Composition not inheritance

 Go takes an unusual approach to object-oriented programming, allowing
 methods on any type, not just classes, but without any form of type-based inheritance
 like subclassing.
 This means there is no type hierarchy.
 This was an intentional design choice.
 Although type hierarchies have been used to build much successful
 software, it is our opinion that the model has been overused and that it
 is worth taking a step back.

 Instead, Go has _interfaces_, an idea that has been discussed at length elsewhere (see
 [[http://research.swtch.com/interfaces]]
 for example), but here is a brief summary.

 In Go an interface is _just_ a set of methods. For instance, here is the definition
 of the `Hash` interface from the standard library.

 	type Hash interface {
 		Write(p []byte) (n int, err error)
 		Sum(b []byte) []byte
 		Reset()
 		Size() int
 		BlockSize() int
 	}

 All data types that implement these methods satisfy this interface implicitly;
 there is no `implements` declaration.
 That said, interface satisfaction is statically checked at compile time
 so despite this decoupling interfaces are type-safe.

 A type will usually satisfy many interfaces, each corresponding
 to a subset of its methods. For example, any type that satisfies the `Hash`
 interface also satisfies the `Writer` interface:

 	type Writer interface {
 		Write(p []byte) (n int, err error)
 	}

 This fluidity of interface satisfaction encourages a different approach
 to software construction. But before explaining that, we should explain
 why Go does not have subclassing.

 Object-oriented programming provides a powerful insight: that the
 _behavior_ of data can be generalized independently of the
 _representation_ of that data.
 The model works best when the behavior (method set) is fixed,
 but once you subclass a type and add a method,
 _the_behaviors_are_no_longer_identical_.
 If instead the set of behaviors is fixed, such as in Go's statically
 defined interfaces, the uniformity of behavior enables data and
 programs to be composed uniformly, orthogonally, and safely.

 One extreme example is the Plan 9 kernel, in which all system data items
 implemented exactly the same interface, a file system API defined
 by 14 methods.
 This uniformity permitted a level of object composition seldom
 achieved in other systems, even today.
 Examples abound. Here's one: A system could import (in Plan 9 terminology) a TCP
 stack to a computer that didn't have TCP or even Ethernet, and over that network
 connect to a machine with a different CPU architecture, import its `/proc` tree,
 and run a local debugger to do breakpoint debugging of the remote process.
 This sort of operation was workaday on Plan 9, nothing special at all.
 The ability to do such things fell out of the design; it required no special
 arrangement (and was all done in plain C).

 We argue that this compositional style of system construction has been
 neglected by the languages that push for design by type hierarchy.
 Type hierarchies result in brittle code.
 The hierarchy must be designed early, often as the first step of
 designing the program, and early decisions can be difficult to change once
 the program is written.
 As a consequence, the model encourages early overdesign as the
 programmer tries to predict every possible use the software might
 require, adding layers of type and abstraction just in case.
 This is upside down.
 The way pieces of a system interact should adapt as it grows,
 not be fixed at the dawn of time.

 Go therefore encourages _composition_ over inheritance, using
 simple, often one-method interfaces to define trivial behaviors
 that serve as clean, comprehensible boundaries between components.

 Consider the `Writer` interface shown above, which is defined in
 package `io`: Any item that has a `Write` method with this
 signature works well with the complementary `Reader` interface:

 	type Reader interface {
 		Read(p []byte) (n int, err error)
 	}

 These two complementary methods allow type-safe chaining
 with rich behaviors, like generalized Unix pipes.
 Files, buffers, networks,
 encryptors, compressors, image encoders, and so on can all be
 connected together.
 The `Fprintf` formatted I/O routine takes an `io.Writer` rather than,
 as in C, a `FILE*`.
 The formatted printer has no knowledge of what it is writing to; it may
 be a image encoder that is in turn writing to a compressor that
 is in turn writing to an encryptor that is in turn writing to a network
 connection.

 Interface composition is a different style of programming, and
 people accustomed to type hierarchies need to adjust their thinking to
 do it well, but the result is an adaptability of
 design that is harder to achieve through type hierarchies.

 Note too that the elimination of the type hierarchy also eliminates
 a form of dependency hierarchy.
 Interface satisfaction allows the program to grow organically without
 predetermined contracts.
 And it is a linear form of growth; a change to an interface affects
 only the immediate clients of that interface; there is no subtree to update.
 The lack of `implements` declarations disturbs some people but
 it enables programs to grow naturally, gracefully, and safely.

 Go's interfaces have a major effect on program design.
 One place we see this is in the use of functions that take interface
 arguments. These are _not_ methods, they are functions.
 Some examples should illustrate their power.
 `ReadAll` returns a byte slice (array) holding all the data that can
 be read from an `io.Reader`:

 	func ReadAll(r io.Reader) ([]byte, error)

 Wrappers—functions that take an interface and return an interface—are
 also widespread.
 Here are some prototypes.
 `LoggingReader` logs every `Read` call on the incoming `Reader`.
 `LimitingReader` stops reading after `n` bytes.
 `ErrorInjector` aids testing by simulating I/O errors.
 And there are many more.

 	func LoggingReader(r io.Reader) io.Reader
 	func LimitingReader(r io.Reader, n int64) io.Reader
 	func ErrorInjector(r io.Reader) io.Reader

 The designs are nothing like hierarchical, subtype-inherited methods.
 They are looser (even _ad_hoc_), organic, decoupled, independent, and therefore scalable.

 * Errors

 Go does not have an exception facility in the conventional sense,
 that is, there is no control structure associated with error handling.
 (Go does provide mechanisms for handling exceptional situations
 such as division by zero. A pair of built-in functions
 called `panic` and `recover` allow the programmer to protect
 against such things. However, these functions
 are intentionally clumsy, rarely used, and not integrated
 into the library the way, say, Java libraries use exceptions.)

 The key language feature for error handling is a pre-defined
 interface type called `error` that represents a value that has an
 `Error` method returning a string:

 	type error interface {
 		Error() string
 	}

 Libraries use the `error` type to return a description of the error.
 Combined with the ability for functions to return multiple
 values, it's easy to return the computed result along with an
 error value, if any.
 For instance, the equivalent
 to C's `getchar` does not return an out-of-band value at EOF,
 nor does it throw an exception; it just returns an `error` value
 alongside the character, with a `nil` `error` value signifying success.
 Here is the signature of the `ReadByte` method of the buffered
 I/O package's `bufio.Reader` type:

 	func (b *Reader) ReadByte() (c byte, err error)

 This is a clear and simple design, easily understood.
 Errors are just values and programs compute with
 them as they would compute with values of any other type.

 It was a deliberate choice not to incorporate exceptions in Go.
 Although a number of critics disagree with this decision, there
 are several reasons we believe it makes for better software.

 First, there is nothing truly exceptional about errors in computer programs.
 For instance, the inability to open a file is a common issue that
 does not deserve special linguistic constructs; `if` and `return` are fine.

 	f, err := os.Open(fileName)
 	if err != nil {
 		return err
 	}

 Also, if errors use special control structures, error handling distorts
 the control flow for a program that handles errors.
 The Java-like style of `try-catch-finally` blocks interlaces multiple overlapping flows
 of control that interact in complex ways.
 Although in contrast Go makes it more
 verbose to check errors, the explicit design keeps the flow of control
 straightforward—literally.

 There is no question the resulting code can be longer,
 but the clarity and simplicity of such code offsets its verbosity.
 Explicit error checking forces the programmer to think about
 errors—and deal with them—when they arise. Exceptions make
 it too easy to _ignore_ them rather than _handle_ them, passing
 the buck up the call stack until it is too late to fix the problem or
 diagnose it well.

 * Tools

 Software engineering requires tools.
 Every language operates in an environment with other languages
 and myriad tools to compile, edit, debug, profile, test, and run programs.

 Go's syntax, package system, naming conventions, and other features
 were designed to make tools easy to write, and the library
 includes a lexer, parser, and type checker for the language.

 Tools to manipulate Go programs are so easy to write that
 many such tools have been created,
 some with interesting consequences for software engineering.

 The best known of these is `gofmt`, the Go source code formatter.
 From the beginning of the project, we  intended Go programs
 to be formatted by machine, eliminating an entire class of argument
 between programmers: how do I lay out my code?
 `Gofmt` is run on all Go programs we write, and most of the open
 source community uses it too.
 It is run as a "presubmit" check for the code repositories to
 make sure that all checked-in Go programs are formatted the same.

 `Gofmt` is often cited by users as one of Go's best features even
 though it is not part of the language.
 The existence and use of `gofmt` means that
 from the beginning, the community has always
 seen Go code as `gofmt` formats it, so Go programs have a single
 style that is now familiar to everyone. Uniform presentation
 makes code easier to read and therefore faster to work on.
 Time not spent on formatting is time saved.
 `Gofmt` also affects scalability: since all code looks the same,
 teams find it easier to work together or with others' code.

 `Gofmt` enabled another class of tools that we did not foresee as clearly.
 The program works by parsing the source code and reformatting it
 from the parse tree itself.
 This makes it possible to _edit_ the parse tree before formatting it,
 so a suite of automatic refactoring tools sprang up.
 These are easy to write, can be semantically rich because they work
 directly on the parse tree, and automatically produce canonically
 formatted code.

 The first example was a `-r` (rewrite) flag on `gofmt` itself, which
 uses a simple pattern-matching language to enable expression-level
 rewrites. For instance, one day we introduced a default value for the
 right-hand side of a slice expression: the length itself. The entire
 Go source tree was updated to use this default with the single
 command:

 	gofmt -r 'a[b:len(a)] -> a[b:]'

 A key point about this transformation is that, because the input and
 output are both in the canonical format, the only changes made to
 the source code are semantic ones.

 A similar but more intricate process allowed `gofmt` to be used to
 update the tree when the language no longer required semicolons
 as statement terminators if the statement ended at a newline.

 Another important tool is `gofix`, which runs tree-rewriting modules
 written in Go itself that are therefore are capable of more advanced
 refactorings.
 The `gofix` tool allowed us to make sweeping changes to APIs and language
 features leading up to the release of Go 1, including a change to the syntax
 for deleting entries from a map, a radically different API for manipulating
 time values, and many more.
 As these changes rolled out, users could update all their code by running
 the simple command

 	gofix

 Note that these tools allow us to _update_ code even if the old code still
 works.
 As a result, Go repositories are easy to keep up to date as libraries evolve.
 Old APIs can be deprecated quickly and automatically so only one version
 of the API needs to be maintained.
 For example, we recently changed Go's protocol buffer implementation to use
 "getter" functions, which were not in the interface before.
 We ran `gofix` on _all_ of Google's Go code to update all programs that
 use protocol buffers, and now there is only one version of the API in use.
 Similar sweeping changes to the C++ or Java libraries are almost infeasible
 at the scale of Google's code base.

 The existence of a parsing package in the standard Go library has enabled
 a number of other tools as well. Examples include the `go` tool, which
 manages program construction including acquiring packages from
 remote repositories;
 the `godoc` document extractor,
 a program to verify that the API compatibility contract is maintained as
 the library is updated, and many more.

 Although tools like these are rarely mentioned in the context of language
 design, they are an integral part of a language's ecosystem and the fact
 that Go was designed with tooling in mind has a huge effect on the
 development of the language, its libraries, and its community.

 * Conclusion

 Go's use is growing inside Google.

 Several big user-facing services use it, including `youtube.com` and `dl.google.com`
 (the download server that delivers Chrome, Android and other downloads),
 as well as our own [[http://golang.org][golang.org]].
 And of course many small ones do, mostly
 built using Google App Engine's native support for Go.

 Many other companies use Go as well; the list is very long, but a few of the
 better known are:

 - BBC Worldwide
 - Canonical
 - Heroku
 - Nokia
 - SoundCloud

 It looks like Go is meeting its goals. Still, it's too early to declare it a success.
 We don't have enough experience yet, especially with big programs (millions
 of lines of code) to know whether the attempts to build a scalable language
 have paid off. All the indicators are positive though.

 On a smaller scale, some minor things aren't quite right and might get
 tweaked in a later (Go 2?) version of the language. For instance, there are
 too many forms of variable declaration syntax, programmers are
 easily confused by the behavior of nil values inside non-nil interfaces,
 and there are many library and interface details that could use another
 round of design.

 It's worth noting, though, that `gofix` and `gofmt` gave us the opportunity to
 fix many other problems during the leadup to Go version 1.
 Go as it is today is therefore much closer to what the designers wanted
 than it would have been without these tools, which were themselves
 enabled by the language's design.

 Not everything was fixed, though. We're still learning (but the language
 is frozen for now).

 A significant weakness of the language is that the implementation still
 needs work. The compilers' generated code and the performance of the
 runtime in particular should be better, and work continues on them.
 There is progress already; in fact some benchmarks show a
 doubling of performance with the development version today compared
 to the first release of Go version 1 early in 2012.

 * Summary

 Software engineering guided the design of Go.
 More than most general-purpose
 programming languages, Go was designed to address a set of software engineering
 issues that we had been exposed to in the construction of large server software.
 Offhand, that might make Go sound rather dull and industrial, but in fact
 the focus on clarity, simplicity and composability throughout the design
 instead resulted in a productive, fun language that many programmers
 find expressive and powerful.

 The properties that led to that include:

 - Clear dependencies
 - Clear syntax
 - Clear semantics
 - Composition over inheritance
 - Simplicity provided by the programming model (garbage collection, concurrency)
 - Easy tooling (the `go` tool, `gofmt`, `godoc`, `gofix`)

 If you haven't tried Go already, we suggest you do.


 .link http://golang.org http://golang.org

 .image splash/appenginegophercolor.jpg