Go generate: A Proposal

Author: Rob Pike

Accepted in the Go 1.4 release.

Introduction

The go build command automates the construction of Go programs but sometimes preliminary processing is required, processing that go build does not support. Motivating examples include:

yacc: generating .go files from yacc grammar (.y) files
protobufs: generating .pb.go files from protocol buffer definition (.proto) files
Unicode: generating tables from UnicodeData.txt
HTML: embedding .html files into Go source code
bindata: translating binary files such as JPEGs into byte arrays in Go source

There are other processing steps one can imagine:

string methods: generating String() string methods for types used as enumerated constants
macros: generating customized implementations given generalized packages, such as sort.Ints from ints

This proposal offers a design for smooth automation of such processing.

Non-goal

It is not a goal of this proposal to build a generalized build system like the Unix make(1) utility. We deliberately avoid doing any dependency analysis. The tool does what is asked of it, nothing more.

It is hoped, however, that it may replace many existing uses of make(1) in the Go repo at least.

Design

There are two basic elements, a new subcommand for the go command, called go generate, and directives inside Go source files that control generation.

When go generate runs, it scans Go source files looking for those directives, and for each one executes a generator that typically creates a new Go source file. The go generate tool also sets the build tag “generate” so that files may be examined by go generate but ignored during build.

The usage is:

go generate [-run regexp] [file.go...|packagePath...]

(Plus the usual -x, -n, -v and -tags options.) If packages are named, each Go source file in each package is scanned for generator directives, and for each directive, the specified generator is run; if files are named, they must be Go source files and generation happens only for directives in those files. Given no arguments, generator processing is applied to the Go source files in the current directory.

The -run flag takes a regular expression, analogous to that of the go test subcommand, that restricts generation to those directives whose command (see below) matches the regular expression.

Generator directives may appear anywhere in the Go source file and are processed sequentially (no parallelism) in source order as presented to the tool. Each directive is a // comment beginning a line, with syntax

//go:generate command arg...

where command is the generator (such as yacc) to be run, corresponding to an executable file that can be run locally; it must either be in the shell path (gofmt) or fully qualified (/usr/you/bin/mytool) and is run in the package directory.

The arguments are space-separated tokens (or double-quoted strings) passed to the generator as individual arguments when it is run. Shell-like variable expansion is available for any environment variables such as $HOME. Also, the special variable $GOFILE refers to the name of the file containing the directive. (We may need other special variables such as $GOPACKAGE. When the generator is run, these are also provided in the shell environment.) No other special processing, such as globbing, is provided.

No further generators are run if any generator returns an error exit status.

As an example, say we have a package my/own/gopher that includes a yacc grammar in file gopher.y. Inside main.go (not gopher.y) we place the directive

//go:generate yacc -o gopher.go gopher.y

(More about what yacc means in the next section.) Whenever we need to update the generated file, we give the shell command,

% go generate my/own/gopher

or, if we are already in the source directory,

% go generate

If we want to make sure that only the yacc generator is run, we execute

% go generate -run yacc

If we have fixed a bug in yacc and want to update all yacc-generated files in our tree, we can run

% go generate -run yacc all

The typical cycle for a package author developing software that uses go generate is

% edit …
% go generate
% go test

and once things are settled, the author commits the generated files to the source repository, so that they are available to clients that use go get:

% git add *.go
% git commit

Commands

The yacc program is of course not the standard version, but is accessed from the command line by

go tool yacc args...

To make it easy to use tools like yacc that are not installed in $PATH, have complex access methods, or benefit from extra flags or other wrapping, there is a special directive that defines a shorthand for a command. It is a go:generate directive followed by the keyword/flag -command and which generator it defines; the rest of the line is substituted for the command name when the generator is run. Thus to define yacc as a generator command we access normally by running go tool yacc, we first write the directive

//go:generate -command yacc go tool yacc

and then all other generator directives using yacc that follow in that file (only) can be written as above:

//go:generate yacc -o gopher.go gopher.y

which will be translated to

go tool yacc -o gopher.go gopher.y

when run.

Discussion

This design is unusual but is driven by several motivating principles.

First, go generate is intended[^1] to be run by the author of a package, not the client of it. The author of the package generates the required Go files and includes them in the package; the client does a regular go get or go build. Generation through go generate is not part of the build, just a tool for package authors. This avoids complicating the dependency analysis done by Go build.

[^1]: One can imagine scenarios where the author wishes the client to run the generator, but in such cases the author must guarantee that the client has the generator available. Regardless, go get will not automate the running of the processor, so further installation instructions will need to be provided by the author.

Second, go build should never cause generation to happen automatically by the client of the package. Generators should run only when explicitly requested.

Third, the author of the package should have great freedom in what generator to use (that is a key goal of the proposal), but the client might not have that processor available. As a simple example, if it is a shell script, it will not run on Windows. It is important that automated generation not break clients but be invisible to them, which is another reason it should be run only by the author of the package.

Finally, it must fit well with the existing go command, which means it applies only to Go source files and packages. This is why the directives are in Go files but not, for example, in the .y file holding a yacc grammar.

Examples

Here are some hypothetical worked examples. There are countless more possibilities.

String methods

We wish to generate a String method for a named constant type. We write a tool, say strmeth, that reads a definition for a single constant type and values and prints a complete Go source file containing a method definition for that type.

In our Go source file, main.go, we decorate each constant declaration like this (with some blank lines interposed so the generator directive does not appear in the doc comment):

//go:generate strmeth Day -o day_string.go $GOFILE

// Day represents the day of the week
type Day int
const (
	Sunday Day = iota
	Monday
	...
)

The strmeth generator parses the Go source to find the definition of the Day type and its constants, and writes out a String() string method for that type. For the user, generation of the string method is trivial: just run go generate.

Yacc

As outlined above, we define a custom command

//go:generate -command yacc go tool yacc

and then anywhere in main.go (say) we write

//go:generate yacc -o foo.go foo.y

Protocol buffers

The process is the same as with yacc. Inside main.go, we write, for each protocol buffer file we have, a line like

//go:generate protoc -go_out=. file.proto

Because of the way protoc works, we could generate multiple proto definitions into a single .pb.go file like this:

//go:generate protoc -go_out=. file1.proto file2.proto

Since no globbing is provided, one cannot say *.proto, but this is intentional, for simplicity and clarity of dependency.

Caveat: The protoc program must be run at the root of the source tree; we would need to provide a -cd option to it or wrap it somehow.

Binary data

A tool that converts binary files into byte arrays that can be compiled into Go binaries would work similarly. Again, in the Go source we write something like

//go:generate bindata -o jpegs.go pic1.jpg pic2.jpg pic3.jpg

This is also demonstrates another reason the annotations are in Go source: there is no easy way to inject them into binary files.

Sort

One could imagine a variant sort implementation that allows one to specify concrete types that have custom sorters, just by automatic rewriting of macro-like sort definition. To do this, we write a sort.go file that contains a complete implementation of sort on an explicit but undefined type spelled, say, TYPE. In that file we provide a build tag so it is never compiled (TYPE is not defined, so it won't compile) but is processed by go generate:

// +build generate

Then we write an generator directive for each type for which we want a custom sort:

//go:generate rename TYPE=int
//go:generate rename TYPE=strings

or perhaps

//go:generate rename TYPE=int TYPE=strings

The rename processor would be a simple wrapping of gofmt -r, perhaps written as a shell script.

There are many more possibilities, and it is a goal of this proposal to encourage experimentation with pre-build-time code generation.