blob: 3abd67d43b17f51f411d91d9977c7b0c6b6e46bc [file] [log] [blame]
The cover story
2 Dec 2013
Tags: tools, coverage, testing
Rob Pike
* Introduction
From the beginning of the project, Go was designed with tools in mind.
Those tools include some of the most iconic pieces of Go technology such as
the documentation presentation tool
[[https://golang.org/cmd/godoc][godoc]],
the code formatting tool
[[https://golang.org/cmd/gofmt][gofmt]],
and the API rewriter
[[https://golang.org/cmd/fix][gofix]].
Perhaps most important of all is the
[[https://golang.org/cmd/go][`go` command]],
the program that automatically installs, builds, and tests Go programs
using nothing more than the source code as the build specification.
The release of Go 1.2 introduces a new tool for test coverage that takes an
unusual approach to the way it generates coverage statistics, an approach
that builds on the technology laid down by godoc and friends.
* Support for tools
First, some background: What does it mean for a
[[https://talks.golang.org/2012/splash.article#TOC_17.][language to support good tooling]]?
It means that the language makes it easy to write good tools and that its ecosystem
supports the construction of tools of all flavors.
There are a number of properties of Go that make it suitable for tooling.
For starters, Go has a regular syntax that is easy to parse.
The grammar aims to be free of special cases that require complex machinery to analyze.
Where possible, Go uses lexical and syntactic constructs to make semantic properties
easy to understand.
Examples include the use of upper-case letters to define exported names
and the radically simplified scoping rules compared to other languages in the C tradition.
Finally, the standard library comes with production-quality packages to lex and parse Go source code.
They also include, more unusually, a production-quality package to pretty-print Go syntax trees.
These packages in combination form the core of the gofmt tool, but the pretty-printer is worth singling out.
Because it can take an arbitrary Go syntax tree and output standard-format, human-readable, correct
code, it creates the possibility to build tools that transform the parse tree and output modified but
correct and easy-to-read code.
One example is the gofix tool, which automates the
rewriting of code to use new language features or updated libraries.
Gofix let us make fundamental changes to the language and libraries in the
[[https://blog.golang.org/the-path-to-go-1][run-up to Go 1.0]],
with the confidence that users could just run the tool to update their source to the newest version.
Inside Google, we have used gofix to make sweeping changes in a huge code repository that would be almost
unthinkable in the other languages we use.
There's no need any more to support multiple versions of some API; we can use gofix to update
the entire company in one operation.
It's not just these big tools that these packages enable, of course.
They also make it easy to write more modest programs such as IDE plugins, for instance.
All these items build on each other, making the Go environment
more productive by automating many tasks.
* Test coverage
Test coverage is a term that describes how much of a package's code is exercised by running the package's tests.
If executing the test suite causes 80% of the package's source statements to be run, we say that the test coverage is 80%.
The program that provides test coverage in Go 1.2 is the latest to exploit the tooling support in the Go ecosystem.
The usual way to compute test coverage is to instrument the binary.
For instance, the GNU [[http://gcc.gnu.org/onlinedocs/gcc/Gcov.html][gcov]] program sets breakpoints at branches
executed by the binary.
As each branch executes, the breakpoint is cleared and the target statements of the branch are marked as 'covered'.
This approach is successful and widely used. An early test coverage tool for Go even worked the same way.
But it has problems.
It is difficult to implement, as analysis of the execution of binaries is challenging.
It also requires a reliable way of tying the execution trace back to the source code, which can also be difficult,
as any user of a source-level debugger can attest.
Problems there include inaccurate debugging information and issues such as in-lined functions complicating
the analysis.
Most important, this approach is very non-portable.
It needs to be done afresh for every architecture, and to some extent for every
operating system since debugging support varies greatly from system to system.
It does work, though, and for instance if you are a user of gccgo, the gcov tool can give you test coverage
information.
However If you're a user of gc, the more commonly used Go compiler suite, until Go 1.2 you were out of luck.
* Test coverage for Go
For the new test coverage tool for Go, we took a different approach that avoids dynamic debugging.
The idea is simple: Rewrite the package's source code before compilation to add instrumentation,
compile and run the modified source, and dump the statistics.
The rewriting is easy to arrange because the `go` command controls the flow
from source to test to execution.
Here's an example. Say we have a simple, one-file package like this:
.code cover/pkg.go
and this test:
.code cover/pkg_test.go
To get the test coverage for the package,
we run the test with coverage enabled by providing the `-cover` flag to `go` `test`:
% go test -cover
PASS
coverage: 42.9% of statements
ok size 0.026s
%
Notice that the coverage is 42.9%, which isn't very good.
Before we ask how to raise that number, let's see how that was computed.
When test coverage is enabled, `go` `test` runs the "cover" tool, a separate program included
with the distribution, to rewrite the source code before compilation. Here's what the rewritten
`Size` function looks like:
.code cover/pkg.cover /func/,/^}/
Each executable section of the program is annotated with an assignment statement that,
when executed, records that that section ran.
The counter is tied to the original source position of the statements it counts
through a second read-only data structure that is also generated by the cover tool.
When the test run completes, the counters are collected and the percentage is computed
by seeing how many were set.
Although that annotating assignment might look expensive, it compiles to a single "move" instruction.
Its run-time overhead is therefore modest, adding only about 3% when running a typical (more realistic) test.
That makes it reasonable to include test coverage as part of the standard development pipeline.
* Viewing the results
The test coverage for our example was poor.
To discover why, we ask `go` `test` to write a "coverage profile" for us, a file that holds
the collected statistics so we can study them in more detail.
That's easy to do: use the `-coverprofile` flag to specify a file for the output:
% go test -coverprofile=coverage.out
PASS
coverage: 42.9% of statements
ok size 0.030s
%
(The `-coverprofile` flag automatically sets `-cover` to enable coverage analysis.)
The test runs just as before, but the results are saved in a file.
To study them, we run the test coverage tool ourselves, without `go` `test`.
As a start, we can ask for the coverage to be broken down by function,
although that's not going to illuminate much in this case since there's
only one function:
% go tool cover -func=coverage.out
size.go: Size 42.9%
total: (statements) 42.9%
%
A much more interesting way to see the data is to get an HTML presentation
of the source code decorated with coverage information.
This display is invoked by the `-html` flag:
$ go tool cover -html=coverage.out
When this command is run, a browser window pops up, showing the covered (green),
uncovered (red), and uninstrumented (grey) source.
Here's a screen dump:
.image cover/set.png
With this presentation, it's obvious what's wrong: we neglected to test several
of the cases!
And we can see exactly which ones they are, which makes it easy to
improve our test coverage.
* Heat maps
A big advantage of this source-level approach to test coverage is that it's
easy to instrument the code in different ways.
For instance, we can ask not only whether a statement has been executed,
but how many times.
The `go` `test` command accepts a `-covermode` flag to set the coverage mode
to one of three settings:
- set: did each statement run?
- count: how many times did each statement run?
- atomic: like count, but counts precisely in parallel programs
The default is 'set', which we've already seen.
The `atomic` setting is needed only when accurate counts are required
when running parallel algorithms. It uses atomic operations from the
[[https://golang.org/pkg/sync/atomic/][sync/atomic]] package,
which can be quite expensive.
For most purposes, though, the `count` mode works fine and, like
the default `set` mode, is very cheap.
Let's try counting statement execution for a standard package, the `fmt` formatting package.
We run the test and write out a coverage profile so we can present the information
nicely afterwards.
% go test -covermode=count -coverprofile=count.out fmt
ok fmt 0.056s coverage: 91.7% of statements
%
That's a much better test coverage ratio than for our previous example.
(The coverage ratio is not affected by the coverage mode.)
We can display the function breakdown:
% go tool cover -func=count.out
fmt/format.go: init 100.0%
fmt/format.go: clearflags 100.0%
fmt/format.go: init 100.0%
fmt/format.go: computePadding 84.6%
fmt/format.go: writePadding 100.0%
fmt/format.go: pad 100.0%
...
fmt/scan.go: advance 96.2%
fmt/scan.go: doScanf 96.8%
total: (statements) 91.7%
The big payoff happens in the HTML output:
% go tool cover -html=count.out
Here's what the `pad` function looks like in that presentation:
.image cover/count.png
Notice how the intensity of the green changes. Brighter-green
statements have higher execution counts; less saturated greens
represent lower execution counts.
You can even hover the mouse over the statements to see the
actual counts pop up in a tool tip.
At the time of writing, the counts come out like this
(we've moved the counts from the tool tips to beginning-of-line
markers to make them easier to show):
2933 if !f.widPresent || f.wid == 0 {
2985 f.buf.Write(b)
2985 return
2985 }
56 padding, left, right := f.computePadding(len(b))
56 if left > 0 {
37 f.writePadding(left, padding)
37 }
56 f.buf.Write(b)
56 if right > 0 {
13 f.writePadding(right, padding)
13 }
That's a lot of information about the execution of the function,
information that might be useful in profiling.
* Basic blocks
You might have noticed that the counts in the previous example
were not what you expected on the lines with closing braces.
That's because, as always, test coverage is an inexact science.
What's going on here is worth explaining, though. We'd like the
coverage annotations to be demarcated by branches in the program,
the way they are when the binary is instrumented in the traditional
method.
It's hard to do that by rewriting the source, though, since
the branches don't appear explicitly in the source.
What the coverage annotation does is instrument blocks, which
are typically bounded by brace brackets.
Getting this right in general is very hard.
A consequence of the algorithm used is that the closing
brace looks like it belongs to the block it closes, while the
opening brace looks like it belongs outside the block.
A more interesting consequence is that in an expression like
f() && g()
there is no attempt to separately instrument the calls to `f` and `g`, Regardless of
the facts it will always look like they both ran the same
number of times, the number of times `f` ran.
To be fair, even `gcov` has trouble here. That tool gets the
instrumentation right but the presentation is line-based and
can therefore miss some nuances.
* The big picture
That's the story about test coverage in Go 1.2.
A new tool with an interesting implementation enables not only
test coverage statistics, but easy-to-interpret presentations
of them and even the possibility to extract profiling information.
Testing is an important part of software development and test
coverage a simple way to add discipline to your testing strategy.
Go forth, test, and cover.