design/14313-benchmark-format.md - proposal - Git at Google

 # Proposal: Go Benchmark Data Format

 Authors: Russ Cox, Austin Clements

 Last updated: February 2016

 Discussion at [golang.org/issue/14313](https://golang.org/issue/14313).

 ## Abstract

 We propose to make the current output of `go test -bench` the defined format for recording all Go benchmark data.
 Having a defined format allows benchmark measurement programs
 and benchmark analysis programs to interoperate while
 evolving independently.

 ## Background

 ### Benchmark data formats

 We are unaware of any standard formats for recording raw benchmark data,
 and we've been unable to find any using web searches.
 One might expect that a standard benchmark suite such as SPEC CPU2006 would have
 defined a format for raw results, but that appears not to be the case.
 The [collection of published results](https://www.spec.org/cpu2006/results/)
 includes only analyzed data ([example](https://www.spec.org/cpu2006/results/res2011q3/cpu2006-20110620-17230.txt)), not raw data.

 Go has a de facto standard format for benchmark data:
 the lines generated by the testing package when using `go test -bench`.
 For example, running compress/flate's benchmarks  produces this output:

 	BenchmarkDecodeDigitsSpeed1e4-8   	     100	    154125 ns/op	  64.88 MB/s	   40418 B/op	       7 allocs/op
 	BenchmarkDecodeDigitsSpeed1e5-8   	      10	   1367632 ns/op	  73.12 MB/s	   41356 B/op	      14 allocs/op
 	BenchmarkDecodeDigitsSpeed1e6-8   	       1	  13879794 ns/op	  72.05 MB/s	   52056 B/op	      94 allocs/op
 	BenchmarkDecodeDigitsDefault1e4-8 	     100	    147551 ns/op	  67.77 MB/s	   40418 B/op	       8 allocs/op
 	BenchmarkDecodeDigitsDefault1e5-8 	      10	   1197672 ns/op	  83.50 MB/s	   41508 B/op	      13 allocs/op
 	BenchmarkDecodeDigitsDefault1e6-8 	       1	  11808775 ns/op	  84.68 MB/s	   53800 B/op	      80 allocs/op
 	BenchmarkDecodeDigitsCompress1e4-8	     100	    143348 ns/op	  69.76 MB/s	   40417 B/op	       8 allocs/op
 	BenchmarkDecodeDigitsCompress1e5-8	      10	   1185527 ns/op	  84.35 MB/s	   41508 B/op	      13 allocs/op
 	BenchmarkDecodeDigitsCompress1e6-8	       1	  11740304 ns/op	  85.18 MB/s	   53800 B/op	      80 allocs/op
 	BenchmarkDecodeTwainSpeed1e4-8    	     100	    143665 ns/op	  69.61 MB/s	   40849 B/op	      15 allocs/op
 	BenchmarkDecodeTwainSpeed1e5-8    	      10	   1390359 ns/op	  71.92 MB/s	   45700 B/op	      31 allocs/op
 	BenchmarkDecodeTwainSpeed1e6-8    	       1	  12128469 ns/op	  82.45 MB/s	   89336 B/op	     221 allocs/op
 	BenchmarkDecodeTwainDefault1e4-8  	     100	    141916 ns/op	  70.46 MB/s	   40849 B/op	      15 allocs/op
 	BenchmarkDecodeTwainDefault1e5-8  	      10	   1076669 ns/op	  92.88 MB/s	   43820 B/op	      28 allocs/op
 	BenchmarkDecodeTwainDefault1e6-8  	       1	  10106485 ns/op	  98.95 MB/s	   71096 B/op	     172 allocs/op
 	BenchmarkDecodeTwainCompress1e4-8 	     100	    138516 ns/op	  72.19 MB/s	   40849 B/op	      15 allocs/op
 	BenchmarkDecodeTwainCompress1e5-8 	      10	   1227964 ns/op	  81.44 MB/s	   43316 B/op	      25 allocs/op
 	BenchmarkDecodeTwainCompress1e6-8 	       1	  10040347 ns/op	  99.60 MB/s	   72120 B/op	     173 allocs/op
 	BenchmarkEncodeDigitsSpeed1e4-8   	      30	    482808 ns/op	  20.71 MB/s
 	BenchmarkEncodeDigitsSpeed1e5-8   	       5	   2685455 ns/op	  37.24 MB/s
 	BenchmarkEncodeDigitsSpeed1e6-8   	       1	  24966055 ns/op	  40.05 MB/s
 	BenchmarkEncodeDigitsDefault1e4-8 	      20	    655592 ns/op	  15.25 MB/s
 	BenchmarkEncodeDigitsDefault1e5-8 	       1	  13000839 ns/op	   7.69 MB/s
 	BenchmarkEncodeDigitsDefault1e6-8 	       1	 136341747 ns/op	   7.33 MB/s
 	BenchmarkEncodeDigitsCompress1e4-8	      20	    668083 ns/op	  14.97 MB/s
 	BenchmarkEncodeDigitsCompress1e5-8	       1	  12301511 ns/op	   8.13 MB/s
 	BenchmarkEncodeDigitsCompress1e6-8	       1	 137962041 ns/op	   7.25 MB/s

 The testing package always reports ns/op, and each benchmark can request the addition of MB/s (throughput) and also B/op and allocs/op (allocation rates).

 ### Benchmark processors

 Multiple tools have been written that process this format,
 most notably [benchcmp](https://godoc.org/golang.org/x/tools/cmd/benchcmp)
 and its more statistically valid successor [benchstat](https://godoc.org/rsc.io/benchstat).
 There is also [benchmany](https://godoc.org/github.com/aclements/go-misc/benchmany)'s plot subcommand
 and likely more unpublished programs.

 ### Benchmark runners

 Multiple tools have also been written that generate this format.
 In addition to the standard Go testing package,
 [compilebench](https://godoc.org/rsc.io/compilebench)
 generates this data format based on runs of the Go compiler,
 and Austin's unpublished shellbench generates this data format
 after running an arbitrary shell command.

 The [golang.org/x/benchmarks/bench](https://golang.org/x/benchmarks/bench) benchmarks
 are notable for _not_ generating this format,
 which has made all analysis of those results
 more complex than we believe it should be.
 We intend to update those benchmarks to generate the standard format,
 once a standard format is defined.
 Part of the motivation for the proposal is to avoid
 the need to process custom output formats in future benchmarks.

 ## Proposal

 A Go benchmark data file is a UTF-8 textual file consisting of a sequence of lines.
 Configuration lines and benchmark result lines, described below,
 have semantic meaning in the reporting of benchmark results.

 All other lines in the data file, including but not limited to
 blank lines and lines beginning with a # character, are ignored.
 For example, the testing package prints test results above benchmark data,
 usually the text `PASS`. That line is neither a configuration line nor a benchmark
 result line, so it is ignored.

 ### Configuration Lines

 A configuration line is a key-value pair of the form

 	key: value

 where key begins with a lower case character (as defined by `unicode.IsLower`),
 contains no space characters (as defined by `unicode.IsSpace`)
 nor upper case characters (as defined by `unicode.IsUpper`),
 and one or more ASCII space or tab characters separate “key:” from “value.”
 Conventionally, multiword keys are written with the words
 separated by hyphens, as in cpu-speed.
 There are no restrictions on value, except that it cannot contain a newline character.
 Value can be omitted entirely, in which case the colon must still be
 present, but need not be followed by a space.

 The interpretation of a key/value pair is up to tooling, but the key/value pair
 is considered to describe all benchmark results that follow,
 until overwritten by a configuration line with the same key.

 ### Benchmark Results

 A benchmark result line has the general form

 	<name> <iterations> <value> <unit> [<value> <unit>...]

 The fields are separated by runs of space characters (as defined by `unicode.IsSpace`),
 so the line can be parsed with `strings.Fields`.
 The line must have an even number of fields, and at least four.

 The first field is the benchmark name, which must begin with `Benchmark`
 followed by an upper case character (as defined by `unicode.IsUpper`)
 or the end of the field,
 as in `BenchmarkReverseString` or just `Benchmark`.
 Tools displaying benchmark data conventionally omit the `Benchmark` prefix.
 The same benchmark name can appear on multiple result lines,
 indicating that the benchmark was run multiple times.

 The second field gives the number of iterations run.
 For most processing this number can be ignored, although
 it may give some indication of the expected accuracy
 of the measurements that follow.

 The remaining fields report value/unit pairs in which the value
 is a float64 that can be parsed by `strconv.ParseFloat`
 and the unit explains the value, as in “64.88 MB/s”.
 The units reported are typically normalized so that they can be
 interpreted without considering to the number of iterations.
 In the example, the CPU cost is reported per-operation and the
 throughput is reported per-second; neither is a total that
 depends on the number of iterations.

 ### Value Units

 A value's unit string is expected to specify not only the measurement unit
 but also, as needed, a description of what is being measured.
 For example, a benchmark might report its overall execution time
 as well as cache miss times with three units “ns/op,” “L1-miss-ns/op,”and “L2-miss-ns/op.”

 Tooling can expect that the unit strings are identical for all runs to be compared;
 for example, a result reporting “ns/op” need not be considered comparable
 to one reporting “µs/op.”

 However, tooling may assume that the measurement unit is the final
 of the hyphen-separated words in the unit string and may recognize
 and rescale known measurement units.
 For example, consistently large “ns/op” or “L1-miss-ns/op”
 might be rescaled to “ms/op” or “L1-miss-ms/op” for display.

 ### Benchmark Name Configuration

 In the current testing package, benchmark names correspond to Go identifiers:
 each benchmark must be written as a different Go function.
 [Work targeted for Go 1.7](https://github.com/golang/proposal/blob/master/design/12166-subtests.md) will allow tests and benchmarks
 to define sub-tests and sub-benchmarks programatically,
 in particular to vary interesting parameters both when
 testing and when benchmarking.
 That work uses a slash to separate the name of a benchmark
 collection from the description of a sub-benchmark.

 We propose that sub-benchmarks adopt the convention of
 choosing names that are key=value pairs;
 that slash-prefixed key=value pairs in the benchmark name are
 treated by benchmark data processors as per-benchmark
 configuration values.

 ### Example

 The benchmark output given in the background section above
 is already in the format proposed here.
 That is a key feature of the proposal.

 However, a future run of the benchmark might add configuration lines,
 and the benchmark might be rewritten to use sub-benchmarks,
 producing this output:

 	commit: 7cd9055
 	commit-time: 2016-02-11T13:25:45-0500
 	goos: darwin
 	goarch: amd64
 	cpu: Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
 	cpu-count: 8
 	cpu-physical-count: 4
 	os: Mac OS X 10.11.3
 	mem: 16 GB

 	BenchmarkDecode/text=digits/level=speed/size=1e4-8   	     100	    154125 ns/op	  64.88 MB/s	   40418 B/op	       7 allocs/op
 	BenchmarkDecode/text=digits/level=speed/size=1e5-8   	      10	   1367632 ns/op	  73.12 MB/s	   41356 B/op	      14 allocs/op
 	BenchmarkDecode/text=digits/level=speed/size=1e6-8   	       1	  13879794 ns/op	  72.05 MB/s	   52056 B/op	      94 allocs/op
 	BenchmarkDecode/text=digits/level=default/size=1e4-8 	     100	    147551 ns/op	  67.77 MB/s	   40418 B/op	       8 allocs/op
 	BenchmarkDecode/text=digits/level=default/size=1e5-8 	      10	   1197672 ns/op	  83.50 MB/s	   41508 B/op	      13 allocs/op
 	BenchmarkDecode/text=digits/level=default/size=1e6-8 	       1	  11808775 ns/op	  84.68 MB/s	   53800 B/op	      80 allocs/op
 	BenchmarkDecode/text=digits/level=best/size=1e4-8    	     100	    143348 ns/op	  69.76 MB/s	   40417 B/op	       8 allocs/op
 	BenchmarkDecode/text=digits/level=best/size=1e5-8    	      10	   1185527 ns/op	  84.35 MB/s	   41508 B/op	      13 allocs/op
 	BenchmarkDecode/text=digits/level=best/size=1e6-8    	       1	  11740304 ns/op	  85.18 MB/s	   53800 B/op	      80 allocs/op
 	BenchmarkDecode/text=twain/level=speed/size=1e4-8    	     100	    143665 ns/op	  69.61 MB/s	   40849 B/op	      15 allocs/op
 	BenchmarkDecode/text=twain/level=speed/size=1e5-8    	      10	   1390359 ns/op	  71.92 MB/s	   45700 B/op	      31 allocs/op
 	BenchmarkDecode/text=twain/level=speed/size=1e6-8    	       1	  12128469 ns/op	  82.45 MB/s	   89336 B/op	     221 allocs/op
 	BenchmarkDecode/text=twain/level=default/size=1e4-8  	     100	    141916 ns/op	  70.46 MB/s	   40849 B/op	      15 allocs/op
 	BenchmarkDecode/text=twain/level=default/size=1e5-8  	      10	   1076669 ns/op	  92.88 MB/s	   43820 B/op	      28 allocs/op
 	BenchmarkDecode/text=twain/level=default/size=1e6-8  	       1	  10106485 ns/op	  98.95 MB/s	   71096 B/op	     172 allocs/op
 	BenchmarkDecode/text=twain/level=best/size=1e4-8     	     100	    138516 ns/op	  72.19 MB/s	   40849 B/op	      15 allocs/op
 	BenchmarkDecode/text=twain/level=best/size=1e5-8     	      10	   1227964 ns/op	  81.44 MB/s	   43316 B/op	      25 allocs/op
 	BenchmarkDecode/text=twain/level=best/size=1e6-8     	       1	  10040347 ns/op	  99.60 MB/s	   72120 B/op	     173 allocs/op
 	BenchmarkEncode/text=digits/level=speed/size=1e4-8   	      30	    482808 ns/op	  20.71 MB/s
 	BenchmarkEncode/text=digits/level=speed/size=1e5-8   	       5	   2685455 ns/op	  37.24 MB/s
 	BenchmarkEncode/text=digits/level=speed/size=1e6-8   	       1	  24966055 ns/op	  40.05 MB/s
 	BenchmarkEncode/text=digits/level=default/size=1e4-8 	      20	    655592 ns/op	  15.25 MB/s
 	BenchmarkEncode/text=digits/level=default/size=1e5-8 	       1	  13000839 ns/op	   7.69 MB/s
 	BenchmarkEncode/text=digits/level=default/size=1e6-8 	       1	 136341747 ns/op	   7.33 MB/s
 	BenchmarkEncode/text=digits/level=best/size=1e4-8    	      20	    668083 ns/op	  14.97 MB/s
 	BenchmarkEncode/text=digits/level=best/size=1e5-8    	       1	  12301511 ns/op	   8.13 MB/s
 	BenchmarkEncode/text=digits/level=best/size=1e6-8    	       1	 137962041 ns/op	   7.25 MB/s

 Using sub-benchmarks has benefits beyond this proposal, namely that it would
 avoid the current repetitive code:

 	func BenchmarkDecodeDigitsSpeed1e4(b *testing.B)    { benchmarkDecode(b, digits, speed, 1e4) }
 	func BenchmarkDecodeDigitsSpeed1e5(b *testing.B)    { benchmarkDecode(b, digits, speed, 1e5) }
 	func BenchmarkDecodeDigitsSpeed1e6(b *testing.B)    { benchmarkDecode(b, digits, speed, 1e6) }
 	func BenchmarkDecodeDigitsDefault1e4(b *testing.B)  { benchmarkDecode(b, digits, default_, 1e4) }
 	func BenchmarkDecodeDigitsDefault1e5(b *testing.B)  { benchmarkDecode(b, digits, default_, 1e5) }
 	func BenchmarkDecodeDigitsDefault1e6(b *testing.B)  { benchmarkDecode(b, digits, default_, 1e6) }
 	func BenchmarkDecodeDigitsCompress1e4(b *testing.B) { benchmarkDecode(b, digits, compress, 1e4) }
 	func BenchmarkDecodeDigitsCompress1e5(b *testing.B) { benchmarkDecode(b, digits, compress, 1e5) }
 	func BenchmarkDecodeDigitsCompress1e6(b *testing.B) { benchmarkDecode(b, digits, compress, 1e6) }
 	func BenchmarkDecodeTwainSpeed1e4(b *testing.B)     { benchmarkDecode(b, twain, speed, 1e4) }
 	func BenchmarkDecodeTwainSpeed1e5(b *testing.B)     { benchmarkDecode(b, twain, speed, 1e5) }
 	func BenchmarkDecodeTwainSpeed1e6(b *testing.B)     { benchmarkDecode(b, twain, speed, 1e6) }
 	func BenchmarkDecodeTwainDefault1e4(b *testing.B)   { benchmarkDecode(b, twain, default_, 1e4) }
 	func BenchmarkDecodeTwainDefault1e5(b *testing.B)   { benchmarkDecode(b, twain, default_, 1e5) }
 	func BenchmarkDecodeTwainDefault1e6(b *testing.B)   { benchmarkDecode(b, twain, default_, 1e6) }
 	func BenchmarkDecodeTwainCompress1e4(b *testing.B)  { benchmarkDecode(b, twain, compress, 1e4) }
 	func BenchmarkDecodeTwainCompress1e5(b *testing.B)  { benchmarkDecode(b, twain, compress, 1e5) }
 	func BenchmarkDecodeTwainCompress1e6(b *testing.B)  { benchmarkDecode(b, twain, compress, 1e6) }

 More importantly for this proposal, using sub-benchmarks also makes the possible
 comparison axes clear: digits vs twait, speed vs default vs best, size 1e4 vs 1e5 vs 1e6.

 ## Rationale

 As discussed in the background section,
 we have already developed a number of analysis programs
 that assume this proposal's format,
 as well as a number of programs that generate this format.
 Standardizing the format should encourage additional work
 on both kinds of programs.

 [Issue 12826](https://golang.org/issue/12826) suggests a different approach,
 namely the addition of a new `go test` option `-benchformat`, to control
 the format of benchmark output. In fact it gives the lack of standardization
 as the main justification for a new option:

 > Currently `go test -bench .` prints out benchmark results in a
 > certain format, but there is no guarantee that this format will not
 > change. Thus a tool that parses go test output may break if an
 > incompatible change to the output format is made.

 Our approach is instead to guarantee that the format will not change,
 or rather that it will only change in ways allowed by this design.
 An analysis tool that parses the output specified here will not break
 in future versions of Go,
 and a tool that generates the output specified here will work
 with all such analysis tools.
 Having one agreed-upon format enables broad interoperation;
 the ability for one tool to generate arbitrarily many different formats
 does not achieve the same result.

 The proposed format also seems to be extensible enough to accommodate
 anticipated future work on benchmark reporting.

 The main known issue with the current `go test -bench` is that
 we'd like to emit finer-grained detail about runs, for linearity testing
 and more robust statistics (see [issue 10669](https://golang.org/issue/10669)).
 This proposal allows that by simply printing more result lines.

 Another known issue is that we may want to add custom outputs
 such as garbage collector statistics to certain benchmark runs.
 This proposal allows that by adding more value-unit pairs.

 ## Compatibility

 Tools consuming existing benchmark format may need trivial changes
 to ignore non-benchmark result lines or to cope with additional value-unit pairs
 in benchmark results.

 ## Implementation

 The benchmark format described here is already generated by `go test -bench`
 and expected by tools like `benchcmp` and `benchstat`.

 The format is trivial to generate, and it is
 straightforward but not quite trivial to parse.

 We anticipate that the [new x/perf subrepo](https://github.com/golang/go/issues/14304) will include a library for loading
 benchmark data from files, although the format is also simple enough that
 tools that want a different in-memory representation might reasonably
 write separate parsers.
	# Proposal: Go Benchmark Data Format

	Authors: Russ Cox, Austin Clements

	Last updated: February 2016

	Discussion at [golang.org/issue/14313](https://golang.org/issue/14313).

	## Abstract

	We propose to make the current output of `go test -bench` the defined format for recording all Go benchmark data.
	Having a defined format allows benchmark measurement programs
	and benchmark analysis programs to interoperate while
	evolving independently.

	## Background

	### Benchmark data formats

	We are unaware of any standard formats for recording raw benchmark data,
	and we've been unable to find any using web searches.
	One might expect that a standard benchmark suite such as SPEC CPU2006 would have
	defined a format for raw results, but that appears not to be the case.
	The [collection of published results](https://www.spec.org/cpu2006/results/)
	includes only analyzed data ([example](https://www.spec.org/cpu2006/results/res2011q3/cpu2006-20110620-17230.txt)), not raw data.

	Go has a de facto standard format for benchmark data:
	the lines generated by the testing package when using `go test -bench`.
	For example, running compress/flate's benchmarks produces this output:

	BenchmarkDecodeDigitsSpeed1e4-8 100 154125 ns/op 64.88 MB/s 40418 B/op 7 allocs/op
	BenchmarkDecodeDigitsSpeed1e5-8 10 1367632 ns/op 73.12 MB/s 41356 B/op 14 allocs/op
	BenchmarkDecodeDigitsSpeed1e6-8 1 13879794 ns/op 72.05 MB/s 52056 B/op 94 allocs/op
	BenchmarkDecodeDigitsDefault1e4-8 100 147551 ns/op 67.77 MB/s 40418 B/op 8 allocs/op
	BenchmarkDecodeDigitsDefault1e5-8 10 1197672 ns/op 83.50 MB/s 41508 B/op 13 allocs/op
	BenchmarkDecodeDigitsDefault1e6-8 1 11808775 ns/op 84.68 MB/s 53800 B/op 80 allocs/op
	BenchmarkDecodeDigitsCompress1e4-8 100 143348 ns/op 69.76 MB/s 40417 B/op 8 allocs/op
	BenchmarkDecodeDigitsCompress1e5-8 10 1185527 ns/op 84.35 MB/s 41508 B/op 13 allocs/op
	BenchmarkDecodeDigitsCompress1e6-8 1 11740304 ns/op 85.18 MB/s 53800 B/op 80 allocs/op
	BenchmarkDecodeTwainSpeed1e4-8 100 143665 ns/op 69.61 MB/s 40849 B/op 15 allocs/op
	BenchmarkDecodeTwainSpeed1e5-8 10 1390359 ns/op 71.92 MB/s 45700 B/op 31 allocs/op
	BenchmarkDecodeTwainSpeed1e6-8 1 12128469 ns/op 82.45 MB/s 89336 B/op 221 allocs/op
	BenchmarkDecodeTwainDefault1e4-8 100 141916 ns/op 70.46 MB/s 40849 B/op 15 allocs/op
	BenchmarkDecodeTwainDefault1e5-8 10 1076669 ns/op 92.88 MB/s 43820 B/op 28 allocs/op
	BenchmarkDecodeTwainDefault1e6-8 1 10106485 ns/op 98.95 MB/s 71096 B/op 172 allocs/op
	BenchmarkDecodeTwainCompress1e4-8 100 138516 ns/op 72.19 MB/s 40849 B/op 15 allocs/op
	BenchmarkDecodeTwainCompress1e5-8 10 1227964 ns/op 81.44 MB/s 43316 B/op 25 allocs/op
	BenchmarkDecodeTwainCompress1e6-8 1 10040347 ns/op 99.60 MB/s 72120 B/op 173 allocs/op
	BenchmarkEncodeDigitsSpeed1e4-8 30 482808 ns/op 20.71 MB/s
	BenchmarkEncodeDigitsSpeed1e5-8 5 2685455 ns/op 37.24 MB/s
	BenchmarkEncodeDigitsSpeed1e6-8 1 24966055 ns/op 40.05 MB/s
	BenchmarkEncodeDigitsDefault1e4-8 20 655592 ns/op 15.25 MB/s
	BenchmarkEncodeDigitsDefault1e5-8 1 13000839 ns/op 7.69 MB/s
	BenchmarkEncodeDigitsDefault1e6-8 1 136341747 ns/op 7.33 MB/s
	BenchmarkEncodeDigitsCompress1e4-8 20 668083 ns/op 14.97 MB/s
	BenchmarkEncodeDigitsCompress1e5-8 1 12301511 ns/op 8.13 MB/s
	BenchmarkEncodeDigitsCompress1e6-8 1 137962041 ns/op 7.25 MB/s

	The testing package always reports ns/op, and each benchmark can request the addition of MB/s (throughput) and also B/op and allocs/op (allocation rates).

	### Benchmark processors

	Multiple tools have been written that process this format,
	most notably [benchcmp](https://godoc.org/golang.org/x/tools/cmd/benchcmp)
	and its more statistically valid successor [benchstat](https://godoc.org/rsc.io/benchstat).
	There is also [benchmany](https://godoc.org/github.com/aclements/go-misc/benchmany)'s plot subcommand
	and likely more unpublished programs.

	### Benchmark runners

	Multiple tools have also been written that generate this format.
	In addition to the standard Go testing package,
	[compilebench](https://godoc.org/rsc.io/compilebench)
	generates this data format based on runs of the Go compiler,
	and Austin's unpublished shellbench generates this data format
	after running an arbitrary shell command.

	The [golang.org/x/benchmarks/bench](https://golang.org/x/benchmarks/bench) benchmarks
	are notable for _not_ generating this format,
	which has made all analysis of those results
	more complex than we believe it should be.
	We intend to update those benchmarks to generate the standard format,
	once a standard format is defined.
	Part of the motivation for the proposal is to avoid
	the need to process custom output formats in future benchmarks.

	## Proposal

	A Go benchmark data file is a UTF-8 textual file consisting of a sequence of lines.
	Configuration lines and benchmark result lines, described below,
	have semantic meaning in the reporting of benchmark results.

	All other lines in the data file, including but not limited to
	blank lines and lines beginning with a # character, are ignored.
	For example, the testing package prints test results above benchmark data,
	usually the text `PASS`. That line is neither a configuration line nor a benchmark
	result line, so it is ignored.

	### Configuration Lines

	A configuration line is a key-value pair of the form

	key: value

	where key begins with a lower case character (as defined by `unicode.IsLower`),
	contains no space characters (as defined by `unicode.IsSpace`)
	nor upper case characters (as defined by `unicode.IsUpper`),
	and one or more ASCII space or tab characters separate “key:” from “value.”
	Conventionally, multiword keys are written with the words
	separated by hyphens, as in cpu-speed.
	There are no restrictions on value, except that it cannot contain a newline character.
	Value can be omitted entirely, in which case the colon must still be
	present, but need not be followed by a space.

	The interpretation of a key/value pair is up to tooling, but the key/value pair
	is considered to describe all benchmark results that follow,
	until overwritten by a configuration line with the same key.

	### Benchmark Results

	A benchmark result line has the general form

	<name> <iterations> <value> <unit> [<value> <unit>...]

	The fields are separated by runs of space characters (as defined by `unicode.IsSpace`),
	so the line can be parsed with `strings.Fields`.
	The line must have an even number of fields, and at least four.

	The first field is the benchmark name, which must begin with `Benchmark`
	followed by an upper case character (as defined by `unicode.IsUpper`)
	or the end of the field,
	as in `BenchmarkReverseString` or just `Benchmark`.
	Tools displaying benchmark data conventionally omit the `Benchmark` prefix.
	The same benchmark name can appear on multiple result lines,
	indicating that the benchmark was run multiple times.

	The second field gives the number of iterations run.
	For most processing this number can be ignored, although
	it may give some indication of the expected accuracy
	of the measurements that follow.

	The remaining fields report value/unit pairs in which the value
	is a float64 that can be parsed by `strconv.ParseFloat`
	and the unit explains the value, as in “64.88 MB/s”.
	The units reported are typically normalized so that they can be
	interpreted without considering to the number of iterations.
	In the example, the CPU cost is reported per-operation and the
	throughput is reported per-second; neither is a total that
	depends on the number of iterations.

	### Value Units

	A value's unit string is expected to specify not only the measurement unit
	but also, as needed, a description of what is being measured.
	For example, a benchmark might report its overall execution time
	as well as cache miss times with three units “ns/op,” “L1-miss-ns/op,”and “L2-miss-ns/op.”

	Tooling can expect that the unit strings are identical for all runs to be compared;
	for example, a result reporting “ns/op” need not be considered comparable
	to one reporting “µs/op.”

	However, tooling may assume that the measurement unit is the final
	of the hyphen-separated words in the unit string and may recognize
	and rescale known measurement units.
	For example, consistently large “ns/op” or “L1-miss-ns/op”
	might be rescaled to “ms/op” or “L1-miss-ms/op” for display.

	### Benchmark Name Configuration

	In the current testing package, benchmark names correspond to Go identifiers:
	each benchmark must be written as a different Go function.
	[Work targeted for Go 1.7](https://github.com/golang/proposal/blob/master/design/12166-subtests.md) will allow tests and benchmarks
	to define sub-tests and sub-benchmarks programatically,
	in particular to vary interesting parameters both when
	testing and when benchmarking.
	That work uses a slash to separate the name of a benchmark
	collection from the description of a sub-benchmark.

	We propose that sub-benchmarks adopt the convention of
	choosing names that are key=value pairs;
	that slash-prefixed key=value pairs in the benchmark name are
	treated by benchmark data processors as per-benchmark
	configuration values.

	### Example

	The benchmark output given in the background section above
	is already in the format proposed here.
	That is a key feature of the proposal.

	However, a future run of the benchmark might add configuration lines,
	and the benchmark might be rewritten to use sub-benchmarks,
	producing this output:

	commit: 7cd9055
	commit-time: 2016-02-11T13:25:45-0500
	goos: darwin
	goarch: amd64
	cpu: Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
	cpu-count: 8
	cpu-physical-count: 4
	os: Mac OS X 10.11.3
	mem: 16 GB

	BenchmarkDecode/text=digits/level=speed/size=1e4-8 100 154125 ns/op 64.88 MB/s 40418 B/op 7 allocs/op
	BenchmarkDecode/text=digits/level=speed/size=1e5-8 10 1367632 ns/op 73.12 MB/s 41356 B/op 14 allocs/op
	BenchmarkDecode/text=digits/level=speed/size=1e6-8 1 13879794 ns/op 72.05 MB/s 52056 B/op 94 allocs/op
	BenchmarkDecode/text=digits/level=default/size=1e4-8 100 147551 ns/op 67.77 MB/s 40418 B/op 8 allocs/op
	BenchmarkDecode/text=digits/level=default/size=1e5-8 10 1197672 ns/op 83.50 MB/s 41508 B/op 13 allocs/op
	BenchmarkDecode/text=digits/level=default/size=1e6-8 1 11808775 ns/op 84.68 MB/s 53800 B/op 80 allocs/op
	BenchmarkDecode/text=digits/level=best/size=1e4-8 100 143348 ns/op 69.76 MB/s 40417 B/op 8 allocs/op
	BenchmarkDecode/text=digits/level=best/size=1e5-8 10 1185527 ns/op 84.35 MB/s 41508 B/op 13 allocs/op
	BenchmarkDecode/text=digits/level=best/size=1e6-8 1 11740304 ns/op 85.18 MB/s 53800 B/op 80 allocs/op
	BenchmarkDecode/text=twain/level=speed/size=1e4-8 100 143665 ns/op 69.61 MB/s 40849 B/op 15 allocs/op
	BenchmarkDecode/text=twain/level=speed/size=1e5-8 10 1390359 ns/op 71.92 MB/s 45700 B/op 31 allocs/op
	BenchmarkDecode/text=twain/level=speed/size=1e6-8 1 12128469 ns/op 82.45 MB/s 89336 B/op 221 allocs/op
	BenchmarkDecode/text=twain/level=default/size=1e4-8 100 141916 ns/op 70.46 MB/s 40849 B/op 15 allocs/op
	BenchmarkDecode/text=twain/level=default/size=1e5-8 10 1076669 ns/op 92.88 MB/s 43820 B/op 28 allocs/op
	BenchmarkDecode/text=twain/level=default/size=1e6-8 1 10106485 ns/op 98.95 MB/s 71096 B/op 172 allocs/op
	BenchmarkDecode/text=twain/level=best/size=1e4-8 100 138516 ns/op 72.19 MB/s 40849 B/op 15 allocs/op
	BenchmarkDecode/text=twain/level=best/size=1e5-8 10 1227964 ns/op 81.44 MB/s 43316 B/op 25 allocs/op
	BenchmarkDecode/text=twain/level=best/size=1e6-8 1 10040347 ns/op 99.60 MB/s 72120 B/op 173 allocs/op
	BenchmarkEncode/text=digits/level=speed/size=1e4-8 30 482808 ns/op 20.71 MB/s
	BenchmarkEncode/text=digits/level=speed/size=1e5-8 5 2685455 ns/op 37.24 MB/s
	BenchmarkEncode/text=digits/level=speed/size=1e6-8 1 24966055 ns/op 40.05 MB/s
	BenchmarkEncode/text=digits/level=default/size=1e4-8 20 655592 ns/op 15.25 MB/s
	BenchmarkEncode/text=digits/level=default/size=1e5-8 1 13000839 ns/op 7.69 MB/s
	BenchmarkEncode/text=digits/level=default/size=1e6-8 1 136341747 ns/op 7.33 MB/s
	BenchmarkEncode/text=digits/level=best/size=1e4-8 20 668083 ns/op 14.97 MB/s
	BenchmarkEncode/text=digits/level=best/size=1e5-8 1 12301511 ns/op 8.13 MB/s
	BenchmarkEncode/text=digits/level=best/size=1e6-8 1 137962041 ns/op 7.25 MB/s

	Using sub-benchmarks has benefits beyond this proposal, namely that it would
	avoid the current repetitive code:

	func BenchmarkDecodeDigitsSpeed1e4(b *testing.B) { benchmarkDecode(b, digits, speed, 1e4) }
	func BenchmarkDecodeDigitsSpeed1e5(b *testing.B) { benchmarkDecode(b, digits, speed, 1e5) }
	func BenchmarkDecodeDigitsSpeed1e6(b *testing.B) { benchmarkDecode(b, digits, speed, 1e6) }
	func BenchmarkDecodeDigitsDefault1e4(b *testing.B) { benchmarkDecode(b, digits, default_, 1e4) }
	func BenchmarkDecodeDigitsDefault1e5(b *testing.B) { benchmarkDecode(b, digits, default_, 1e5) }
	func BenchmarkDecodeDigitsDefault1e6(b *testing.B) { benchmarkDecode(b, digits, default_, 1e6) }
	func BenchmarkDecodeDigitsCompress1e4(b *testing.B) { benchmarkDecode(b, digits, compress, 1e4) }
	func BenchmarkDecodeDigitsCompress1e5(b *testing.B) { benchmarkDecode(b, digits, compress, 1e5) }
	func BenchmarkDecodeDigitsCompress1e6(b *testing.B) { benchmarkDecode(b, digits, compress, 1e6) }
	func BenchmarkDecodeTwainSpeed1e4(b *testing.B) { benchmarkDecode(b, twain, speed, 1e4) }
	func BenchmarkDecodeTwainSpeed1e5(b *testing.B) { benchmarkDecode(b, twain, speed, 1e5) }
	func BenchmarkDecodeTwainSpeed1e6(b *testing.B) { benchmarkDecode(b, twain, speed, 1e6) }
	func BenchmarkDecodeTwainDefault1e4(b *testing.B) { benchmarkDecode(b, twain, default_, 1e4) }
	func BenchmarkDecodeTwainDefault1e5(b *testing.B) { benchmarkDecode(b, twain, default_, 1e5) }
	func BenchmarkDecodeTwainDefault1e6(b *testing.B) { benchmarkDecode(b, twain, default_, 1e6) }
	func BenchmarkDecodeTwainCompress1e4(b *testing.B) { benchmarkDecode(b, twain, compress, 1e4) }
	func BenchmarkDecodeTwainCompress1e5(b *testing.B) { benchmarkDecode(b, twain, compress, 1e5) }
	func BenchmarkDecodeTwainCompress1e6(b *testing.B) { benchmarkDecode(b, twain, compress, 1e6) }

	More importantly for this proposal, using sub-benchmarks also makes the possible
	comparison axes clear: digits vs twait, speed vs default vs best, size 1e4 vs 1e5 vs 1e6.

	## Rationale

	As discussed in the background section,
	we have already developed a number of analysis programs
	that assume this proposal's format,
	as well as a number of programs that generate this format.
	Standardizing the format should encourage additional work
	on both kinds of programs.

	[Issue 12826](https://golang.org/issue/12826) suggests a different approach,
	namely the addition of a new `go test` option `-benchformat`, to control
	the format of benchmark output. In fact it gives the lack of standardization
	as the main justification for a new option:

	> Currently `go test -bench .` prints out benchmark results in a
	> certain format, but there is no guarantee that this format will not
	> change. Thus a tool that parses go test output may break if an
	> incompatible change to the output format is made.

	Our approach is instead to guarantee that the format will not change,
	or rather that it will only change in ways allowed by this design.
	An analysis tool that parses the output specified here will not break
	in future versions of Go,
	and a tool that generates the output specified here will work
	with all such analysis tools.
	Having one agreed-upon format enables broad interoperation;
	the ability for one tool to generate arbitrarily many different formats
	does not achieve the same result.

	The proposed format also seems to be extensible enough to accommodate
	anticipated future work on benchmark reporting.

	The main known issue with the current `go test -bench` is that
	we'd like to emit finer-grained detail about runs, for linearity testing
	and more robust statistics (see [issue 10669](https://golang.org/issue/10669)).
	This proposal allows that by simply printing more result lines.

	Another known issue is that we may want to add custom outputs
	such as garbage collector statistics to certain benchmark runs.
	This proposal allows that by adding more value-unit pairs.

	## Compatibility

	Tools consuming existing benchmark format may need trivial changes
	to ignore non-benchmark result lines or to cope with additional value-unit pairs
	in benchmark results.

	## Implementation

	The benchmark format described here is already generated by `go test -bench`
	and expected by tools like `benchcmp` and `benchstat`.

	The format is trivial to generate, and it is
	straightforward but not quite trivial to parse.

	We anticipate that the [new x/perf subrepo](https://github.com/golang/go/issues/14304) will include a library for loading
	benchmark data from files, although the format is also simple enough that
	tools that want a different in-memory representation might reasonably
	write separate parsers.