sweet/README.md - benchmarks - Git at Google

 # Sweet: Benchmarking Suite for Go Implementations

 Sweet is a set of benchmarks derived from the Go community which are intended
 to represent a breadth of real-world applications. The primary use-case of this
 suite is to perform an evaluation of the difference in CPU and memory
 performance between two Go implementations.

 If you use this benchmarking suite for any measurements, please ensure you use
 a versioned release and note the version in the release.

 ## Quickstart

 ### Supported Platforms

 * linux/amd64

 ### Dependencies

 The `sweet` tool only depends on having a stable version of Go and `git`.

 Some benchmarks, however, have various requirements for building. Notably
 they are:

 * `make` (tile38)
 * `bash` (tile38)
 * `binutils` (tile38)

 Please ensure your system has these tools installed and available in your
 system's PATH.

 Furthermore, some benchmarks are able to produce additional information
 on some platforms. For instance, running on platforms where systemd is available
 adds an average RSS measurement for the go-build benchmark.

 #### gVisor

 The gVisor benchmark has additional requirements:
 * The target platform must be `linux/amd64`. Nothing else is supported or ever
   will be.
 * The `ptrace` API must be enabled on your system. Set
   `/proc/sys/kernel/yama/ptrace_scope` appropriately (0 and 1 work, 2 might,
   3 will not).

 ### Build

 ```sh
 $ go build ./cmd/sweet
 ```

 ### Getting Assets

 ```sh
 $ ./sweet get
 ```

 ### Running the benchmarks

 Create a configuration file called `config.toml` with the following contents:

 ```toml
 [[config]]
   name = "myconfig"
   goroot = "<insert some GOROOT here>"
 ```

 Run the benchmarks by running:

 ```sh
 $ ./sweet run -shell config.toml
 ```

 Benchmark results will appear in the `results` directory.

 `-shell` will cause the tool to print each action it performs as a shell
 command. Note that while the shell commands are valid for many systems, they
 may depend on tools being available on your system that `sweet` does not
 require.

 Note that by default `sweet run` expects to be executed in
 `/path/to/x/benchmarks/sweet`, that is, the root of the Sweet subdirectory in
 the `x/benchmarks` repository.
 To execute it from somewhere else, point `-bench-dir` at
 `/path/to/x/benchmarks/sweet/benchmarks`.

 ## Tips and Rules of Thumb

 * If you're not confident if your experimental Go toolchain will work with all
   the benchmarks, try the `-short` flag to run to get much faster feedback on
   whether each benchmark builds and runs.
 * You can expect the benchmarks to take a few hours to run with the default
   settings on a somewhat sizable Linux box.
 * If a benchmark fails to build, run with `-shell` and copy and re-run the
   last command to get full output.
   TODO(mknyszek): Dump the output to the terminal.
 * If a benchmark fails to run, the output should have been captured in the
   corresponding results file (e.g. if biogo-igor failed, check
   `/path/to/results/biogo-igor/myconfig.results`) which is really just the
   stderr (and usually stdout too) of the benchmark. You can also try to re-run
   it yourself with the output of `-shell`.

 ## Memory Requirements

 These benchmarks generally try to stress the Go runtime in interesting ways, and
 some may end up with very large heaps. Therefore, it's recommended to run the
 suite on a system with at least 16 GiB of RAM available to minimize the chance
 that results are lost due to an out-of-memory error.

 ## Configuration Format

 The configuration is TOML-based and a more detailed description of fields may
 be found in the help docs for the `run` subcommand:

 ```sh
 $ ./sweet help run
 ```

 ## Results Format

 Results are produced into a single directory containing each benchmark as a
 sub-directory. Within each sub-directory is one file per configuration
 containing the stderr (and usually combined stdout) of the benchmark run,
 which also doubles as the benchmark output format.

 All results are reported in the standard Go testing package format, such that
 results may be compared using the
 [benchstat](https://godoc.org/golang.org/x/perf/cmd/benchstat) tool.

 Results then may also be composed together for easy viewing. For example, if
 one runs sweet with two configurations named `config1` and `config2`, then to
 quickly compare all results, do:

 ```sh
 $ cat results/*/config1.results > config1.results
 $ cat results/*/config2.results > config2.results
 $ benchstat config1.results config2.results
 ```

 ## Noise

 This benchmark suite tries to keep noise low in measurements where possible.
 * Each measurement is taken against a fresh OS process.
 * Benchmarks have been modified to reduce noise from the input.
   * All inputs are deterministic, including implicit inputs, such as querying an
 	existing database.
   * Inputs are loaded into memory when possible instead of streamed from disk.
 * The suite mitigates external effects we can control (e.g. the suite is aware
   of its co-tenancy with the benchmark and throttles itself when the benchmarks
   are running).

 ### Tips for Reducing Noise

 * Sweet should be run on a dedicated machine where a [perflock
   daemon](https://github.com/aclements/perflock) is running (to avoid noise due
   to CPU throttling).
 * Avoid running these benchmarks in cloud environments if possible. Generally
   the noise inherent to those environments can skew A/B tests and hide small
   changes in performance. See [this paper](https://peerj.com/preprints/3507.pdf)
   for more details. Try to use dedicated hardware instead.

 *Do not* compare results produced by separate invocations of the `sweet` tool.
	# Sweet: Benchmarking Suite for Go Implementations

	Sweet is a set of benchmarks derived from the Go community which are intended
	to represent a breadth of real-world applications. The primary use-case of this
	suite is to perform an evaluation of the difference in CPU and memory
	performance between two Go implementations.

	If you use this benchmarking suite for any measurements, please ensure you use
	a versioned release and note the version in the release.

	## Quickstart

	### Supported Platforms

	* linux/amd64

	### Dependencies

	The `sweet` tool only depends on having a stable version of Go and `git`.

	Some benchmarks, however, have various requirements for building. Notably
	they are:

	* `make` (tile38)
	* `bash` (tile38)
	* `binutils` (tile38)

	Please ensure your system has these tools installed and available in your
	system's PATH.

	Furthermore, some benchmarks are able to produce additional information
	on some platforms. For instance, running on platforms where systemd is available
	adds an average RSS measurement for the go-build benchmark.

	#### gVisor

	The gVisor benchmark has additional requirements:
	* The target platform must be `linux/amd64`. Nothing else is supported or ever
	will be.
	* The `ptrace` API must be enabled on your system. Set
	`/proc/sys/kernel/yama/ptrace_scope` appropriately (0 and 1 work, 2 might,
	3 will not).

	### Build

	```sh
	$ go build ./cmd/sweet
	```

	### Getting Assets

	```sh
	$ ./sweet get
	```

	### Running the benchmarks

	Create a configuration file called `config.toml` with the following contents:

	```toml
	[[config]]
	name = "myconfig"
	goroot = "<insert some GOROOT here>"
	```

	Run the benchmarks by running:

	```sh
	$ ./sweet run -shell config.toml
	```

	Benchmark results will appear in the `results` directory.

	`-shell` will cause the tool to print each action it performs as a shell
	command. Note that while the shell commands are valid for many systems, they
	may depend on tools being available on your system that `sweet` does not
	require.

	Note that by default `sweet run` expects to be executed in
	`/path/to/x/benchmarks/sweet`, that is, the root of the Sweet subdirectory in
	the `x/benchmarks` repository.
	To execute it from somewhere else, point `-bench-dir` at
	`/path/to/x/benchmarks/sweet/benchmarks`.

	## Tips and Rules of Thumb

	* If you're not confident if your experimental Go toolchain will work with all
	the benchmarks, try the `-short` flag to run to get much faster feedback on
	whether each benchmark builds and runs.
	* You can expect the benchmarks to take a few hours to run with the default
	settings on a somewhat sizable Linux box.
	* If a benchmark fails to build, run with `-shell` and copy and re-run the
	last command to get full output.
	TODO(mknyszek): Dump the output to the terminal.
	* If a benchmark fails to run, the output should have been captured in the
	corresponding results file (e.g. if biogo-igor failed, check
	`/path/to/results/biogo-igor/myconfig.results`) which is really just the
	stderr (and usually stdout too) of the benchmark. You can also try to re-run
	it yourself with the output of `-shell`.

	## Memory Requirements

	These benchmarks generally try to stress the Go runtime in interesting ways, and
	some may end up with very large heaps. Therefore, it's recommended to run the
	suite on a system with at least 16 GiB of RAM available to minimize the chance
	that results are lost due to an out-of-memory error.

	## Configuration Format

	The configuration is TOML-based and a more detailed description of fields may
	be found in the help docs for the `run` subcommand:

	```sh
	$ ./sweet help run
	```

	## Results Format

	Results are produced into a single directory containing each benchmark as a
	sub-directory. Within each sub-directory is one file per configuration
	containing the stderr (and usually combined stdout) of the benchmark run,
	which also doubles as the benchmark output format.

	All results are reported in the standard Go testing package format, such that
	results may be compared using the
	[benchstat](https://godoc.org/golang.org/x/perf/cmd/benchstat) tool.

	Results then may also be composed together for easy viewing. For example, if
	one runs sweet with two configurations named `config1` and `config2`, then to
	quickly compare all results, do:

	```sh
	$ cat results/*/config1.results > config1.results
	$ cat results/*/config2.results > config2.results
	$ benchstat config1.results config2.results
	```

	## Noise

	This benchmark suite tries to keep noise low in measurements where possible.
	* Each measurement is taken against a fresh OS process.
	* Benchmarks have been modified to reduce noise from the input.
	* All inputs are deterministic, including implicit inputs, such as querying an
	existing database.
	* Inputs are loaded into memory when possible instead of streamed from disk.
	* The suite mitigates external effects we can control (e.g. the suite is aware
	of its co-tenancy with the benchmark and throttles itself when the benchmarks
	are running).

	### Tips for Reducing Noise

	* Sweet should be run on a dedicated machine where a [perflock
	daemon](https://github.com/aclements/perflock) is running (to avoid noise due
	to CPU throttling).
	* Avoid running these benchmarks in cloud environments if possible. Generally
	the noise inherent to those environments can skew A/B tests and hide small
	changes in performance. See [this paper](https://peerj.com/preprints/3507.pdf)
	for more details. Try to use dedicated hardware instead.

	Do not compare results produced by separate invocations of the `sweet` tool.