TestFailures.md - wiki - Git at Google

 ---
 title: Test Failures
 ---

 If you notice a failure in a test in the Go project, what should you do?

 ## Goals of testing

 The goal of writing (and running) tests for a Go package is to learn about the
 behavior of that package and its dependencies.

 A test failure for a Go package may give us information about:
 * implementation defects in the package or its dependencies,
 * mistaken assumptions about the package API,
 * bugs in the test itself (such as invalid assumptions about timing),
 * unexpectedly high resource requirements (such as RAM or CPU time), or
 * bugs in the underlying platform or defects in the test infrastructure (which
   may need to be escalated or worked around).

 In some cases, the cause of a test failure might not be clear: it might be
 caused by _more than one_ of the above conditions. Much like repeating a
 scientific experiment, allowing a test to fail multiple times can sometimes
 provide more information from the specific pattern of failures.

 However, if a test fails without telling us anything new, then the test is not
 fulfilling its purpose.

 ## Finding test failures

 A test failure is typically noticed from:
 * the [Go build dashboard](https://build.golang.org), especially during [builder
   triage](https://go.dev/issue/52653);
 * TryBot or [SlowBot](/wiki/SlowBots) failures on a pending
   change;
 * running `go test` on a specific package or packages, either working within a
   Go project repository or as part of (say) `go test all` in a user's own
   module;
 * or running `all.bash` or `all.bat` when [installing from
   source](https://go.dev/doc/install/source#install) or [testing a contributed
   change](https://go.dev/doc/contribute#testing).

 ## Triaging a test failure

 Once we have noticed a failure, we need to _triage_ it.
 The goal of triage is to identify:

 1. Is the information from the failure new?
 2. Who is best equipped to analyze the new information from the failure?

 ### Identifying new information

 Start by searching the [open issues](https://go.dev/issue) for key details from
 the failure, such as the name of the failing test and/or other distinctive
 fragments of the error text (such as error codes).

 If you find an existing issue, first check the issue discussion to see whether
 the failure has already been fixed, and whether the information from your failure
 contributes relevant new information. If so, _comment on it_ with details:
 * describe what Go version you were testing (`go version`)
 * how and where you were running the test, such as:
    * `go env` output
    * your machine and OS configuration
    * your network configuration and status
 * whether or how often you are able to reproduce the failure.

 Ideally, attach or link to the full test logs (possibly in a `<details>` block).

 If you don't find an existing issue, file a new one.

 ### Filing an issue

 Paste in enough of the test log for future reporters to be able to search for
 the issue, including the name of the test and any distinctive parts of its log
 output. (If the test log is long — such as if it contains a large goroutine
 dump — consider posting a shorter excerpt and/or enclosing the complete failure
 message in a `<details>` block.)

 You can use the
 [`fetchlogs`](https://pkg.go.dev/golang.org/x/build/cmd/fetchlogs) and
 [`greplogs`](https://pkg.go.dev/golang.org/x/build/cmd/greplogs) tools to
 search for similar failures in the build dashboard:

 ```
 # download recent logs
 fetchlogs -n 1024 -repo all
 ```
 ```
 # search logs for some regexp describing the failure message
 greplogs -l -e $FAILURE_REGEXP
 ```

 If the failure appears to be specific to a package, consult
 https://dev.golang.org/owners to find the maintainers of that package and mention
 them on the issue. (If no owners are listed for the package, check the recent
 history of the package at https://cs.opensource.google/go and/or escalate to
 someone who can help to identify a relevant owner — and consider updating the
 [owners
 table](https://cs.opensource.google/go/x/build/+/master:devapp/owners/table.go)
 with what you learn!)

 If the failure appears to be specific to a `GOOS` or `GOARCH` label the issue
 with the corresponding GOOS and/or GOARCH labels, and mention the relevant
 subteam(s) of
 [@golang/port-maintainers](https://github.com/orgs/golang/teams/port-maintainers)
 on the issue.

 If the failure appears to affect at least one
 [first class port](/wiki/PortingPolicy#first-class-ports),
 add the issue to the current release milestone and label it
 [`release-blocker`](https://github.com/golang/go/labels/release-blocker).
 Otherwise, add the issue to the `Backlog` milestone.

 If the failure appears to be specific to a builder (such as a network
 connectivity issue, or a platform bug requiring a system update), consult
 [`x/build/dashboard/builders.go`](https://cs.opensource.google/go/x/build/+/master:dashboard/builders.go)
 to find the maintainer for that builder and mention them on the issue.
 (For builders without an explicit maintainer listed, instead mention
 the [@golang/release](https://github.com/orgs/golang/teams/release) team.)

 ## Addressing a test failure

 Once an issue has been filed for a test failure, the relevant package, port,
 and/or builder maintainers should examine the information gleaned from the test
 failure and decide how to _address_ it, by doing one or more of:

 * _revert_ a change to the code or the test infrastructure believed to have
   introduced the problem,
 * _fix_ (or apply a workaround for) the root cause of the failure,
 * _collect_ more information, by subscribing to issue updates and/or running
   more tests,
 * _report_ an underlying defect in a dependency, platform, or test
   infrastructure, and/or
 * _deprioritize_ the failure, by skipping the failure on affected platforms (or
   marking those platforms as broken) and then moving the issue to a future or
   `Backlog` milestone and/or removing the `release-blocker` label.

 When a maintainer decides to deprioritize a test failure, they determine that
 _additional failures of the test will not provide useful new information_. At
 that point, the test no longer fulfills its purpose, and the maintainer should
 suppress the failure — typically by adding a call to
 [`testenv.SkipFlaky`](https://pkg.go.dev/internal/testenv#SkipFlaky) or
 [`t.Skipf`](https://pkg.go.dev/testing#F.Skipf).

 ### Skipping a test failure

 When we add a call to `testenv.SkipFlaky`, our goal is to eliminate failure
 modes that do not provide new information while still preserving as much of the
 value of the test as is feasible.

 * If the observed failure is only one of several possible failure modes for
   the test, skip the test only for that failure mode.

    * For example, if the error is always something specific like
      `syscall.ECONNRESET`, use `errors.Is` to check for that specific error.

 * If the failure is believed to affect all versions of a particular `GOOS`
   and/or `GOARCH`, or the affected versions cannot be identified, check against
   `runtime.GOOS` and/or `runtime.GOARCH` and skip only the affected platform.

 * If the failure is due to a bug on a _specific version_ of a platform, skip the
   test based on `testenv.Builder` or the `GO_BUILDER_NAME` environment variable.
   (If the test fails for external Go users, they have the option to upgrade to
   an unaffected version of the platform — and they probably ought to see the
   test failure to find out that the bug exists!)

    * Also consider adding another environment variable that users and contributors
      can set to acknowledge the bug and suppress the failure.

 ### Marking a builder or port as broken

 A platform bug or bug in a core package (such as `os`, `net`, or `runtime`) may
 impact so many tests that the failures are not feasible to skip, or may manifest
 as a failure at compile or link time. If such a bug occurs, the options are more
 limited: if we cannot revert a change or fix or work around the root cause, and
 don't need to collect more information, we can only deprioritize the failure by
 marking the _entire builder or platform_ as broken.

 To mark a builder as broken, edit its configuration in
 [`x/build/dashboard/builders.go`](https://cs.opensource.google/go/x/build/+/master:dashboard/builders.go)
 to add an issue in the `KnownIssue` field; note that builders with known issues
 will generally be skipped during dashboard triage.

 A broken builder for a
 [first class port](/wiki/PortingPolicy#first-class-ports)
 should have its known issue(s) labeled `release-blocker`, pending a decision
 to either fix the builder or drop support for the affected version of the
 platform.

 If all of the builders for a secondary port are broken, the port itself may be
 considered broken. [Discussion
 #53060](https://github.com/golang/go/discussions/53060)
 aims to resolve the question of how broken secondary ports should be handled.
	---
	title: Test Failures
	---

	If you notice a failure in a test in the Go project, what should you do?

	## Goals of testing

	The goal of writing (and running) tests for a Go package is to learn about the
	behavior of that package and its dependencies.

	A test failure for a Go package may give us information about:
	* implementation defects in the package or its dependencies,
	* mistaken assumptions about the package API,
	* bugs in the test itself (such as invalid assumptions about timing),
	* unexpectedly high resource requirements (such as RAM or CPU time), or
	* bugs in the underlying platform or defects in the test infrastructure (which
	may need to be escalated or worked around).

	In some cases, the cause of a test failure might not be clear: it might be
	caused by _more than one_ of the above conditions. Much like repeating a
	scientific experiment, allowing a test to fail multiple times can sometimes
	provide more information from the specific pattern of failures.

	However, if a test fails without telling us anything new, then the test is not
	fulfilling its purpose.

	## Finding test failures

	A test failure is typically noticed from:
	* the [Go build dashboard](https://build.golang.org), especially during [builder
	triage](https://go.dev/issue/52653);
	* TryBot or [SlowBot](/wiki/SlowBots) failures on a pending
	change;
	* running `go test` on a specific package or packages, either working within a
	Go project repository or as part of (say) `go test all` in a user's own
	module;
	* or running `all.bash` or `all.bat` when [installing from
	source](https://go.dev/doc/install/source#install) or [testing a contributed
	change](https://go.dev/doc/contribute#testing).

	## Triaging a test failure

	Once we have noticed a failure, we need to _triage_ it.
	The goal of triage is to identify:

	1. Is the information from the failure new?
	2. Who is best equipped to analyze the new information from the failure?

	### Identifying new information

	Start by searching the [open issues](https://go.dev/issue) for key details from
	the failure, such as the name of the failing test and/or other distinctive
	fragments of the error text (such as error codes).

	If you find an existing issue, first check the issue discussion to see whether
	the failure has already been fixed, and whether the information from your failure
	contributes relevant new information. If so, _comment on it_ with details:
	* describe what Go version you were testing (`go version`)
	* how and where you were running the test, such as:
	* `go env` output
	* your machine and OS configuration
	* your network configuration and status
	* whether or how often you are able to reproduce the failure.

	Ideally, attach or link to the full test logs (possibly in a `<details>` block).

	If you don't find an existing issue, file a new one.

	### Filing an issue

	Paste in enough of the test log for future reporters to be able to search for
	the issue, including the name of the test and any distinctive parts of its log
	output. (If the test log is long — such as if it contains a large goroutine
	dump — consider posting a shorter excerpt and/or enclosing the complete failure
	message in a `<details>` block.)

	You can use the
	[`fetchlogs`](https://pkg.go.dev/golang.org/x/build/cmd/fetchlogs) and
	[`greplogs`](https://pkg.go.dev/golang.org/x/build/cmd/greplogs) tools to
	search for similar failures in the build dashboard:

	```
	# download recent logs
	fetchlogs -n 1024 -repo all
	```
	```
	# search logs for some regexp describing the failure message
	greplogs -l -e $FAILURE_REGEXP
	```

	If the failure appears to be specific to a package, consult
	https://dev.golang.org/owners to find the maintainers of that package and mention
	them on the issue. (If no owners are listed for the package, check the recent
	history of the package at https://cs.opensource.google/go and/or escalate to
	someone who can help to identify a relevant owner — and consider updating the
	[owners
	table](https://cs.opensource.google/go/x/build/+/master:devapp/owners/table.go)
	with what you learn!)

	If the failure appears to be specific to a `GOOS` or `GOARCH` label the issue
	with the corresponding GOOS and/or GOARCH labels, and mention the relevant
	subteam(s) of
	[@golang/port-maintainers](https://github.com/orgs/golang/teams/port-maintainers)
	on the issue.

	If the failure appears to affect at least one
	[first class port](/wiki/PortingPolicy#first-class-ports),
	add the issue to the current release milestone and label it
	[`release-blocker`](https://github.com/golang/go/labels/release-blocker).
	Otherwise, add the issue to the `Backlog` milestone.

	If the failure appears to be specific to a builder (such as a network
	connectivity issue, or a platform bug requiring a system update), consult
	[`x/build/dashboard/builders.go`](https://cs.opensource.google/go/x/build/+/master:dashboard/builders.go)
	to find the maintainer for that builder and mention them on the issue.
	(For builders without an explicit maintainer listed, instead mention
	the [@golang/release](https://github.com/orgs/golang/teams/release) team.)

	## Addressing a test failure

	Once an issue has been filed for a test failure, the relevant package, port,
	and/or builder maintainers should examine the information gleaned from the test
	failure and decide how to _address_ it, by doing one or more of:

	* _revert_ a change to the code or the test infrastructure believed to have
	introduced the problem,
	* _fix_ (or apply a workaround for) the root cause of the failure,
	* _collect_ more information, by subscribing to issue updates and/or running
	more tests,
	* _report_ an underlying defect in a dependency, platform, or test
	infrastructure, and/or
	* _deprioritize_ the failure, by skipping the failure on affected platforms (or
	marking those platforms as broken) and then moving the issue to a future or
	`Backlog` milestone and/or removing the `release-blocker` label.

	When a maintainer decides to deprioritize a test failure, they determine that
	_additional failures of the test will not provide useful new information_. At
	that point, the test no longer fulfills its purpose, and the maintainer should
	suppress the failure — typically by adding a call to
	[`testenv.SkipFlaky`](https://pkg.go.dev/internal/testenv#SkipFlaky) or
	[`t.Skipf`](https://pkg.go.dev/testing#F.Skipf).

	### Skipping a test failure

	When we add a call to `testenv.SkipFlaky`, our goal is to eliminate failure
	modes that do not provide new information while still preserving as much of the
	value of the test as is feasible.

	* If the observed failure is only one of several possible failure modes for
	the test, skip the test only for that failure mode.

	* For example, if the error is always something specific like
	`syscall.ECONNRESET`, use `errors.Is` to check for that specific error.

	* If the failure is believed to affect all versions of a particular `GOOS`
	and/or `GOARCH`, or the affected versions cannot be identified, check against
	`runtime.GOOS` and/or `runtime.GOARCH` and skip only the affected platform.

	* If the failure is due to a bug on a _specific version_ of a platform, skip the
	test based on `testenv.Builder` or the `GO_BUILDER_NAME` environment variable.
	(If the test fails for external Go users, they have the option to upgrade to
	an unaffected version of the platform — and they probably ought to see the
	test failure to find out that the bug exists!)

	* Also consider adding another environment variable that users and contributors
	can set to acknowledge the bug and suppress the failure.

	### Marking a builder or port as broken

	A platform bug or bug in a core package (such as `os`, `net`, or `runtime`) may
	impact so many tests that the failures are not feasible to skip, or may manifest
	as a failure at compile or link time. If such a bug occurs, the options are more
	limited: if we cannot revert a change or fix or work around the root cause, and
	don't need to collect more information, we can only deprioritize the failure by
	marking the _entire builder or platform_ as broken.

	To mark a builder as broken, edit its configuration in
	[`x/build/dashboard/builders.go`](https://cs.opensource.google/go/x/build/+/master:dashboard/builders.go)
	to add an issue in the `KnownIssue` field; note that builders with known issues
	will generally be skipped during dashboard triage.

	A broken builder for a
	[first class port](/wiki/PortingPolicy#first-class-ports)
	should have its known issue(s) labeled `release-blocker`, pending a decision
	to either fix the builder or drop support for the affected version of the
	platform.

	If all of the builders for a secondary port are broken, the port itself may be
	considered broken. [Discussion
	#53060](https://github.com/golang/go/discussions/53060)
	aims to resolve the question of how broken secondary ports should be handled.