| --- |
| title: Test Failures |
| --- |
| |
| If you notice a failure in a test in the Go project, what should you do? |
| |
| ## Goals of testing |
| |
| The goal of writing (and running) tests for a Go package is to learn about the |
| behavior of that package and its dependencies. |
| |
| A test failure for a Go package may give us information about: |
| * implementation defects in the package or its dependencies, |
| * mistaken assumptions about the package API, |
| * bugs in the test itself (such as invalid assumptions about timing), |
| * unexpectedly high resource requirements (such as RAM or CPU time), or |
| * bugs in the underlying platform or defects in the test infrastructure (which |
| may need to be escalated or worked around). |
| |
| In some cases, the cause of a test failure might not be clear: it might be |
| caused by _more than one_ of the above conditions. Much like repeating a |
| scientific experiment, allowing a test to fail multiple times can sometimes |
| provide more information from the specific pattern of failures. |
| |
| However, if a test fails without telling us anything new, then the test is not |
| fulfilling its purpose. |
| |
| ## Finding test failures |
| |
| A test failure is typically noticed from: |
| * the [Go build dashboard](https://build.golang.org), especially during [builder |
| triage](https://go.dev/issue/52653); |
| * TryBot or [SlowBot](/wiki/SlowBots) failures on a pending |
| change; |
| * running `go test` on a specific package or packages, either working within a |
| Go project repository or as part of (say) `go test all` in a user's own |
| module; |
| * or running `all.bash` or `all.bat` when [installing from |
| source](https://go.dev/doc/install/source#install) or [testing a contributed |
| change](https://go.dev/doc/contribute#testing). |
| |
| ## Triaging a test failure |
| |
| Once we have noticed a failure, we need to _triage_ it. |
| The goal of triage is to identify: |
| |
| 1. Is the information from the failure new? |
| 2. Who is best equipped to analyze the new information from the failure? |
| |
| ### Identifying new information |
| |
| Start by searching the [open issues](https://go.dev/issue) for key details from |
| the failure, such as the name of the failing test and/or other distinctive |
| fragments of the error text (such as error codes). |
| |
| If you find an existing issue, first check the issue discussion to see whether |
| the failure has already been fixed, and whether the information from your failure |
| contributes relevant new information. If so, _comment on it_ with details: |
| * describe what Go version you were testing (`go version`) |
| * how and where you were running the test, such as: |
| * `go env` output |
| * your machine and OS configuration |
| * your network configuration and status |
| * whether or how often you are able to reproduce the failure. |
| |
| Ideally, attach or link to the full test logs (possibly in a `<details>` block). |
| |
| If you don't find an existing issue, file a new one. |
| |
| ### Filing an issue |
| |
| Paste in enough of the test log for future reporters to be able to search for |
| the issue, including the name of the test and any distinctive parts of its log |
| output. (If the test log is long — such as if it contains a large goroutine |
| dump — consider posting a shorter excerpt and/or enclosing the complete failure |
| message in a `<details>` block.) |
| |
| You can use the |
| [`fetchlogs`](https://pkg.go.dev/golang.org/x/build/cmd/fetchlogs) and |
| [`greplogs`](https://pkg.go.dev/golang.org/x/build/cmd/greplogs) tools to |
| search for similar failures in the build dashboard: |
| |
| ``` |
| # download recent logs |
| fetchlogs -n 1024 -repo all |
| ``` |
| ``` |
| # search logs for some regexp describing the failure message |
| greplogs -l -e $FAILURE_REGEXP |
| ``` |
| |
| If the failure appears to be specific to a package, consult |
| https://dev.golang.org/owners to find the maintainers of that package and mention |
| them on the issue. (If no owners are listed for the package, check the recent |
| history of the package at https://cs.opensource.google/go and/or escalate to |
| someone who can help to identify a relevant owner — and consider updating the |
| [owners |
| table](https://cs.opensource.google/go/x/build/+/master:devapp/owners/table.go) |
| with what you learn!) |
| |
| If the failure appears to be specific to a `GOOS` or `GOARCH` label the issue |
| with the corresponding GOOS and/or GOARCH labels, and mention the relevant |
| subteam(s) of |
| [@golang/port-maintainers](https://github.com/orgs/golang/teams/port-maintainers) |
| on the issue. |
| |
| If the failure appears to affect at least one |
| [first class port](/wiki/PortingPolicy#first-class-ports), |
| add the issue to the current release milestone and label it |
| [`release-blocker`](https://github.com/golang/go/labels/release-blocker). |
| Otherwise, add the issue to the `Backlog` milestone. |
| |
| If the failure appears to be specific to a builder (such as a network |
| connectivity issue, or a platform bug requiring a system update), consult |
| [`x/build/dashboard/builders.go`](https://cs.opensource.google/go/x/build/+/master:dashboard/builders.go) |
| to find the maintainer for that builder and mention them on the issue. |
| (For builders without an explicit maintainer listed, instead mention |
| the [@golang/release](https://github.com/orgs/golang/teams/release) team.) |
| |
| ## Addressing a test failure |
| |
| Once an issue has been filed for a test failure, the relevant package, port, |
| and/or builder maintainers should examine the information gleaned from the test |
| failure and decide how to _address_ it, by doing one or more of: |
| |
| * _revert_ a change to the code or the test infrastructure believed to have |
| introduced the problem, |
| * _fix_ (or apply a workaround for) the root cause of the failure, |
| * _collect_ more information, by subscribing to issue updates and/or running |
| more tests, |
| * _report_ an underlying defect in a dependency, platform, or test |
| infrastructure, and/or |
| * _deprioritize_ the failure, by skipping the failure on affected platforms (or |
| marking those platforms as broken) and then moving the issue to a future or |
| `Backlog` milestone and/or removing the `release-blocker` label. |
| |
| When a maintainer decides to deprioritize a test failure, they determine that |
| _additional failures of the test will not provide useful new information_. At |
| that point, the test no longer fulfills its purpose, and the maintainer should |
| suppress the failure — typically by adding a call to |
| [`testenv.SkipFlaky`](https://pkg.go.dev/internal/testenv#SkipFlaky) or |
| [`t.Skipf`](https://pkg.go.dev/testing#F.Skipf). |
| |
| ### Skipping a test failure |
| |
| When we add a call to `testenv.SkipFlaky`, our goal is to eliminate failure |
| modes that do not provide new information while still preserving as much of the |
| value of the test as is feasible. |
| |
| * If the observed failure is only one of several possible failure modes for |
| the test, skip the test only for that failure mode. |
| |
| * For example, if the error is always something specific like |
| `syscall.ECONNRESET`, use `errors.Is` to check for that specific error. |
| |
| * If the failure is believed to affect all versions of a particular `GOOS` |
| and/or `GOARCH`, or the affected versions cannot be identified, check against |
| `runtime.GOOS` and/or `runtime.GOARCH` and skip only the affected platform. |
| |
| * If the failure is due to a bug on a _specific version_ of a platform, skip the |
| test based on `testenv.Builder` or the `GO_BUILDER_NAME` environment variable. |
| (If the test fails for external Go users, they have the option to upgrade to |
| an unaffected version of the platform — and they probably ought to see the |
| test failure to find out that the bug exists!) |
| |
| * Also consider adding another environment variable that users and contributors |
| can set to acknowledge the bug and suppress the failure. |
| |
| ### Marking a builder or port as broken |
| |
| A platform bug or bug in a core package (such as `os`, `net`, or `runtime`) may |
| impact so many tests that the failures are not feasible to skip, or may manifest |
| as a failure at compile or link time. If such a bug occurs, the options are more |
| limited: if we cannot revert a change or fix or work around the root cause, and |
| don't need to collect more information, we can only deprioritize the failure by |
| marking the _entire builder or platform_ as broken. |
| |
| To mark a builder as broken, edit its configuration in |
| [`x/build/dashboard/builders.go`](https://cs.opensource.google/go/x/build/+/master:dashboard/builders.go) |
| to add an issue in the `KnownIssue` field; note that builders with known issues |
| will generally be skipped during dashboard triage. |
| |
| A broken builder for a |
| [first class port](/wiki/PortingPolicy#first-class-ports) |
| should have its known issue(s) labeled `release-blocker`, pending a decision |
| to either fix the builder or drop support for the affected version of the |
| platform. |
| |
| If all of the builders for a secondary port are broken, the port itself may be |
| considered broken. [Discussion |
| #53060](https://github.com/golang/go/discussions/53060) |
| aims to resolve the question of how broken secondary ports should be handled. |