title: Watchflakes

Watchflakes is a program that triages apparent test flakes on the build.golang.org dashboards.

An apparent test flake is a failure that:

  • is not on a completely failing builder.
  • is not on an excluded builder.
  • is not running a commit that failed on 4 or more builders.
  • is not part of a run of 4 or more failing commits on its builder.

Watchflakes posts every apparent test flake to an issue in the Test Flakes project.

Every issue description in the Test Flakes project starts with a pattern for the failures relevant to that issue: For example, the markdown for #55260's description starts with:

post <- pkg == "cmd/go" && test == "" && `unexpected files left in tmpdir`

Watchflakes matches every apparent test flake against the patterns in the issues:

  • If a flake matches a pattern in an issue, it is posted to that issue.
  • If a flake matches a pattern in multiple issues, it is posted to the lowest-numbered issue.
  • If a flake does not match a pattern in any issue, watchflakes creates a new issue with a pattern matching the package and test case that failed.

The newly created issue's pattern is often too broad and should be edited to make it more specific to the actual failure. Sending a failure to the lowest-numbered matching issue ensures that creating a broad default pattern for a new failure does not “steal” failures from earlier issues, nor does it spam the new issue with unrelated failures in the same test that are already separately tracked.

Watchflakes places newly created issues in the Test Flakes project and adds the NeedsInvestigation label. These issues start out with no status (not Active, not Done). Issues with no status need to be inspected by a person, who should usually refine the pattern to capture the salient information about the failure. Issues that have been checked can then be moved to Active. GitHub automatically moves issues from Active to Done when they are closed.

Watchflakes considers issues of any status when matching a new failure. If it finds a new failure for a closed issue, it will post the failure and reopen the issue. So it is okay to close an issue when a fix lands, instead of having to wait a few weeks to see if the failure is really gone: if a new failure arrives, the issue will be reopened automatically.

Watchflakes maintains no state of its own: all the state is in the GitHub issues. Every time it runs, it considers the past 60 days of build dashboard failures and makes sure that every apparent flake is accounted for in the Test Flakes project. If a failure matching an issue has already been posted to that issue, watchflakes doesn‘t post it again, of course. And if an issue is edited to update its pattern to exclude certain failures, watchflakes doesn’t remove its old posts, but it does look for a different matching issue for those failures, including possibly creating a new one.


The watchflakes stanza in each issue must appear at the top of the issue description. It must be a code block (either fenced with ``` or indented), and the first line must be #!watchflakes, to keep watchflakes from misinterpreted unrelated code blocks.

The rest of the block is a small watchflakes script. Comments to the end of the line are introduced with #. The script is a sequence of rules, each of which has the form action <- pattern (send matches for pattern to the action).


The actions are:

  • post posts the failure to the issue in which the script appears.
  • skip ignores the failure, throwing it on the floor. This action should be used only rarely (for example, to set policy like in #55166).
  • default is a lower-priority version of post. If an issue has a post or skip matching the failure, watchflakes does that instead. But if there are no other matches, watchflakes considers the default pattern matches. (And then if there aren't any default matches, watchflakes creates a new issue.)


The input to the pattern is a record with named fields, each of which has a string value:

  • pkg is the full import path of the package that failed to build or that failed its test.

  • test is the name of the test function in the package that failed.

  • mode is build or test depending on whether this is a build failure or a test failure.

  • output is the output from the failing test. This output stops just before the final FAIL line printed when the test binary exits. It does not include output from other test cases that also failed in the same run, nor any context that was printed by all.bash or the buildlet before the test started.

  • log is the entire failed build log.

  • snippet is the shortened form of output that will be posted to the issue itself. Matches should almost always use output instead.

  • builder is the name of the builder that ran the test (like dragonfly-amd64-622).

  • repo is the name of the repo being tested (go, net, tools, ...).

  • goos is the GOOS value (linux, windows, ...).

  • goarch is the GOARCH value (amd64, mips64le, ...).

  • date is the date of the commit being tested, in the form 2006-01-02T15:04:05. There is no date comparison logic; use string comparisons instead. Comparing dates should be used rarely.

  • section is the section of the build log in which the failure occurred. In all.bash output, the section is introduced by #####, and each of the Building lines during bootstrap is considered its own section as well. In subrepos, the :: Running lines each introduce a section named for the go command being run (for example go test golang.org/x/tools/...).

    Most patterns don't need to use section. It is most helpful for tests in the main repo that rerun tests with an alternate execution environment.


The pattern is a boolean expression in a Go-like syntax allowing ||, &&, !, (, and ) for building complex expressions; ==, !=, <, <=, >, and >= for comparing fields against against string literals; and ~ and !~ for matching against regular expressions.

All string comparisons must have a field name on the left and a double-quoted string literal on the right, as in builder == "linux-amd64-alpine" or `goos == "

All regular expression matches must have a field name on the left and a back-quoted string literal on the right, as in builder ~ `corellium`.

A back-quoted string literal by itself is taken to be a comparison against the output field, which is appropriate for the vast majority of regular expressions in patterns.


Putting this all together, here are some example scripts.

post <- pkg == "net/http" && test == "TestHandlerAbortRacesBodyRead"

This script in #55277 was created automatically by watchflakes in response to a build run that failed in http.TestHandlerAbortRacesBodyRead. The specific failure that prompted the issue creation was a timeout. If more failures with different root cause were found in that test, it might become appropriate to add && `panic: test timed out` or otherwise refine the pattern.

post <- goos == "openbsd" && `unlinkat .*: operation not permitted`

This script in #49751 collects failures on openbsd caused by unexpected EPERM errors from os.Remove calling unlinkat. These failures cause problems in a variety of tests, so there is no condition on pkg or test.

post <- pkg ~ `^cmd/go` && `appspot.com.*: 503`

This script in #54608 tracks network problems with 503 responses from appspot.com in any tests in the cmd/go/... package hierarchy, not just cmd/go itself.

post <- goos == "windows" &&
        (`dnsquery: DNS server failure` || `getaddrinfow: This is usually a temporary error`)

This script in #55165 matches specific DNS failures in any test on builders running Windows.

post <- builder == "darwin-arm64-12" && pkg == "" && test == ""

This script in #55312 was created automatically by watchflakes to track failures on the darwin-arm64-12 builder that happen before a specific package test can run.

# note: sometimes the URL is printed with one /
default <- `(Get|read) "https://?(goproxy.io|proxy.golang.com.cn|goproxy.cn)`

This script in #55163 matches errors using certain non-standard Go proxies. It uses default to allow other issues to take ownership of more specific failures caused by these proxies. Failures not matching other issues go to #55163 instead of creating new issues.

default <- `: internal compiler error:`

This script in #55257 matches compiler failures in any build, no matter what package or repo is being tested. It uses default for the same reasons as the previous example: so that issues matching specific compiler errors can still be filed, but failures not matching other issues are grouped into #55257 instead of creating new issues assigned to the specific test that happened to trigger the problem.