blob: 401867732f5d22c78dd7210ad532f42a66fbe3e1 [file] [log] [blame] [view]
# Proposal: Less Error-Prone Loop Variable Scoping
David Chase \
Russ Cox \
May 2023
Discussion at https://go.dev/issue/60078. \
Pre-proposal discussion at https://go.dev/issue/56010.
## Abstract
Last fall, we had a GitHub discussion at #56010 about changing for
loop variables declared with `:=` from one-instance-per-loop to
one-instance-per-iteration. Based on that discussion and further work
on understanding the implications of the change, we propose that we
make the change in an appropriate future Go version, perhaps Go 1.22
if the stars align, and otherwise a later version.
## Background
This proposal is about changing for loop variable scoping semantics,
so that loop variables are per-iteration instead of per-loop. This
would eliminate accidental sharing of variables between different
iterations, which happens far more than intentional sharing does. The
proposal would fix [#20733](https://go.dev/issue/20733).
Briefly, the problem is that loops like this one don’t do what they
look like they do:
var ids []*int
for i := 0; i < 10; i++ {
ids = append(ids, &i)
}
That is, this code has a bug. After this loop executes, `ids` contains
10 identical pointers, each pointing at the value 10, instead of 10
distinct pointers to 0 through 9. This happens because the item
variable is per-loop, not per-iteration: `&i` is the same on every
iteration, and `item` is overwritten on each iteration. The usual fix
is to write this instead:
var ids []*int
for i := 0; i < 10; i++ {
i := i
ids = append(ids, &i)
}
This bug also often happens in code with closures that capture the
address of item implicitly, like:
var prints []func()
for _, v := range []int{1, 2, 3} {
prints = append(prints, func() { fmt.Println(v) })
}
for _, print := range prints {
print()
}
This code prints 3, 3, 3, because all the closures print the same v,
and at the end of the loop, v is set to 3. Note that there is no
explicit &v to signal a potential problem. Again the fix is the same:
add v := v.
The same bug exists in this version of the program, with the same fix:
var prints []func()
for i := 1; i <= 3; i++ {
prints = append(prints, func() { fmt.Println(i) })
}
for _, print := range prints {
print()
}
Another common situation where this bug arises is in subtests using t.Parallel:
func TestAllEvenBuggy(t *testing.T) {
testCases := []int{1, 2, 4, 6}
for _, v := range testCases {
t.Run("sub", func(t *testing.T) {
t.Parallel()
if v&1 != 0 {
t.Fatal("odd v", v)
}
})
}
}
This test passes, because each all four subtests check that 6 (the
final test case) is even.
Goroutines are also often involved in this kind of bug, although as
these examples show, they need not be. See also the [Go FAQ
entry](https://go.dev/doc/faq#closures_and_goroutines).
Russ [talked at Gophercon once](https://go.dev/blog/toward-go2#explaining-problems)
about how we need agreement about the existence of a problem before we
move on to solutions. When we examined this issue in the run up to Go 1,
it did not seem like enough of a problem. The general consensus was
that it was annoying but not worth changing. Since then, we suspect
every Go programmer in the world has made this mistake in one program
or another.
We have talked for a long time about redefining these semantics, to
make loop variables _per-iteration_ instead of _per-loop_. That is,
the change would effectively be to add an implicit “x := x” at the
start of every loop body for each iteration variable x, just like
people do manually today. Making this change would remove the bugs
from the programs above.
This proposal does exactly that. Using the `go` version lines in
`go.mod` files, it only applies the new semantics to new programs, so
that existing programs are guaranteed to continue to execute exactly
as before.
Before writing this proposal, we collected feedback in a GitHub
Discussion in October 2022, [#56010](https://go.dev/issue/56010). The
vast majority of the feedback was positive, although a couple people
did say they see no problem with the current semantics and discourage
a change. Here are a few representative quotes from that discussion:
> One thing to notice in this discussion is that even after having this
> problem explained multiple times by different people multiple
> developers were still having trouble understanding exactly what caused
> it and how to avoid it, and even the people that understood the
> problem in one context often failed to notice it would affect other
> types of for loops.
>
> So we could argue that the current semantics make the learning curve for Go steeper.
>
> PS: I have also had problems with this multiple times, once in
> production, thus, I am very in favor of this change even considering
> the breaking aspect of it.
>
> — [@VinGarcia](https://github.com/golang/go/discussions/56010#discussioncomment-3789371)
> This exactly matches my experience. It's relatively easy to understand
> the first example (taking the same address each time), but somewhat
> trickier to understand in the closure/goroutine case. And even when
> you do understand it, one forgets (apparently even Russ forgets!). In
> addition, issues with this often don't show up right away, and then
> when debugging an issue, I find it always takes a while to realize
> that it's "that old loop variable issue again".
>
> — [@benhoyt](https://github.com/golang/go/discussions/56010#discussioncomment-3791004)
> Go's unusual loop semantics are a consistent source of problems and
> bugs in the real world. I've been a professional go developer for
> roughly six years, and I still get bit by this bug, and it's a
> consistent stumbling block for newer Go programmers. I would strongly
> encourage this change.
>
> — [@efronlicht](https://github.com/golang/go/discussions/56010#discussioncomment-3798957)
> I really do not see this as a useful change. These changes always have
> the best intentions, but the reality is that the language works just
> fine now. This well intended change slowly creep in over time, until
> you wind up with the C++ language yet again. If someone can't
> understand a relatively simple design decision like this one, they are
> not going to understand how to properly use channels and other
> language features of Go.
>
> Burying a change to the semantics of the language in go.mod is absolutely bonkers.
>
> — [@hydrogen18](https://github.com/golang/go/discussions/56010#discussioncomment-3851670)
Overall, the discussion included 72 participants and 291 total
comments and replies. As a rough measure of user sentiment, the
discussion post received 671 thumbs up, 115 party, and 198 heart emoji
reactions, and not a single thumbs down reaction.
Russ also presented the idea of making this change at GopherCon 2022,
shortly after the discussion, and then again at Google Open Source
Live's Go Day 2022. Feedback from both talks was entirely positive:
not a single person suggested that we should not make this change.
## Proposal
We propose to change for loop scoping in a future version of Go to be
per-iteration instead of per-loop. For the purposes of this document,
we are calling that version Go 1.30, but the change would land in
whatever version of Go it is ready for. The earliest version of Go
that could include the change would be Go 1.22.
This change includes four major parts:
(1) the language specification,
(2) module-based and file-based language version selection,
(3) tooling to help users in the transition,
(4) updates to other parts of the Go ecosystem.
The implementation of these parts spans the compiler, the `go` command,
the `go` `vet` command, and other tools.
### Language Specification
In <https://go.dev/ref/spec#For_clause>, the text currently reads:
> The init statement may be a short variable declaration, but the post
> statement must not. Variables declared by the init statement are
> re-used in each iteration.
This would be replaced with:
> The init statement may be a short variable declaration (`:=`), but the
> post statement must not. Each iteration has its own separate declared
> variable (or variables). The variable used by the first iteration is
> declared by the init statement. The variable used by each subsequent
> iteration is declared implicitly before executing the post statement
> and initialized to the value of the previous iteration's variable at
> that moment.
>
> var prints []func()
> for i := 0; i < 3; i++ {
> prints = append(prints, func() { println(i) })
> }
> for _, p := range prints {
> p()
> }
>
> // Output:
> // 0
> // 1
> // 2
>
> Prior to Go 1.30, iterations shared one set of variables instead of
> having their own separate variables.
(Remember that in this document, we are using Go 1.30 as the placeholder
for the release that will ship the new semantics.)
For precision in this proposal, the spec example would compile to a
form semantically equivalent to this Go program:
{
i_outer := 0
first := true
for {
i := i_outer
if first {
first = false
} else {
i++
}
if !(i < 3) {
break
}
prints = append(prints, func() { println(i) })
i_outer = i
}
}
Of course, a compiler can write the code less awkwardly, since it need
not limit the translation output to valid Go source code. In
particular a compiler is likely to have the concept of the current
memory location associated with `i` and be able to update it just
before the post statement.
In <https://go.dev/ref/spec#For_range>, the text currently reads:
> The iteration variables may be declared by the "range" clause using a
> form of short variable declaration (`:=`). In this case their types
> are set to the types of the respective iteration values and their
> scope is the block of the "for" statement; they are re-used in each
> iteration. If the iteration variables are declared outside the "for"
> statement, after execution their values will be those of the last
> iteration.
This would be replaced with:
> The iteration variables may be declared by the "range" clause using a
> form of short variable declaration (`:=`). In this case their types
> are set to the types of the respective iteration values and their
> scope is the block of the "for" statement; each iteration has its own
> separate variables. If the iteration variables are declared outside
> the "for" statement, after execution their values will be those of the
> last iteration.
>
> var prints []func()
> for _, s := range []string{"a", "b", "c"} {
> prints = append(prints, func() { println(s) })
> }
> for _, p := range prints {
> p()
> }
>
> // Output:
> // a
> // b
> // c
>
> Prior to Go 1.30, iterations shared one set of variables instead of
> having their own separate variables.
For precision in this proposal, the spec example would compile to a
form semantically equivalent to this Go program:
{
var s_outer string
for _, s_outer = range []string{"a", "b", "c"} {
s := s_outer
prints = append(prints, func() { println(s) })
}
}
Note that in both 3-clause and range forms, this proposal is a
complete no-op for loops with no `:=` in the loop header and loops
with no variable capture in the loop body. In particular, a loop like
the following example, modifying the loop variable during the loop
body, continues to execute as it always has:
for i := 0;; i++ {
if i >= len(s) || s[i] == '"' {
return s[:i]
}
if s[i] == '\\' { // skip escaped char, potentially a quote
i++
}
}
### Language Version Selection
The change in language specification will fix far more programs than
it breaks, but it may break a very small number of programs. To make
the potential breakage completely user controlled, the rollout would
decide whether to use the new semantics based on the `go` line in each
package’s `go.mod` file. This is the same line already used for
enabling language features; for example, to use generics in a package,
the `go.mod` must say `go 1.18` or later. As a special case, for this
proposal, we would use the `go` line for changing semantics instead of
for adding or removing a feature.
Modules that say `go 1.30` or later would have for loops using
per-iteration variables, while modules declaring earlier versions have
for loops with per-loop variables:
<img width="734" alt="Code in modules that say go 1.30 gets per-iteration variable semantics; code in modules that say earlier Go versions gets per-loop semantics." src="https://user-images.githubusercontent.com/104030/193599987-19d8f564-cb40-488e-beaa-5093a4823ee0.png">
This mechanism would allow the change to be [deployed
gradually](https://go.dev/talks/2016/refactor.article) in a given code
base. Each module can update to the new semantics independently,
avoiding a bifurcation of the ecosystem.
The [forward compatibility work in #57001](https://go.dev/issue/57001),
which will land in Go 1.21, ensures that Go 1.21 and later will not
attempt to compile code marked `go 1.30`. Even if this change lands in
Go 1.22, the previous (and only other supported) Go release would be
Go 1.21, which would understand not to compile `go 1.22` code. So code
opting in to the new loop semantics would never miscompile in older Go
releases, because it would not compile at all. If the changes were to
be slated for Go 1.22, it might make sense to issue a Go 1.20 point
release making its `go` command understand not to compile `go 1.22`
code. Strictly speaking, that point release is unnecessary, because if
Go 1.22 has been released, Go 1.20 is unsupported and we don't need to
worry about its behavior. But in practice people do use older Go
releases for longer than they are supported, and if they keep up with
point releases we can help them avoid this potential problem.
The forward compatibility work also allows a per-file language version
selection using `//go:build` directives. Specifically, if a file in a
`go 1.29` module says `//go:build go1.30`, it gets the Go 1.30
language semantics, and similarly if a file in a `go 1.30` module says
`//go:build go1.29`, it gets the Go 1.29 language semantics. This
general rule would apply to loop semantics as well, so the files in a
module could be converted one at a time in a per-file gradual code
repair if necessary.
Vendoring of other Go modules already records the Go version listed in
each vendored module's `go.mod` file, to implement the general
language version selection rule. That existing support would also
ensure that old vendored modules keep their old loop semantics even in
a newer overall module.
### Transition Support Tooling
We expect that this change will fix far more programs than it breaks,
but it will break some programs. The most common programs that break
are buggy tests (see the [“fixes buggy code” section below](#fixes)
for details). Users who observe a difference in their programs need
support to pinpoint the change. We plan to provide two kinds of
support tooling, one static and one dynamic.
The static support tooling is a compiler flag that reports every loop
that is compiling differently due to the new semantics. Our prototype
implementation does a very good job of filtering out loops that are
provably unaffected by the change in semantics, so in a typical
program very few loops are reported. The new compiler flag,
`-d=loopvar=2`, can be invoked by adding an option to the `go` `build`
or `go` `test` command line: either `-gcflags=-d=loopvar=2` for
reports about the current package only, or `-gcflags=all=-d=loopvar=2`
for reports about all packages.
The dynamic support tooling is a new program called bisect that, with
help from the compiler, runs a test repeatedly with different sets of
loops opted in to the new semantics. By using a binary search-like
algorithm, bisect can pinpoint the exact loop or loops that, when
converted to the new semantics, cause a test failure. Once you have a
test that fails with the new semantics but passes with the old
semantics, you run:
bisect -compile=loopvar go test
We have used this dynamic tooling in a conversion of Google's internal
monorepo to the new loop semantics. The rate of test failure caused by
the change was about 1 in 8,000. Many of these tests took a long time
to run and contained complex code that we were unfamiliar with. The
bisect tool is especially important in this situation: it runs the
search while you are at lunch or doing other work, and when you return
it has printed the source file and line number of the loop that causes
the test failure when compiled with the new semantics. At that point,
it is trivial to rewrite the loop to pre-declare a per-loop variable
and no longer use `:=`, preserving the old meaning even in the new
semantics. We also found that code owners were far more likely to see
the actual problem when we could point to the specific line of code.
As noted in the [“fixes buggy code” section below](#fixes), all but
one of the test failures turned out to be a buggy test.
### Updates to the Go Ecosystem
Other parts of the Go ecosystem will need to be updated to understand
the new loop semantics.
Vet and the golang.org/x/tools/go/analysis framework are being updated
as part of #57001 to have access to the per-file language version
information. Analyses like the vet loopclosure check will need to
tailor their diagnostics based on the language version: in files using
the new semantics, there won't be `:=` loop variable problems anymore.
Other analyzers, like staticcheck and golangci-lint, may need updates
as well. We will notify the authors of those tools and work with them
to make sure they have the information they need.
## Rationale and Compatibility
In most Go design documents, Rationale and Compatibility are two
distinct sections. For this proposal, considerations of compatibility
are so fundamental that it makes sense to address them as part of the
rationale. To be completely clear: _this is a breaking change to Go_.
However, the specifics of how we plan to roll out the change follow
the spirit of the compatibility guidelines if not the “letter of the
law.”
In the [Go 2 transitions document](https://github.com/golang/proposal/blob/master/design/28221-go2-transitions.md#language-changes)
we gave the general rule that language redefinitions like what we just
described are not permitted, giving this very proposal as an example
of something that violates the general rule. We still believe that
that is the right general rule, but we have come to also believe that
the for loop variable case is strong enough to motivate a one-time
exception to that rule. Loop variables being per-loop instead of
per-iteration is the only design decision we know of in Go that makes
programs incorrect more often than it makes them correct. Since it is
the only such design decision, we do not see any plausible candidates
for additional exceptions.
The rest of this section presents the rationale and compatibility
considerations.
### A decade of experience shows the cost of the current semantics
Russ [talked at Gophercon once](https://go.dev/blog/toward-go2#explaining-problems)
about how we need agreement about the existence of a problem before we
move on to solutions. When we examined this issue in the run up to Go
1, it did not seem like enough of a problem. The general consensus was
that it was annoying but not worth changing.
Since then, we suspect every Go programmer in the world has made this
mistake in one program or another. Russ certainly has done it
repeatedly over the past decade, despite being the one who argued for
the current semantics and then implemented them. (Apologies!)
The current cures for this problem are worse than the disease.
We ran a program to process the git logs of the top 14k modules, from
about 12k git repos and looked for commits with diff hunks that were
entirely “x := x” lines being added. We found about 600 such commits.
On close inspection, approximately half of the changes were
unnecessary, done probably either at the insistence of inaccurate
static analysis, confusion about the semantics, or an abundance of
caution. Perhaps the most striking was this pair of changes from
different projects:
```
for _, informer := range c.informerMap {
+ informer := informer
go informer.Run(stopCh)
}
```
```
for _, a := range alarms {
+ a := a
go a.Monitor(b)
}
```
One of these two changes is unnecessary and the other is a real bug
fix, but you can’t tell which is which without more context. (In one,
the loop variable is an interface value, and copying it has no effect;
in the other, the loop variable is a struct, and the method takes a
pointer receiver, so copying it ensures that the receiver is a
different pointer on each iteration.)
And then there are changes like this one, which is unnecessary
regardless of context (there is no opportunity for hidden
address-taking):
```
for _, scheme := range artifact.Schemes {
+ scheme := scheme
Runtime.artifactByScheme[scheme.ID] = id
Runtime.schemesByID[scheme.ID] = scheme
}
```
This kind of confusion and ambiguity is the exact opposite of the
readability we are aiming for in Go.
People are clearly having enough trouble with the current semantics
that they choose overly conservative tools and adding “x := x” lines
by rote in situations not flagged by tools, preferring that to
debugging actual problems. This is an entirely rational choice, but it
is also an indictment of the current semantics.
We’ve also seen production problems caused in part by these semantics,
both inside Google and at other companies (for example,
[this problem at Let’s Encrypt](https://bugzilla.mozilla.org/show_bug.cgi?id=1619047)).
It seems likely that, world-wide, the current semantics have easily
cost many millions of dollars in wasted developer time and production
outages.
### Old code is unaffected, compiling exactly as before
The go lines in go.mod give us a way to guarantee that all old code is
unaffected, even in a build that also contains new code. Only when you
change your go.mod line do the packages in that module get the new
semantics, and you control that. In general this one reason is not
sufficient, as laid out in the Go 2 transitions document. But it is a
key property that contributes to the overall rationale, with all the
other reasons added in.
### Changing the semantics globally would disallow gradual code repair
As noted earlier, [gradual code repair](https://go.dev/talks/2016/refactor.article)
is an important technique for deploying any potentially breaking
change: it allows focusing on one part of the code base at a time,
instead of having to consider all of it together. The per-module
go.mod go lines and the per-file `//go:build` directives enable
gradual code repair.
Some people have suggested we simply make the change unconditionally
when using Go 1.30, instead of allowing this fine-grained selection.
Given the low impact we expect from the change, this “all at once”
approach may be viable even for sizable code bases. However, it leaves
no room for error and creates the possibility of a large problem that
cannot be broken into smaller problems. A forced global change removes
the safety net that the gradual approach provides. From an engineering
and risk reduction point of view, that seems unwise. The safer, more
gradual path is the better one.
### Changing the semantics is usually a no-op, and when it’s not, it fixes buggy code far more often than it breaks correct code {#fixes}
As mentioned above, we have recently (as of May 2023) enabled the new
loop semantics in Google's internal Go toolchain. In order to do
that, we ran all of our tests, found the specific loops that needed
not to change behavior in order to pass (using `bisect` on each newly
failing test), rewrote the specific loops not to use `:=`, and then
changed the semantics globally. For Google's internal code base, we
did make a global change, even for open-source Go libraries written
for older Go versions. One reason for the global change was pragmatic:
there is of course no code marked as “Go 1.30” in the world now, so if
not for the global change there would be no change at all. Another
reason was that we wanted to find out how much total work it would
require to change all code. The process was still gradual, in the
sense that we tested the entirety of Google's Go code many times with
a compiler flag enabling the change just for our own builds, and fixed
all broken code, before we made the global change that affected all
our users.
People who want to experiment with a global change in their code bases
can build with `GOEXPERIMENT=loopvar` using the current development
copy of Go. That experimental mode will also ship in the Go 1.21
release.
The vast majority of newly failing tests were table-driven tests using
[t.Parallel](https://pkg.go.dev/testing/#T.Parallel). The usual
pattern is to have a test that reduces to something like `TestAllEvenBuggy`
from the start of the document:
```
func TestAllEven(t *testing.T) {
testCases := []int{1, 2, 4, 6}
for _, v := range testCases {
t.Run("sub", func(t *testing.T) {
t.Parallel()
if v&1 != 0 {
t.Fatal("odd v", v)
}
})
}
}
```
This test aims to check that all the test cases are even (they are
not!), but it passes with current Go toolchains. The problem is that
`t.Parallel` stops the closure and lets the loop continue, and then it
runs all the closures in parallel when ‘TestAllEven’ returns. By the
time the if statement in the closure executes, the loop is done, and v
has its final iteration value, 6. All four subtests now continue
executing in parallel, and they all check that 6 is even, instead of
checking each of the test cases. There is no race in this code,
because `t.Parallel` orders all the `v&1` tests after the final update
to `v` during the range loop, so the test passes even using `go test
-race`. Of course, real-world examples are typically much more
complex.
Another common form of this bug is preparing test case data by
building slices of pointers. For example this code, similar to an example at
the start of the document, builds a `[]*int32` for use as a repeated int32
in a protocol buffer:
```
func GenerateTestIDs() {
var ids []*int32
for i := int32(0); i < 10; i++ {
ids = append(ids, &i)
}
}
```
This loop aims to create a slice of ten different pointers to the
values 0 through 9, but instead it creates a slice of ten of the same
pointer, each pointing to 10.
For any of these loops, there are two useful rewrites. The first is to
remove the use of `:=`. For example:
```
func TestAllEven(t *testing.T) {
testCases := []int{1, 2, 4, 6}
var v int // TODO: Likely loop scoping bug!
for _, v = range testCases {
t.Run("sub", func(t *testing.T) {
t.Parallel()
if v&1 != 0 {
t.Fatal("odd v", v)
}
})
}
}
```
or
```
func GenerateTestIDs() {
var ids []*int32
var i int32 // TODO: Likely loop scoping bug!
for i = int32(0); i < 10; i++ {
ids = append(ids, &i)
}
}
```
This kind of rewrite keeps tests passing even if compiled using the
proposed loop semantics. Of course, most of the time the tests are
passing incorrectly; this just preserves the status quo.
The other useful rewrite is to add an explicit `x := x` assignment, as
discussed in the [Go FAQ](https://go.dev/doc/faq#closures_and_goroutines).
For example:
```
func TestAllEven(t *testing.T) {
testCases := []int{1, 2, 4, 6}
for _, v := range testCases {
v := v // TODO: This makes the test fail. Why?
t.Run("sub", func(t *testing.T) {
t.Parallel()
if v&1 != 0 {
t.Fatal("odd v", v)
}
})
}
}
```
or
```
func GenerateTestIDs() {
var ids []*int32
for i := int32(0); i < 10; i++ {
i := i // TODO: This makes the test fail. Why?
ids = append(ids, &i)
}
}
```
This kind of rewrite makes the test break using the current loop
semantics, and they will stay broken if compiled with the proposed
loop semantics. This rewrite is most useful for sending to the owners
of the code for further debugging.
Out of all the failing tests, only one affected loop was not in
test code. That code looked like:
```
var original *mapping
for _, saved := range savedMappings {
if saved.start <= m.start && m.end <= saved.end {
original = &saved
break
}
}
...
```
Unfortunately, this code was in a very low-level support program that
is invoked when a program is crashing, and a test checks that the code
contains no allocations or even runtime write barriers. In the old
loop semantics, both `original` and `saved` were function-scoped
variables, so the assignment `original = &saved` does not cause
`saved` to escape to the heap. In the new loop semantics, `saved` is
per-iteration, so `original = &saved` makes it escape the iteration
and therefore require heap allocation. The test failed because the
code is disallowed from allocating, yet it was now allocating. The fix
was to do the first kind of rewrite, declaring `saved` before the loop
and moving it back to function scope.
Similar code might change from allocating one variable per loop to
allocating N variables per loop. In some cases, that extra allocation
is inherent to fixing a latent bug. For example, `GenerateTestIDs`
above is now allocating 10 int32s instead of one – the price of
correctness. In a very frequently executed already-correct loop, the
new allocations may be unnecessary and could potentially cause more
garbage collector pressure and a measurable performance difference. If
so, standard monitoring and allocation profiles (`pprof
--alloc_objects`) should pinpoint the location easily, and the fix is
trivial: declare the variable above the loop. Benchmarking of the
public “bent” bench suite showed no statistically significant
performance difference over all, so we expect most programs to be
unaffected.
Not all the failing tests used code as obvious as the examples above.
One failing test that didn't use t.Parallel reduced to:
```
var once sync.Once
for _, tt := range testCases {
once.Do(func() {
http.HandleFunc("/handler", func(w http.ResponseWriter, r *http.Request) {
w.Write(tt.Body)
})
})
result := get("/handler")
if result != string(tt.Body) {
...
}
}
```
This strange loop registers an HTTP handler on the first iteration and
then makes a request served by the handler on every iteration. For the
handler to serve the expected data, the `tt` captured by the handler
closure must be the same as the `tt` for the current iteration. With a
per-loop `tt`, that's true. With a per-iteration `tt`, it's not: the
handler keeps using the first iteration's `tt` even in later
iterations, causing the failure.
As difficult as that example may be to puzzle through, it is a
simplified version of the original. The bisect tool pinpointing the
exact loop was a huge help in finding the problem.
Our experience supports the claim that the new semantics fixes buggy
code far more often than it breaks correct code. The new semantics
only caused test failures in about 1 of every 8,000 test packages,
but running the updated Go 1.20 `loopclosure` vet check over our entire
code base flagged tests at a much higher rate: 1 in 400 (20 in 8,000).
The loopclosure checker has no false positives: all the reports are buggy
uses of t.Parallel in our source tree.
That is, about 5% of the flagged tests were like `TestAllEvenBuggy`;
the other 95% were like `TestAllEven`: not (yet) testing what it intended,
but a correct test of correct code even with the loop variable bug fixed.
Of course, there is always the possibility that Google’s tests may not
be representative of the overall ecosystem’s tests in various ways,
and perhaps this is one of them. But there is no indication from this
analysis of _any_ common idiom at all where per-loop semantics are
required. Also, Google's tests include tests of open source Go
libraries that we use, and there were only two failures, both reported
upstream. Finally, the git log analysis points in the same direction:
parts of the ecosystem are adopting tools with very high false
positive rates and doing what the tools say, with no apparent
problems.
To be clear, it _is_ possible to write artificial examples of code
that is “correct” with per-loop variables and “incorrect” with
per-iteration variables but these are contrived.
One example, with credit to Tim King, is a convoluted way to sum the
numbers in a slice:
```
func sum(list []int) int {
m := make(map[*int]int)
for _, x := range list {
m[&x] += x
}
for _, sum := range m {
return sum
}
return 0
}
```
In the current semantics there is only one `&x` and therefore only one
map entry. With the new semantics there are many `&x` and many map
entries, so the map does not accumulate a sum.
Another example, with credit to Matthew Dempsky, is a non-racy loop:
```
for i, x := 0, 0; i < 1; i++ {
go func() { x++ }()
}
```
This loop only executes one iteration (starts just one goroutine), and
`x` is not read or written in the loop condition or post-statement.
Therefore the one created goroutine is the only code reading or
writing `x`, making the program race-free. The rewritten semantics
would have to make a new copy of `x` for the next iteration when it
runs `i++`, and that copying of `x` would race with the `x++` in the
goroutine, making the rewritten program have a race. This example
shows that it is possible for the new semantics to introduce a race
where there was no race before. (Of course, there would be a race in
the old semantics if the loop iterated more than once.)
These examples show that it is technically possible for the
per-iteration semantics to change the correctness of existing code,
even if the examples are contrived. This is more evidence for the
gradual code repair approach.
### Changing 3-clause loops keeps all for loops consistent and fixes real-world bugs
Some people suggested only applying this change to range loops, not
three-clause for loops like `i := 0; i < n; i++`.
Adjusting the 3-clause form may seem strange to C programmers, but the
same capture problems that happen in range loops also happen in
three-clause for loops. Changing both forms eliminates that bug from
the entire language, not just one place, and it keeps the loops
consistent in their variable semantics. That consistency means that if
you change a loop from using range to using a 3-clause form or vice
versa, you only have to think about whether the iteration visits the
same items, not whether a subtle change in variable semantics will
break your code. It is also worth noting that JavaScript is using
per-iteration semantics for 3-clause for loops using let, with no
problems.
In Google's own code base, at least a few of the newly failing tests
were due to buggy 3-clause loops, like in the `GenerateTestIDs` example.
These 3-clause bugs happen less often, but they still happen at a high
enough rate to be worth fixing. The consistency arguments only add to
the case.
### Good tooling can help users identify exactly the loops that need the most scrutiny during the transition
As noted in the [transition discussion](#transition), our experience
analyzing the failures in Google’s Go tests shows that we can use
compiler instrumentation to identify loops that may be compiling
differently, because the compiler thinks the loop variables escape.
Almost all the time, this identifies a very small number of loops, and
one of those loops is right next to the failure. The automated bisect
tool removes even that small amount of manual effort.
### Static analysis is not a viable alternative
Whether a particular loop is “buggy” due to the current behavior
depends on whether the address of an iteration value is taken _and
then that pointer is used after the next iteration begins_. It is
impossible in general for analyzers to see where the pointer lands and
what will happen to it. In particular, analyzers cannot see clearly
through interface method calls or indirect function calls. Different
tools have made different approximations. Vet recognizes a few
definitely bad patterns in its `loopclosure` checker, and we added a
new pattern checking for mistakes using t.Parallel in Go 1.20. To
avoid false positives, `loopclosure` also has many false negatives.
Other checkers in the ecosystem err in the other direction. The commit
log analysis showed some checkers were producing over 90% false
positive rates in real code bases. (That is, when the checker was
added to the code base, the “corrections” submitted at the same time
were not fixing actual problems over 90% of the time in some commits.)
There is no perfect way to catch these bugs statically. Changing the
semantics, on the other hand, eliminates all the bugs.
### Mechanical rewriting to preserve old semantics is possible but mostly unnecessary churn
People have suggested writing a tool that rewrites _every_ for loop
flagged as changing by the compiler, preserving the old semantics by
removing the use of `:=`. Then a person could revert the loop changess
one at a time after careful code examination. A variant of this tool
might simply add a comment to each loop along with a `//go:build
go1.29` directive at the top of the file, leaving less for the person
to undo. This kind of tool is definitely possible to write, but our
experience with real Go code suggests that it would cause far more
churn than is necessary, since 95% of definitely buggy loops simply
became correct loops with the new semantics. The approach also assumes
that careful code examination will identify all buggy code, which in
our experience is an overly optimistic assumption. Even after bisected
test failures proved that specific loops were definitely buggy,
identifying the exact bug was quite challenging. And with the
rewriting tool, you don't even know for sure that the loop is buggy,
just that the compiler would treat it differently.
All in all, we believe that the combination of being able to generate
the compiler's report of changed positions is sufficient on the static
analysis side, along with the bisect tool to track down the source of
recognized misbehaviors. Of course, users who want a rewriting tool
can easily use the compiler's report to write one, especially if the
rewrite only adds comments and `//go:build` directives.
### Changing loop syntax entirely would cause unnecessary churn
We have talked in the past about introducing a different syntax for
loops (for example, #24282), and then giving the new syntax the new
semantics while deprecating the current syntax. Ultimately this would
cause a very significant amount of churn disproportionate to the
benefit: the vast majority of existing loops are correct and do not
need any fixes. In Google's Go source tree, rate of buggy loops was
about 1 per 20,000. It would be a truly extreme response to force an
edit of every for loop that exists today, invalidate all existing
documentation, and then have two different for loops that Go
programmers need to understand for the rest of time, all to fix 1 bad
loop out of 20,000. Changing the semantics to match what people
overwhelmingly already expect provides the same value at far less
cost. It also focuses effort on newly written code, which tends to be
buggier than old code (because the old code has been at least
partially debugged already).
### Disallowing loop variable capture would cause unnecessary churn
Some people have suggested disallowing loop variable captures
entirely, which would certainly make it impossible to write a buggy
loop. Unfortunately, that would also invalidate essentially every Go
program in existence, the vast majority of which are correct. It would
also make loop variables less capable than ordinary variables, which
would be strange. Even if this were just a temporary state, with loop
captures allowed again after the semantic change, that's a huge amount
of churn to catch the 0.005% of for loops that are buggy.
### Experience from C# supports the change
Early versions of C# had per-loop variable scoping for their
equivalent of range loops. C# 5 changed the semantics to be
per-iteration, as in this proposal. (C# 5 did not change the 3-clause
for loop form, in contrast to this proposal.)
In a comment on the GitHub discussion, [@jaredpar reported](https://github.com/golang/go/discussions/56010#discussioncomment-3788526)
on experience with C#. Quoting that comment in full:
> I work on the C# team and can offer perspective here.
>
> The C# 5 rollout unconditionally changed the `foreach` loop variable
> to be per iteration. At the time of C# 5 there was no equivalent to
> Go's putting `go 1.30` in a go.mod file so the only choice was break
> unconditionally or live with the behavior. The loop variable lifetime
> became a bit of a sore point pretty much the moment the language
> introduced lambda expressions for all the reasons you describe. As the
> language grew to leverage lambdas more through features like LINQ,
> `async`, Task Parallel Libraries, etc ... the problem got worse. It
> got so prevalent that the C# team decided the unconditional break was
> justified. It would be much easier to explain the change to the,
> hopefully, small number of customers impacted vs. continuing to
> explain the tricksy behavior to new customers.
>
> This change was not taken lightly. It had been discussed internally
> for several years, [blogs were written about it](https://ericlippert.com/2009/11/12/closing-over-the-loop-variable-considered-harmful-part-one/),
> lots of analysis of customer code, upper management buy off, etc ...
> In end though the change was rather anticlimactic. Yes it did break a
> small number of customers but it was smaller than expected. For the
> customers impacted they responded positively to our justifications and
> accepted the proposed code fixes to move forward.
>
> I'm one of the main people who does customer feedback triage as well
> as someone who helps customers migrating to newer versions of the
> compiler that stumble onto unexpected behavior changes. That gives me
> a good sense of what _pain points_ exist for tooling migration. This
> was a small blip when it was first introduced but quickly faded. Even
> as recently as a few years ago I was helping large code bases upgrade
> from C# 4. While they do hit other breaking changes we've had, they
> rarely hit this one. I'm honestly struggling to remember the last time
> I worked with a customer hitting this.
>
> It's been ~10 years since this change was taken to the language and a
> lot has changed in that time. Projects have a property `<LangVersion>`
> that serves much the same purpose as the Go version in the go.mod
> file. These days when we introduce a significant breaking change we
> tie the behavior to `<LangVersion>` when possible. That helps
> separates the concept for customers of:
>
> 1. Acquiring a new toolset. This comes when you upgrade Visual Studio
> or the .NET SDK. We want these to be _friction free_ actions so
> customers get latest bug / security fixes. This never changes
> `<LangVersion>` so breaks don't happen here.
>
> 2. Moving to a new
> language version. This is an explicit action the customer takes to
> move to a new .NET / C# version. It is understood this has some cost
> associated with it to account for breaking changes.
>
> This separation has been very successful for us and allowed us to make
> changes that would not have been possible in the past. If we were
> doing this change today we'd almost certainly tie the break to a
> `<LangVersion>` update
## Implementation
The actual semantic changes are implemented in the compiler today in
an opt-in basis and form the basis of the experimental data. The
transition support tooling also exists today.
Anyone is welcome to try the change in their own trees to help inform
their understanding of the impact of the change. Specifically:
go install golang.org/dl/gotip@latest
gotip download
GOEXPERIMENT=loopvar gotip test etc
will compile all Go code with the new per-iteration loop variables
(that is, a global change, ignoring `go.mod` settings). To add
compiler diagnostics about loops that are compiling differently:
GOEXPERIMENT=loopvar gotip build -gcflags=all=-d=loopvar=2
Omit the `all=` to limit the diagnostics to the current package. To
debug a test failure that only happens with per-iteration loop
variables enabled, use:
go install golang.org/x/tools/cmd/bisect@latest
bisect -compile=loopvar gotip test the/failing/test
Other implementation work yet to be done includes documentation,
updating vet checks like loopclosure, and coordinating with the
authors of tools like staticcheck and golangci-lint. We should also
update `go fix` to remove redundant `x := x` lines from source files
that have opted in to the new semantics.
### Google testing
As noted above, we have already enabled the new loop semantics for all
code built by Google's internal Go toolchain, with only a small number
of affected tests. We will update this section to summarize any
production problems encountered.
As of May 9, it has been almost a week since the change was enabled,
and there have been no production problems, nor any bug reports of any kind.
### Timeline and Rollout
The general rule for proposals is to avoid speculating about specific
releases that will include a change. The proposal process does not
rush to meet arbitrary deadlines: we will take the time necessary to
make the right decision and, if accepted, to land the right changes
and support. That general rule is why this proposal has been referring
to Go 1.30, as a placeholder for the release that includes the new
loop semantics.
That said, the response to the [preliminary discussion of this idea](https://go.dev/issue/56010)
was enthusiastically positive, and we have no reason to expect a
different reaction to this formal proposal. Assuming that is the case,
it could be possible to ship the change in Go 1.22. Since the
GOEXPERIMENT support will ship in Go 1.21, once the proposal is
accepted and Go 1.21 is released, it would make sense to publish a
short web page explaining the change and encouraging users to try it
in their programs, like the instructions above. If the proposal is
accepted before Go 1.21 is released, that page could be published with
the release, including a link to the page in the Go 1.21 release
notes. Whenever the instructions are published, it would also make
sense to publish a blog post highlighting the upcoming change. It will
in general be good to advertise the change in as many ways as
possible.
## Open issues (if applicable)
### Bazel language version
Bazel's Go support (`rules_go`) does not support setting the language
version on a per-package basis. It would need to be updated to do
that, with gazelle maintaining that information in generated BUILD
files (derived from the `go.mod` files).
### Performance on other benchmark suites
As noted above, there is a potential for new allocations in programs
with the new semantics, which may cause changes in performance.
Although we observed no performance changes in the “bent” benchmark
suite, it would be good to hear reports from others with their own
benchmark suites.