Proposal: Less Error-Prone Loop Variable Scoping

David Chase
Russ Cox
May 2023

Discussion at https://go.dev/issue/60078.
Pre-proposal discussion at https://go.dev/issue/56010.

Abstract

Last fall, we had a GitHub discussion at #56010 about changing for loop variables declared with := from one-instance-per-loop to one-instance-per-iteration. Based on that discussion and further work on understanding the implications of the change, we propose that we make the change in an appropriate future Go version, perhaps Go 1.22 if the stars align, and otherwise a later version.

Background

This proposal is about changing for loop variable scoping semantics, so that loop variables are per-iteration instead of per-loop. This would eliminate accidental sharing of variables between different iterations, which happens far more than intentional sharing does. The proposal would fix #20733.

Briefly, the problem is that loops like this one don’t do what they look like they do:

var ids []*int
for i := 0; i < 10; i++ {
	ids = append(ids, &i)
}

That is, this code has a bug. After this loop executes, ids contains 10 identical pointers, each pointing at the value 10, instead of 10 distinct pointers to 0 through 9. This happens because the item variable is per-loop, not per-iteration: &i is the same on every iteration, and item is overwritten on each iteration. The usual fix is to write this instead:

var ids []*int
for i := 0; i < 10; i++ {
	i := i
	ids = append(ids, &i)
}

This bug also often happens in code with closures that capture the address of item implicitly, like:

var prints []func()
for _, v := range []int{1, 2, 3} {
	prints = append(prints, func() { fmt.Println(v) })
}
for _, print := range prints {
	print()
}

This code prints 3, 3, 3, because all the closures print the same v, and at the end of the loop, v is set to 3. Note that there is no explicit &v to signal a potential problem. Again the fix is the same: add v := v.

The same bug exists in this version of the program, with the same fix:

var prints []func()
for i := 1; i <= 3; i++ {
	prints = append(prints, func() { fmt.Println(i) })
}
for _, print := range prints {
	print()
}

Another common situation where this bug arises is in subtests using t.Parallel:

func TestAllEvenBuggy(t *testing.T) {
	testCases := []int{1, 2, 4, 6}
	for _, v := range testCases {
		t.Run("sub", func(t *testing.T) {
			t.Parallel()
			if v&1 != 0 {
				t.Fatal("odd v", v)
			}
		})
	}
}

This test passes, because each all four subtests check that 6 (the final test case) is even.

Goroutines are also often involved in this kind of bug, although as these examples show, they need not be. See also the Go FAQ entry.

Russ talked at Gophercon once about how we need agreement about the existence of a problem before we move on to solutions. When we examined this issue in the run up to Go 1, it did not seem like enough of a problem. The general consensus was that it was annoying but not worth changing. Since then, we suspect every Go programmer in the world has made this mistake in one program or another.

We have talked for a long time about redefining these semantics, to make loop variables per-iteration instead of per-loop. That is, the change would effectively be to add an implicit “x := x” at the start of every loop body for each iteration variable x, just like people do manually today. Making this change would remove the bugs from the programs above.

This proposal does exactly that. Using the go version lines in go.mod files, it only applies the new semantics to new programs, so that existing programs are guaranteed to continue to execute exactly as before.

Before writing this proposal, we collected feedback in a GitHub Discussion in October 2022, #56010. The vast majority of the feedback was positive, although a couple people did say they see no problem with the current semantics and discourage a change. Here are a few representative quotes from that discussion:

One thing to notice in this discussion is that even after having this problem explained multiple times by different people multiple developers were still having trouble understanding exactly what caused it and how to avoid it, and even the people that understood the problem in one context often failed to notice it would affect other types of for loops.
So we could argue that the current semantics make the learning curve for Go steeper.
PS: I have also had problems with this multiple times, once in production, thus, I am very in favor of this change even considering the breaking aspect of it.
— @VinGarcia

This exactly matches my experience. It‘s relatively easy to understand the first example (taking the same address each time), but somewhat trickier to understand in the closure/goroutine case. And even when you do understand it, one forgets (apparently even Russ forgets!). In addition, issues with this often don’t show up right away, and then when debugging an issue, I find it always takes a while to realize that it's “that old loop variable issue again”.
— @benhoyt

Go‘s unusual loop semantics are a consistent source of problems and bugs in the real world. I’ve been a professional go developer for roughly six years, and I still get bit by this bug, and it's a consistent stumbling block for newer Go programmers. I would strongly encourage this change.
— @efronlicht

I really do not see this as a useful change. These changes always have the best intentions, but the reality is that the language works just fine now. This well intended change slowly creep in over time, until you wind up with the C++ language yet again. If someone can't understand a relatively simple design decision like this one, they are not going to understand how to properly use channels and other language features of Go.
Burying a change to the semantics of the language in go.mod is absolutely bonkers.
— @hydrogen18

Overall, the discussion included 72 participants and 291 total comments and replies. As a rough measure of user sentiment, the discussion post received 671 thumbs up, 115 party, and 198 heart emoji reactions, and not a single thumbs down reaction.

Russ also presented the idea of making this change at GopherCon 2022, shortly after the discussion, and then again at Google Open Source Live's Go Day 2022. Feedback from both talks was entirely positive: not a single person suggested that we should not make this change.

Proposal

We propose to change for loop scoping in a future version of Go to be per-iteration instead of per-loop. For the purposes of this document, we are calling that version Go 1.30, but the change would land in whatever version of Go it is ready for. The earliest version of Go that could include the change would be Go 1.22.

This change includes four major parts: (1) the language specification, (2) module-based and file-based language version selection, (3) tooling to help users in the transition, (4) updates to other parts of the Go ecosystem. The implementation of these parts spans the compiler, the go command, the go vet command, and other tools.

Language Specification

In https://go.dev/ref/spec#For_clause, the text currently reads:

The init statement may be a short variable declaration, but the post statement must not. Variables declared by the init statement are re-used in each iteration.

This would be replaced with:

The init statement may be a short variable declaration (:=), but the post statement must not. Each iteration has its own separate declared variable (or variables). The variable used by the first iteration is declared by the init statement. The variable used by each subsequent iteration is declared implicitly before executing the post statement and initialized to the value of the previous iteration's variable at that moment.
var prints []func()
for i := 0; i < 3; i++ {
    prints = append(prints, func() { println(i) })
}
for _, p := range prints {
    p()
}

// Output:
// 0
// 1
// 2
Prior to Go 1.30, iterations shared one set of variables instead of having their own separate variables.

(Remember that in this document, we are using Go 1.30 as the placeholder for the release that will ship the new semantics.)

For precision in this proposal, the spec example would compile to a form semantically equivalent to this Go program:

{
	i_outer := 0
	first := true
	for {
		i := i_outer
		if first {
			first = false
		} else {
			i++
		}
		if !(i < 3) {
			break
		}
		prints = append(prints, func() { println(i) })
		i_outer = i
	}
}

Of course, a compiler can write the code less awkwardly, since it need not limit the translation output to valid Go source code. In particular a compiler is likely to have the concept of the current memory location associated with i and be able to update it just before the post statement.

In https://go.dev/ref/spec#For_range, the text currently reads:

The iteration variables may be declared by the “range” clause using a form of short variable declaration (:=). In this case their types are set to the types of the respective iteration values and their scope is the block of the “for” statement; they are re-used in each iteration. If the iteration variables are declared outside the “for” statement, after execution their values will be those of the last iteration.

This would be replaced with:

The iteration variables may be declared by the “range” clause using a form of short variable declaration (:=). In this case their types are set to the types of the respective iteration values and their scope is the block of the “for” statement; each iteration has its own separate variables. If the iteration variables are declared outside the “for” statement, after execution their values will be those of the last iteration.
var prints []func()
for _, s := range []string{"a", "b", "c"} {
    prints = append(prints, func() { println(s) })
}
for _, p := range prints {
    p()
}

// Output:
// a
// b
// c
Prior to Go 1.30, iterations shared one set of variables instead of having their own separate variables.

For precision in this proposal, the spec example would compile to a form semantically equivalent to this Go program:

{
	var s_outer string
	for _, s_outer = range []string{"a", "b", "c"} {
		s := s_outer
		prints = append(prints, func() { println(s) })
	}
}

Note that in both 3-clause and range forms, this proposal is a complete no-op for loops with no := in the loop header and loops with no variable capture in the loop body. In particular, a loop like the following example, modifying the loop variable during the loop body, continues to execute as it always has:

for i := 0;; i++ {
	if i >= len(s) || s[i] == '"' {
		return s[:i]
	}
	if s[i] == '\\' { // skip escaped char, potentially a quote
		i++
	}
}

Language Version Selection

The change in language specification will fix far more programs than it breaks, but it may break a very small number of programs. To make the potential breakage completely user controlled, the rollout would decide whether to use the new semantics based on the go line in each package’s go.mod file. This is the same line already used for enabling language features; for example, to use generics in a package, the go.mod must say go 1.18 or later. As a special case, for this proposal, we would use the go line for changing semantics instead of for adding or removing a feature.

Modules that say go 1.30 or later would have for loops using per-iteration variables, while modules declaring earlier versions have for loops with per-loop variables:

This mechanism would allow the change to be deployed gradually in a given code base. Each module can update to the new semantics independently, avoiding a bifurcation of the ecosystem.

The forward compatibility work in #57001, which will land in Go 1.21, ensures that Go 1.21 and later will not attempt to compile code marked go 1.30. Even if this change lands in Go 1.22, the previous (and only other supported) Go release would be Go 1.21, which would understand not to compile go 1.22 code. So code opting in to the new loop semantics would never miscompile in older Go releases, because it would not compile at all. If the changes were to be slated for Go 1.22, it might make sense to issue a Go 1.20 point release making its go command understand not to compile go 1.22 code. Strictly speaking, that point release is unnecessary, because if Go 1.22 has been released, Go 1.20 is unsupported and we don't need to worry about its behavior. But in practice people do use older Go releases for longer than they are supported, and if they keep up with point releases we can help them avoid this potential problem.

The forward compatibility work also allows a per-file language version selection using //go:build directives. Specifically, if a file in a go 1.29 module says //go:build go1.30, it gets the Go 1.30 language semantics, and similarly if a file in a go 1.30 module says //go:build go1.29, it gets the Go 1.29 language semantics. This general rule would apply to loop semantics as well, so the files in a module could be converted one at a time in a per-file gradual code repair if necessary.

Vendoring of other Go modules already records the Go version listed in each vendored module's go.mod file, to implement the general language version selection rule. That existing support would also ensure that old vendored modules keep their old loop semantics even in a newer overall module.

Transition Support Tooling

We expect that this change will fix far more programs than it breaks, but it will break some programs. The most common programs that break are buggy tests (see the “fixes buggy code” section below for details). Users who observe a difference in their programs need support to pinpoint the change. We plan to provide two kinds of support tooling, one static and one dynamic.

The static support tooling is a compiler flag that reports every loop that is compiling differently due to the new semantics. Our prototype implementation does a very good job of filtering out loops that are provably unaffected by the change in semantics, so in a typical program very few loops are reported. The new compiler flag, -d=loopvar=2, can be invoked by adding an option to the go build or go test command line: either -gcflags=-d=loopvar=2 for reports about the current package only, or -gcflags=all=-d=loopvar=2 for reports about all packages.

The dynamic support tooling is a new program called bisect that, with help from the compiler, runs a test repeatedly with different sets of loops opted in to the new semantics. By using a binary search-like algorithm, bisect can pinpoint the exact loop or loops that, when converted to the new semantics, cause a test failure. Once you have a test that fails with the new semantics but passes with the old semantics, you run:

bisect -compile=loopvar go test

We have used this dynamic tooling in a conversion of Google's internal monorepo to the new loop semantics. The rate of test failure caused by the change was about 1 in 8,000. Many of these tests took a long time to run and contained complex code that we were unfamiliar with. The bisect tool is especially important in this situation: it runs the search while you are at lunch or doing other work, and when you return it has printed the source file and line number of the loop that causes the test failure when compiled with the new semantics. At that point, it is trivial to rewrite the loop to pre-declare a per-loop variable and no longer use :=, preserving the old meaning even in the new semantics. We also found that code owners were far more likely to see the actual problem when we could point to the specific line of code. As noted in the “fixes buggy code” section below, all but one of the test failures turned out to be a buggy test.

Updates to the Go Ecosystem

Other parts of the Go ecosystem will need to be updated to understand the new loop semantics.

Vet and the golang.org/x/tools/go/analysis framework are being updated as part of #57001 to have access to the per-file language version information. Analyses like the vet loopclosure check will need to tailor their diagnostics based on the language version: in files using the new semantics, there won't be := loop variable problems anymore.

Other analyzers, like staticcheck and golangci-lint, may need updates as well. We will notify the authors of those tools and work with them to make sure they have the information they need.

Rationale and Compatibility

In most Go design documents, Rationale and Compatibility are two distinct sections. For this proposal, considerations of compatibility are so fundamental that it makes sense to address them as part of the rationale. To be completely clear: this is a breaking change to Go. However, the specifics of how we plan to roll out the change follow the spirit of the compatibility guidelines if not the “letter of the law.”

In the Go 2 transitions document we gave the general rule that language redefinitions like what we just described are not permitted, giving this very proposal as an example of something that violates the general rule. We still believe that that is the right general rule, but we have come to also believe that the for loop variable case is strong enough to motivate a one-time exception to that rule. Loop variables being per-loop instead of per-iteration is the only design decision we know of in Go that makes programs incorrect more often than it makes them correct. Since it is the only such design decision, we do not see any plausible candidates for additional exceptions.

The rest of this section presents the rationale and compatibility considerations.

A decade of experience shows the cost of the current semantics

Since then, we suspect every Go programmer in the world has made this mistake in one program or another. Russ certainly has done it repeatedly over the past decade, despite being the one who argued for the current semantics and then implemented them. (Apologies!)

The current cures for this problem are worse than the disease.

We ran a program to process the git logs of the top 14k modules, from about 12k git repos and looked for commits with diff hunks that were entirely “x := x” lines being added. We found about 600 such commits. On close inspection, approximately half of the changes were unnecessary, done probably either at the insistence of inaccurate static analysis, confusion about the semantics, or an abundance of caution. Perhaps the most striking was this pair of changes from different projects:

     for _, informer := range c.informerMap {
+        informer := informer
         go informer.Run(stopCh)
     }

     for _, a := range alarms {
+        a := a
         go a.Monitor(b)
     }

One of these two changes is unnecessary and the other is a real bug fix, but you can’t tell which is which without more context. (In one, the loop variable is an interface value, and copying it has no effect; in the other, the loop variable is a struct, and the method takes a pointer receiver, so copying it ensures that the receiver is a different pointer on each iteration.)

And then there are changes like this one, which is unnecessary regardless of context (there is no opportunity for hidden address-taking):

     for _, scheme := range artifact.Schemes {
+        scheme := scheme
         Runtime.artifactByScheme[scheme.ID] = id
         Runtime.schemesByID[scheme.ID] = scheme
     }

This kind of confusion and ambiguity is the exact opposite of the readability we are aiming for in Go.

People are clearly having enough trouble with the current semantics that they choose overly conservative tools and adding “x := x” lines by rote in situations not flagged by tools, preferring that to debugging actual problems. This is an entirely rational choice, but it is also an indictment of the current semantics.

We’ve also seen production problems caused in part by these semantics, both inside Google and at other companies (for example, this problem at Let’s Encrypt). It seems likely that, world-wide, the current semantics have easily cost many millions of dollars in wasted developer time and production outages.

Old code is unaffected, compiling exactly as before

The go lines in go.mod give us a way to guarantee that all old code is unaffected, even in a build that also contains new code. Only when you change your go.mod line do the packages in that module get the new semantics, and you control that. In general this one reason is not sufficient, as laid out in the Go 2 transitions document. But it is a key property that contributes to the overall rationale, with all the other reasons added in.

Changing the semantics globally would disallow gradual code repair

As noted earlier, gradual code repair is an important technique for deploying any potentially breaking change: it allows focusing on one part of the code base at a time, instead of having to consider all of it together. The per-module go.mod go lines and the per-file //go:build directives enable gradual code repair.

Some people have suggested we simply make the change unconditionally when using Go 1.30, instead of allowing this fine-grained selection. Given the low impact we expect from the change, this “all at once” approach may be viable even for sizable code bases. However, it leaves no room for error and creates the possibility of a large problem that cannot be broken into smaller problems. A forced global change removes the safety net that the gradual approach provides. From an engineering and risk reduction point of view, that seems unwise. The safer, more gradual path is the better one.

Changing the semantics is usually a no-op, and when it’s not, it fixes buggy code far more often than it breaks correct code

As mentioned above, we have recently (as of May 2023) enabled the new loop semantics in Google‘s internal Go toolchain. In order to do that, we ran all of our tests, found the specific loops that needed not to change behavior in order to pass (using bisect on each newly failing test), rewrote the specific loops not to use :=, and then changed the semantics globally. For Google’s internal code base, we did make a global change, even for open-source Go libraries written for older Go versions. One reason for the global change was pragmatic: there is of course no code marked as “Go 1.30” in the world now, so if not for the global change there would be no change at all. Another reason was that we wanted to find out how much total work it would require to change all code. The process was still gradual, in the sense that we tested the entirety of Google's Go code many times with a compiler flag enabling the change just for our own builds, and fixed all broken code, before we made the global change that affected all our users.

People who want to experiment with a global change in their code bases can build with GOEXPERIMENT=loopvar using the current development copy of Go. That experimental mode will also ship in the Go 1.21 release.

The vast majority of newly failing tests were table-driven tests using t.Parallel. The usual pattern is to have a test that reduces to something like TestAllEvenBuggy from the start of the document:

func TestAllEven(t *testing.T) {
	testCases := []int{1, 2, 4, 6}
	for _, v := range testCases {
		t.Run("sub", func(t *testing.T) {
			t.Parallel()
			if v&1 != 0 {
				t.Fatal("odd v", v)
			}
		})
	}
}

This test aims to check that all the test cases are even (they are not!), but it passes with current Go toolchains. The problem is that t.Parallel stops the closure and lets the loop continue, and then it runs all the closures in parallel when ‘TestAllEven’ returns. By the time the if statement in the closure executes, the loop is done, and v has its final iteration value, 6. All four subtests now continue executing in parallel, and they all check that 6 is even, instead of checking each of the test cases. There is no race in this code, because t.Parallel orders all the v&1 tests after the final update to v during the range loop, so the test passes even using go test -race. Of course, real-world examples are typically much more complex.

Another common form of this bug is preparing test case data by building slices of pointers. For example this code, similar to an example at the start of the document, builds a []*int32 for use as a repeated int32 in a protocol buffer:

func GenerateTestIDs() {
	var ids []*int32
	for i := int32(0); i < 10; i++ {
		ids = append(ids, &i)
	}
}

This loop aims to create a slice of ten different pointers to the values 0 through 9, but instead it creates a slice of ten of the same pointer, each pointing to 10.

For any of these loops, there are two useful rewrites. The first is to remove the use of :=. For example:

func TestAllEven(t *testing.T) {
	testCases := []int{1, 2, 4, 6}
	var v int // TODO: Likely loop scoping bug!
	for _, v = range testCases {
		t.Run("sub", func(t *testing.T) {
			t.Parallel()
			if v&1 != 0 {
				t.Fatal("odd v", v)
			}
		})
	}
}

func GenerateTestIDs() {
	var ids []*int32
	var i int32 // TODO: Likely loop scoping bug!
	for i = int32(0); i < 10; i++ {
		ids = append(ids, &i)
	}
}

This kind of rewrite keeps tests passing even if compiled using the proposed loop semantics. Of course, most of the time the tests are passing incorrectly; this just preserves the status quo.

The other useful rewrite is to add an explicit x := x assignment, as discussed in the Go FAQ. For example:

func TestAllEven(t *testing.T) {
	testCases := []int{1, 2, 4, 6}
	for _, v := range testCases {
		v := v // TODO: This makes the test fail. Why?
		t.Run("sub", func(t *testing.T) {
			t.Parallel()
			if v&1 != 0 {
				t.Fatal("odd v", v)
			}
		})
	}
}

func GenerateTestIDs() {
	var ids []*int32
	for i := int32(0); i < 10; i++ {
		i := i // TODO: This makes the test fail. Why?
		ids = append(ids, &i)
	}
}

This kind of rewrite makes the test break using the current loop semantics, and they will stay broken if compiled with the proposed loop semantics. This rewrite is most useful for sending to the owners of the code for further debugging.

Out of all the failing tests, only one affected loop was not in test code. That code looked like:

var original *mapping
for _, saved := range savedMappings {
	if saved.start <= m.start && m.end <= saved.end {
		original = &saved
		break
	}
}
...

Unfortunately, this code was in a very low-level support program that is invoked when a program is crashing, and a test checks that the code contains no allocations or even runtime write barriers. In the old loop semantics, both original and saved were function-scoped variables, so the assignment original = &saved does not cause saved to escape to the heap. In the new loop semantics, saved is per-iteration, so original = &saved makes it escape the iteration and therefore require heap allocation. The test failed because the code is disallowed from allocating, yet it was now allocating. The fix was to do the first kind of rewrite, declaring saved before the loop and moving it back to function scope.

Similar code might change from allocating one variable per loop to allocating N variables per loop. In some cases, that extra allocation is inherent to fixing a latent bug. For example, GenerateTestIDs above is now allocating 10 int32s instead of one – the price of correctness. In a very frequently executed already-correct loop, the new allocations may be unnecessary and could potentially cause more garbage collector pressure and a measurable performance difference. If so, standard monitoring and allocation profiles (pprof --alloc_objects) should pinpoint the location easily, and the fix is trivial: declare the variable above the loop. Benchmarking of the public “bent” bench suite showed no statistically significant performance difference over all, so we expect most programs to be unaffected.

Not all the failing tests used code as obvious as the examples above. One failing test that didn't use t.Parallel reduced to:

var once sync.Once
for _, tt := range testCases {
    once.Do(func() {
        http.HandleFunc("/handler", func(w http.ResponseWriter, r *http.Request) {
            w.Write(tt.Body)
        })
    })

    result := get("/handler")
    if result != string(tt.Body) {
        ...
    }
}

This strange loop registers an HTTP handler on the first iteration and then makes a request served by the handler on every iteration. For the handler to serve the expected data, the tt captured by the handler closure must be the same as the tt for the current iteration. With a per-loop tt, that‘s true. With a per-iteration tt, it’s not: the handler keeps using the first iteration's tt even in later iterations, causing the failure.

As difficult as that example may be to puzzle through, it is a simplified version of the original. The bisect tool pinpointing the exact loop was a huge help in finding the problem.

Our experience supports the claim that the new semantics fixes buggy code far more often than it breaks correct code. The new semantics only caused test failures in about 1 of every 8,000 test packages, but running the updated Go 1.20 loopclosure vet check over our entire code base flagged tests at a much higher rate: 1 in 400 (20 in 8,000). The loopclosure checker has no false positives: all the reports are buggy uses of t.Parallel in our source tree. That is, about 5% of the flagged tests were like TestAllEvenBuggy; the other 95% were like TestAllEven: not (yet) testing what it intended, but a correct test of correct code even with the loop variable bug fixed.

Of course, there is always the possibility that Google’s tests may not be representative of the overall ecosystem’s tests in various ways, and perhaps this is one of them. But there is no indication from this analysis of any common idiom at all where per-loop semantics are required. Also, Google's tests include tests of open source Go libraries that we use, and there were only two failures, both reported upstream. Finally, the git log analysis points in the same direction: parts of the ecosystem are adopting tools with very high false positive rates and doing what the tools say, with no apparent problems.

To be clear, it is possible to write artificial examples of code that is “correct” with per-loop variables and “incorrect” with per-iteration variables but these are contrived.

One example, with credit to Tim King, is a convoluted way to sum the numbers in a slice:

func sum(list []int) int {
	m := make(map[*int]int)
	for _, x := range list {
		m[&x] += x
	}
	for _, sum := range m {
		return sum
	}
	return 0
}

In the current semantics there is only one &x and therefore only one map entry. With the new semantics there are many &x and many map entries, so the map does not accumulate a sum.

Another example, with credit to Matthew Dempsky, is a non-racy loop:

for i, x := 0, 0; i < 1; i++ {
	go func() { x++ }()
}

This loop only executes one iteration (starts just one goroutine), and x is not read or written in the loop condition or post-statement. Therefore the one created goroutine is the only code reading or writing x, making the program race-free. The rewritten semantics would have to make a new copy of x for the next iteration when it runs i++, and that copying of x would race with the x++ in the goroutine, making the rewritten program have a race. This example shows that it is possible for the new semantics to introduce a race where there was no race before. (Of course, there would be a race in the old semantics if the loop iterated more than once.)

These examples show that it is technically possible for the per-iteration semantics to change the correctness of existing code, even if the examples are contrived. This is more evidence for the gradual code repair approach.

Changing 3-clause loops keeps all for loops consistent and fixes real-world bugs

Some people suggested only applying this change to range loops, not three-clause for loops like i := 0; i < n; i++.

Adjusting the 3-clause form may seem strange to C programmers, but the same capture problems that happen in range loops also happen in three-clause for loops. Changing both forms eliminates that bug from the entire language, not just one place, and it keeps the loops consistent in their variable semantics. That consistency means that if you change a loop from using range to using a 3-clause form or vice versa, you only have to think about whether the iteration visits the same items, not whether a subtle change in variable semantics will break your code. It is also worth noting that JavaScript is using per-iteration semantics for 3-clause for loops using let, with no problems.

In Google's own code base, at least a few of the newly failing tests were due to buggy 3-clause loops, like in the GenerateTestIDs example. These 3-clause bugs happen less often, but they still happen at a high enough rate to be worth fixing. The consistency arguments only add to the case.

Good tooling can help users identify exactly the loops that need the most scrutiny during the transition

As noted in the transition discussion, our experience analyzing the failures in Google’s Go tests shows that we can use compiler instrumentation to identify loops that may be compiling differently, because the compiler thinks the loop variables escape. Almost all the time, this identifies a very small number of loops, and one of those loops is right next to the failure. The automated bisect tool removes even that small amount of manual effort.

Static analysis is not a viable alternative

Whether a particular loop is “buggy” due to the current behavior depends on whether the address of an iteration value is taken and then that pointer is used after the next iteration begins. It is impossible in general for analyzers to see where the pointer lands and what will happen to it. In particular, analyzers cannot see clearly through interface method calls or indirect function calls. Different tools have made different approximations. Vet recognizes a few definitely bad patterns in its loopclosure checker, and we added a new pattern checking for mistakes using t.Parallel in Go 1.20. To avoid false positives, loopclosure also has many false negatives. Other checkers in the ecosystem err in the other direction. The commit log analysis showed some checkers were producing over 90% false positive rates in real code bases. (That is, when the checker was added to the code base, the “corrections” submitted at the same time were not fixing actual problems over 90% of the time in some commits.)

There is no perfect way to catch these bugs statically. Changing the semantics, on the other hand, eliminates all the bugs.

Mechanical rewriting to preserve old semantics is possible but mostly unnecessary churn

People have suggested writing a tool that rewrites every for loop flagged as changing by the compiler, preserving the old semantics by removing the use of :=. Then a person could revert the loop changess one at a time after careful code examination. A variant of this tool might simply add a comment to each loop along with a //go:build go1.29 directive at the top of the file, leaving less for the person to undo. This kind of tool is definitely possible to write, but our experience with real Go code suggests that it would cause far more churn than is necessary, since 95% of definitely buggy loops simply became correct loops with the new semantics. The approach also assumes that careful code examination will identify all buggy code, which in our experience is an overly optimistic assumption. Even after bisected test failures proved that specific loops were definitely buggy, identifying the exact bug was quite challenging. And with the rewriting tool, you don't even know for sure that the loop is buggy, just that the compiler would treat it differently.

All in all, we believe that the combination of being able to generate the compiler‘s report of changed positions is sufficient on the static analysis side, along with the bisect tool to track down the source of recognized misbehaviors. Of course, users who want a rewriting tool can easily use the compiler’s report to write one, especially if the rewrite only adds comments and //go:build directives.

Changing loop syntax entirely would cause unnecessary churn

We have talked in the past about introducing a different syntax for loops (for example, #24282), and then giving the new syntax the new semantics while deprecating the current syntax. Ultimately this would cause a very significant amount of churn disproportionate to the benefit: the vast majority of existing loops are correct and do not need any fixes. In Google's Go source tree, rate of buggy loops was about 1 per 20,000. It would be a truly extreme response to force an edit of every for loop that exists today, invalidate all existing documentation, and then have two different for loops that Go programmers need to understand for the rest of time, all to fix 1 bad loop out of 20,000. Changing the semantics to match what people overwhelmingly already expect provides the same value at far less cost. It also focuses effort on newly written code, which tends to be buggier than old code (because the old code has been at least partially debugged already).

Disallowing loop variable capture would cause unnecessary churn

Some people have suggested disallowing loop variable captures entirely, which would certainly make it impossible to write a buggy loop. Unfortunately, that would also invalidate essentially every Go program in existence, the vast majority of which are correct. It would also make loop variables less capable than ordinary variables, which would be strange. Even if this were just a temporary state, with loop captures allowed again after the semantic change, that's a huge amount of churn to catch the 0.005% of for loops that are buggy.

Experience from C# supports the change

Early versions of C# had per-loop variable scoping for their equivalent of range loops. C# 5 changed the semantics to be per-iteration, as in this proposal. (C# 5 did not change the 3-clause for loop form, in contrast to this proposal.)

In a comment on the GitHub discussion, @jaredpar reported on experience with C#. Quoting that comment in full:

I work on the C# team and can offer perspective here.
The C# 5 rollout unconditionally changed the foreach loop variable to be per iteration. At the time of C# 5 there was no equivalent to Go's putting go 1.30 in a go.mod file so the only choice was break unconditionally or live with the behavior. The loop variable lifetime became a bit of a sore point pretty much the moment the language introduced lambda expressions for all the reasons you describe. As the language grew to leverage lambdas more through features like LINQ, async, Task Parallel Libraries, etc ... the problem got worse. It got so prevalent that the C# team decided the unconditional break was justified. It would be much easier to explain the change to the, hopefully, small number of customers impacted vs. continuing to explain the tricksy behavior to new customers.
This change was not taken lightly. It had been discussed internally for several years, blogs were written about it, lots of analysis of customer code, upper management buy off, etc ... In end though the change was rather anticlimactic. Yes it did break a small number of customers but it was smaller than expected. For the customers impacted they responded positively to our justifications and accepted the proposed code fixes to move forward.
I‘m one of the main people who does customer feedback triage as well as someone who helps customers migrating to newer versions of the compiler that stumble onto unexpected behavior changes. That gives me a good sense of what pain points exist for tooling migration. This was a small blip when it was first introduced but quickly faded. Even as recently as a few years ago I was helping large code bases upgrade from C# 4. While they do hit other breaking changes we’ve had, they rarely hit this one. I'm honestly struggling to remember the last time I worked with a customer hitting this.
It's been ~10 years since this change was taken to the language and a lot has changed in that time. Projects have a property <LangVersion> that serves much the same purpose as the Go version in the go.mod file. These days when we introduce a significant breaking change we tie the behavior to <LangVersion> when possible. That helps separates the concept for customers of:
Acquiring a new toolset. This comes when you upgrade Visual Studio or the .NET SDK. We want these to be friction free actions so customers get latest bug / security fixes. This never changes <LangVersion> so breaks don't happen here.
Moving to a new language version. This is an explicit action the customer takes to move to a new .NET / C# version. It is understood this has some cost associated with it to account for breaking changes.
This separation has been very successful for us and allowed us to make changes that would not have been possible in the past. If we were doing this change today we'd almost certainly tie the break to a <LangVersion> update

Implementation

The actual semantic changes are implemented in the compiler today in an opt-in basis and form the basis of the experimental data. The transition support tooling also exists today.

Anyone is welcome to try the change in their own trees to help inform their understanding of the impact of the change. Specifically:

go install golang.org/dl/gotip@latest
gotip download
GOEXPERIMENT=loopvar gotip test etc

will compile all Go code with the new per-iteration loop variables (that is, a global change, ignoring go.mod settings). To add compiler diagnostics about loops that are compiling differently:

GOEXPERIMENT=loopvar gotip build -gcflags=all=-d=loopvar=2

Omit the all= to limit the diagnostics to the current package. To debug a test failure that only happens with per-iteration loop variables enabled, use:

go install golang.org/x/tools/cmd/bisect@latest
bisect -compile=loopvar gotip test the/failing/test

Other implementation work yet to be done includes documentation, updating vet checks like loopclosure, and coordinating with the authors of tools like staticcheck and golangci-lint. We should also update go fix to remove redundant x := x lines from source files that have opted in to the new semantics.

Google testing

As noted above, we have already enabled the new loop semantics for all code built by Google's internal Go toolchain, with only a small number of affected tests. We will update this section to summarize any production problems encountered.

As of May 9, it has been almost a week since the change was enabled, and there have been no production problems, nor any bug reports of any kind.

Timeline and Rollout

The general rule for proposals is to avoid speculating about specific releases that will include a change. The proposal process does not rush to meet arbitrary deadlines: we will take the time necessary to make the right decision and, if accepted, to land the right changes and support. That general rule is why this proposal has been referring to Go 1.30, as a placeholder for the release that includes the new loop semantics.

That said, the response to the preliminary discussion of this idea was enthusiastically positive, and we have no reason to expect a different reaction to this formal proposal. Assuming that is the case, it could be possible to ship the change in Go 1.22. Since the GOEXPERIMENT support will ship in Go 1.21, once the proposal is accepted and Go 1.21 is released, it would make sense to publish a short web page explaining the change and encouraging users to try it in their programs, like the instructions above. If the proposal is accepted before Go 1.21 is released, that page could be published with the release, including a link to the page in the Go 1.21 release notes. Whenever the instructions are published, it would also make sense to publish a blog post highlighting the upcoming change. It will in general be good to advertise the change in as many ways as possible.

Open issues (if applicable)

Bazel language version

Bazel's Go support (rules_go) does not support setting the language version on a per-package basis. It would need to be updated to do that, with gazelle maintaining that information in generated BUILD files (derived from the go.mod files).

Performance on other benchmark suites

As noted above, there is a potential for new allocations in programs with the new semantics, which may cause changes in performance. Although we observed no performance changes in the “bent” benchmark suite, it would be good to hear reports from others with their own benchmark suites.