design/28221-go2-transitions.md - proposal - Git at Google

 # Proposal: Go 2 transition

 Author: Ian Lance Taylor

 Last update: October 15, 2018

 ## Abstract

 A proposal for how to make incompatible changes from Go 1 to Go 2
 while breaking as little as possible.

 ## Background

 Currently the Go language and standard libraries are covered by the
 [Go 1 compatibility guarantee](https://golang.org/doc/go1compat).
 The goal of that document was to promise that new releases of Go would
 not break existing working programs.

 Among the goals for the Go 2 process is to consider changes to the
 language and standard libraries that will break the guarantee.
 Since Go is used in a distributed open source environment, we cannot
 rely on a [flag
 day](http://www.catb.org/jargon/html/F/flag-day.html).
 We must permit the interoperation of different packages written using
 different versions of Go.

 Every language goes through version transitions.
 As background, here are some notes on what other languages have done.
 Feel free to skip the rest of this section.

 ### C

 C language versions are driven by the ISO standardization process.
 C language development has paid close attention to backward
 compatibility.
 After the first ISO standard, C90, every subsequent standard has
 maintained strict backward compatibility.
 Where new keywords have been introduced, they are introduced in a
 namespace reserved by C90 (an underscore followed by an uppercase
 ASCII letter) and are made more accessible via a `#define` macro in a
 header file that did not previously exist (examples are `_Complex`,
 defined as `complex` in `<complex.h>`, and `_Bool`, defined as `bool`
 in `<stdbool.h>`).
 None of the basic language semantics defined in C90 have changed.

 In addition, most C compilers provide options to define precisely
 which version of the C standard the code should be compiled for (for
 example, `-std=c90`).
 Most standard library implementations support feature macros that may
 be #define’d before including the header files to specify exactly
 which version of the library should be provided (for example,
 `_ISOC99_SOURCE`).
 While these features have had bugs, they are fairly reliable and are
 widely used.

 A key feature of these options is that code compiled at different
 language/library versions can in general all be linked together and
 work as expected.

 The first standard, C90, did introduce breaking changes to the
 previous C language implementations, known informally as K&R C.
 New keywords were introduced, such as `volatile` (actually that might
 have been the only new keyword in C90).
 The precise implementation of integer promotion in integer expressions
 changed from unsigned-preserving to value-preserving.
 Fortunately it was easy to detect code using the new keywords due to
 compilation errors, and easy to adjust that code.
 The change in integer promotion actually made it less surprising to
 naive users, and experienced users mostly used explicit casts to
 ensure portability among systems with different integer sizes, so
 while there was no automatic detection of problems not much code broke
 in practice.

 There were also some irritating changes.
 C90 introduced trigraphs, which changed the behavior of some string
 constants.
 Compilers adapted with options like -no-trigraphs and -Wtrigraphs.

 More seriously, C90 introduced the notion of undefined behavior, and
 declared that programs that invoked undefined behavior might take
 any action.
 In K&R C, the cases that C90 described as undefined behavior were
 mostly treated as what C90 called implementation-defined behavior: the
 program would take some non-portable but predictable action.
 Compiler writers absorbed the notion of undefined behavior, and
 started writing optimizations that assumed that the behavior would not
 occur.
 This caused effects that surprised people not fluent in the C
 standard.
 I won’t go into the details here, but one example of this (from my
 blog) is [signed overflow](http://www.airs.com/blog/archives/120).

 C of course continues to be the preferred language for kernel
 development and the glue language of the computing industry.
 Though it has been partially replaced by newer languages, this is not
 because of any choices made by new versions of C.

 The lessons I see here are:

 * Backward compatibility matters.
 * Breaking compatibility in small ways is OK, as long as people can
   spot the breakages through compiler options or compiler errors.
 * Compiler options to select specific language/library versions are
   useful, provided code compiled using different options can be linked
   together.
 * Unlimited undefined behavior is confusing for users.

 ### C++

 C++ language versions are also now driven by the ISO standardization process.
 Like C, C++ pays close attention to backward compatibility.
 C++ has been historically more free with adding new keywords (there
 are 10 new keywords in C++11).
 This works out OK because the newer keywords tend to be relatively
 long (`constexpr`, `nullptr`, `static_assert`) and compilation errors
 make it easy to find code using the new keywords as identifiers.

 C++ uses the same sorts of options for specifying the standard version
 for language and libraries as are found in C.
 It suffers from the same sorts of problems as C with regard to
 undefined behavior.

 An example of a breaking change in C++ was the change in the scope of
 a variable declared in the initialization statement of a for loop.
 In the pre-standard versions of C++, the scope of the variable
 extended to the end of the enclosing block, as though it were declared
 immediately before the for loop.
 During the development of the first C++ standard, C++98, this was
 changed so that the scope was only within the for loop itself.
 Compilers adapted by introducing options like `-ffor-scope` so that
 users could control the expected scope of the variable (for a period
 of time, when compiling with neither `-ffor-scope` nor
 `-fno-for-scope`, the GCC compiler used the old scope but warned about
 any code that relied on it).

 Despite the relatively strong backward compatibility, code written in
 new versions of C++, like C++11, tends to have a very different feel
 than code written in older versions of C++.
 This is because styles have changed to use new language and library
 features.
 Raw pointers are less commonly used, range loops are used rather than
 standard iterator patterns, new concepts like rvalue references and
 move semantics are used widely, and so forth.
 People familiar with older versions of C++ can struggle to understand
 code written in new versions.

 C++ is of course an enormously popular language, and the ongoing
 language revision process has not harmed its popularity.

 Besides the lessons from C, I would add:

 * A new version may have a very different feel while remaining
   backward compatible.

 ### Java

 I know less about Java than about the other languages I discuss, so
 there may be more errors here and there are certainly more biases.

 Java is largely backward compatible at the byte-code level, meaning
 that Java version N+1 libraries can call code written in, and
 compiled by, Java version N (and N-1, N-2, and so forth).
 Java source code is also mostly backward compatible, although they do
 add new keywords from time to time.

 The Java documentation is very detailed about potential compatibility
 issues when moving from one release to another.

 The Java standard library is enormous, and new packages are added at
 each new release.
 Packages are also deprecated from time to time.
 Using a deprecated package will cause a warning at compile time (the
 warning may be turned off), and after a few releases the deprecated
 package will be removed (at least in theory).

 Java does not seem to have many backward compatibility problems.
 The problems are centered on the JVM: an older JVM generally will not
 run newer releases, so you have to make sure that your JVM is at least
 as new as that required by the newest library you want to use.

 Java arguably has something of a forward compatibility problem in
 that JVM bytecodes present a higher level interface than that of a
 CPU, and that makes it harder to introduce new features that cannot
 be directly represented using the existing bytecodes.

 This forward compatibility problem is part of the reason that Java
 generics use type erasure.
 Changing the definition of existing bytecodes would have broken
 existing programs that had already been compiled into bytecode.
 Extending bytecodes to support generic types would have required a
 large number of additional bytecodes to be defined.

 This forward compatibility problem, to the extent that it is a
 problem, does not exist for Go.
 Since Go compiles to machine code, and implements all required run
 time checks by generating additional machine code, there is no similar
 forward compatibility issue.

 But, in general:

 * Be aware of how compatibility issues may restrict future changes.

 ### Python

 Python 3.0 (also known as Python 3000) started development in 2006 and
 was initially released in 2008.
 In 2018 the transition is still incomplete.
 Some people continue to use Python 2.7 (released in 2010).
 This is not a path we want to emulate for Go 2.

 The main reason for this slow transition appears to be lack of
 backward compatibility.
 Python 3.0 was intentionally incompatible with earlier versions of
 Python.
 Notably, `print` was changed from a statement to a function, and
 strings were changed to use Unicode.
 Python is often used in conjunction with C code, and the latter change
 meant that any code that passed strings from Python to C required
 tweaking the C code.

 Because Python is an interpreted language, and because there is no
 backward compatibility, it is impossible to mix Python 2 and Python
 3 code in the same program.
 This means that for a typical program that uses a range of libraries,
 each of those libraries must be converted to Python 3 before the
 program can be converted.
 Since programs are in various states of conversion, libraries must
 support Python 2 and 3 simultaneously.

 Python supports statements of the form `from __future__ import
 FEATURE`.
 A statement like this changes the interpretation of the rest of the
 file in some way.
 For example, `from __future__ import print_function` changes `print`
 from a statement (as in Python 2) to a function (as in Python 3).
 This can be used to take incremental steps toward new language
 versions, and to make it easier to share the same code among different
 language versions.

 So, we knew it already, but:

 * Backward compatibility is essential.
 * Compatibility of the interface to other languages is important.
 * Upgrading to a new version is limited by the version that your
   libraries support.

 ### Perl

 The Perl 6 development process began in 2000.
 The first stable version of the Perl 6 spec was announced in 2015.
 This is not a path we want to emulate for Go 2.

 There are many reasons for this slow path.
 Perl 6 was intentionally not backward compatible: it was meant to fix
 warts in the language.
 Perl 6 was intended to be represented by a spec rather than, as with
 previous versions of Perl, an implementation.
 Perl 6 started with a set of change proposals, but then continued to
 evolve over time, and then evolve some more.

 Perl supports `use feature` which is similar to Python's `from
 __future__ import`.
 It changes the interpretation of the rest of the file to use a
 specified new language feature.

 * Don’t be Perl 6.
 * Set and meet deadlines.
 * Don’t change everything at once.

 ## Proposal

 ### Language changes

 Pedantically speaking, we must have a way to speak about specific
 language versions.
 Each change to the Go language first appears in a Go release.
 We will use Go release numbers to define language versions.
 That is the only reasonable choice, but it can be confusing because
 standard library changes are also associated with Go release numbers.
 When thinking about compatibility, it will be necessary to
 conceptually separate the Go language version from the standard
 library version.

 As an example of a specific change, type aliases were first available
 in Go language version 1.9.
 Type aliases were an example of a backward compatible language change.
 All code written in Go language versions 1.0 through 1.8 continued to
 work the same way with Go language 1.9.
 Code using type aliases requires Go language 1.9 or later.

 #### Language additions

 Type aliases are an example of an addition to the language.
 Code using the type alias syntax `type A = B` did not compile with Go
 versions before 1.9.

 Type aliases, and other backward compatible changes since Go 1.0, show
 us that for additions to the language it is not necessary for packages
 to explicitly declare the minimum language version that they require.
 Some packages changed to use type aliases.
 When such a package was compiled with Go 1.8 tools, the package failed
 to compile.
 The package author can simply say: upgrade to Go 1.9, or downgrade to
 an earlier version of the package.
 None of the Go tools need to know about this requirement; it's implied
 by the failure to compile with older versions of the tools.

 It's true of course that programmers need to understand language
 additions, but the the tooling does not.
 Neither the Go 1.8 tools nor the Go 1.9 tools need to explicitly know
 that type aliases were added in Go 1.9, other than in the limited
 sense that the Go 1.9 compiler will compile type aliases and the Go
 1.8 compiler will not.
 That said, the possibility of specifying a minimum language version to
 get better error messages for unsupported language features is
 discussed below.

 #### Language removals

 We must also consider language changes that simply remove features
 from the language.
 For example, [issue 3939](http://golang.org/issue/3939) proposes that
 we remove the conversion `string(i)` for an integer value `i`.
 If we make this change in, say, Go version 1.20, then packages that
 use this syntax will stop compiling in Go 1.20.
 (If you prefer to restrict backward incompatible changes to new major
 versions, then replace 1.20 by 2.0 in this discussion; the problem
 remains the same.)

 In this case, packages using the old syntax have no simple recourse.
 While we can provide tooling to convert pre-1.20 code into working
 1.20 code, we can't force package authors to run those tools.
 Some packages may be unmaintained but still useful.
 Some organizations may want to upgrade to 1.20 without having to
 requalify the versions of packages that they rely on.
 Some package authors may want to use 1.20 even though their packages
 now break, but do not have time to modify their package.

 These scenarios suggest that we need a mechanism to specify the
 maximum version of the Go language with which a package can be built.

 Importantly, specifying the maximum version of the Go language should
 not be taken to imply the maximum version of the Go tools.
 The Go compiler released with Go version 1.20 must be able to build
 packages using Go language 1.19.
 This can be done by adding an option to cmd/compile (and, if
 necessary, cmd/asm and cmd/link) along the lines of the `-std` option
 supported by C compilers.
 When cmd/compile sees the option, perhaps `-lang=go1.19`, it will
 compile the code using the Go 1.19 syntax.

 This requires cmd/compile to support all previous versions, one way or
 another.
 If supporting old syntaxes proves to be troublesome, the `-lang`
 option could perhaps be implemented by passing the code through a
 convertor from the old version to the current.
 That would keep support of old versions out of cmd/compile proper, and
 the convertor could be useful for people who want to update their
 code.
 But it is unlikely that supporting old language versions will be a
 significant problem.

 Naturally, even though the package is built with the language version
 1.19 syntax, it must in other respects be a 1.20 package: it must link
 with 1.20 code, be able to call and be called by 1.20 code, and so
 forth.

 The go tool will need to know the maximum language version so that it
 knows how to invoke cmd/compile.
 Assuming we continue with the modules experiment, the logical place
 for this information is the go.mod file.
 The go.mod file for a module M can specify the maximum language
 version for the packages that it defines.
 This would be honored when M is downloaded as a dependency by some
 other module.

 The maximum language version is not a minimum language version.
 If a module require features in language 1.19, but can be built with
 1.20, we can say that the maximum language version is 1.20.
 If we build with Go release 1.19, we will see that we are at less than
 the maximum, and simply build with language version 1.19.
 Maximum language versions greater than that supported by the current
 tools can simply be ignored.
 If we later build with Go release 1.21, we will build the module with
 `-lang=go1.20`.

 This means that the tools can set the maximum language version
 automatically.
 When we use Go release 1.30 to release a module, we can mark the
 module as having maximum language version 1.30.
 All users of the module will see this maximum version and do the right
 thing.

 This implies that we will have to support old versions of the language
 indefinitely.
 If we remove a language feature after version 1.25, version 1.26 and
 all later versions will still have to support that feature if invoked
 with the `-lang=go1.25` option (or `-lang=go1.24` or any other earlier
 version in which the feature is supported).
 Of course, if no `-lang` option is used, or if the option is
 `-lang=go1.26` or later, the feature will not be available.
 Since we do not expect wholesale removals of existing language
 features, this should be a manageable burden.

 I believe that this approach suffices for language removals.

 #### Minimum language version

 For better error messages it may be useful to permit the module file
 to specify a minimum language version.
 This is not required: if a module uses features introduced in
 language version 1.N, then building it with 1.N-1 will fail at compile
 time.
 This may be confusing, but in practice it will likely be obvious what
 the problem is.

 That said, if modules can specify a minimum language version, the go
 tool could produce an immediate, clear error message when building
 with 1.N-1.

 The minimum language version could potentially be set by the compiler
 or some other tool.
 When compiling each file, see which features it uses, and use that to
 determine the minimum version.
 It need not be precisely accurate.

 This is just a suggestion, not a requirement.
 It would likely provide a better user experience as the language
 changes.

 #### Language redefinitions

 The Go language can also change in ways that are not additions or
 removals, but are instead changes to the way a specific language
 construct works.
 For example, in Go 1.1 the size of the type `int` on 64-bit hosts
 changed from 32 bits to 64 bits.
 This change was relatively harmless, as the language does not specify
 the exact size of `int`.
 Potentially, though, some Go 1.0 programs continued to compile with Go
 1.1 but stopped working.

 A redefinition is a case where we have code that compiles successfully
 with both versions 1.N and version 1.M, where M > N, and where the
 meaning of the code is different in the two versions.
 For example, [issue 20733](https://golang.org/issue/20733) proposes
 that variables in a range loop should be redefined in each iteration.
 Though in practice this change seems more likely to fix programs than
 to break them, in principle this change might break working programs.

 Note that a new keyword normally cannot cause a redefinition, though
 we must be careful to ensure that that is true before introducing
 one.
 For example, if we introduce the keyword `check` as suggested in [the
 error handling draft
 design](https://go.googlesource.com/proposal/+/master/design/go2draft-error-handling.md),
 and we permit code like `check(f())`, that might seem to be a
 redefinition if `check` is defined as a function in the same package.
 But after the keyword is introduced, any attempt to define such a
 function will fail.
 So it is not possible for code using `check`, under whichever meaning,
 to compile with both version 1.N and 1.M.
 The new keyword can be handled as a removal (of the non-keyword use of
 `check`) and an addition (of the keyword `check`).

 In order for the Go ecosystem to survive a transition to Go 2, we must
 minimize these sorts of redefinitions.
 As discussed earlier, successful languages have generally had
 essentially no redefinitions beyond a certain point.

 The complexity of a redefinition is, of course, that we can no longer
 rely on the compiler to detect the problem.
 When looking at a redefined language construct, the compiler cannot
 know which meaning is meant.
 In the presence of redefined language constructs, we cannot determine
 the maximum language version.
 We don't know if the construct is intended to be compiled with the old
 meaning or the new.

 The only possibility would be to let programmers set the language
 version.
 In this case it would be either a minimum or maximum language
 version, as appropriate.
 It would have to be set in such a way that it would not be
 automatically updated by any tools.
 Of course, setting such a version would be error prone.
 Over time, a maximum language version would lead to surprising
 results, as people tried to use new language features, and failed.

 I think the only feasible safe approach is to not permit language
 redefinitions.

 We are stuck with our current semantics.
 This doesn't mean we can't improve them.
 For example, for [issue 20733](https://golang.org/issue/20733), the
 range issue, we could change range loops so that taking the address of
 a range parameter, or referring to it from a function literal, is
 forbidden.
 This would not be a redefinition; it would be a removal.
 That approach might eliminate the bugs without the potential of
 breaking code unexpectedly.

 #### Build tags

 Build tags are an existing mechanism that can be used by programs to
 choose which files to compile based on the release.

 Build tags name release versions, which look just like language
 versions, but, speaking pedantically, are different.
 In the discussion above we've talked about using Go release 1.N to
 compile code with language version 1.N-1.
 That is not possible using build tags.

 Build tags can be used to set the maximum or a minimum release, or
 both, that will be used to compile a specific file.
 They can be a convenient way to take advantage of language changes
 that are only available after a certain version; that is, they can be
 used to set a minimum language version when compiling a file.

 As discussed above, though, what is most useful for language changes
 is the ability to set a maximum language version.
 Build tags don't provide that in a useful way.
 If you use a build tag to set your current release version as your
 maximum version, your package will not build with later releases.
 Setting a maximum language version is only possible when it is set to
 a version before the current release, and is coupled with an alternate
 implementation that is used for the later versions.
 That is, if you are building with 1.N, it's not helpful to use a build
 tag of `!1.N+1`.
 You could use a build tag of `!1.M` where `M < N`, but in almost all
 cases you will then need a separate file with a build tag of `1.M+1`.

 Build tags can be used to handle language redefinitions: if there is a
 language redefinition at language version `1.N`, programmers can write
 one file with a build tag of `!1.N` using the old semantics and a
 different file with a build tag of `1.N` using the new semantics.
 However, these duplicate implementations are a lot of work, it's hard
 to know in general when it is required, and it would be easy to make a
 mistake.
 The availability of build tags is not enough to overcome the earlier
 comments about not permitting any language redefinitions.

 #### import "go2"

 It would be possible to add a mechanism to Go similar to Python's
 `from __future__ import` and Perl's `use feature`.
 For example, we could use a special import path, such as `import
 "go2/type-aliases"`.
 This would put the required language features in the file that uses
 them, rather than hidden away in the go.mod file.

 This would provide a way to describe the set of language additions
 required by the file.
 It's more complicated, because instead of relying on a language
 version, the language is broken up into separate features.
 There is no obvious way to ever remove any of these special imports,
 so they will tend to accumulate over time.
 Python and Perl avoid the accumulation problem by intentionally making
 a backward incompatible change.
 After moving to Python 3 or Perl 6, the accumulated feature requests
 can be discarded.
 Since Go is trying to avoid a large backward incompatible change,
 there would be no clear way to ever remove these imports.

 This mechanism does not address language removals.
 We could introduce a removal import, such as `import
 "go2/no-int-to-string"`, but it's not obvious why anyone would ever
 use it.
 In practice, there would be no way to ever remove language features,
 even ones that are confusing and error-prone.

 This kind of approach doesn't seem suitable for Go.

 ### Standard library changes

 One of the benefits of a Go 2 transition is the chance to release some
 of the standard library packages from the Go 1 compatibility
 guarantee.
 Another benefit is the chance to move many, perhaps most, of the
 packages out of the six month release cycle.
 If the modules experiment works out it may even be possible to start
 doing this sooner rather than later, with some packages on a faster
 cycle.

 I propose that the six month release cycle continue, but that it be
 treated as a compiler/runtime release cycle.
 We want Go releases to be useful out of the box, so releases will
 continue to include the current versions of roughly the same set of
 packages that they contain today.
 However, many of those packages will actually be run on their own
 release cycles.
 People using a given Go release will be able to explicitly choose to
 use newer versions of the standard library packages.
 In fact, in some cases they may be able to use older versions of the
 standard library packages where that seems useful.

 Different release cycles would require more resources on the part of
 the package maintainers.
 We can only do this if we have enough people to manage it and enough
 testing resources to test it.

 We could also continue using the six month release cycle for
 everything, but make the separable packages available separately for
 use with different, compatible, releases.

 #### Core standard library

 Still, some parts of the standard library must be treated as core
 libraries.
 These libraries are closely tied to the compiler and other tools, and
 must strictly follow the release cycle.
 Neither older nor newer versions of these libraries may be used.

 Ideally, these libraries will remain on the current version 1.
 If it seems necessary to change any of them to version 2, that will
 have to be discussed on a case by case basis.
 At this time I see no reason for it.

 The tentative list of core libraries is:

 * os/signal
 * plugin
 * reflect
 * runtime
 * runtime/cgo
 * runtime/debug
 * runtime/msan
 * runtime/pprof
 * runtime/race
 * runtime/tsan
 * sync
 * sync/atomic
 * testing
 * time
 * unsafe

 I am, perhaps optimistically, omitting the net, os, and syscall
 packages from this list.
 We'll see what we can manage.

 #### Penumbra standard library

 The penumbra standard library consists of those packages that are
 included with a release but are maintained independently.
 This will be most of the current standard library.
 These packages will follow the same discipline as today, with the
 option to move to a v2 where appropriate.
 It will be possible to use `go get` to upgrade or, possibly, downgrade
 these standard library packages.
 In particular, fixes can be made as minor releases separately from the
 six month core library release cycle.

 The go tool will have to be able to distinguish between the core
 library and the penumbra library.
 I don't know precisely how this will work, but it seems feasible.

 When moving a standard library package to v2, it will be essential to
 plan for programs that use both v1 and v2 of the package.
 Those programs will have to work as expected, or if that is impossible
 will have to fail cleanly and quickly.
 In some cases this will involve modifying the v1 version to use an
 internal package that is also shared by the v2 package.

 Standard library packages will have to compile with older versions of
 the language, at least the two previous release cycles that we
 currently support.

 #### Removing packages from the standard library

 The ability to support `go get` of standard library packages will
 permit us to remove packages from the releases.
 Those packages will continue to exist and be maintained, and people
 will be able to retrieve them if they need them.
 However, they will not be shipped by default with a Go release.

 This will include packages like

 * index/suffixarray
 * log/syslog
 * net/http/cgi
 * net/http/fcgi

 and perhaps other packages that do not seem to be widely useful.

 We should in due course plan a deprecation policy for old packages, to
 move these packages to a point where they are no longer maintained.
 The deprecation policy will also apply to the v1 versions of packages
 that move to v2.

 Or this may prove to be too problematic, and we should never deprecate
 any existing package, and never remove them from the standard
 releases.

 ## Go 2

 If the above process works as planned, then in an important sense
 there never will be a Go 2.
 Or, to put it a different way, we will slowly transition to new
 language and library features.
 We could at any point during the transition decide that now we are
 Go 2, which might be good marketing.
 Or we could just skip it (there has never been a C 2.0, why have a Go
 2.0?).

 Popular languages like C, C++, and Java never have a version 2.
 In effect, they are always at version 1.N, although they use different
 names for that state.
 I believe that we should emulate them.
 In truth, a Go 2 in the full sense of the word, in the sense of an
 incompatible new version of the language or core libraries, would not
 be a good option for our users.
 A real Go 2 would, perhaps unsurprisingly, be harmful.