design: add go13compiler, go13linker, go15bootstrap These are ported from the original Google docs. I am going to update the short links to point here, and then perhaps I will stop getting requests for access to the Google docs from people whose companies block all access to Google docs outside their domain. Change-Id: Ic3dfb3566c7335e98865dfcc188230d295bf40d5 Reviewed-on: https://go-review.googlesource.com/c/proposal/+/517535 Reviewed-by: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org>

commit: d661ed19a203000b7c5441d2cd63a6e4b98ff2ba [log] [tgz]
author: Russ Cox <rsc@golang.org> Wed Aug 09 00:46:43 2023 -0400
committer: Russ Cox <rsc@golang.org> Wed Aug 09 04:48:10 2023 +0000
tree: 3768d6a4a7ce1369e0fb2fb8f7526a83c7523589
parent: f87e19c78f5b0633681c1fd669470b675c67653c [diff]
diff --git a/design/go13compiler.md b/design/go13compiler.md
new file mode 100644
index 0000000..d82e46e
--- /dev/null
+++ b/design/go13compiler.md

@@ -0,0 +1,81 @@
+# Go 1.3+ Compiler Overhaul
+
+
+Russ Cox \
+December 2013 \
+golang.org/s/go13compiler
+
+## Abstract
+
+The Go compiler today is written in C. It is time to move to Go.
+[**Update, 2014**: This work was completed and presented at GopherCon. See “[Go from C to Go](https://www.youtube.com/watch?v=QIE5nV5fDwA)”.]
+
+[**Update, 2023.** This plan was originally published as a Google document. For easier access, it was converted to Markdown in this repository in 2023. Later work has overhauled the compiler further in a number of ways, validating the conjectures about the benefits of converting to Go. This document has only minor historical value now.]
+
+Background
+
+The “gc” Go toolchain is derived from the Plan 9 compiler toolchain. The assemblers, C compilers, and linkers are adopted essentially unchanged, and the Go compilers (in cmd/gc, cmd/5g, cmd/6g, and cmd/8g) are new C programs that fit into the toolchain.
+
+Writing the compiler in C had some important advantages over using Go at the start of the project, most prominent among them the fact that, at first, Go did not exist and so could not be used to write a compiler, and the fact that, once Go did exist, it often changed in significant, backwards-incompatible ways. Using C instead of Go avoided both the initial and ongoing bootstrapping problems. Today, however, Go does exist, and its definition is stable as of Go 1, so the problems of bootstrapping are greatly reduced.
+
+As the bootstrapping problems have receded, other engineering concerns have arisen that make Go much more attractive than C for the compiler implementation. The concerns include:
+
+- It is easier to write correct Go code than to write correct C code.
+- It is easier to debug incorrect Go code than to debug incorrect C code.
+- Work on a Go compiler necessarily requires a good understanding of Go. Implementing the compiler in C adds an unnecessary second requirement.
+- Go makes parallel execution trivial compared to C.
+- Go has better standard support than C for modularity, for automated rewriting, for unit testing, and for profiling.
+- Go is much more fun to use than C.
+
+For all these reasons, we believe it is time to switch to Go compilers written in Go.
+
+## Proposed Plan
+
+We plan to translate the existing compilers from C to Go by writing and then applying an automatic translator. The conversion will proceed in phases, starting in Go 1.3 but continuing into future releases.
+
+_Phase 1_. Develop and debug the translator. This can be done in parallel with ordinary development. In particular, it is fine for people to continue making changes to the C version of the compiler during this phase. The translator is a fair amount of work, but we are confident that we can build one that works for the specific case of translating the compilers. There are many corners of C that have no direct translation into Go; macros, unions, and bit fields are probably highest on the list. Fortunately (but not coincidentally), those features are rarely used, if at all, in the code being translated. Pointer arithmetic and arrays are also some work to translate, but even those are rare in the compiler, which primarily operates on trees and linked lists. The translator will preserve the comments and structure of the original C code, so the translation should be as readable as the current compiler.
+
+_Phase 2_. Use the translator to convert the compilers from C to Go and delete the C copies. At this point we have transitioned to Go and still have a working compiler, but the compiler is still very much a C program. This may happen for Go 1.3, but that’s pretty aggressive. It is more likely to happen for Go 1.4.
+
+_Phase 3_. Use some tools, perhaps derived from gofix and the Go oracle to split the compiler into packages, cleaning up and documenting the code, and adding unit tests as appropriate. This phase turns the compiler into an idiomatic Go program. This is targeted for Go 1.4.
+
+_Phase 4a_. Apply standard profiling and measurement techniques to understand and optimize the memory and CPU usage of the compiler. This may include introducing parallelization; if so, the race detector is likely to be a significant help. This is targeted for Go 1.4, but parts may slip to Go 1.5. Some basic profiling and optimization may be done earlier, in Phase 3.
+
+_Phase 4b_. (Concurrent with Phase 4a.) With the compiler split into packages with clearly defined boundaries, it should be straightforward to introduce a new middle representation between the architecture-independent unordered tree (Node*s) and the architecture-dependent ordered list (Prog*s) used today. That representation, which should be architecture-independent but contain information about precise order of execution, can be used to introduce order-dependent but architecture-independent optimizations like elimination of redundant nil checks and bounds checks. It may be based on SSA and if so would certainly take advantage of the lessons learned from Alan Donovan’s go.tools/ssa package.
+
+_Phase 5_. Replace the front end with the latest (perhaps new) versions of go/parser and go/types. Robert Griesemer has discussed the possibility of designing new go/parser and go/types APIs at some point, based on experience with the current ones (and under new names, to preserve Go 1 compatibility). The work of connecting them to a compiler back end may help guide design of new APIs.
+
+## Bootstrapping
+
+With a Go compiler written in Go, there must be a plan for bootstrapping from scratch. The rule we plan to adopt is that the Go 1.3 compiler must compile using Go 1.2, Go 1.4 must compile using Go 1.3, and so on. Then there is a clear path to generating current binaries: build the Go 1.2 toolchain (written in C), use it to build the Go 1.3 toolchain, and so on. There will be a shell script to do this; it will take CPU time but not human time. The bootstrapping only needs to be done once per machine; the Go 1.x binaries can be kept in a known location and reused each time all.bash is run during the development of Go 1.(x+1).
+
+Obviously, this bootstrapping path scales poorly over time. Before too many releases have gone by, it may make sense to write a back end for the compiler that generates C code. The code need not be efficient or readable, just correct. That C version would be checked in, just as today we check in the y.tab.c file generated by yacc. The bootstrap sequence would invoke gcc on that C code to build a bootstrap compiler, and the bootstrap compiler would be used to build the real compiler. Like in the other scheme, the bootstrap compiler binary can be kept in a known location and reused (not rebuilt) each time all.bash is run.
+
+## Alternatives
+
+There are a few alternatives that would be obvious approaches to consider, and so it is worth explaining why we have decided against them.
+
+_Write new compilers from scratch_. The current compilers do have one very important property: they compile Go correctly (or at least correctly enough for nearly all current users). Despite Go’s simplicity, there are many subtle cases in the optimizations and other rewrites performed by the compilers, and it would be foolish to throw away the 10 or so man-years of effort that have gone into them.
+
+_Translate the compiler manually_. We have translated other, smaller C and C++ programs to Go manually. The process is tedious and therefore error-prone, and the mistakes can be very subtle and difficult to find. A mechanical translator will instead generate translations with consistent classes of errors, which should be easier to find, and it will not zone out during the tedious parts. The Go compilers are also significantly larger than anything we’ve converted: over 60,000 lines of C. Mechanical help will make the job much easier. As Dick Sites wrote in 1974, “I would rather write programs to help me write programs than write programs.” Translating the compiler mechanically also makes it easier for development on the C originals to proceed unhindered until we are ready for the switch.
+
+_Translate just the back ends and connect to go/parser and go/types immediately_. The data structures in the compiler that convey information from the front end to the back ends look nothing like the APIs presented by go/parser and go/types. Replacing the front end by those libraries would require writing code to convert from the go/parser and go/types data structures into the ones expected by the back ends, a very broad and error-prone undertaking. We do believe that it makes sense to use these packages, but it also makes sense to wait until the compiler is structured more like a Go program, into documented sub-packages of its own with defined boundaries and unit tests.
+
+_Discard the current compilers and use gccgo (or go/parser + go/types + LLVM, or …)_. The current compilers are a large part of Go’s flexibility. Tying development of Go to a comparatively larger code base like GCC or LLVM seems likely to hurt that flexibility. Also, GCC is a large C (now partly C++) program and LLVM a large C++ program. All the reasons listed above justifying a move away from the current compiler code apply as much or more to these code bases.
+
+## Long Term Use of C
+
+Carried to completion, this plan still leaves the rest of the Plan 9 toolchain written in C. In the long term it would be nice to eliminate all C from the tree. This section speculates on how that might happen. It is not guaranteed to happen in this way or at all.
+
+_Package runtime_. Most of the runtime is written in C, for many of the same reasons that the Go compiler is written in C. However, the runtime is much smaller than the compilers and it is already written in a mix of Go and C. It is plausible to convert the C to Go one piece at a time. The major pieces are the scheduler, the garbage collector, the hash map implementation, and the channel implementation. (The fine mixing of Go and C is possible here because the C is compiled with 6c, not gcc.)
+
+_C compilers_. The Plan 9 C compilers are themselves written in C. If we remove all the C from Go package implementations (in particular, package runtime), we can remove these compilers: “go tool 6c” and so on would be no more, and .c files in Go package directory sources would no longer be supported. We would need to announce these plans early, so that external packages written partly in C have time to remove their uses. (Cgo, which uses gcc instead of 6c, would remain as a way to write parts of a package in C.) The Go 1 compatibility document excludes changes to the toolchain; deleting the C compilers is permitted.
+
+_Assemblers_. The Plan 9 assemblers are also written in C. However, the assembler is little more than a simple parser coupled with a serialization of the parse tree. That could easily be translated to Go, either automatically or by hand.
+
+_Linkers_. The Plan 9 linkers are also written in C. Recent work has moved most of the linker in into the compilers, and there is already a plan to rewrite what is left as a new, much simpler Go program. The part of the linker that has moved into the Go compiler will now need to be translated along with the rest of the compiler.
+
+_Libmach-based tools: nm, pack, addr2line, and objdump_. Nm has already been rewritten in Go. Pack and addr2line can be rewritten any day. Objdump currently depends on libmach’s disassemblers, but those should be straightforward to convert to go, whether mechanically or manually, and at that point libmach itself can be deleted.
+
+
+

diff --git a/design/go13linker.md b/design/go13linker.md
new file mode 100644
index 0000000..cfbbe3e
--- /dev/null
+++ b/design/go13linker.md

@@ -0,0 +1,55 @@
+# Go 1.3 Linker Overhaul
+
+Russ Cox \
+November 2013 \
+golang.org/s/go13linker
+
+## Abstract
+
+The linker is one of the slowest parts of building and running a typical Go program. To address this, we plan to split the linker into two pieces. Perhaps one can be written in Go.
+
+[**Update, 2023.** This plan was originally published as a Google document. For easier access, it was converted to Markdown in this repository in 2023. Later work overhauled the linker a second time, greatly improving its structure, efficiency, and code quality. This document has only minor historical value now.]
+
+## Background
+
+The linker has always been the slowest part of the Plan 9 toolchain, and it is now the slowest part of the Go toolchain. Ken Thompson’s [overview of the toolchain](http://plan9.bell-labs.com/sys/doc/compiler.html) concludes:
+
+> The new compilers compile quickly, load slowly, and produce medium quality object code. The compilers are relatively portable, requiring but a couple of weeks’ work to produce a compiler for a different computer. For Plan 9, where we needed several compilers with specialized features and our own object formats, this project was indispensable. It is also necessary for us to be able to freely distribute our compilers with the Plan 9 distribution.
+>
+> Two problems have come up in retrospect. The first has to do with the division of labor between compiler and loader. Plan 9 runs on multi-processors and as such compilations are often done in parallel. Unfortunately, all compilations must be complete before loading can begin. The load is then single-threaded. With this model, any shift of work from compile to load results in a significant increase in real time. The same is true of libraries that are compiled infrequently and loaded often. In the future, we may try to put some of the loader work back into the compiler.
+
+That document was written in the early 1990s. The future is here.
+
+## Proposed Plan
+
+The current linker performs two separable tasks. First, it translates an input stream of pseudo-instructions into executable code and data blocks, along with a list of relocations. Second, it deletes dead code, merges what’s left into a single image, resolves relocations, and generates a few whole-program data structures such as the [runtime symbol table](http://golang.org/s/go12symtab).
+
+The first part can be factored out into a library - liblink - that can be linked into the assemblers and compilers. The object files written by 6a, 6c, or 6g and so on would be written by liblink and then contain executable code and data blocks and relocations, the result of the first half of the current linker.
+
+The second part can be handled by what’s left of the linker after extracting liblink. That remaining program which would read the new object files and complete the link. That linker is a small amount of code, the bulk of it architecture-independent. It is possible that it could be merged into a single architecture-independent program invoked as “go tool ld”. It is even possible that it could be rewritten in Go, making it easy to parallelize large links. (See the section below for how to bootstrap.)
+
+To start, we will focus on getting the new split working with C code. The exploration of using Go will happen only once the rest of the change is done.
+
+To avoid churn in the usage of the tools, the generated object files will keep the existing suffixes .5, .6, .8. Perhaps in Go 1.3 we will even include shim programs named 5l, 6l, and 8l that invoke the new linker. These shim programs would be retired in Go 1.4.
+
+## Object Files
+
+The new split requires a new object file format. The current objects contain pseudo-instruction streams, but the new objects will contain executable code and data blocks along with relocations.
+
+A natural question is whether we should adopt an existing object file format, such as ELF. At first, we will use a custom format. A Go-specific linker is required to build runtime data structures like the symbol table, so even if we used ELF object files we could not reuse a standard ELF linker. ELF files are also considerably more general and ELF semantics considerably more complex than the Go-specific linker needs. A custom, less general object file format should be simpler to generate and simpler to consume. On the other hand, ELF can be processed by standard tools like readelf, objdump, and so on. Once the dust has settled, though, and we know exactly what we need from the format, it is worth looking at whether the use of ELF makes sense.
+
+The details of the new object file are not yet worked out. The rest of this section lists some design considerations.
+
+ - Obviously the files should be as simple as possible. With few exceptions, anything that can be done in the library half of the linker should be. Possible surprises include the stack split code being done in the library half, which makes object files OS-specific, although they already are due to OS-specific Go code in packages, and the software floating point work being done in the library half, making ARM object files GOARM-specific (today nothing GOARM-specific is done until the linker runs).
+ - We should make sure that object files are usable via mmap. This would reduce copying during I/O. It may require changing the Go runtime to simply panic, not crash, on SIGSEGV on non-nil addresses.
+ - Pure Go packages consist of a single object file generated by invoking the Go compiler once on the complete set of Go source files. That object file is then wrapped in an archive. We should arrange that a single object file is also a valid archive file, so that in that common case there is no wrapping step needed.
+
+## Bootstrapping
+
+If the new Go linker is written in Go, there is a bootstrapping problem: how do you link the linker? There are two approaches.
+
+The first approach is to maintain a bootstrap list of CLs. The first CL in the sequence would have the current linker, written in C. Each subsequent step would be a CL containing a new linker that can be linked using the previous linker. The final binaries resulting from the sequence can be made available for download. The sequence need not be too long and could be made to coincide with milestones. For example, we could arrange that the Go 1.3 linker can be compiled as a Go 1.2 program, the Go 1.4 linker can be compiled as a Go 1.3 program, and so on. The recorded sequence makes it possible to re-bootstrap if needed but also provides a way to defend against the [Trusting Trust problem](http://cm.bell-labs.com/who/ken/trust.html). Another way to bootstrap would be to compile gccgo and use it to build the Go 1.3 linker.
+
+The second approach is to keep the C linker even after we have a better one written in Go, and to keep both mostly feature-equivalent. The version written in C only needs to keep enough features to link the one written in Go. It needs to pick up some object files, merge them, and write out an executable. There’s no need for cgo support, no need for external linking, no need for shared libraries, no need for performance. It should be a relatively modest amount of code (perhaps just a few thousand lines) and should not need to change very often. The C version would be built and used during make.bash but not installed. This approach is easier for other developers building Go from source.
+
+It doesn’t matter much which approach we take, just that there is at least one viable approach. We can decide once things are further along.

diff --git a/design/go15bootstrap.md b/design/go15bootstrap.md
new file mode 100644
index 0000000..24b79da
--- /dev/null
+++ b/design/go15bootstrap.md

@@ -0,0 +1,67 @@
+# Go 1.5 Bootstrap Plan
+
+Russ Cox \
+January 2015 \
+golang.org/s/go15bootstrap \
+([comments on golang-dev](https://groups.google.com/d/msg/golang-dev/3bTIOleL8Ik/D8gICLOiUJEJ))
+
+## Abstract
+
+Go 1.5 will use a toolchain written in Go (at least in part). \
+Question: how do you build Go if you need Go built already? \
+Answer: building Go 1.5 will require having Go 1.4 available.
+
+[**Update, 2023.** This plan was originally published as a Google document. For easier access, it was converted to Markdown in this repository in 2023. Later versions of Go require newer bootstrap toolchains. See [go.dev/issue/52465](https://go.dev/issue/52465) for those details.]
+
+## Background
+
+We have been planning for a year now to eliminate all C programs from the Go source tree. The C compilers (5c, 6c, 8c, 9c) have already been removed. The remaining C programs will be converted to Go: they are the Go compilers ([golang.org/s/go13compiler](https://go.dev/s/go13compiler)), the assemblers, the linkers ([golang.org/s/go13linker](https://go.dev/s/go13linker)), and cmd/dist. If these programs are written in Go, that introduces a bootstrapping problem when building completely from source code: you need a working Go toolchain in order to build a Go toolchain.
+
+## Proposal
+
+To build Go 1.x, for x ≥ 5, it will be necessary to have Go 1.4 (or newer) installed already, in $GOROOT_BOOTSTRAP. The default value of $GOROOT_BOOTSTRAP is $HOME/go1.4. In general we'll keep using Go 1.4 as the bootstrap base version for as long as possible. The toolchain proper (compiler, assemblers, linkers) will need to be buildable with Go 1.4, whether by restricting their feature use to what is in Go 1.4 or by using build tags.
+
+For comparison with what will follow, the old build process for Go 1.4 is:
+
+1. Build cmd/dist with gcc (or clang).
+2. Using dist, build compiler toolchain with gcc (or clang)
+3. NOP
+4. Using dist, build cmd/go (as go_bootstrap) with compiler toolchain.
+5. Using go_bootstrap, build the remaining standard library and commands.
+
+The new build process for Go 1.x (x ≥ 5) will be:
+
+1. Build cmd/dist with Go 1.4.
+2. Using dist, build Go 1.x compiler toolchain with Go 1.4.
+3. Using dist, rebuild Go 1.x compiler toolchain with itself.
+4. Using dist, build Go 1.x cmd/go (as go_bootstrap) with Go 1.x compiler toolchain.
+5. Using go_bootstrap, build the remaining Go 1.x standard library and commands.
+
+There are two changes.
+
+The first change is that we replace gcc (or clang) with Go 1.4.
+
+The second change is the introduction of step 3, which rebuilds the Go 1.x compiler toolchain with itself. The 6g built in Step 2 is a Go 1.x compiler built using Go 1.4 libraries and compilers. The 6g built in Step 3 is the same Go 1.x compiler, but built using Go 1.x libraries and compilers. If Go 1.x has changed the format of debug info or some other detail of the binaries, it may matter to tools whether 6g is a Go 1.4 binary or a Go 1.x binary. If Go 1.x has introduced any performance or stability improvements in the libraries, the compiler in Step 3 will be faster or more stable than the compiler in Step 2. Of course, if Go 1.x is buggier, the 6g built in Step 3 will also be buggier, so it will be possible to disable step 3 for debugging.
+
+Step 3 could make make.bash take longer. As an upper bound on the slowdown, the current build process steps 1-4 take 20 seconds on my MacBook Pro, out of the total 40 seconds required for make.bash. In the new process, I can’t see step 3 adding more than 50% to the make.bash run time, and I expect it would be significantly less than that. On the other hand, the C compilations being replaced are very I/O heavy; two Go compilations might still be faster, especially on I/O-constrained ARM devices. In any event, if make.bash does slow down, I will speed up run.bash at least as much, so that all.bash time does not increase.
+
+## New Ports
+
+Bootstrapping makes new ports a little more complex. It was possible in the past to check out the Go tree on a new system and run all.bash to build the toolchain (and it would fail, and you’d make some edits, and try again). Now, it will not be possible to run all.bash until that system is fully supported by Go.
+
+For Go 1.x (x ≥ 5), new ports will have to be done by cross-compiling test binaries on a working system, copying the binaries over to the target, and running and debugging them there. This is already well-supported by all.bash via the go\_$GOOS\_$GOARCH\_exec scripts (see ‘go help run’). Once all.bash can be run in that mode, the resulting compilers and libraries can be copied to the target system and used directly.
+
+Once a port works well enough that the compilers and linkers can run on the target machine, the script bootstrap.bash (run on an old system) will prepare a GOROOT_BOOTSTRAP directory for use on the new system.
+
+## Deployment
+
+Today we are still using the Go 1.4 build process above.
+
+The first step in the transition will be to convert cmd/dist itself to Go and change make.bash to use Go 1.4 to build cmd/dist. That replaces “gcc (or clang)” with “Go 1.4” in step 1 of the build and changes nothing else. This will mainly exercise the integration of Go 1.4 into the build.
+
+After that first step, we can convert the remaining C programs in whatever order makes sense. Each conversion will require minor modifications to cmd/dist to build the Go version instead of the C version. I am not sure whether the new linker or the new assemblers will be converted first. I expect the Go compiler to be converted last.
+
+We will probably do the larger conversions on the dev.cc branch and merge into master at good checkpoints, so that multiple people can work on the conversion (coordinated via Git) but able to break certain builds for long amounts of time without affecting other developers. This is similar to what we did for dev.cc and dev.garbage in 2014.
+
+Go 1.5 will require Go 1.4 to build. The goal is to convert all the C programs—the Go compiler, the linker, the assemblers, and cmd/dist—for Go 1.5. We may not reach that goal, but certainly some of that list will be converted.
+
commit	d661ed19a203000b7c5441d2cd63a6e4b98ff2ba	[log] [tgz]
author	Russ Cox <rsc@golang.org>	Wed Aug 09 00:46:43 2023 -0400
committer	Russ Cox <rsc@golang.org>	Wed Aug 09 04:48:10 2023 +0000
tree	3768d6a4a7ce1369e0fb2fb8f7526a83c7523589
parent	f87e19c78f5b0633681c1fd669470b675c67653c [diff]