_content/talks/2015/gogo.slide - website - Git at Google

 Go in Go
 Gopherfest
 26 May 2015

 Rob Pike
 Google
 r@golang.org
 http://golang.org/

 * Go in Go

 As of the 1.5 release of Go, the entire system is now written in Go.
 (And a little assembler.)

 C is gone.

 Side note: `gccgo` is still going strong.
 This talk is about the original compiler, `gc`.

 * Why was it in C?

 Bootstrapping.

 (Also Go was not intended primarily as a compiler implementation language.)

 * Why move the compiler to Go?

 Not for validation; we have more pragmatic motives:

 - Go is easier to write (correctly) than C.
 - Go is easier to debug than C (even absent a debugger).
 - Go is the only language you'd need to know; encourages contributions.
 - Go has better modularity, tooling, testing, profiling, ...
 - Go makes parallel execution trivial.

 Already seeing benefits, and it's early yet.

 Design document: [[http://golang.org/s/go13compiler]]

 * Why move the runtime to Go?

 We had our own C compiler just to compile the runtime.
 We needed a compiler with the same ABI as Go, such as segmented stacks.

 Switching it to Go means we can get rid of the C compiler.
 That's more important than converting the compiler to Go.

 (All the reasons for moving the compiler apply to the runtime as well.)

 Now only one language in the runtime; easier integration, stack management, etc.


 As always, simplicity is the overriding consideration.

 * History

 Why do we have our own tool chain at all?
 Our own ABI?
 Our own file formats?

 History, familiarity, and ease of moving forward. And speed.

 Many of Go's big changes would be much harder with GCC or LLVM.

 .link https://news.ycombinator.com/item?id=8817990

 * Big changes

 All made easier by owning the tools and/or moving to Go:

 - linker rearchitecture
 - new garbage collector
 - stack maps
 - contiguous stacks
 - write barriers

 The last three are all but impossible in C:

 - C is not type safe; don't always know what's a pointer
 - aliasing of stack slots caused by optimization

 (`Gccgo` will have segmented stacks and imprecise (stack) collection for a while yet.)

 * Goroutine stacks

 - Until 1.2: Stacks were segmented.
 - 1.3: Stacks were contiguous unless executing C code (runtime).
 - 1.4: Stacks made contiguous by restricting C to system stack.
 - 1.5: Stacks made contiguous by eliminating C.

 These were each huge steps, made quickly (led by `khr@`).

 * Converting the runtime

 Mostly done by hand with machine assistance.

 Challenge to implement the runtime in a safe language.
 Some use of `unsafe` to deal with pointers as raw bits in the GC, for instance.
 But less than you might think.

 The translator (next sections) helped for some of the translation.

 * Converting the compiler

 Why translate it, not write it from scratch? Correctness, testing.

 Steps:

 - Write a custom translator from C to Go.
 - Run the translator, iterate until success.
 - Measure success by bit-identical output.
 - Clean up the code by hand and by machine.
 - Turn it from C-in-Go to idiomatic Go (still happening).

 * Translator

 First output was C line-by-line translated to (bad!) Go.
 Tool to do this written by `rsc@` (talked about at GopherCon 2014).
 Custom written for this job, not a general C-to-Go translator.

 Steps:

 - Parse C code using new simple C parser (`yacc`)
 - Remove or rewrite C-isms such as `*p++` as an expression
 - Walk the C parse tree, print the C code in Go syntax
 - Compile the output
 - Run, compare generated code
 - Repeat

 The `Yacc` grammar was translated by sam-powered hands.

 * Translator configuration

 Aided by hand-written rewrite rules, such as:

 - this field is a bool
 - this function returns a bool

 Also diff-like rewrites for things such as using the standard library:

 	diff {
 	-	g.Rpo = obj.Calloc(g.Num*sizeof(g.Rpo[0]), 1).([]*Flow)
 	-	idom = obj.Calloc(g.Num*sizeof(idom[0]), 1).([]int32)
 	-	if g.Rpo == nil || idom == nil {
 	-		Fatal("out of memory")
 	-	}
 	+	g.Rpo = make([]*Flow, g.Num)
 	+	idom = make([]int32, g.Num)
 	}

 * Another example

 This one due to semantic difference between the languages.

 	diff {
 	-	if nreg == 64 {
 	-		mask = ^0 // can't rely on C to shift by 64
 	-	} else {
 	-		mask = (1 << uint(nreg)) - 1
 	-	}
 	+	mask = (1 << uint(nreg)) - 1
 	}

 * Grind

 Once in Go, new tool `grind` deployed (by `rsc@`):

 - parses Go, type checks
 - records a list of edits to perform: "insert this text at this position"
 - at end, applies edits to source (hard to edit AST).

 Changes guided by profiling and other analysis:

 - removes dead code
 - removes gotos
 - removes unused labels, needless indirections, etc.
 - moves `var` declarations nearer to first use

 .link http://rsc.io/grind

 * Performance problems

 Output from translator was poor Go, and ran about 10X slower.
 Most of that slowdown has been recovered.

 Problems with C to Go:

 - C patterns can be poor Go; e.g.: complex `for` loops
 - C stack variables never escape; Go compiler isn't as sure
 - interfaces such as `fmt.Stringer` vs. C's `varargs`
 - no `unions` in Go, so use `structs` instead: bloat
 - variable declarations in wrong place

 C compiler didn't free much memory, but Go has a GC.
 Adds CPU and memory overhead.

 * Performance fixes

 Profile! (Never done before!)

 - move `vars` closer to first use
 - split `vars` into multiple
 - replace code in the compiler with code in the library: e.g. `math/big`
 - use interface or other tricks to combine `struct` fields
 - better escape analysis (`drchase@`).
 - hand tuning code and data layout

 Use tools like `grind`, `gofmt` `-r` and `eg` for much of this.

 Removing interface argument from a debugging print library got 15% overall!

 More remains to be done.

 * Technical benefits

 Other benefits of the conversion:

 Garbage collection means no more worry about introducing a dangling pointer.

 Chance to clean up the back ends.

 Unified `386` and `amd64` architectures throughout the tool chain.

 New architectures are easier to add.

 Unified the tools: now one compiler, one assembler, one linker.

 * Compiler

 `GOOS=YYY` `GOARCH=XXX` `go` `tool` `compile`

 One compiler; no more `6g`, `8g` etc.

 About 50K lines of portable code.
 Even the registerizer is portable now; architectures well characterized.
 Non-portable: Peepholing, details like registers bound to instructions.
 Typically around 10% of the portable LOC.

 * Assembler

 `GOOS=YYY` `GOARCH=XXX` `go` `tool` `asm`

 New assembler, all in Go, written from scratch by `r@`.
 Clean, idiomatic Go code.

 Less than 4000 lines, <10% machine-dependent.

 Almost completely compatible with previous `yacc` and C assemblers.

 How is this possible?

 - shared syntax originating in the Plan 9 assemblers
 - unified back-end logic (old `liblink`, now `internal/obj`)

 * Linker

 `GOOS=YYY` `GOARCH=XXX` `go` `tool` `link`

 Mostly hand- and machine- translated from C code.

 New library, `internal/obj`, part of original linker, captures details about machines, writes object files.

 27000 lines summed across 4 architectures, mostly tables (plus some ugliness).

 - `arm`: 4000
 - `arm64`: 6000
 - `ppc64`: 5000
 - `x86`: 7500 (`386` and `amd64`)

 Example benefit: one print routine to print any instruction for any architecture.

 * Bootstrap

 With no C compiler, bootstrapping requires a Go compiler.

 Therefore need to build or download a working Go installation to build 1.5 from source.

 We use Go 1.4+ as the base to build the 1.5+ tool chain. (Newer is OK too.)

 Details: [[http://golang.org/s/go15bootstrap]]

 * Future

 Much work still to do, but 1.5 is mostly set.

 Future work:

 Better escape analysis.
 New compiler back end using SSA (much easier in Go than C).
 Will allow much more optimization.

 Generate machine descriptions from PDFs (or maybe XML).
 Will have a purely machine-generated instruction definition:
 "Read in PDF, write out an assembler configuration".
 Already deployed for the disassemblers.

 * Conclusions

 Getting rid of C was a huge advance for the project.
 Code is cleaner, testable, profilable, easier to work on.

 New unified tool chain reduces code size, increases maintainability.

 Flexible tool chain, portability still paramount.
	Go in Go
	Gopherfest
	26 May 2015

	Rob Pike
	Google
	r@golang.org
	http://golang.org/

	* Go in Go

	As of the 1.5 release of Go, the entire system is now written in Go.
	(And a little assembler.)

	C is gone.

	Side note: `gccgo` is still going strong.
	This talk is about the original compiler, `gc`.

	* Why was it in C?

	Bootstrapping.

	(Also Go was not intended primarily as a compiler implementation language.)

	* Why move the compiler to Go?

	Not for validation; we have more pragmatic motives:

	- Go is easier to write (correctly) than C.
	- Go is easier to debug than C (even absent a debugger).
	- Go is the only language you'd need to know; encourages contributions.
	- Go has better modularity, tooling, testing, profiling, ...
	- Go makes parallel execution trivial.

	Already seeing benefits, and it's early yet.

	Design document: [[http://golang.org/s/go13compiler]]

	* Why move the runtime to Go?

	We had our own C compiler just to compile the runtime.
	We needed a compiler with the same ABI as Go, such as segmented stacks.

	Switching it to Go means we can get rid of the C compiler.
	That's more important than converting the compiler to Go.

	(All the reasons for moving the compiler apply to the runtime as well.)

	Now only one language in the runtime; easier integration, stack management, etc.


	As always, simplicity is the overriding consideration.

	* History

	Why do we have our own tool chain at all?
	Our own ABI?
	Our own file formats?

	History, familiarity, and ease of moving forward. And speed.

	Many of Go's big changes would be much harder with GCC or LLVM.

	.link https://news.ycombinator.com/item?id=8817990

	* Big changes

	All made easier by owning the tools and/or moving to Go:

	- linker rearchitecture
	- new garbage collector
	- stack maps
	- contiguous stacks
	- write barriers

	The last three are all but impossible in C:

	- C is not type safe; don't always know what's a pointer
	- aliasing of stack slots caused by optimization

	(`Gccgo` will have segmented stacks and imprecise (stack) collection for a while yet.)

	* Goroutine stacks

	- Until 1.2: Stacks were segmented.
	- 1.3: Stacks were contiguous unless executing C code (runtime).
	- 1.4: Stacks made contiguous by restricting C to system stack.
	- 1.5: Stacks made contiguous by eliminating C.

	These were each huge steps, made quickly (led by `khr@`).

	* Converting the runtime

	Mostly done by hand with machine assistance.

	Challenge to implement the runtime in a safe language.
	Some use of `unsafe` to deal with pointers as raw bits in the GC, for instance.
	But less than you might think.

	The translator (next sections) helped for some of the translation.

	* Converting the compiler

	Why translate it, not write it from scratch? Correctness, testing.

	Steps:

	- Write a custom translator from C to Go.
	- Run the translator, iterate until success.
	- Measure success by bit-identical output.
	- Clean up the code by hand and by machine.
	- Turn it from C-in-Go to idiomatic Go (still happening).

	* Translator

	First output was C line-by-line translated to (bad!) Go.
	Tool to do this written by `rsc@` (talked about at GopherCon 2014).
	Custom written for this job, not a general C-to-Go translator.

	Steps:

	- Parse C code using new simple C parser (`yacc`)
	- Remove or rewrite C-isms such as `*p++` as an expression
	- Walk the C parse tree, print the C code in Go syntax
	- Compile the output
	- Run, compare generated code
	- Repeat

	The `Yacc` grammar was translated by sam-powered hands.

	* Translator configuration

	Aided by hand-written rewrite rules, such as:

	- this field is a bool
	- this function returns a bool

	Also diff-like rewrites for things such as using the standard library:

	diff {
	- g.Rpo = obj.Calloc(g.Numsizeof(g.Rpo[0]), 1).([]Flow)
	- idom = obj.Calloc(g.Num*sizeof(idom[0]), 1).([]int32)
	- if g.Rpo == nil \|\| idom == nil {
	- Fatal("out of memory")
	- }
	+ g.Rpo = make([]*Flow, g.Num)
	+ idom = make([]int32, g.Num)
	}

	* Another example

	This one due to semantic difference between the languages.

	diff {
	- if nreg == 64 {
	- mask = ^0 // can't rely on C to shift by 64
	- } else {
	- mask = (1 << uint(nreg)) - 1
	- }
	+ mask = (1 << uint(nreg)) - 1
	}

	* Grind

	Once in Go, new tool `grind` deployed (by `rsc@`):

	- parses Go, type checks
	- records a list of edits to perform: "insert this text at this position"
	- at end, applies edits to source (hard to edit AST).

	Changes guided by profiling and other analysis:

	- removes dead code
	- removes gotos
	- removes unused labels, needless indirections, etc.
	- moves `var` declarations nearer to first use

	.link http://rsc.io/grind

	* Performance problems

	Output from translator was poor Go, and ran about 10X slower.
	Most of that slowdown has been recovered.

	Problems with C to Go:

	- C patterns can be poor Go; e.g.: complex `for` loops
	- C stack variables never escape; Go compiler isn't as sure
	- interfaces such as `fmt.Stringer` vs. C's `varargs`
	- no `unions` in Go, so use `structs` instead: bloat
	- variable declarations in wrong place

	C compiler didn't free much memory, but Go has a GC.
	Adds CPU and memory overhead.

	* Performance fixes

	Profile! (Never done before!)

	- move `vars` closer to first use
	- split `vars` into multiple
	- replace code in the compiler with code in the library: e.g. `math/big`
	- use interface or other tricks to combine `struct` fields
	- better escape analysis (`drchase@`).
	- hand tuning code and data layout

	Use tools like `grind`, `gofmt` `-r` and `eg` for much of this.

	Removing interface argument from a debugging print library got 15% overall!

	More remains to be done.

	* Technical benefits

	Other benefits of the conversion:

	Garbage collection means no more worry about introducing a dangling pointer.

	Chance to clean up the back ends.

	Unified `386` and `amd64` architectures throughout the tool chain.

	New architectures are easier to add.

	Unified the tools: now one compiler, one assembler, one linker.

	* Compiler

	`GOOS=YYY` `GOARCH=XXX` `go` `tool` `compile`

	One compiler; no more `6g`, `8g` etc.

	About 50K lines of portable code.
	Even the registerizer is portable now; architectures well characterized.
	Non-portable: Peepholing, details like registers bound to instructions.
	Typically around 10% of the portable LOC.

	* Assembler

	`GOOS=YYY` `GOARCH=XXX` `go` `tool` `asm`

	New assembler, all in Go, written from scratch by `r@`.
	Clean, idiomatic Go code.

	Less than 4000 lines, <10% machine-dependent.

	Almost completely compatible with previous `yacc` and C assemblers.

	How is this possible?

	- shared syntax originating in the Plan 9 assemblers
	- unified back-end logic (old `liblink`, now `internal/obj`)

	* Linker

	`GOOS=YYY` `GOARCH=XXX` `go` `tool` `link`

	Mostly hand- and machine- translated from C code.

	New library, `internal/obj`, part of original linker, captures details about machines, writes object files.

	27000 lines summed across 4 architectures, mostly tables (plus some ugliness).

	- `arm`: 4000
	- `arm64`: 6000
	- `ppc64`: 5000
	- `x86`: 7500 (`386` and `amd64`)

	Example benefit: one print routine to print any instruction for any architecture.

	* Bootstrap

	With no C compiler, bootstrapping requires a Go compiler.

	Therefore need to build or download a working Go installation to build 1.5 from source.

	We use Go 1.4+ as the base to build the 1.5+ tool chain. (Newer is OK too.)

	Details: [[http://golang.org/s/go15bootstrap]]

	* Future

	Much work still to do, but 1.5 is mostly set.

	Future work:

	Better escape analysis.
	New compiler back end using SSA (much easier in Go than C).
	Will allow much more optimization.

	Generate machine descriptions from PDFs (or maybe XML).
	Will have a purely machine-generated instruction definition:
	"Read in PDF, write out an assembler configuration".
	Already deployed for the disassemblers.

	* Conclusions

	Getting rid of C was a huge advance for the project.
	Code is cleaner, testable, profilable, easier to work on.

	New unified tool chain reduces code size, increases maintainability.

	Flexible tool chain, portability still paramount.