talks/2016/asm: add slides for assembler talk
From GopherCon, The Design of the Go Assembler
Change-Id: If585f57e4adc904138c9e381d3bec55b0e7e6c03
Reviewed-on: https://go-review.googlesource.com/27333
Reviewed-by: Andrew Gerrand <adg@golang.org>
diff --git a/2016/asm.slide b/2016/asm.slide
new file mode 100644
index 0000000..80c3bf3
--- /dev/null
+++ b/2016/asm.slide
@@ -0,0 +1,389 @@
+The Design of the Go Assembler
+
+Gophercon
+12 July 2016
+
+Rob Pike
+Google
+@rob_pike
+[[http://golang.org/s/plusrob][+RobPikeTheHuman]]
+http://golang.org/
+
+* Presentation on youtube.com
+
+Video is [[https://www.youtube.com/watch?v=KINIAgRpkDA][here]].
+
+
+* Motivation
+
+_Why_Learn_Assembler_Language?_
+
+_The_most_important_single_thing_to_realize_about_assembler_language_is_that_it_enables_the_programmer_to_use_all_System/360_machine_functions_as_if_he_were_coding_in_System/360_machine_language._
+
+— A Programmer's Introduction to IBM System/360 Assembler Language, 1970, page 4
+
+* We still need assembly language
+
+Once it was all you needed, then high-level languages like FORTRAN and COBOL came along.
+
+But still needed today:
+
+- environment bootstrap (operating system and program startup, runtime)
+- low-level library code such as stack management and context switching
+- performance (`math/big`)
+- access to features not exposed in language such as crypto instructions
+
+Also, perhaps most important: It's how we talk about the machine.
+
+Knowing assembly, even a little, means understanding computers better.
+
+* What does it look like?
+
+Some examples...
+
+* IBM System/360
+
+.code asm/360.s
+
+* Apollo 11 Guidance Computer
+
+.code asm/apollo.s
+
+* PDP-10
+
+.code asm/pdp10.s
+
+(From the MIT PDP-10 Info file)
+
+* PDP-11
+
+.code asm/pdp11.s
+
+(From Unix v6 `as/as13.s`)
+
+* Motorola 68000
+
+.code asm/68000.s
+
+(From Wikipedia)
+
+* CRAY-1
+
+.code asm/cray1.s
+
+(From Robert Griesemer's PhD thesis)
+
+
+* Common structure
+
+Columnar layout with function and variable declarations, labels, instructions.
+
+Instructions:
+
+ subroutine header
+ label:
+ instruction operand... ; comment
+ ...
+
+Operands:
+
+ register
+ literal constant
+ address
+ register indirection (register as address)
+ ...
+
+There are exceptions such as Cray (`A5` `A5+A14`) but they aren't conceptually different.
+
+CPUs are all pretty much the same.
+
+* Use that commonality
+
+We can use the common structure of all assemblers (CPUs, really) to construct a common grammar for all architectures.
+
+This realization took some time.
+
+The seeds were planted long ago.
+
+* Plan 9 assembly
+
+Around 1986, Ken Thompson wrote a C compiler for the National 32000 (Sequent SMP).
+Compiler generated pseudo-code, linker did instruction assignment.
+
+The "assembler" was just a way to write that pseudo-code textually.
+
+ MOVW $0, var
+
+might become (hypothetical example)
+
+ XORW R1, R1
+ STORE R1, var
+
+Note assembler emits the `MOVW`; the linker generates `XORW` and `STORE`.
+We call this _instruction_selection_.
+
+Or consider `RET`, which becomes `RET` or `JMP` `LR` or `JMP` `(R31)` or ...
+
+The assembler is just a way to hand-write the output the compiler produces.
+(Compiler does not feed assembler, unlike in many other systems.)
+
+* The pieces
+
+.image asm/arch1.png
+
+* The Plan 9 assemblers
+
+Assembler for each architecture was a separate C program with a Yacc grammar,
+adapted and partially rewritten for every architecture.
+
+`8a`, `6a`, `va` etc. corresponding to `8c`, `6c` `vc`, etc.
+(One-letter codes: `8` for 386, `6` for AMD64, `v` for MIPS, etc.)
+
+All very similar up front but different in detail.
+
+The earliest Go implementations used this design, adding Go compilers `8g`, `6g` but using the Plan 9 assemblers unchanged.
+
+The separation of (compiler/assembler)⇒linker allowed the Go linker to do more, including helping boot the runtime.
+
+* Go 1.3: Rearrange the pieces
+
+Goal: Move to a pure Go implementation.
+Preparation started in Go 1.3
+
+New library that (in part) does instruction selection: `"liblink"` (as of 1.5, `"obj"`).
+Call it from the compiler.
+
+Thus the first part of the old linker is now in the compiler.
+The compiler now emits (mostly) real instructions, not pseudo-instructions.
+
+Result: Slower compiler, but faster build.
+Instruction selection for library code done once, not every time you link a program.
+
+Assemblers also use `obj`.
+
+For both compiler and assembler, the _input_ is unchanged.
+In fact the whole _process_ is the same, just arranged differently.
+
+* The old pieces
+
+.image asm/arch1.png
+
+
+* The new pieces
+
+.image asm/arch2.png
+
+
+* Go 1.5: C must Go
+
+More prep in Go 1.4, then in Go 1.5, all tooling moved to Go.
+
+Compiler and linker machine-translated from C to Go.
+The old `liblink` became a new suite of libraries, `obj/...`:
+
+- `cmd/internal/obj` (portable part)
+- `cmd/internal/obj/x86` (architecture-specific part)
+- `cmd/internal/obj/arm` (architecture-specific part)
+- ...
+
+Previous presentations about this work:
+
+- Russ Cox at Gophercon 2014 (out of date): [[youtube.com/watch?v=QIE5nV5fDwA]]
+- Rob Pike at Gopherfest 2015: [[youtube.com/watch?v=cF1zJYkBW4A]]
+* Go 1.5: Compiler and linker as single programs
+
+The many compilers (`6g`, `8g` etc.) were replaced with a single tool: `compile`.
+`GOOS` and `GOARCH` (only!) specify the target operating system and architecture.
+
+ GOOS=darwin GOARCH=arm go tool compile prog.go
+
+Same for the linker: `6l`, `8l`, etc. become `go` `tool` `link`.
+
+How can a single binary handle all these architectures?
+
+Only one input language, only one output generator (the `obj` library).
+The target is configured when the tool starts.
+
+* Go 1.5 Assembler
+
+Unlike the old compilers, which shared much code, the old assemblers were all different programs.
+(Although they were very similar inside, they shared almost no code.)
+
+Proposal: Write a single `go` `tool` `asm` from scratch in Go, replacing all the old assemblers.
+
+`GOOS` and `GOARCH` tell you what the target is.
+
+But assembly language isn't Go. Every machine has a different assembly language.
+
+Well, not really! Not quite universal across machines, but ...
+
+* An example program
+
+Look at the generated assembly for this simple program:
+
+.code asm/add.go
+
+For each architecture, with some noise edited out:
+
+* 32-bit x86 (386)
+
+.code asm/386.s
+
+* 64-bit x86 (amd64)
+
+.code asm/amd64.s
+
+* 32-bit arm
+
+.code asm/arm.s
+
+* 64-bit arm (arm64)
+
+.code asm/arm64.s
+
+* S390 (s390x)
+
+.code asm/s390x.s
+
+* 64-bit MIPS (mips64)
+
+.code asm/mips64.s
+
+* 64-bit Power (ppc64le)
+
+.code asm/ppc64le.s
+
+* Common grammar
+
+They all look the same. (Partly by design, partly because they _are_ the same.)
+
+The only significant variation is the names of instructions and registers.
+Many details hidden, such as what `RET` is. (It's a pseudo-instruction.)
+
+(Offsets are determined by size of `int`, among other things.)
+
+The fortuitous syntax originated in Ken's National 32000 assembler.
+
+With common syntax and the `obj` library, can build a single assembler for all CPUs.
+
+* Aside: Downside
+
+Not the same assembly notation as the manufacturers'.
+Can be offputting to outsiders.
+
+On the other hand, this approach uses the same notation on all machines.
+New architectures can arrive without creating or learning new notation.
+
+A tradeoff worth making.
+
+* Design of the Go 1.5 assembler
+
+The apotheosis of assemblers.
+
+New program, entirely in Go.
+
+Common lexer and parser across all architectures.
+Each instruction parsed into an instruction description.
+That becomes a data structure passed to the new `obj` library.
+
+The core of the assembler has very little per-machine information.
+Instead, tables are constructed at run time, flavored by `$GOARCH`.
+
+An internal package, `cmd/asm/internal/arch`, creates these tables on the fly.
+Machine details are loaded from `obj`.
+
+* An example: initializing the 386
+
+.code asm/arch386._go /^import/,$
+
+Parser just does string matching to find the instruction.
+
+* An example: ADDW on 386
+
+Given an assembly run with `GOOS=386`, the instruction
+
+ ADDW AX, BX
+
+is parsed into in a data structure schematically like:
+
+ &obj.Prog{
+ As: arch.Instructions["ADDW"],
+ From: obj.Addr{Reg: arch.Register["AX"]},
+ To: obj.Addr{Reg: arch.Register["BX"]},
+ ...
+ }
+
+That gets passed to the `obj` library for encoding as a 386 instruction.
+
+This is a purely mechanical process devoid of semantics.
+
+* Validation
+
+Assembler does some validation:
+
+- lexical and syntactic correctness
+- operand syntax
+- (with some variation. e.g.: `[R2,R5,R8,g]` only legal on ARM)
+
+But all semantic checking is done by the `obj` library.
+
+If it can be turned into real instructions, it's legal!
+
+* Testing
+
+New assembler was tested against the old (C-written) ones.
+
+A/B testing at the bit level: Same input must give same output.
+Also reworked some parts of `obj` packages for better diagnostics and debugging.
+
+Did `386` first, then `amd64`, `arm`, and `ppc`. Each was easier than the last.
+
+No hardware manuals were opened during this process.
+
+* Result
+
+One Go program replaces many C/Yacc programs, so it's easier to maintain.
+As a Go program it can have proper tests.
+
+Dependent on `obj`, so correctness and completeness are relatively simple to guarantee.
+
+New assembler almost 100% compatible with previous ones.
+Incompatibilities were mostly inconsistencies.
+
+Portability is easy now.
+
+A new instruction set just needs connecting it up with the `obj` library,
+plus a minor amount of architecture-specific tuning and validation.
+
+Several architectures have been added since the assembler was created,
+most by the open source community.
+
+* Tables
+
+To a large extent, the assembler is now table-driven.
+Can we generate those tables?
+
+The disassemblers (used by `go` `tool` `pprof`) are created by machine processing of PDFs.
+The architecture definition is machine-readable, so use it!
+
+Plan to go the other way:
+
+Read in a PDF, write out `obj` library definitions and bind to assembler.
+Why write by hand when you can automate?
+
+Hope to have this working soon; basics are already in place.
+
+Result: a largely machine-generated assembler.
+
+* Conclusion
+
+Assembly language is essentially the same everywhere.
+
+Use that to build a *true* common assembly language.
+
+Customize it on the fly using dynamically loaded tables.
+
+And one day: create those tables automatically.
+
+
+A portable solution to a especially non-portable problem.
diff --git a/2016/asm/360.s b/2016/asm/360.s
new file mode 100644
index 0000000..5c9d60c
--- /dev/null
+++ b/2016/asm/360.s
@@ -0,0 +1,13 @@
+1 PRINT NOGEN
+2 STOCK1 START 0
+3 BEGIN BALR 11,0
+4 USING *,11
+5 MVC NEWOH,OLDOH
+6 AP NEWOH,RECPT
+7 AP NEWOH,ISSUE
+8 EOJ
+11 OLDOH DC PL4'9'
+12 RECPT DC PL4'4'
+13 ISSUE DC PL4'6'
+14 NEWOH DS PL4
+15 END BEGIN
diff --git a/2016/asm/386.s b/2016/asm/386.s
new file mode 100644
index 0000000..10a78ff
--- /dev/null
+++ b/2016/asm/386.s
@@ -0,0 +1,5 @@
+TEXT add(SB), $0-12
+ MOVL a+4(FP), BX
+ ADDL b+8(FP), BX
+ MOVL BX, 12(FP)
+ RET
diff --git a/2016/asm/68000.s b/2016/asm/68000.s
new file mode 100644
index 0000000..ba69d36
--- /dev/null
+++ b/2016/asm/68000.s
@@ -0,0 +1,15 @@
+strtolower public
+ link a6,#0 ;Set up stack frame
+ movea 8(a6),a0 ;A0 = src, from stack
+ movea 12(a6),a1 ;A1 = dst, from stack
+loop move.b (a0)+,d0 ;Load D0 from (src)
+ cmpi #'A',d0 ;If D0 < 'A',
+ blo copy ;skip
+ cmpi #'Z',d0 ;If D0 > 'Z',
+ bhi copy ;skip
+ addi #'a'-'A',d0 ;D0 = lowercase(D0)
+copy move.b d0,(a1)+ ;Store D0 to (dst)
+ bne loop ;Repeat while D0 <> NUL
+ unlk a6 ;Restore stack frame
+ rts ;Return
+ end
diff --git a/2016/asm/add.go b/2016/asm/add.go
new file mode 100644
index 0000000..4409aa1
--- /dev/null
+++ b/2016/asm/add.go
@@ -0,0 +1,5 @@
+package add
+
+func add(a, b int) int {
+ return a + b
+}
diff --git a/2016/asm/amd64.s b/2016/asm/amd64.s
new file mode 100644
index 0000000..117ede8
--- /dev/null
+++ b/2016/asm/amd64.s
@@ -0,0 +1,6 @@
+TEXT add(SB), $0-24
+ MOVQ b+16(FP), AX
+ MOVQ a+8(FP), CX
+ ADDQ CX, AX
+ MOVQ AX, 24(FP)
+ RET
diff --git a/2016/asm/apollo.s b/2016/asm/apollo.s
new file mode 100644
index 0000000..20af5bb
--- /dev/null
+++ b/2016/asm/apollo.s
@@ -0,0 +1,16 @@
+# TO ENTER A JOB REQUEST REQUIRING NO VAC AREA:
+
+ COUNT 02/EXEC
+
+NOVAC INHINT
+ AD FAKEPRET # LOC(MPAC +6) - LOC(QPRET)
+ TS NEWPRIO # PRIORITY OF NEW JOB + NOVAC C(FIXLOC)
+
+ EXTEND
+ INDEX Q # Q WILL BE UNDISTURBED THROUGHOUT.
+ DCA 0 # 2CADR OF JOB ENTERED.
+ DXCH NEWLOC
+ CAF EXECBANK
+ XCH FBANK
+ TS EXECTEM1
+ TCF NOVAC2 # ENTER EXECUTIVE BANK.
diff --git a/2016/asm/arch1.png b/2016/asm/arch1.png
new file mode 100644
index 0000000..f2d6e6e
--- /dev/null
+++ b/2016/asm/arch1.png
Binary files differ
diff --git a/2016/asm/arch2.png b/2016/asm/arch2.png
new file mode 100644
index 0000000..920709d
--- /dev/null
+++ b/2016/asm/arch2.png
Binary files differ
diff --git a/2016/asm/arch386._go b/2016/asm/arch386._go
new file mode 100644
index 0000000..9cbdafa
--- /dev/null
+++ b/2016/asm/arch386._go
@@ -0,0 +1,21 @@
+import (
+ "cmd/internal/obj"
+ "cmd/internal/obj/x86"
+)
+
+func archX86(linkArch *obj.LinkArch) *Arch {
+ register := make(map[string]int16)
+ // Create maps for easy lookup of instruction names etc.
+ for i, s := range x86.Register {
+ register[s] = int16(i + x86.REG_AL)
+ }
+ instructions := make(map[string]obj.As)
+ for i, s := range obj.Anames {
+ instructions[s] = x86.As(i)
+ }
+ return &Arch{
+ Instructions: instructions,
+ Register: register,
+ ...
+ }
+}
diff --git a/2016/asm/arm.s b/2016/asm/arm.s
new file mode 100644
index 0000000..41ffc2b
--- /dev/null
+++ b/2016/asm/arm.s
@@ -0,0 +1,6 @@
+TEXT add(SB), $-4-12
+ MOVW a(FP), R0
+ MOVW b+4(FP), R1
+ ADD R1, R0
+ MOVW R0, 8(FP)
+ RET
diff --git a/2016/asm/arm64.s b/2016/asm/arm64.s
new file mode 100644
index 0000000..0881803
--- /dev/null
+++ b/2016/asm/arm64.s
@@ -0,0 +1,6 @@
+TEXT add(SB), $-8-24
+ MOVD a(FP), R0
+ MOVD b+8(FP), R1
+ ADD R1, R0
+ MOVD R0, 16(FP)
+ RET
diff --git a/2016/asm/cray1.s b/2016/asm/cray1.s
new file mode 100644
index 0000000..90c5670
--- /dev/null
+++ b/2016/asm/cray1.s
@@ -0,0 +1,18 @@
+ident slice
+ V6 0 ; initialize S
+ A4 S0 ; initialize *x
+ A5 S1 ; initialize *y
+ A3 S2 ; initialize i
+loop S0 A3
+ JSZ exit ; if S0 == 0 goto exit
+ VL A3 ; set vector length
+ V11 ,A4,1 ; load slice of x[i], stride 1
+ V12 ,A5,1 ; load slice of y[i], stride 1
+ V13 V11 *F V12 ; slice of x[i] * y[i]
+ V6 V6 +F V13 ; partial sum
+ A14 VL ; get vector length of this iteration
+ A4 A4 + A14 ; *x = *x + VL
+ A5 A5 + A14 ; *y = *y + VL
+ A3 A3 - A14 ; i = i - VL
+ J loop
+ exit
diff --git a/2016/asm/mips64.s b/2016/asm/mips64.s
new file mode 100644
index 0000000..1c2ebea
--- /dev/null
+++ b/2016/asm/mips64.s
@@ -0,0 +1,6 @@
+TEXT add(SB), $-8-24
+ MOVV a(FP), R1
+ MOVV b+8(FP), R2
+ ADDVU R2, R1
+ MOVV R1, 16(FP)
+ RET
diff --git a/2016/asm/pdp10.s b/2016/asm/pdp10.s
new file mode 100644
index 0000000..3dec910
--- /dev/null
+++ b/2016/asm/pdp10.s
@@ -0,0 +1,14 @@
+TITLE COUNT
+
+A=1 ;Define a name for an accumulator.
+
+START: MOVSI A,-100 ;initialize loop counter.
+ ;A contains -100,,0
+LOOP: HRRZM A,TABLE(A) ;Use right half of A to index.
+ AOBJN A,LOOP ;Add 1 to both halves (-77,,1 -76,,2 etc.)
+ ;Jump if still negative.
+ .VALUE ;Halt program.
+
+TABLE: BLOCK 100 ;Assemble space to fill up.
+
+END START ;End the assembly.
diff --git a/2016/asm/pdp11.s b/2016/asm/pdp11.s
new file mode 100644
index 0000000..1992055
--- /dev/null
+++ b/2016/asm/pdp11.s
@@ -0,0 +1,19 @@
+/ a3 -- pdp-11 assembler pass 1
+
+assem:
+ jsr pc,readop
+ jsr pc,checkeos
+ br ealoop
+ tst ifflg
+ beq 3f
+ cmp r4,$200
+ blos assem
+ cmpb (r4),$21 /if
+ bne 2f
+ inc ifflg
+2:
+ cmpb (r4),$22 /endif
+ bne assem
+ dec ifflg
+ br assem
+
diff --git a/2016/asm/ppc64le.s b/2016/asm/ppc64le.s
new file mode 100644
index 0000000..2463e78
--- /dev/null
+++ b/2016/asm/ppc64le.s
@@ -0,0 +1,6 @@
+TEXT add(SB), $0-24
+ MOVD a(FP), R2
+ MOVD b+8(FP), R3
+ ADD R3, R2
+ MOVD R2, 16(FP)
+ RET
diff --git a/2016/asm/s390x.s b/2016/asm/s390x.s
new file mode 100644
index 0000000..8065154
--- /dev/null
+++ b/2016/asm/s390x.s
@@ -0,0 +1,6 @@
+TEXT add(SB), $0-24
+ MOVD a(FP), R1
+ MOVD b+8(FP), R2
+ ADD R2, R1, R1
+ MOVD R1, 16(FP)
+ RET