talks/2016/asm: add slides for assembler talk

From GopherCon, The Design of the Go Assembler

Change-Id: If585f57e4adc904138c9e381d3bec55b0e7e6c03
Reviewed-on: https://go-review.googlesource.com/27333
Reviewed-by: Andrew Gerrand <adg@golang.org>
diff --git a/2016/asm.slide b/2016/asm.slide
new file mode 100644
index 0000000..80c3bf3
--- /dev/null
+++ b/2016/asm.slide
@@ -0,0 +1,389 @@
+The Design of the Go Assembler
+
+Gophercon
+12 July 2016
+
+Rob Pike
+Google
+@rob_pike
+[[http://golang.org/s/plusrob][+RobPikeTheHuman]]
+http://golang.org/
+
+* Presentation on youtube.com
+
+Video is [[https://www.youtube.com/watch?v=KINIAgRpkDA][here]].
+
+
+* Motivation
+ 
+_Why_Learn_Assembler_Language?_
+
+_The_most_important_single_thing_to_realize_about_assembler_language_is_that_it_enables_the_programmer_to_use_all_System/360_machine_functions_as_if_he_were_coding_in_System/360_machine_language._
+
+— A Programmer's Introduction to IBM System/360 Assembler Language, 1970, page 4
+
+* We still need assembly language
+
+Once it was all you needed, then high-level languages like FORTRAN and COBOL came along.
+
+But still needed today:
+
+- environment bootstrap (operating system and program startup, runtime)
+- low-level library code such as stack management and context switching
+- performance (`math/big`)
+- access to features not exposed in language such as crypto instructions
+
+Also, perhaps most important: It's how we talk about the machine.
+
+Knowing assembly, even a little, means understanding computers better.
+
+* What does it look like?
+
+Some examples...
+
+* IBM System/360
+
+.code asm/360.s
+
+* Apollo 11 Guidance Computer
+
+.code asm/apollo.s
+
+* PDP-10
+
+.code asm/pdp10.s
+
+(From the MIT PDP-10 Info file)
+
+* PDP-11
+
+.code asm/pdp11.s
+
+(From Unix v6 `as/as13.s`)
+
+* Motorola 68000
+
+.code asm/68000.s
+
+(From Wikipedia)             
+
+* CRAY-1
+
+.code asm/cray1.s
+
+(From Robert Griesemer's PhD thesis)
+
+
+* Common structure
+
+Columnar layout with function and variable declarations, labels, instructions.
+
+Instructions:
+
+	subroutine header
+	label:
+		instruction operand...	; comment
+		...
+
+Operands:
+
+	register
+	literal constant
+	address
+	register indirection (register as address)
+	...
+
+There are exceptions such as Cray (`A5` `A5+A14`) but they aren't conceptually different.
+
+CPUs are all pretty much the same.
+
+* Use that commonality
+
+We can use the common structure of all assemblers (CPUs, really) to construct a common grammar for all architectures.
+
+This realization took some time.
+
+The seeds were planted long ago.
+
+* Plan 9 assembly
+
+Around 1986, Ken Thompson wrote a C compiler for the National 32000 (Sequent SMP).
+Compiler generated pseudo-code, linker did instruction assignment.
+
+The "assembler" was just a way to write that pseudo-code textually.
+
+	MOVW    $0, var
+
+might become (hypothetical example)
+
+	XORW    R1, R1
+	STORE   R1, var
+
+Note assembler emits the `MOVW`; the linker generates `XORW` and `STORE`.
+We call this _instruction_selection_.
+
+Or consider `RET`, which becomes `RET` or `JMP` `LR` or `JMP` `(R31)` or ...
+
+The assembler is just a way to hand-write the output the compiler produces.
+(Compiler does not feed assembler, unlike in many other systems.)
+
+* The pieces
+
+.image asm/arch1.png
+
+* The Plan 9 assemblers
+
+Assembler for each architecture was a separate C program with a Yacc grammar,
+adapted and partially rewritten for every architecture.
+
+`8a`, `6a`, `va` etc. corresponding to `8c`, `6c` `vc`, etc.
+(One-letter codes: `8` for 386, `6` for AMD64, `v` for MIPS, etc.)
+
+All very similar up front but different in detail.
+
+The earliest Go implementations used this design, adding Go compilers `8g`, `6g` but using the Plan 9 assemblers unchanged.
+
+The separation of (compiler/assembler)⇒linker allowed the Go linker to do more, including helping boot the runtime.
+
+* Go 1.3: Rearrange the pieces
+
+Goal: Move to a pure Go implementation.
+Preparation started in Go 1.3
+
+New library that (in part) does instruction selection: `"liblink"` (as of 1.5, `"obj"`).
+Call it from the compiler.
+
+Thus the first part of the old linker is now in the compiler.
+The compiler now emits (mostly) real instructions, not pseudo-instructions.
+
+Result: Slower compiler, but faster build.
+Instruction selection for library code done once, not every time you link a program.
+
+Assemblers also use `obj`.
+
+For both compiler and assembler, the _input_ is unchanged.
+In fact the whole _process_ is the same, just arranged differently.
+
+* The old pieces
+
+.image asm/arch1.png
+
+
+* The new pieces
+
+.image asm/arch2.png
+
+
+* Go 1.5: C must Go
+
+More prep in Go 1.4, then in Go 1.5, all tooling moved to Go.
+
+Compiler and linker machine-translated from C to Go.
+The old `liblink` became a new suite of libraries, `obj/...`:
+
+- `cmd/internal/obj`  (portable part)
+- `cmd/internal/obj/x86`    (architecture-specific part)
+- `cmd/internal/obj/arm`    (architecture-specific part)
+- ...
+ 
+Previous presentations about this work:
+
+- Russ Cox at Gophercon 2014 (out of date): [[youtube.com/watch?v=QIE5nV5fDwA]]
+- Rob Pike at Gopherfest 2015: [[youtube.com/watch?v=cF1zJYkBW4A]]
+* Go 1.5: Compiler and linker as single programs
+
+The many compilers (`6g`, `8g` etc.) were replaced with a single tool: `compile`.
+`GOOS` and `GOARCH` (only!) specify the target operating system and architecture.
+
+	GOOS=darwin GOARCH=arm go tool compile prog.go
+
+Same for the linker: `6l`, `8l`, etc. become `go` `tool` `link`.
+
+How can a single binary handle all these architectures?
+
+Only one input language, only one output generator (the `obj` library).
+The target is configured when the tool starts.
+
+* Go 1.5 Assembler
+
+Unlike the old compilers, which shared much code, the old assemblers were all different programs.
+(Although they were very similar inside, they shared almost no code.)
+
+Proposal: Write a single `go` `tool` `asm` from scratch in Go, replacing all the old assemblers.
+
+`GOOS` and `GOARCH` tell you what the target is.
+
+But assembly language isn't Go. Every machine has a different assembly language.
+
+Well, not really! Not quite universal across machines, but ...
+
+* An example program
+
+Look at the generated assembly for this simple program:
+
+.code asm/add.go
+
+For each architecture, with some noise edited out:
+
+* 32-bit x86 (386)
+
+.code asm/386.s
+
+* 64-bit x86 (amd64)
+
+.code asm/amd64.s
+
+* 32-bit arm
+
+.code asm/arm.s
+
+* 64-bit arm (arm64)
+
+.code asm/arm64.s
+
+* S390 (s390x)
+
+.code asm/s390x.s
+
+* 64-bit MIPS (mips64)
+
+.code asm/mips64.s
+
+* 64-bit Power (ppc64le)
+
+.code asm/ppc64le.s
+
+* Common grammar
+
+They all look the same. (Partly by design, partly because they _are_ the same.)
+
+The only significant variation is the names of instructions and registers.
+Many details hidden, such as what `RET` is. (It's a pseudo-instruction.)
+
+(Offsets are determined by size of `int`, among other things.)
+
+The fortuitous syntax originated in Ken's National 32000 assembler.
+
+With common syntax and the `obj` library, can build a single assembler for all CPUs.
+
+* Aside: Downside
+
+Not the same assembly notation as the manufacturers'.
+Can be offputting to outsiders.
+
+On the other hand, this approach uses the same notation on all machines.
+New architectures can arrive without creating or learning new notation.
+
+A tradeoff worth making.
+
+* Design of the Go 1.5 assembler
+
+The apotheosis of assemblers.
+
+New program, entirely in Go.
+
+Common lexer and parser across all architectures.
+Each instruction parsed into an instruction description.
+That becomes a data structure passed to the new `obj` library.
+
+The core of the assembler has very little per-machine information.
+Instead, tables are constructed at run time, flavored by `$GOARCH`.
+
+An internal package, `cmd/asm/internal/arch`, creates these tables on the fly.
+Machine details are loaded from `obj`.
+
+* An example: initializing the 386
+
+.code asm/arch386._go /^import/,$
+
+Parser just does string matching to find the instruction.
+
+* An example: ADDW on 386
+
+Given an assembly run with `GOOS=386`, the instruction
+
+	ADDW AX, BX
+
+is parsed into in a data structure schematically like:
+
+	&obj.Prog{
+		As: arch.Instructions["ADDW"],
+		From: obj.Addr{Reg: arch.Register["AX"]},
+		To: obj.Addr{Reg: arch.Register["BX"]},
+		...
+	}
+
+That gets passed to the `obj` library for encoding as a 386 instruction.
+
+This is a purely mechanical process devoid of semantics.
+
+* Validation
+
+Assembler does some validation:
+
+- lexical and syntactic correctness
+- operand syntax
+- (with some variation. e.g.: `[R2,R5,R8,g]` only legal on ARM)
+
+But all semantic checking is done by the `obj` library.
+
+If it can be turned into real instructions, it's legal!
+
+* Testing
+
+New assembler was tested against the old (C-written) ones.
+
+A/B testing at the bit level: Same input must give same output.
+Also reworked some parts of `obj` packages for better diagnostics and debugging.
+
+Did `386` first, then `amd64`, `arm`, and `ppc`. Each was easier than the last.
+
+No hardware manuals were opened during this process.
+
+* Result
+
+One Go program replaces many C/Yacc programs, so it's easier to maintain.
+As a Go program it can have proper tests.
+
+Dependent on `obj`, so correctness and completeness are relatively simple to guarantee.
+
+New assembler almost 100% compatible with previous ones.
+Incompatibilities were mostly inconsistencies.
+
+Portability is easy now.
+
+A new instruction set just needs connecting it up with the `obj` library,
+plus a minor amount of architecture-specific tuning and validation.
+
+Several architectures have been added since the assembler was created,
+most by the open source community.
+
+* Tables
+
+To a large extent, the assembler is now table-driven.
+Can we generate those tables?
+
+The disassemblers (used by `go` `tool` `pprof`) are created by machine processing of PDFs.
+The architecture definition is machine-readable, so use it!
+
+Plan to go the other way:
+
+Read in a PDF, write out `obj` library definitions and bind to assembler.
+Why write by hand when you can automate?
+
+Hope to have this working soon; basics are already in place.
+
+Result: a largely machine-generated assembler.
+
+* Conclusion
+
+Assembly language is essentially the same everywhere.
+
+Use that to build a *true* common assembly language.
+
+Customize it on the fly using dynamically loaded tables.
+
+And one day: create those tables automatically.
+
+
+A portable solution to a especially non-portable problem.
diff --git a/2016/asm/360.s b/2016/asm/360.s
new file mode 100644
index 0000000..5c9d60c
--- /dev/null
+++ b/2016/asm/360.s
@@ -0,0 +1,13 @@
+1        PRINT NOGEN
+2 STOCK1 START 0
+3 BEGIN  BALR  11,0
+4        USING *,11
+5        MVC   NEWOH,OLDOH
+6        AP    NEWOH,RECPT
+7        AP    NEWOH,ISSUE
+8        EOJ
+11 OLDOH DC    PL4'9'
+12 RECPT DC    PL4'4'
+13 ISSUE DC    PL4'6'
+14 NEWOH DS    PL4
+15       END   BEGIN
diff --git a/2016/asm/386.s b/2016/asm/386.s
new file mode 100644
index 0000000..10a78ff
--- /dev/null
+++ b/2016/asm/386.s
@@ -0,0 +1,5 @@
+TEXT add(SB), $0-12
+    MOVL    a+4(FP), BX
+    ADDL    b+8(FP), BX
+    MOVL    BX, 12(FP)
+    RET
diff --git a/2016/asm/68000.s b/2016/asm/68000.s
new file mode 100644
index 0000000..ba69d36
--- /dev/null
+++ b/2016/asm/68000.s
@@ -0,0 +1,15 @@
+strtolower      public
+                link    a6,#0           ;Set up stack frame
+                movea   8(a6),a0        ;A0 = src, from stack
+                movea   12(a6),a1       ;A1 = dst, from stack
+loop            move.b  (a0)+,d0        ;Load D0 from (src)
+                cmpi    #'A',d0         ;If D0 < 'A',
+                blo     copy            ;skip
+                cmpi    #'Z',d0         ;If D0 > 'Z',
+                bhi     copy            ;skip
+                addi    #'a'-'A',d0     ;D0 = lowercase(D0)
+copy            move.b  d0,(a1)+        ;Store D0 to (dst)
+                bne     loop            ;Repeat while D0 <> NUL
+                unlk    a6              ;Restore stack frame
+                rts                     ;Return
+                end
diff --git a/2016/asm/add.go b/2016/asm/add.go
new file mode 100644
index 0000000..4409aa1
--- /dev/null
+++ b/2016/asm/add.go
@@ -0,0 +1,5 @@
+package add
+
+func add(a, b int) int {
+	return a + b
+}
diff --git a/2016/asm/amd64.s b/2016/asm/amd64.s
new file mode 100644
index 0000000..117ede8
--- /dev/null
+++ b/2016/asm/amd64.s
@@ -0,0 +1,6 @@
+TEXT add(SB), $0-24
+    MOVQ    b+16(FP), AX
+    MOVQ    a+8(FP), CX
+    ADDQ    CX, AX
+    MOVQ    AX, 24(FP)
+    RET
diff --git a/2016/asm/apollo.s b/2016/asm/apollo.s
new file mode 100644
index 0000000..20af5bb
--- /dev/null
+++ b/2016/asm/apollo.s
@@ -0,0 +1,16 @@
+# TO ENTER A JOB REQUEST REQUIRING NO VAC AREA:
+
+          COUNT     02/EXEC
+                
+NOVAC     INHINT
+          AD        FAKEPRET     # LOC(MPAC +6) - LOC(QPRET)
+          TS        NEWPRIO      # PRIORITY OF NEW JOB + NOVAC C(FIXLOC)
+
+          EXTEND
+          INDEX     Q            # Q WILL BE UNDISTURBED THROUGHOUT.
+          DCA       0            # 2CADR OF JOB ENTERED.
+          DXCH      NEWLOC
+          CAF       EXECBANK
+          XCH       FBANK
+          TS        EXECTEM1
+          TCF       NOVAC2       # ENTER EXECUTIVE BANK.
diff --git a/2016/asm/arch1.png b/2016/asm/arch1.png
new file mode 100644
index 0000000..f2d6e6e
--- /dev/null
+++ b/2016/asm/arch1.png
Binary files differ
diff --git a/2016/asm/arch2.png b/2016/asm/arch2.png
new file mode 100644
index 0000000..920709d
--- /dev/null
+++ b/2016/asm/arch2.png
Binary files differ
diff --git a/2016/asm/arch386._go b/2016/asm/arch386._go
new file mode 100644
index 0000000..9cbdafa
--- /dev/null
+++ b/2016/asm/arch386._go
@@ -0,0 +1,21 @@
+import (
+	"cmd/internal/obj"
+	"cmd/internal/obj/x86"
+)
+
+func archX86(linkArch *obj.LinkArch) *Arch {
+	register := make(map[string]int16)
+	// Create maps for easy lookup of instruction names etc.
+	for i, s := range x86.Register {
+		register[s] = int16(i + x86.REG_AL)
+	}
+	instructions := make(map[string]obj.As)
+	for i, s := range obj.Anames {
+		instructions[s] = x86.As(i)
+	}
+	return &Arch{
+		Instructions:   instructions,
+		Register:       register,
+		...
+	}
+}
diff --git a/2016/asm/arm.s b/2016/asm/arm.s
new file mode 100644
index 0000000..41ffc2b
--- /dev/null
+++ b/2016/asm/arm.s
@@ -0,0 +1,6 @@
+TEXT add(SB), $-4-12
+    MOVW    a(FP), R0
+    MOVW    b+4(FP), R1
+    ADD     R1, R0
+    MOVW    R0, 8(FP)
+    RET
diff --git a/2016/asm/arm64.s b/2016/asm/arm64.s
new file mode 100644
index 0000000..0881803
--- /dev/null
+++ b/2016/asm/arm64.s
@@ -0,0 +1,6 @@
+TEXT add(SB), $-8-24
+    MOVD    a(FP), R0
+    MOVD    b+8(FP), R1
+    ADD     R1, R0
+    MOVD    R0, 16(FP)
+    RET
diff --git a/2016/asm/cray1.s b/2016/asm/cray1.s
new file mode 100644
index 0000000..90c5670
--- /dev/null
+++ b/2016/asm/cray1.s
@@ -0,0 +1,18 @@
+ident slice
+         V6        0               ; initialize S
+         A4        S0              ; initialize *x
+         A5        S1              ; initialize *y
+         A3        S2              ; initialize i
+loop     S0        A3
+         JSZ       exit            ; if S0 == 0 goto exit
+         VL        A3              ; set vector length
+         V11       ,A4,1           ; load slice of x[i], stride 1
+         V12       ,A5,1           ; load slice of y[i], stride 1
+         V13       V11 *F V12      ; slice of x[i] * y[i]
+         V6        V6 +F V13       ; partial sum
+         A14       VL              ; get vector length of this iteration
+         A4        A4 + A14        ; *x = *x + VL
+         A5        A5 + A14        ; *y = *y + VL
+         A3        A3 - A14        ; i = i - VL
+         J        loop
+ exit
diff --git a/2016/asm/mips64.s b/2016/asm/mips64.s
new file mode 100644
index 0000000..1c2ebea
--- /dev/null
+++ b/2016/asm/mips64.s
@@ -0,0 +1,6 @@
+TEXT add(SB), $-8-24
+    MOVV    a(FP), R1
+    MOVV    b+8(FP), R2
+    ADDVU   R2, R1
+    MOVV    R1, 16(FP)
+    RET
diff --git a/2016/asm/pdp10.s b/2016/asm/pdp10.s
new file mode 100644
index 0000000..3dec910
--- /dev/null
+++ b/2016/asm/pdp10.s
@@ -0,0 +1,14 @@
+TITLE   COUNT
+ 
+A=1                             ;Define a name for an accumulator.
+
+START:  MOVSI A,-100            ;initialize loop counter.
+                                ;A contains -100,,0
+LOOP:   HRRZM A,TABLE(A)        ;Use right half of A to index.
+        AOBJN A,LOOP            ;Add 1 to both halves (-77,,1 -76,,2 etc.)
+                                ;Jump if still negative.
+        .VALUE                  ;Halt program.
+
+TABLE:  BLOCK 100               ;Assemble space to fill up.
+
+END START                       ;End the assembly.
diff --git a/2016/asm/pdp11.s b/2016/asm/pdp11.s
new file mode 100644
index 0000000..1992055
--- /dev/null
+++ b/2016/asm/pdp11.s
@@ -0,0 +1,19 @@
+/ a3 -- pdp-11 assembler pass 1
+
+assem:
+        jsr     pc,readop
+        jsr     pc,checkeos
+        br      ealoop
+        tst     ifflg
+        beq     3f
+        cmp     r4,$200
+        blos    assem
+        cmpb    (r4),$21   /if
+        bne     2f
+        inc     ifflg
+2:
+        cmpb    (r4),$22   /endif
+        bne     assem
+        dec     ifflg
+        br      assem
+
diff --git a/2016/asm/ppc64le.s b/2016/asm/ppc64le.s
new file mode 100644
index 0000000..2463e78
--- /dev/null
+++ b/2016/asm/ppc64le.s
@@ -0,0 +1,6 @@
+TEXT add(SB), $0-24
+    MOVD    a(FP), R2
+    MOVD    b+8(FP), R3
+    ADD     R3, R2
+    MOVD    R2, 16(FP)
+    RET
diff --git a/2016/asm/s390x.s b/2016/asm/s390x.s
new file mode 100644
index 0000000..8065154
--- /dev/null
+++ b/2016/asm/s390x.s
@@ -0,0 +1,6 @@
+TEXT add(SB), $0-24
+    MOVD    a(FP), R1
+    MOVD    b+8(FP), R2
+    ADD     R2, R1, R1
+    MOVD    R1, 16(FP)
+    RET