| <!--{ |
| "Title": "A Quick Guide to Go's Assembler", |
| "Path": "/doc/asm" |
| }--> |
| |
| <h2 id="introduction">A Quick Guide to Go's Assembler</h2> |
| |
| <p> |
| This document is a quick outline of the unusual form of assembly language used by the <code>gc</code> |
| suite of Go compilers (<code>6g</code>, <code>8g</code>, etc.). |
| The document is not comprehensive. |
| </p> |
| |
| <p> |
| The assembler is based on the input to the Plan 9 assemblers, which is documented in detail |
| <a href="http://plan9.bell-labs.com/sys/doc/asm.html">on the Plan 9 site</a>. |
| If you plan to write assembly language, you should read that document although much of it is Plan 9-specific. |
| This document provides a summary of the syntax and |
| describes the peculiarities that apply when writing assembly code to interact with Go. |
| </p> |
| |
| <p> |
| The most important thing to know about Go's assembler is that it is not a direct representation of the underlying machine. |
| Some of the details map precisely to the machine, but some do not. |
| This is because the compiler suite (see |
| <a href="http://plan9.bell-labs.com/sys/doc/compiler.html">this description</a>) |
| needs no assembler pass in the usual pipeline. |
| Instead, the compiler emits a kind of incompletely defined instruction set, in binary form, which the linker |
| then completes. |
| In particular, the linker does instruction selection, so when you see an instruction like <code>MOV</code> |
| what the linker actually generates for that operation might not be a move instruction at all, perhaps a clear or load. |
| Or it might correspond exactly to the machine instruction with that name. |
| In general, machine-specific operations tend to appear as themselves, while more general concepts like |
| memory move and subroutine call and return are more abstract. |
| The details vary with architecture, and we apologize for the imprecision; the situation is not well-defined. |
| </p> |
| |
| <p> |
| The assembler program is a way to generate that intermediate, incompletely defined instruction sequence |
| as input for the linker. |
| If you want to see what the instructions look like in assembly for a given architecture, say amd64, there |
| are many examples in the sources of the standard library, in packages such as |
| <a href="/pkg/runtime/"><code>runtime</code></a> and |
| <a href="/pkg/math/big/"><code>math/big</code></a>. |
| You can also examine what the compiler emits as assembly code: |
| </p> |
| |
| <pre> |
| $ cat x.go |
| package main |
| |
| func main() { |
| println(3) |
| } |
| $ go tool 6g -S x.go # or: go build -gcflags -S x.go |
| |
| --- prog list "main" --- |
| 0000 (x.go:3) TEXT main+0(SB),$8-0 |
| 0001 (x.go:3) FUNCDATA $0,gcargs·0+0(SB) |
| 0002 (x.go:3) FUNCDATA $1,gclocals·0+0(SB) |
| 0003 (x.go:4) MOVQ $3,(SP) |
| 0004 (x.go:4) PCDATA $0,$8 |
| 0005 (x.go:4) CALL ,runtime.printint+0(SB) |
| 0006 (x.go:4) PCDATA $0,$-1 |
| 0007 (x.go:4) PCDATA $0,$0 |
| 0008 (x.go:4) CALL ,runtime.printnl+0(SB) |
| 0009 (x.go:4) PCDATA $0,$-1 |
| 0010 (x.go:5) RET , |
| ... |
| </pre> |
| |
| <p> |
| The <code>FUNCDATA</code> and <code>PCDATA</code> directives contain information |
| for use by the garbage collector; they are introduced by the compiler. |
| </p> |
| |
| <!-- Commenting out because the feature is gone but it's popular and may come back. |
| |
| <p> |
| To see what gets put in the binary after linking, add the <code>-a</code> flag to the linker: |
| </p> |
| |
| <pre> |
| $ go tool 6l -a x.6 # or: go build -ldflags -a x.go |
| codeblk [0x2000,0x1d059) at offset 0x1000 |
| 002000 main.main | (3) TEXT main.main+0(SB),$8 |
| 002000 65488b0c25a0080000 | (3) MOVQ 2208(GS),CX |
| 002009 483b21 | (3) CMPQ SP,(CX) |
| 00200c 7707 | (3) JHI ,2015 |
| 00200e e83da20100 | (3) CALL ,1c250+runtime.morestack00 |
| 002013 ebeb | (3) JMP ,2000 |
| 002015 4883ec08 | (3) SUBQ $8,SP |
| 002019 | (3) FUNCDATA $0,main.gcargs·0+0(SB) |
| 002019 | (3) FUNCDATA $1,main.gclocals·0+0(SB) |
| 002019 48c7042403000000 | (4) MOVQ $3,(SP) |
| 002021 | (4) PCDATA $0,$8 |
| 002021 e8aad20000 | (4) CALL ,f2d0+runtime.printint |
| 002026 | (4) PCDATA $0,$-1 |
| 002026 | (4) PCDATA $0,$0 |
| 002026 e865d40000 | (4) CALL ,f490+runtime.printnl |
| 00202b | (4) PCDATA $0,$-1 |
| 00202b 4883c408 | (5) ADDQ $8,SP |
| 00202f c3 | (5) RET , |
| ... |
| </pre> |
| |
| --> |
| |
| <h3 id="symbols">Symbols</h3> |
| |
| <p> |
| Some symbols, such as <code>PC</code>, <code>R0</code> and <code>SP</code>, are predeclared and refer to registers. |
| There are two other predeclared symbols, <code>SB</code> (static base) and <code>FP</code> (frame pointer). |
| All user-defined symbols other than jump labels are written as offsets to these pseudo-registers. |
| </p> |
| |
| <p> |
| The <code>SB</code> pseudo-register can be thought of as the origin of memory, so the symbol <code>foo(SB)</code> |
| is the name <code>foo</code> as an address in memory. |
| This form is used to name global functions and data. |
| Adding <code><></code> to the name, as in <code>foo<>(SB)</code>, makes the name |
| visible only in the current source file, like a top-level <code>static</code> declaration in a C file. |
| </p> |
| |
| <p> |
| The <code>FP</code> pseudo-register is a virtual frame pointer |
| used to refer to function arguments. |
| The compilers maintain a virtual frame pointer and refer to the arguments on the stack as offsets from that pseudo-register. |
| Thus <code>0(FP)</code> is the first argument to the function, |
| <code>8(FP)</code> is the second (on a 64-bit machine), and so on. |
| When referring to a function argument this way, it is conventional to place the name |
| at the beginning, as in <code>first_arg+0(FP)</code> and <code>second_arg+8(FP)</code>. |
| Some of the assemblers enforce this convention, rejecting plain <code>0(FP)</code> and <code>8(FP)</code>. |
| For assembly functions with Go prototypes, <code>go</code> <code>vet</code> will check that the argument names |
| and offsets match. |
| On 32-bit systems, the low and high 32 bits of a 64-bit value are distinguished by adding |
| a <code>_lo</code> or <code>_hi</code> suffix to the name, as in <code>arg_lo+0(FP)</code> or <code>arg_hi+4(FP)</code>. |
| If a Go prototype does not name its result, the expected assembly name is <code>ret</code>. |
| </p> |
| |
| <p> |
| The <code>SP</code> pseudo-register is a virtual stack pointer |
| used to refer to frame-local variables and the arguments being |
| prepared for function calls. |
| It points to the top of the local stack frame, so references should use negative offsets |
| in the range [−framesize, 0): |
| <code>x-8(SP)</code>, <code>y-4(SP)</code>, and so on. |
| On architectures with a real register named <code>SP</code>, the name prefix distinguishes |
| references to the virtual stack pointer from references to the architectural <code>SP</code> register. |
| That is, <code>x-8(SP)</code> and <code>-8(SP)</code> are different memory locations: |
| the first refers to the virtual stack pointer pseudo-register, while the second refers to the |
| hardware's <code>SP</code> register. |
| </p> |
| |
| <p> |
| Instructions, registers, and assembler directives are always in UPPER CASE to remind you |
| that assembly programming is a fraught endeavor. |
| (Exception: the <code>g</code> register renaming on ARM.) |
| </p> |
| |
| <p> |
| In Go object files and binaries, the full name of a symbol is the |
| package path followed by a period and the symbol name: |
| <code>fmt.Printf</code> or <code>math/rand.Int</code>. |
| Because the assembler's parser treats period and slash as punctuation, |
| those strings cannot be used directly as identifier names. |
| Instead, the assembler allows the middle dot character U+00B7 |
| and the division slash U+2215 in identifiers and rewrites them to |
| plain period and slash. |
| Within an assembler source file, the symbols above are written as |
| <code>fmt·Printf</code> and <code>math∕rand·Int</code>. |
| The assembly listings generated by the compilers when using the <code>-S</code> flag |
| show the period and slash directly instead of the Unicode replacements |
| required by the assemblers. |
| </p> |
| |
| <p> |
| Most hand-written assembly files do not include the full package path |
| in symbol names, because the linker inserts the package path of the current |
| object file at the beginning of any name starting with a period: |
| in an assembly source file within the math/rand package implementation, |
| the package's Int function can be referred to as <code>·Int</code>. |
| This convention avoids the need to hard-code a package's import path in its |
| own source code, making it easier to move the code from one location to another. |
| </p> |
| |
| <h3 id="directives">Directives</h3> |
| |
| <p> |
| The assembler uses various directives to bind text and data to symbol names. |
| For example, here is a simple complete function definition. The <code>TEXT</code> |
| directive declares the symbol <code>runtime·profileloop</code> and the instructions |
| that follow form the body of the function. |
| The last instruction in a <code>TEXT</code> block must be some sort of jump, usually a <code>RET</code> (pseudo-)instruction. |
| (If it's not, the linker will append a jump-to-itself instruction; there is no fallthrough in <code>TEXTs</code>.) |
| After the symbol, the arguments are flags (see below) |
| and the frame size, a constant (but see below): |
| </p> |
| |
| <pre> |
| TEXT runtime·profileloop(SB),NOSPLIT,$8 |
| MOVQ $runtime·profileloop1(SB), CX |
| MOVQ CX, 0(SP) |
| CALL runtime·externalthreadhandler(SB) |
| RET |
| </pre> |
| |
| <p> |
| In the general case, the frame size is followed by an argument size, separated by a minus sign. |
| (It's not a subtraction, just idiosyncratic syntax.) |
| The frame size <code>$24-8</code> states that the function has a 24-byte frame |
| and is called with 8 bytes of argument, which live on the caller's frame. |
| If <code>NOSPLIT</code> is not specified for the <code>TEXT</code>, |
| the argument size must be provided. |
| For assembly functions with Go prototypes, <code>go</code> <code>vet</code> will check that the |
| argument size is correct. |
| </p> |
| |
| <p> |
| Note that the symbol name uses a middle dot to separate the components and is specified as an offset from the |
| static base pseudo-register <code>SB</code>. |
| This function would be called from Go source for package <code>runtime</code> using the |
| simple name <code>profileloop</code>. |
| </p> |
| |
| <p> |
| Global data symbols are defined by a sequence of initializing |
| <code>DATA</code> directives followed by a <code>GLOBL</code> directive. |
| Each <code>DATA</code> directive initializes a section of the |
| corresponding memory. |
| The memory not explicitly initialized is zeroed. |
| The general form of the <code>DATA</code> directive is |
| |
| <pre> |
| DATA symbol+offset(SB)/width, value |
| </pre> |
| |
| <p> |
| which initializes the symbol memory at the given offset and width with the given value. |
| The <code>DATA</code> directives for a given symbol must be written with increasing offsets. |
| </p> |
| |
| <p> |
| The <code>GLOBL</code> directive declares a symbol to be global. |
| The arguments are optional flags and the size of the data being declared as a global, |
| which will have initial value all zeros unless a <code>DATA</code> directive |
| has initialized it. |
| The <code>GLOBL</code> directive must follow any corresponding <code>DATA</code> directives. |
| </p> |
| |
| <p> |
| For example, |
| </p> |
| |
| <pre> |
| DATA divtab<>+0x00(SB)/4, $0xf4f8fcff |
| DATA divtab<>+0x04(SB)/4, $0xe6eaedf0 |
| ... |
| DATA divtab<>+0x3c(SB)/4, $0x81828384 |
| GLOBL divtab<>(SB), RODATA, $64 |
| |
| GLOBL runtime·tlsoffset(SB), NOPTR, $4 |
| </pre> |
| |
| <p> |
| declares and initializes <code>divtab<></code>, a read-only 64-byte table of 4-byte integer values, |
| and declares <code>runtime·tlsoffset</code>, a 4-byte, implicitly zeroed variable that |
| contains no pointers. |
| </p> |
| |
| <p> |
| There may be one or two arguments to the directives. |
| If there are two, the first is a bit mask of flags, |
| which can be written as numeric expressions, added or or-ed together, |
| or can be set symbolically for easier absorption by a human. |
| Their values, defined in the standard <code>#include</code> file <code>textflag.h</code>, are: |
| </p> |
| |
| <ul> |
| <li> |
| <code>NOPROF</code> = 1 |
| <br> |
| (For <code>TEXT</code> items.) |
| Don't profile the marked function. This flag is deprecated. |
| </li> |
| <li> |
| <code>DUPOK</code> = 2 |
| <br> |
| It is legal to have multiple instances of this symbol in a single binary. |
| The linker will choose one of the duplicates to use. |
| </li> |
| <li> |
| <code>NOSPLIT</code> = 4 |
| <br> |
| (For <code>TEXT</code> items.) |
| Don't insert the preamble to check if the stack must be split. |
| The frame for the routine, plus anything it calls, must fit in the |
| spare space at the top of the stack segment. |
| Used to protect routines such as the stack splitting code itself. |
| </li> |
| <li> |
| <code>RODATA</code> = 8 |
| <br> |
| (For <code>DATA</code> and <code>GLOBL</code> items.) |
| Put this data in a read-only section. |
| </li> |
| <li> |
| <code>NOPTR</code> = 16 |
| <br> |
| (For <code>DATA</code> and <code>GLOBL</code> items.) |
| This data contains no pointers and therefore does not need to be |
| scanned by the garbage collector. |
| </li> |
| <li> |
| <code>WRAPPER</code> = 32 |
| <br> |
| (For <code>TEXT</code> items.) |
| This is a wrapper function and should not count as disabling <code>recover</code>. |
| </li> |
| </ul> |
| |
| <h3 id="runtime">Runtime Coordination</h3> |
| |
| <p> |
| For garbage collection to run correctly, the runtime must know the |
| location of pointers in all global data and in most stack frames. |
| The Go compiler emits this information when compiling Go source files, |
| but assembly programs must define it explicitly. |
| </p> |
| |
| <p> |
| A data symbol marked with the <code>NOPTR</code> flag (see above) |
| is treated as containing no pointers to runtime-allocated data. |
| A data symbol with the <code>RODATA</code> flag |
| is allocated in read-only memory and is therefore treated |
| as implicitly marked <code>NOPTR</code>. |
| A data symbol with a total size smaller than a pointer |
| is also treated as implicitly marked <code>NOPTR</code>. |
| It is not possible to define a symbol containing pointers in an assembly source file; |
| such a symbol must be defined in a Go source file instead. |
| Assembly source can still refer to the symbol by name |
| even without <code>DATA</code> and <code>GLOBL</code> directives. |
| A good general rule of thumb is to define all non-<code>RODATA</code> |
| symbols in Go instead of in assembly. |
| </p> |
| |
| <p> |
| Each function also needs annotations giving the location of |
| live pointers in its arguments, results, and local stack frame. |
| For an assembly function with no pointer results and |
| either no local stack frame or no function calls, |
| the only requirement is to define a Go prototype for the function |
| in a Go source file in the same package. The name of the assembly |
| function must not contain the package name component (for example, |
| function <code>Syscall</code> in package <code>syscall</code> should |
| use the name <code>·Syscall</code> instead of the equivalent name |
| <code>syscall·Syscall</code> in its <code>TEXT</code> directive). |
| For more complex situations, explicit annotation is needed. |
| These annotations use pseudo-instructions defined in the standard |
| <code>#include</code> file <code>funcdata.h</code>. |
| </p> |
| |
| <p> |
| If a function has no arguments and no results, |
| the pointer information can be omitted. |
| This is indicated by an argument size annotation of <code>$<i>n</i>-0</code> |
| on the <code>TEXT</code> instruction. |
| Otherwise, pointer information must be provided by |
| a Go prototype for the function in a Go source file, |
| even for assembly functions not called directly from Go. |
| (The prototype will also let <code>go</code> <code>vet</code> check the argument references.) |
| At the start of the function, the arguments are assumed |
| to be initialized but the results are assumed uninitialized. |
| If the results will hold live pointers during a call instruction, |
| the function should start by zeroing the results and then |
| executing the pseudo-instruction <code>GO_RESULTS_INITIALIZED</code>. |
| This instruction records that the results are now initialized |
| and should be scanned during stack movement and garbage collection. |
| It is typically easier to arrange that assembly functions do not |
| return pointers or do not contain call instructions; |
| no assembly functions in the standard library use |
| <code>GO_RESULTS_INITIALIZED</code>. |
| </p> |
| |
| <p> |
| If a function has no local stack frame, |
| the pointer information can be omitted. |
| This is indicated by a local frame size annotation of <code>$0-<i>n</i></code> |
| on the <code>TEXT</code> instruction. |
| The pointer information can also be omitted if the |
| function contains no call instructions. |
| Otherwise, the local stack frame must not contain pointers, |
| and the assembly must confirm this fact by executing the |
| pseudo-instruction <code>NO_LOCAL_POINTERS</code>. |
| Because stack resizing is implemented by moving the stack, |
| the stack pointer may change during any function call: |
| even pointers to stack data must not be kept in local variables. |
| </p> |
| |
| <h2 id="architectures">Architecture-specific details</h2> |
| |
| <p> |
| It is impractical to list all the instructions and other details for each machine. |
| To see what instructions are defined for a given machine, say 32-bit Intel x86, |
| look in the top-level header file for the corresponding linker, in this case <code>8l</code>. |
| That is, the file <code>$GOROOT/src/cmd/8l/8.out.h</code> contains a C enumeration, called <code>as</code>, |
| of the instructions and their spellings as known to the assembler and linker for that architecture. |
| In that file you'll find a declaration that begins |
| </p> |
| |
| <pre> |
| enum as |
| { |
| AXXX, |
| AAAA, |
| AAAD, |
| AAAM, |
| AAAS, |
| AADCB, |
| ... |
| </pre> |
| |
| <p> |
| Each instruction begins with a initial capital <code>A</code> in this list, so <code>AADCB</code> |
| represents the <code>ADCB</code> (add carry byte) instruction. |
| The enumeration is in alphabetical order, plus some late additions (<code>AXXX</code> occupies |
| the zero slot as an invalid instruction). |
| The sequence has nothing to do with the actual encoding of the machine instructions. |
| Again, the linker takes care of that detail. |
| </p> |
| |
| <p> |
| One detail evident in the examples from the previous sections is that data in the instructions flows from left to right: |
| <code>MOVQ</code> <code>$0,</code> <code>CX</code> clears <code>CX</code>. |
| This convention applies even on architectures where the usual mode is the opposite direction. |
| </p> |
| |
| <p> |
| Here follows some descriptions of key Go-specific details for the supported architectures. |
| </p> |
| |
| <h3 id="x86">32-bit Intel 386</h3> |
| |
| <p> |
| The runtime pointer to the <code>g</code> structure is maintained |
| through the value of an otherwise unused (as far as Go is concerned) register in the MMU. |
| A OS-dependent macro <code>get_tls</code> is defined for the assembler if the source includes |
| an architecture-dependent header file, like this: |
| </p> |
| |
| <pre> |
| #include "zasm_GOOS_GOARCH.h" |
| </pre> |
| |
| <p> |
| Within the runtime, the <code>get_tls</code> macro loads its argument register |
| with a pointer to the <code>g</code> pointer, and the <code>g</code> struct |
| contains the <code>m</code> pointer. |
| The sequence to load <code>g</code> and <code>m</code> using <code>CX</code> looks like this: |
| </p> |
| |
| <pre> |
| get_tls(CX) |
| MOVL g(CX), AX // Move g into AX. |
| MOVL g_m(AX), BX // Move g->m into BX. |
| </pre> |
| |
| <h3 id="amd64">64-bit Intel 386 (a.k.a. amd64)</h3> |
| |
| <p> |
| The assembly code to access the <code>m</code> and <code>g</code> |
| pointers is the same as on the 386, except it uses <code>MOVQ</code> rather than |
| <code>MOVL</code>: |
| </p> |
| |
| <pre> |
| get_tls(CX) |
| MOVQ g(CX), AX // Move g into AX. |
| MOVQ g_m(AX), BX // Move g->m into BX. |
| </pre> |
| |
| <h3 id="arm">ARM</h3> |
| |
| <p> |
| The registers <code>R10</code> and <code>R11</code> |
| are reserved by the compiler and linker. |
| </p> |
| |
| <p> |
| <code>R10</code> points to the <code>g</code> (goroutine) structure. |
| Within assembler source code, this pointer must be referred to as <code>g</code>; |
| the name <code>R10</code> is not recognized. |
| </p> |
| |
| <p> |
| To make it easier for people and compilers to write assembly, the ARM linker |
| allows general addressing forms and pseudo-operations like <code>DIV</code> or <code>MOD</code> |
| that may not be expressible using a single hardware instruction. |
| It implements these forms as multiple instructions, often using the <code>R11</code> register |
| to hold temporary values. |
| Hand-written assembly can use <code>R11</code>, but doing so requires |
| being sure that the linker is not also using it to implement any of the other |
| instructions in the function. |
| </p> |
| |
| <p> |
| When defining a <code>TEXT</code>, specifying frame size <code>$-4</code> |
| tells the linker that this is a leaf function that does not need to save <code>LR</code> on entry. |
| </p> |
| |
| <p> |
| The name <code>SP</code> always refers to the virtual stack pointer described earlier. |
| For the hardware register, use <code>R13</code>. |
| </p> |
| |
| <h3 id="unsupported_opcodes">Unsupported opcodes</h3> |
| |
| <p> |
| The assemblers are designed to support the compiler so not all hardware instructions |
| are defined for all architectures: if the compiler doesn't generate it, it might not be there. |
| If you need to use a missing instruction, there are two ways to proceed. |
| One is to update the assembler to support that instruction, which is straightforward |
| but only worthwhile if it's likely the instruction will be used again. |
| Instead, for simple one-off cases, it's possible to use the <code>BYTE</code> |
| and <code>WORD</code> directives |
| to lay down explicit data into the instruction stream within a <code>TEXT</code>. |
| Here's how the 386 runtime defines the 64-bit atomic load function. |
| </p> |
| |
| <pre> |
| // uint64 atomicload64(uint64 volatile* addr); |
| // so actually |
| // void atomicload64(uint64 *res, uint64 volatile *addr); |
| TEXT runtime·atomicload64(SB), NOSPLIT, $0-8 |
| MOVL ptr+0(FP), AX |
| LEAL ret_lo+4(FP), BX |
| BYTE $0x0f; BYTE $0x6f; BYTE $0x00 // MOVQ (%EAX), %MM0 |
| BYTE $0x0f; BYTE $0x7f; BYTE $0x03 // MOVQ %MM0, 0(%EBX) |
| BYTE $0x0F; BYTE $0x77 // EMMS |
| RET |
| </pre> |