blob: 11249a3b7fb4cee68d822f38c4234b5bfdfa6ae0 [file] [log] [blame]
Rob Pike2fbcb082013-11-12 20:04:22 -08001<!--{
2 "Title": "A Quick Guide to Go's Assembler",
Russ Coxa664b492013-11-13 21:29:34 -05003 "Path": "/doc/asm"
Rob Pike2fbcb082013-11-12 20:04:22 -08004}-->
5
6<h2 id="introduction">A Quick Guide to Go's Assembler</h2>
7
8<p>
Rob Pike012917a2015-07-08 15:53:47 +10009This document is a quick outline of the unusual form of assembly language used by the <code>gc</code> Go compiler.
Rob Pikeedebe102014-04-15 16:27:48 -070010The document is not comprehensive.
11</p>
12
13<p>
Rob Pike012917a2015-07-08 15:53:47 +100014The assembler is based on the input style of the Plan 9 assemblers, which is documented in detail
15<a href="http://plan9.bell-labs.com/sys/doc/asm.html">elsewhere</a>.
Rob Pike2fbcb082013-11-12 20:04:22 -080016If you plan to write assembly language, you should read that document although much of it is Plan 9-specific.
Rob Pike012917a2015-07-08 15:53:47 +100017The current document provides a summary of the syntax and the differences with
18what is explained in that document, and
Rob Pike2fbcb082013-11-12 20:04:22 -080019describes the peculiarities that apply when writing assembly code to interact with Go.
20</p>
21
22<p>
23The most important thing to know about Go's assembler is that it is not a direct representation of the underlying machine.
24Some of the details map precisely to the machine, but some do not.
25This is because the compiler suite (see
26<a href="http://plan9.bell-labs.com/sys/doc/compiler.html">this description</a>)
27needs no assembler pass in the usual pipeline.
Rob Pike012917a2015-07-08 15:53:47 +100028Instead, the compiler operates on a kind of semi-abstract instruction set,
29and instruction selection occurs partly after code generation.
30The assembler works on the semi-abstract form, so
31when you see an instruction like <code>MOV</code>
32what the tool chain actually generates for that operation might
33not be a move instruction at all, perhaps a clear or load.
Rob Pike2fbcb082013-11-12 20:04:22 -080034Or it might correspond exactly to the machine instruction with that name.
35In general, machine-specific operations tend to appear as themselves, while more general concepts like
36memory move and subroutine call and return are more abstract.
37The details vary with architecture, and we apologize for the imprecision; the situation is not well-defined.
38</p>
39
40<p>
Rob Pike012917a2015-07-08 15:53:47 +100041The assembler program is a way to parse a description of that
42semi-abstract instruction set and turn it into instructions to be
43input to the linker.
Rob Pike2fbcb082013-11-12 20:04:22 -080044If you want to see what the instructions look like in assembly for a given architecture, say amd64, there
45are many examples in the sources of the standard library, in packages such as
46<a href="/pkg/runtime/"><code>runtime</code></a> and
47<a href="/pkg/math/big/"><code>math/big</code></a>.
Rob Pike012917a2015-07-08 15:53:47 +100048You can also examine what the compiler emits as assembly code
49(the actual output may differ from what you see here):
Rob Pike2fbcb082013-11-12 20:04:22 -080050</p>
51
52<pre>
53$ cat x.go
54package main
55
56func main() {
57 println(3)
58}
Rob Pike012917a2015-07-08 15:53:47 +100059$ GOOS=linux GOARCH=amd64 go tool compile -S x.go # or: go build -gcflags -S x.go
Rob Pike2fbcb082013-11-12 20:04:22 -080060
61--- prog list "main" ---
620000 (x.go:3) TEXT main+0(SB),$8-0
630001 (x.go:3) FUNCDATA $0,gcargs·0+0(SB)
640002 (x.go:3) FUNCDATA $1,gclocals·0+0(SB)
650003 (x.go:4) MOVQ $3,(SP)
660004 (x.go:4) PCDATA $0,$8
670005 (x.go:4) CALL ,runtime.printint+0(SB)
680006 (x.go:4) PCDATA $0,$-1
690007 (x.go:4) PCDATA $0,$0
700008 (x.go:4) CALL ,runtime.printnl+0(SB)
710009 (x.go:4) PCDATA $0,$-1
720010 (x.go:5) RET ,
73...
74</pre>
75
76<p>
77The <code>FUNCDATA</code> and <code>PCDATA</code> directives contain information
78for use by the garbage collector; they are introduced by the compiler.
79</p>
80
Rob Pikeedebe102014-04-15 16:27:48 -070081<!-- Commenting out because the feature is gone but it's popular and may come back.
82
Rob Pike2fbcb082013-11-12 20:04:22 -080083<p>
84To see what gets put in the binary after linking, add the <code>-a</code> flag to the linker:
85</p>
86
87<pre>
88$ go tool 6l -a x.6 # or: go build -ldflags -a x.go
89codeblk [0x2000,0x1d059) at offset 0x1000
90002000 main.main | (3) TEXT main.main+0(SB),$8
91002000 65488b0c25a0080000 | (3) MOVQ 2208(GS),CX
92002009 483b21 | (3) CMPQ SP,(CX)
9300200c 7707 | (3) JHI ,2015
9400200e e83da20100 | (3) CALL ,1c250+runtime.morestack00
95002013 ebeb | (3) JMP ,2000
96002015 4883ec08 | (3) SUBQ $8,SP
97002019 | (3) FUNCDATA $0,main.gcargs·0+0(SB)
98002019 | (3) FUNCDATA $1,main.gclocals·0+0(SB)
99002019 48c7042403000000 | (4) MOVQ $3,(SP)
100002021 | (4) PCDATA $0,$8
101002021 e8aad20000 | (4) CALL ,f2d0+runtime.printint
102002026 | (4) PCDATA $0,$-1
103002026 | (4) PCDATA $0,$0
104002026 e865d40000 | (4) CALL ,f490+runtime.printnl
10500202b | (4) PCDATA $0,$-1
10600202b 4883c408 | (5) ADDQ $8,SP
10700202f c3 | (5) RET ,
108...
109</pre>
110
Rob Pikeedebe102014-04-15 16:27:48 -0700111-->
Rob Pike2fbcb082013-11-12 20:04:22 -0800112
Rob Pike012917a2015-07-08 15:53:47 +1000113<h3 id="constants">Constants</h3>
114
115<p>
116Although the assembler takes its guidance from the Plan 9 assemblers,
117it is a distinct program, so there are some differences.
118One is in constant evaluation.
119Constant expressions in the assembler are parsed using Go's operator
120precedence, not the C-like precedence of the original.
121Thus <code>3&amp;1<<2</code> is 4, not 0—it parses as <code>(3&amp;1)<<2</code>
122not <code>3&amp;(1<<2)</code>.
123Also, constants are always evaluated as 64-bit unsigned integers.
124Thus <code>-2</code> is not the integer value minus two,
125but the unsigned 64-bit integer with the same bit pattern.
126The distinction rarely matters but
127to avoid ambiguity, division or right shift where the right operand's
128high bit is set is rejected.
129</p>
130
Rob Pike2fbcb082013-11-12 20:04:22 -0800131<h3 id="symbols">Symbols</h3>
132
133<p>
Rob Pike012917a2015-07-08 15:53:47 +1000134Some symbols, such as <code>R1</code> or <code>LR</code>,
135are predefined and refer to registers.
136The exact set depends on the architecture.
137</p>
138
139<p>
140There are four predeclared symbols that refer to pseudo-registers.
141These are not real registers, but rather virtual registers maintained by
142the tool chain, such as a frame pointer.
143The set of pseudo-registers is the same for all architectures:
144</p>
145
146<ul>
147
148<li>
149<code>FP</code>: Frame pointer: arguments and locals.
150</li>
151
152<li>
153<code>PC</code>: Program counter:
154jumps and branches.
155</li>
156
157<li>
158<code>SB</code>: Static base pointer: global symbols.
159</li>
160
161<li>
162<code>SP</code>: Stack pointer: top of stack.
163</li>
164
165</ul>
166
167<p>
168All user-defined symbols are written as offsets to the pseudo-registers
169<code>FP</code> (arguments and locals) and <code>SB</code> (globals).
Rob Pike2fbcb082013-11-12 20:04:22 -0800170</p>
171
172<p>
173The <code>SB</code> pseudo-register can be thought of as the origin of memory, so the symbol <code>foo(SB)</code>
174is the name <code>foo</code> as an address in memory.
Russ Cox202bf8d2014-10-28 15:51:06 -0400175This form is used to name global functions and data.
Rob Pike012917a2015-07-08 15:53:47 +1000176Adding <code>&lt;&gt;</code> to the name, as in <span style="white-space: nowrap"><code>foo&lt;&gt;(SB)</code></span>, makes the name
Russ Cox202bf8d2014-10-28 15:51:06 -0400177visible only in the current source file, like a top-level <code>static</code> declaration in a C file.
Rob Pike012917a2015-07-08 15:53:47 +1000178Adding an offset to the name refers to that offset from the symbol's address, so
179<code>a+4(SB)</code> is four bytes past the start of <code>foo</code>.
Rob Pike2fbcb082013-11-12 20:04:22 -0800180</p>
181
182<p>
Russ Coxa664b492013-11-13 21:29:34 -0500183The <code>FP</code> pseudo-register is a virtual frame pointer
184used to refer to function arguments.
Rob Pike2fbcb082013-11-12 20:04:22 -0800185The compilers maintain a virtual frame pointer and refer to the arguments on the stack as offsets from that pseudo-register.
186Thus <code>0(FP)</code> is the first argument to the function,
187<code>8(FP)</code> is the second (on a 64-bit machine), and so on.
Rob Pike012917a2015-07-08 15:53:47 +1000188However, when referring to a function argument this way, it is necessary to place a name
Russ Coxa664b492013-11-13 21:29:34 -0500189at the beginning, as in <code>first_arg+0(FP)</code> and <code>second_arg+8(FP)</code>.
Rob Pike012917a2015-07-08 15:53:47 +1000190(The meaning of the offset—offset from the frame pointer—distinct
191from its use with <code>SB</code>, where it is an offset from the symbol.)
192The assembler enforces this convention, rejecting plain <code>0(FP)</code> and <code>8(FP)</code>.
193The actual name is semantically irrelevant but should be used to document
194the argument's name.
195It is worth stressing that <code>FP</code> is always a
196pseudo-register, not a hardware
197register, even on architectures with a hardware frame pointer.
198</p>
199
200<p>
Russ Cox202bf8d2014-10-28 15:51:06 -0400201For assembly functions with Go prototypes, <code>go</code> <code>vet</code> will check that the argument names
Russ Coxa664b492013-11-13 21:29:34 -0500202and offsets match.
Russ Cox202bf8d2014-10-28 15:51:06 -0400203On 32-bit systems, the low and high 32 bits of a 64-bit value are distinguished by adding
204a <code>_lo</code> or <code>_hi</code> suffix to the name, as in <code>arg_lo+0(FP)</code> or <code>arg_hi+4(FP)</code>.
205If a Go prototype does not name its result, the expected assembly name is <code>ret</code>.
Russ Coxa664b492013-11-13 21:29:34 -0500206</p>
207
208<p>
209The <code>SP</code> pseudo-register is a virtual stack pointer
210used to refer to frame-local variables and the arguments being
211prepared for function calls.
212It points to the top of the local stack frame, so references should use negative offsets
213in the range [−framesize, 0):
214<code>x-8(SP)</code>, <code>y-4(SP)</code>, and so on.
Rob Pike012917a2015-07-08 15:53:47 +1000215</p>
216
217<p>
218On architectures with a hardware register named <code>SP</code>,
219the name prefix distinguishes
220references to the virtual stack pointer from references to the architectural
221<code>SP</code> register.
222That is, <code>x-8(SP)</code> and <code>-8(SP)</code>
223are different memory locations:
224the first refers to the virtual stack pointer pseudo-register,
225while the second refers to the
Russ Coxa664b492013-11-13 21:29:34 -0500226hardware's <code>SP</code> register.
Rob Pike2fbcb082013-11-12 20:04:22 -0800227</p>
228
229<p>
Rob Pike012917a2015-07-08 15:53:47 +1000230On machines where <code>SP</code> and <code>PC</code> are
231traditionally aliases for a physical, numbered register,
232in the Go assembler the names <code>SP</code> and <code>PC</code>
233are still treated specially;
234for instance, references to <code>SP</code> require a symbol,
235much like <code>FP</code>.
236To access the actual hardware register use the true <code>R</code> name.
237For example, on the ARM architecture the hardware
238<code>SP</code> and <code>PC</code> are accessible as
239<code>R13</code> and <code>R15</code>.
240</p>
241
242<p>
243Branches and direct jumps are always written as offsets to the PC, or as
244jumps to labels:
245</p>
246
247<pre>
248label:
249 MOVW $0, R1
250 JMP label
251</pre>
252
253<p>
254Each label is visible only within the function in which it is defined.
255It is therefore permitted for multiple functions in a file to define
256and use the same label names.
257Direct jumps and call instructions can target text symbols,
258such as <code>name(SB)</code>, but not offsets from symbols,
259such as <code>name+4(SB)</code>.
260</p>
261
262<p>
Rob Pike2fbcb082013-11-12 20:04:22 -0800263Instructions, registers, and assembler directives are always in UPPER CASE to remind you
264that assembly programming is a fraught endeavor.
Russ Cox89f185f2014-06-26 11:54:39 -0400265(Exception: the <code>g</code> register renaming on ARM.)
Rob Pike2fbcb082013-11-12 20:04:22 -0800266</p>
267
268<p>
269In Go object files and binaries, the full name of a symbol is the
270package path followed by a period and the symbol name:
271<code>fmt.Printf</code> or <code>math/rand.Int</code>.
272Because the assembler's parser treats period and slash as punctuation,
273those strings cannot be used directly as identifier names.
274Instead, the assembler allows the middle dot character U+00B7
275and the division slash U+2215 in identifiers and rewrites them to
276plain period and slash.
277Within an assembler source file, the symbols above are written as
278<code>fmt·Printf</code> and <code>math∕rand·Int</code>.
279The assembly listings generated by the compilers when using the <code>-S</code> flag
280show the period and slash directly instead of the Unicode replacements
281required by the assemblers.
282</p>
283
284<p>
285Most hand-written assembly files do not include the full package path
286in symbol names, because the linker inserts the package path of the current
287object file at the beginning of any name starting with a period:
288in an assembly source file within the math/rand package implementation,
289the package's Int function can be referred to as <code>·Int</code>.
290This convention avoids the need to hard-code a package's import path in its
291own source code, making it easier to move the code from one location to another.
292</p>
293
294<h3 id="directives">Directives</h3>
295
296<p>
297The assembler uses various directives to bind text and data to symbol names.
298For example, here is a simple complete function definition. The <code>TEXT</code>
299directive declares the symbol <code>runtime·profileloop</code> and the instructions
300that follow form the body of the function.
301The last instruction in a <code>TEXT</code> block must be some sort of jump, usually a <code>RET</code> (pseudo-)instruction.
302(If it's not, the linker will append a jump-to-itself instruction; there is no fallthrough in <code>TEXTs</code>.)
303After the symbol, the arguments are flags (see below)
304and the frame size, a constant (but see below):
305</p>
306
307<pre>
308TEXT runtime·profileloop(SB),NOSPLIT,$8
309 MOVQ $runtime·profileloop1(SB), CX
310 MOVQ CX, 0(SP)
311 CALL runtime·externalthreadhandler(SB)
312 RET
313</pre>
314
315<p>
316In the general case, the frame size is followed by an argument size, separated by a minus sign.
Brad Fitzpatrick66075342014-04-27 07:40:48 -0700317(It's not a subtraction, just idiosyncratic syntax.)
Rob Pike2fbcb082013-11-12 20:04:22 -0800318The frame size <code>$24-8</code> states that the function has a 24-byte frame
319and is called with 8 bytes of argument, which live on the caller's frame.
320If <code>NOSPLIT</code> is not specified for the <code>TEXT</code>,
321the argument size must be provided.
Russ Cox202bf8d2014-10-28 15:51:06 -0400322For assembly functions with Go prototypes, <code>go</code> <code>vet</code> will check that the
323argument size is correct.
Rob Pike2fbcb082013-11-12 20:04:22 -0800324</p>
325
326<p>
327Note that the symbol name uses a middle dot to separate the components and is specified as an offset from the
328static base pseudo-register <code>SB</code>.
329This function would be called from Go source for package <code>runtime</code> using the
330simple name <code>profileloop</code>.
331</p>
332
333<p>
Russ Cox202bf8d2014-10-28 15:51:06 -0400334Global data symbols are defined by a sequence of initializing
335<code>DATA</code> directives followed by a <code>GLOBL</code> directive.
336Each <code>DATA</code> directive initializes a section of the
337corresponding memory.
338The memory not explicitly initialized is zeroed.
339The general form of the <code>DATA</code> directive is
Rob Pike2fbcb082013-11-12 20:04:22 -0800340
341<pre>
Russ Cox202bf8d2014-10-28 15:51:06 -0400342DATA symbol+offset(SB)/width, value
Rob Pike2fbcb082013-11-12 20:04:22 -0800343</pre>
344
345<p>
Russ Cox202bf8d2014-10-28 15:51:06 -0400346which initializes the symbol memory at the given offset and width with the given value.
347The <code>DATA</code> directives for a given symbol must be written with increasing offsets.
Rob Pike2fbcb082013-11-12 20:04:22 -0800348</p>
349
350<p>
351The <code>GLOBL</code> directive declares a symbol to be global.
352The arguments are optional flags and the size of the data being declared as a global,
353which will have initial value all zeros unless a <code>DATA</code> directive
354has initialized it.
355The <code>GLOBL</code> directive must follow any corresponding <code>DATA</code> directives.
Russ Cox202bf8d2014-10-28 15:51:06 -0400356</p>
357
358<p>
359For example,
Rob Pike2fbcb082013-11-12 20:04:22 -0800360</p>
361
362<pre>
Russ Cox202bf8d2014-10-28 15:51:06 -0400363DATA divtab&lt;&gt;+0x00(SB)/4, $0xf4f8fcff
364DATA divtab&lt;&gt;+0x04(SB)/4, $0xe6eaedf0
365...
366DATA divtab&lt;&gt;+0x3c(SB)/4, $0x81828384
367GLOBL divtab&lt;&gt;(SB), RODATA, $64
368
369GLOBL runtime·tlsoffset(SB), NOPTR, $4
Rob Pike2fbcb082013-11-12 20:04:22 -0800370</pre>
371
372<p>
Russ Cox202bf8d2014-10-28 15:51:06 -0400373declares and initializes <code>divtab&lt;&gt;</code>, a read-only 64-byte table of 4-byte integer values,
374and declares <code>runtime·tlsoffset</code>, a 4-byte, implicitly zeroed variable that
375contains no pointers.
Rob Pike2fbcb082013-11-12 20:04:22 -0800376</p>
377
378<p>
379There may be one or two arguments to the directives.
380If there are two, the first is a bit mask of flags,
381which can be written as numeric expressions, added or or-ed together,
382or can be set symbolically for easier absorption by a human.
Rob Pike8bca1482014-08-12 17:04:45 -0700383Their values, defined in the standard <code>#include</code> file <code>textflag.h</code>, are:
Rob Pike2fbcb082013-11-12 20:04:22 -0800384</p>
385
386<ul>
387<li>
388<code>NOPROF</code> = 1
389<br>
390(For <code>TEXT</code> items.)
391Don't profile the marked function. This flag is deprecated.
392</li>
393<li>
394<code>DUPOK</code> = 2
395<br>
396It is legal to have multiple instances of this symbol in a single binary.
397The linker will choose one of the duplicates to use.
398</li>
399<li>
400<code>NOSPLIT</code> = 4
401<br>
402(For <code>TEXT</code> items.)
403Don't insert the preamble to check if the stack must be split.
404The frame for the routine, plus anything it calls, must fit in the
405spare space at the top of the stack segment.
406Used to protect routines such as the stack splitting code itself.
407</li>
408<li>
409<code>RODATA</code> = 8
410<br>
411(For <code>DATA</code> and <code>GLOBL</code> items.)
412Put this data in a read-only section.
413</li>
414<li>
415<code>NOPTR</code> = 16
416<br>
417(For <code>DATA</code> and <code>GLOBL</code> items.)
418This data contains no pointers and therefore does not need to be
419scanned by the garbage collector.
420</li>
421<li>
Rob Pike012917a2015-07-08 15:53:47 +1000422<code>WRAPPER</code> = 32
Rob Pike2fbcb082013-11-12 20:04:22 -0800423<br>
424(For <code>TEXT</code> items.)
425This is a wrapper function and should not count as disabling <code>recover</code>.
426</li>
Rob Pike012917a2015-07-08 15:53:47 +1000427<li>
428<code>NEEDCTXT</code> = 64
429<br>
430(For <code>TEXT</code> items.)
431This function is a closure so it uses its incoming context register.
432</li>
Rob Pike2fbcb082013-11-12 20:04:22 -0800433</ul>
434
Russ Cox202bf8d2014-10-28 15:51:06 -0400435<h3 id="runtime">Runtime Coordination</h3>
436
437<p>
438For garbage collection to run correctly, the runtime must know the
439location of pointers in all global data and in most stack frames.
440The Go compiler emits this information when compiling Go source files,
441but assembly programs must define it explicitly.
442</p>
443
444<p>
445A data symbol marked with the <code>NOPTR</code> flag (see above)
446is treated as containing no pointers to runtime-allocated data.
447A data symbol with the <code>RODATA</code> flag
448is allocated in read-only memory and is therefore treated
449as implicitly marked <code>NOPTR</code>.
450A data symbol with a total size smaller than a pointer
451is also treated as implicitly marked <code>NOPTR</code>.
452It is not possible to define a symbol containing pointers in an assembly source file;
453such a symbol must be defined in a Go source file instead.
454Assembly source can still refer to the symbol by name
455even without <code>DATA</code> and <code>GLOBL</code> directives.
456A good general rule of thumb is to define all non-<code>RODATA</code>
457symbols in Go instead of in assembly.
458</p>
459
460<p>
461Each function also needs annotations giving the location of
462live pointers in its arguments, results, and local stack frame.
463For an assembly function with no pointer results and
464either no local stack frame or no function calls,
465the only requirement is to define a Go prototype for the function
Shenghou Ma7aa68752015-01-08 21:43:47 -0500466in a Go source file in the same package. The name of the assembly
467function must not contain the package name component (for example,
468function <code>Syscall</code> in package <code>syscall</code> should
469use the name <code>·Syscall</code> instead of the equivalent name
470<code>syscall·Syscall</code> in its <code>TEXT</code> directive).
Russ Cox202bf8d2014-10-28 15:51:06 -0400471For more complex situations, explicit annotation is needed.
472These annotations use pseudo-instructions defined in the standard
473<code>#include</code> file <code>funcdata.h</code>.
474</p>
475
476<p>
477If a function has no arguments and no results,
478the pointer information can be omitted.
479This is indicated by an argument size annotation of <code>$<i>n</i>-0</code>
480on the <code>TEXT</code> instruction.
481Otherwise, pointer information must be provided by
482a Go prototype for the function in a Go source file,
483even for assembly functions not called directly from Go.
484(The prototype will also let <code>go</code> <code>vet</code> check the argument references.)
485At the start of the function, the arguments are assumed
486to be initialized but the results are assumed uninitialized.
487If the results will hold live pointers during a call instruction,
488the function should start by zeroing the results and then
489executing the pseudo-instruction <code>GO_RESULTS_INITIALIZED</code>.
490This instruction records that the results are now initialized
491and should be scanned during stack movement and garbage collection.
492It is typically easier to arrange that assembly functions do not
493return pointers or do not contain call instructions;
494no assembly functions in the standard library use
495<code>GO_RESULTS_INITIALIZED</code>.
496</p>
497
498<p>
499If a function has no local stack frame,
500the pointer information can be omitted.
501This is indicated by a local frame size annotation of <code>$0-<i>n</i></code>
502on the <code>TEXT</code> instruction.
503The pointer information can also be omitted if the
504function contains no call instructions.
505Otherwise, the local stack frame must not contain pointers,
506and the assembly must confirm this fact by executing the
507pseudo-instruction <code>NO_LOCAL_POINTERS</code>.
508Because stack resizing is implemented by moving the stack,
509the stack pointer may change during any function call:
510even pointers to stack data must not be kept in local variables.
511</p>
512
Rob Pike2fbcb082013-11-12 20:04:22 -0800513<h2 id="architectures">Architecture-specific details</h2>
514
515<p>
516It is impractical to list all the instructions and other details for each machine.
Rob Pike3c5eb962015-07-13 15:22:35 +1000517To see what instructions are defined for a given machine, say ARM,
518look in the source for the <code>obj</code> support library for
519that architecture, located in the directory <code>src/cmd/internal/obj/arm</code>.
520In that directory is a file <code>a.out.go</code>; it contains
521a long list of constants starting with <code>A</code>, like this:
Rob Pike2fbcb082013-11-12 20:04:22 -0800522</p>
523
524<pre>
Rob Pike3c5eb962015-07-13 15:22:35 +1000525const (
526 AAND = obj.ABaseARM + obj.A_ARCHSPECIFIC + iota
527 AEOR
528 ASUB
529 ARSB
530 AADD
Rob Pike2fbcb082013-11-12 20:04:22 -0800531 ...
532</pre>
533
534<p>
Rob Pike3c5eb962015-07-13 15:22:35 +1000535This is the list of instructions and their spellings as known to the assembler and linker for that architecture.
536Each instruction begins with an initial capital <code>A</code> in this list, so <code>AAND</code>
537represents the bitwise and instruction,
538<code>AND</code> (without the leading <code>A</code>),
539and is written in assembly source as <code>AND</code>.
540The enumeration is mostly in alphabetical order.
541(The architecture-independent <code>AXXX</code>, defined in the
542<code>cmd/internal/obj</code> package,
543represents an invalid instruction).
544The sequence of the <code>A</code> names has nothing to do with the actual
545encoding of the machine instructions.
546The <code>cmd/internal/obj</code> package takes care of that detail.
547</p>
548
549<p>
550The instructions for both the 386 and AMD64 architectures are listed in
551<code>cmd/internal/obj/x86/a.out.go</code>.
552</p>
553
554<p>
555The architectures share syntax for common addressing modes such as
556<code>(R1)</code> (register indirect),
557<code>4(R1)</code> (register indirect with offset), and
558<code>$foo(SB)</code> (absolute address).
559The assembler also supports some (not necessarily all) addressing modes
560specific to each architecture.
561The sections below list these.
Rob Pike2fbcb082013-11-12 20:04:22 -0800562</p>
563
564<p>
565One detail evident in the examples from the previous sections is that data in the instructions flows from left to right:
566<code>MOVQ</code> <code>$0,</code> <code>CX</code> clears <code>CX</code>.
Rob Pike3c5eb962015-07-13 15:22:35 +1000567This rule applies even on architectures where the conventional notation uses the opposite direction.
Rob Pike2fbcb082013-11-12 20:04:22 -0800568</p>
569
570<p>
Rob Pike3c5eb962015-07-13 15:22:35 +1000571Here follow some descriptions of key Go-specific details for the supported architectures.
Rob Pike2fbcb082013-11-12 20:04:22 -0800572</p>
573
574<h3 id="x86">32-bit Intel 386</h3>
575
576<p>
Russ Cox89f185f2014-06-26 11:54:39 -0400577The runtime pointer to the <code>g</code> structure is maintained
Rob Pike2fbcb082013-11-12 20:04:22 -0800578through the value of an otherwise unused (as far as Go is concerned) register in the MMU.
579A OS-dependent macro <code>get_tls</code> is defined for the assembler if the source includes
Rob Pike3c5eb962015-07-13 15:22:35 +1000580a special header, <code>go_asm.h</code>:
Rob Pike2fbcb082013-11-12 20:04:22 -0800581</p>
582
583<pre>
Rob Pike3c5eb962015-07-13 15:22:35 +1000584#include "go_asm.h"
Rob Pike2fbcb082013-11-12 20:04:22 -0800585</pre>
586
587<p>
588Within the runtime, the <code>get_tls</code> macro loads its argument register
Russ Cox89f185f2014-06-26 11:54:39 -0400589with a pointer to the <code>g</code> pointer, and the <code>g</code> struct
590contains the <code>m</code> pointer.
Rob Pike2fbcb082013-11-12 20:04:22 -0800591The sequence to load <code>g</code> and <code>m</code> using <code>CX</code> looks like this:
592</p>
593
594<pre>
595get_tls(CX)
Russ Cox89f185f2014-06-26 11:54:39 -0400596MOVL g(CX), AX // Move g into AX.
Rob Pike3c5eb962015-07-13 15:22:35 +1000597MOVL g_m(AX), BX // Move g.m into BX.
Rob Pike2fbcb082013-11-12 20:04:22 -0800598</pre>
599
Rob Pike3c5eb962015-07-13 15:22:35 +1000600<p>
601Addressing modes:
602</p>
603
604<ul>
605
606<li>
607<code>(DI)(BX*2)</code>: The location at address <code>DI</code> plus <code>BX*2</code>.
608</li>
609
610<li>
611<code>64(DI)(BX*2)</code>: The location at address <code>DI</code> plus <code>BX*2</code> plus 64.
612These modes accept only 1, 2, 4, and 8 as scale factors.
613</li>
614
615</ul>
616
Rob Pike2fbcb082013-11-12 20:04:22 -0800617<h3 id="amd64">64-bit Intel 386 (a.k.a. amd64)</h3>
618
619<p>
Rob Pike3c5eb962015-07-13 15:22:35 +1000620The two architectures behave largely the same at the assembler level.
621Assembly code to access the <code>m</code> and <code>g</code>
622pointers on the 64-bit version is the same as on the 32-bit 386,
623except it uses <code>MOVQ</code> rather than <code>MOVL</code>:
Rob Pike2fbcb082013-11-12 20:04:22 -0800624</p>
625
626<pre>
627get_tls(CX)
Russ Cox89f185f2014-06-26 11:54:39 -0400628MOVQ g(CX), AX // Move g into AX.
Rob Pike3c5eb962015-07-13 15:22:35 +1000629MOVQ g_m(AX), BX // Move g.m into BX.
Rob Pike2fbcb082013-11-12 20:04:22 -0800630</pre>
631
632<h3 id="arm">ARM</h3>
633
634<p>
Russ Cox89f185f2014-06-26 11:54:39 -0400635The registers <code>R10</code> and <code>R11</code>
Russ Coxa664b492013-11-13 21:29:34 -0500636are reserved by the compiler and linker.
637</p>
638
639<p>
Russ Cox89f185f2014-06-26 11:54:39 -0400640<code>R10</code> points to the <code>g</code> (goroutine) structure.
641Within assembler source code, this pointer must be referred to as <code>g</code>;
642the name <code>R10</code> is not recognized.
Russ Coxa664b492013-11-13 21:29:34 -0500643</p>
644
645<p>
646To make it easier for people and compilers to write assembly, the ARM linker
647allows general addressing forms and pseudo-operations like <code>DIV</code> or <code>MOD</code>
648that may not be expressible using a single hardware instruction.
649It implements these forms as multiple instructions, often using the <code>R11</code> register
650to hold temporary values.
651Hand-written assembly can use <code>R11</code>, but doing so requires
652being sure that the linker is not also using it to implement any of the other
653instructions in the function.
Rob Pike2fbcb082013-11-12 20:04:22 -0800654</p>
655
656<p>
657When defining a <code>TEXT</code>, specifying frame size <code>$-4</code>
658tells the linker that this is a leaf function that does not need to save <code>LR</code> on entry.
659</p>
660
Russ Coxa664b492013-11-13 21:29:34 -0500661<p>
662The name <code>SP</code> always refers to the virtual stack pointer described earlier.
663For the hardware register, use <code>R13</code>.
664</p>
Rob Pike2fbcb082013-11-12 20:04:22 -0800665
Rob Pike3c5eb962015-07-13 15:22:35 +1000666<p>
Rob Pikedf9423f2015-07-14 10:24:40 +1000667Condition code syntax is to append a period and the one- or two-letter code to the instruction,
668as in <code>MOVW.EQ</code>.
669Multiple codes may be appended: <code>MOVM.IA.W</code>.
670The order of the code modifiers is irrelevant.
671</p>
672
673<p>
Rob Pike3c5eb962015-07-13 15:22:35 +1000674Addressing modes:
675</p>
676
677<ul>
678
679<li>
680<code>R0-&gt;16</code>
681<br>
682<code>R0&gt;&gt;16</code>
683<br>
684<code>R0&lt;&lt;16</code>
685<br>
686<code>R0@&gt;16</code>:
687For <code>&lt;&lt;</code>, left shift <code>R0</code> by 16 bits.
688The other codes are <code>-&gt;</code> (arithmetic right shift),
689<code>&gt;&gt;</code> (logical right shift), and
690<code>@&gt;</code> (rotate right).
691</li>
692
693<li>
694<code>R0-&gt;R1</code>
695<br>
696<code>R0&gt;&gt;R1</code>
697<br>
698<code>R0&lt;&lt;R1</code>
699<br>
700<code>R0@&gt;R1</code>:
701For <code>&lt;&lt;</code>, left shift <code>R0</code> by the count in <code>R1</code>.
702The other codes are <code>-&gt;</code> (arithmetic right shift),
703<code>&gt;&gt;</code> (logical right shift), and
704<code>@&gt;</code> (rotate right).
705
706</li>
707
708<li>
709<code>[R0,g,R12-R15]</code>: For multi-register instructions, the set comprising
710<code>R0</code>, <code>g</code>, and <code>R12</code> through <code>R15</code> inclusive.
711</li>
712
Rob Pikedf9423f2015-07-14 10:24:40 +1000713<li>
714<code>(R5, R6)</code>: Destination register pair.
715</li>
716
Rob Pike3c5eb962015-07-13 15:22:35 +1000717</ul>
718
719<h3 id="arm64">ARM64</h3>
720
721<p>
Rob Pikedf9423f2015-07-14 10:24:40 +1000722The ARM64 port is in an experimental state.
723</p>
724
725<p>
726Instruction modifiers are appended to the instruction following a period.
727The only modifiers are <code>P</code> (postincrement) and <code>W</code>
728(preincrement):
729<code>MOVW.P</code>, <code>MOVW.W</code>
Rob Pike3c5eb962015-07-13 15:22:35 +1000730</p>
731
732<p>
733Addressing modes:
734</p>
735
736<ul>
737
738<li>
Rob Pikedf9423f2015-07-14 10:24:40 +1000739<code>(R5, R6)</code>: Register pair for <code>LDP</code>/<code>STP</code>.
Rob Pike3c5eb962015-07-13 15:22:35 +1000740</li>
741
742</ul>
743
744<h3 id="ppc64">Power64, a.k.a. ppc64</h3>
745
746<p>
Rob Pikedf9423f2015-07-14 10:24:40 +1000747The Power 64 port is in an experimental state.
Rob Pike3c5eb962015-07-13 15:22:35 +1000748</p>
749
750<p>
751Addressing modes:
752</p>
753
754<ul>
755
756<li>
757<code>(R5)(R6*1)</code>: The location at <code>R5</code> plus <code>R6</code>. It is a scaled
Rob Pikedf9423f2015-07-14 10:24:40 +1000758mode as on the x86, but the only scale allowed is <code>1</code>.
759</li>
760
761<li>
762<code>(R5+R6)</code>: Alias for (R5)(R6*1)
Rob Pike3c5eb962015-07-13 15:22:35 +1000763</li>
764
765</ul>
766
Rob Pike2fbcb082013-11-12 20:04:22 -0800767<h3 id="unsupported_opcodes">Unsupported opcodes</h3>
768
769<p>
770The assemblers are designed to support the compiler so not all hardware instructions
771are defined for all architectures: if the compiler doesn't generate it, it might not be there.
772If you need to use a missing instruction, there are two ways to proceed.
773One is to update the assembler to support that instruction, which is straightforward
774but only worthwhile if it's likely the instruction will be used again.
775Instead, for simple one-off cases, it's possible to use the <code>BYTE</code>
776and <code>WORD</code> directives
777to lay down explicit data into the instruction stream within a <code>TEXT</code>.
778Here's how the 386 runtime defines the 64-bit atomic load function.
779</p>
780
781<pre>
782// uint64 atomicload64(uint64 volatile* addr);
783// so actually
784// void atomicload64(uint64 *res, uint64 volatile *addr);
Rob Pike3c5eb962015-07-13 15:22:35 +1000785TEXT runtime·atomicload64(SB), NOSPLIT, $0-12
Russ Cox202bf8d2014-10-28 15:51:06 -0400786 MOVL ptr+0(FP), AX
Rob Pike3c5eb962015-07-13 15:22:35 +1000787 TESTL $7, AX
788 JZ 2(PC)
789 MOVL 0, AX // crash with nil ptr deref
Russ Cox202bf8d2014-10-28 15:51:06 -0400790 LEAL ret_lo+4(FP), BX
Rob Pike3c5eb962015-07-13 15:22:35 +1000791 // MOVQ (%EAX), %MM0
792 BYTE $0x0f; BYTE $0x6f; BYTE $0x00
793 // MOVQ %MM0, 0(%EBX)
794 BYTE $0x0f; BYTE $0x7f; BYTE $0x03
795 // EMMS
796 BYTE $0x0F; BYTE $0x77
Rob Pike2fbcb082013-11-12 20:04:22 -0800797 RET
798</pre>