Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 1 | <!--{ |
| 2 | "Title": "A Quick Guide to Go's Assembler", |
Russ Cox | a664b49 | 2013-11-13 21:29:34 -0500 | [diff] [blame] | 3 | "Path": "/doc/asm" |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 4 | }--> |
| 5 | |
| 6 | <h2 id="introduction">A Quick Guide to Go's Assembler</h2> |
| 7 | |
| 8 | <p> |
Rob Pike | 012917a | 2015-07-08 15:53:47 +1000 | [diff] [blame] | 9 | This document is a quick outline of the unusual form of assembly language used by the <code>gc</code> Go compiler. |
Rob Pike | edebe10 | 2014-04-15 16:27:48 -0700 | [diff] [blame] | 10 | The document is not comprehensive. |
| 11 | </p> |
| 12 | |
| 13 | <p> |
Rob Pike | 012917a | 2015-07-08 15:53:47 +1000 | [diff] [blame] | 14 | The assembler is based on the input style of the Plan 9 assemblers, which is documented in detail |
| 15 | <a href="http://plan9.bell-labs.com/sys/doc/asm.html">elsewhere</a>. |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 16 | If you plan to write assembly language, you should read that document although much of it is Plan 9-specific. |
Rob Pike | 012917a | 2015-07-08 15:53:47 +1000 | [diff] [blame] | 17 | The current document provides a summary of the syntax and the differences with |
| 18 | what is explained in that document, and |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 19 | describes the peculiarities that apply when writing assembly code to interact with Go. |
| 20 | </p> |
| 21 | |
| 22 | <p> |
| 23 | The most important thing to know about Go's assembler is that it is not a direct representation of the underlying machine. |
| 24 | Some of the details map precisely to the machine, but some do not. |
| 25 | This is because the compiler suite (see |
| 26 | <a href="http://plan9.bell-labs.com/sys/doc/compiler.html">this description</a>) |
| 27 | needs no assembler pass in the usual pipeline. |
Rob Pike | 012917a | 2015-07-08 15:53:47 +1000 | [diff] [blame] | 28 | Instead, the compiler operates on a kind of semi-abstract instruction set, |
| 29 | and instruction selection occurs partly after code generation. |
| 30 | The assembler works on the semi-abstract form, so |
| 31 | when you see an instruction like <code>MOV</code> |
| 32 | what the tool chain actually generates for that operation might |
| 33 | not be a move instruction at all, perhaps a clear or load. |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 34 | Or it might correspond exactly to the machine instruction with that name. |
| 35 | In general, machine-specific operations tend to appear as themselves, while more general concepts like |
| 36 | memory move and subroutine call and return are more abstract. |
| 37 | The details vary with architecture, and we apologize for the imprecision; the situation is not well-defined. |
| 38 | </p> |
| 39 | |
| 40 | <p> |
Rob Pike | 012917a | 2015-07-08 15:53:47 +1000 | [diff] [blame] | 41 | The assembler program is a way to parse a description of that |
| 42 | semi-abstract instruction set and turn it into instructions to be |
| 43 | input to the linker. |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 44 | If you want to see what the instructions look like in assembly for a given architecture, say amd64, there |
| 45 | are many examples in the sources of the standard library, in packages such as |
| 46 | <a href="/pkg/runtime/"><code>runtime</code></a> and |
| 47 | <a href="/pkg/math/big/"><code>math/big</code></a>. |
Rob Pike | 012917a | 2015-07-08 15:53:47 +1000 | [diff] [blame] | 48 | You can also examine what the compiler emits as assembly code |
| 49 | (the actual output may differ from what you see here): |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 50 | </p> |
| 51 | |
| 52 | <pre> |
| 53 | $ cat x.go |
| 54 | package main |
| 55 | |
| 56 | func main() { |
| 57 | println(3) |
| 58 | } |
Rob Pike | 012917a | 2015-07-08 15:53:47 +1000 | [diff] [blame] | 59 | $ GOOS=linux GOARCH=amd64 go tool compile -S x.go # or: go build -gcflags -S x.go |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 60 | |
| 61 | --- prog list "main" --- |
| 62 | 0000 (x.go:3) TEXT main+0(SB),$8-0 |
| 63 | 0001 (x.go:3) FUNCDATA $0,gcargs·0+0(SB) |
| 64 | 0002 (x.go:3) FUNCDATA $1,gclocals·0+0(SB) |
| 65 | 0003 (x.go:4) MOVQ $3,(SP) |
| 66 | 0004 (x.go:4) PCDATA $0,$8 |
| 67 | 0005 (x.go:4) CALL ,runtime.printint+0(SB) |
| 68 | 0006 (x.go:4) PCDATA $0,$-1 |
| 69 | 0007 (x.go:4) PCDATA $0,$0 |
| 70 | 0008 (x.go:4) CALL ,runtime.printnl+0(SB) |
| 71 | 0009 (x.go:4) PCDATA $0,$-1 |
| 72 | 0010 (x.go:5) RET , |
| 73 | ... |
| 74 | </pre> |
| 75 | |
| 76 | <p> |
| 77 | The <code>FUNCDATA</code> and <code>PCDATA</code> directives contain information |
| 78 | for use by the garbage collector; they are introduced by the compiler. |
| 79 | </p> |
| 80 | |
Rob Pike | edebe10 | 2014-04-15 16:27:48 -0700 | [diff] [blame] | 81 | <!-- Commenting out because the feature is gone but it's popular and may come back. |
| 82 | |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 83 | <p> |
| 84 | To see what gets put in the binary after linking, add the <code>-a</code> flag to the linker: |
| 85 | </p> |
| 86 | |
| 87 | <pre> |
| 88 | $ go tool 6l -a x.6 # or: go build -ldflags -a x.go |
| 89 | codeblk [0x2000,0x1d059) at offset 0x1000 |
| 90 | 002000 main.main | (3) TEXT main.main+0(SB),$8 |
| 91 | 002000 65488b0c25a0080000 | (3) MOVQ 2208(GS),CX |
| 92 | 002009 483b21 | (3) CMPQ SP,(CX) |
| 93 | 00200c 7707 | (3) JHI ,2015 |
| 94 | 00200e e83da20100 | (3) CALL ,1c250+runtime.morestack00 |
| 95 | 002013 ebeb | (3) JMP ,2000 |
| 96 | 002015 4883ec08 | (3) SUBQ $8,SP |
| 97 | 002019 | (3) FUNCDATA $0,main.gcargs·0+0(SB) |
| 98 | 002019 | (3) FUNCDATA $1,main.gclocals·0+0(SB) |
| 99 | 002019 48c7042403000000 | (4) MOVQ $3,(SP) |
| 100 | 002021 | (4) PCDATA $0,$8 |
| 101 | 002021 e8aad20000 | (4) CALL ,f2d0+runtime.printint |
| 102 | 002026 | (4) PCDATA $0,$-1 |
| 103 | 002026 | (4) PCDATA $0,$0 |
| 104 | 002026 e865d40000 | (4) CALL ,f490+runtime.printnl |
| 105 | 00202b | (4) PCDATA $0,$-1 |
| 106 | 00202b 4883c408 | (5) ADDQ $8,SP |
| 107 | 00202f c3 | (5) RET , |
| 108 | ... |
| 109 | </pre> |
| 110 | |
Rob Pike | edebe10 | 2014-04-15 16:27:48 -0700 | [diff] [blame] | 111 | --> |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 112 | |
Rob Pike | 012917a | 2015-07-08 15:53:47 +1000 | [diff] [blame] | 113 | <h3 id="constants">Constants</h3> |
| 114 | |
| 115 | <p> |
| 116 | Although the assembler takes its guidance from the Plan 9 assemblers, |
| 117 | it is a distinct program, so there are some differences. |
| 118 | One is in constant evaluation. |
| 119 | Constant expressions in the assembler are parsed using Go's operator |
| 120 | precedence, not the C-like precedence of the original. |
| 121 | Thus <code>3&1<<2</code> is 4, not 0—it parses as <code>(3&1)<<2</code> |
| 122 | not <code>3&(1<<2)</code>. |
| 123 | Also, constants are always evaluated as 64-bit unsigned integers. |
| 124 | Thus <code>-2</code> is not the integer value minus two, |
| 125 | but the unsigned 64-bit integer with the same bit pattern. |
| 126 | The distinction rarely matters but |
| 127 | to avoid ambiguity, division or right shift where the right operand's |
| 128 | high bit is set is rejected. |
| 129 | </p> |
| 130 | |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 131 | <h3 id="symbols">Symbols</h3> |
| 132 | |
| 133 | <p> |
Rob Pike | 012917a | 2015-07-08 15:53:47 +1000 | [diff] [blame] | 134 | Some symbols, such as <code>R1</code> or <code>LR</code>, |
| 135 | are predefined and refer to registers. |
| 136 | The exact set depends on the architecture. |
| 137 | </p> |
| 138 | |
| 139 | <p> |
| 140 | There are four predeclared symbols that refer to pseudo-registers. |
| 141 | These are not real registers, but rather virtual registers maintained by |
| 142 | the tool chain, such as a frame pointer. |
| 143 | The set of pseudo-registers is the same for all architectures: |
| 144 | </p> |
| 145 | |
| 146 | <ul> |
| 147 | |
| 148 | <li> |
| 149 | <code>FP</code>: Frame pointer: arguments and locals. |
| 150 | </li> |
| 151 | |
| 152 | <li> |
| 153 | <code>PC</code>: Program counter: |
| 154 | jumps and branches. |
| 155 | </li> |
| 156 | |
| 157 | <li> |
| 158 | <code>SB</code>: Static base pointer: global symbols. |
| 159 | </li> |
| 160 | |
| 161 | <li> |
| 162 | <code>SP</code>: Stack pointer: top of stack. |
| 163 | </li> |
| 164 | |
| 165 | </ul> |
| 166 | |
| 167 | <p> |
| 168 | All user-defined symbols are written as offsets to the pseudo-registers |
| 169 | <code>FP</code> (arguments and locals) and <code>SB</code> (globals). |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 170 | </p> |
| 171 | |
| 172 | <p> |
| 173 | The <code>SB</code> pseudo-register can be thought of as the origin of memory, so the symbol <code>foo(SB)</code> |
| 174 | is the name <code>foo</code> as an address in memory. |
Russ Cox | 202bf8d | 2014-10-28 15:51:06 -0400 | [diff] [blame] | 175 | This form is used to name global functions and data. |
Rob Pike | 012917a | 2015-07-08 15:53:47 +1000 | [diff] [blame] | 176 | Adding <code><></code> to the name, as in <span style="white-space: nowrap"><code>foo<>(SB)</code></span>, makes the name |
Russ Cox | 202bf8d | 2014-10-28 15:51:06 -0400 | [diff] [blame] | 177 | visible only in the current source file, like a top-level <code>static</code> declaration in a C file. |
Rob Pike | 012917a | 2015-07-08 15:53:47 +1000 | [diff] [blame] | 178 | Adding an offset to the name refers to that offset from the symbol's address, so |
| 179 | <code>a+4(SB)</code> is four bytes past the start of <code>foo</code>. |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 180 | </p> |
| 181 | |
| 182 | <p> |
Russ Cox | a664b49 | 2013-11-13 21:29:34 -0500 | [diff] [blame] | 183 | The <code>FP</code> pseudo-register is a virtual frame pointer |
| 184 | used to refer to function arguments. |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 185 | The compilers maintain a virtual frame pointer and refer to the arguments on the stack as offsets from that pseudo-register. |
| 186 | Thus <code>0(FP)</code> is the first argument to the function, |
| 187 | <code>8(FP)</code> is the second (on a 64-bit machine), and so on. |
Rob Pike | 012917a | 2015-07-08 15:53:47 +1000 | [diff] [blame] | 188 | However, when referring to a function argument this way, it is necessary to place a name |
Russ Cox | a664b49 | 2013-11-13 21:29:34 -0500 | [diff] [blame] | 189 | at the beginning, as in <code>first_arg+0(FP)</code> and <code>second_arg+8(FP)</code>. |
Rob Pike | 012917a | 2015-07-08 15:53:47 +1000 | [diff] [blame] | 190 | (The meaning of the offset—offset from the frame pointer—distinct |
| 191 | from its use with <code>SB</code>, where it is an offset from the symbol.) |
| 192 | The assembler enforces this convention, rejecting plain <code>0(FP)</code> and <code>8(FP)</code>. |
| 193 | The actual name is semantically irrelevant but should be used to document |
| 194 | the argument's name. |
| 195 | It is worth stressing that <code>FP</code> is always a |
| 196 | pseudo-register, not a hardware |
| 197 | register, even on architectures with a hardware frame pointer. |
| 198 | </p> |
| 199 | |
| 200 | <p> |
Russ Cox | 202bf8d | 2014-10-28 15:51:06 -0400 | [diff] [blame] | 201 | For assembly functions with Go prototypes, <code>go</code> <code>vet</code> will check that the argument names |
Russ Cox | a664b49 | 2013-11-13 21:29:34 -0500 | [diff] [blame] | 202 | and offsets match. |
Russ Cox | 202bf8d | 2014-10-28 15:51:06 -0400 | [diff] [blame] | 203 | On 32-bit systems, the low and high 32 bits of a 64-bit value are distinguished by adding |
| 204 | a <code>_lo</code> or <code>_hi</code> suffix to the name, as in <code>arg_lo+0(FP)</code> or <code>arg_hi+4(FP)</code>. |
| 205 | If a Go prototype does not name its result, the expected assembly name is <code>ret</code>. |
Russ Cox | a664b49 | 2013-11-13 21:29:34 -0500 | [diff] [blame] | 206 | </p> |
| 207 | |
| 208 | <p> |
| 209 | The <code>SP</code> pseudo-register is a virtual stack pointer |
| 210 | used to refer to frame-local variables and the arguments being |
| 211 | prepared for function calls. |
| 212 | It points to the top of the local stack frame, so references should use negative offsets |
| 213 | in the range [−framesize, 0): |
| 214 | <code>x-8(SP)</code>, <code>y-4(SP)</code>, and so on. |
Rob Pike | 012917a | 2015-07-08 15:53:47 +1000 | [diff] [blame] | 215 | </p> |
| 216 | |
| 217 | <p> |
| 218 | On architectures with a hardware register named <code>SP</code>, |
| 219 | the name prefix distinguishes |
| 220 | references to the virtual stack pointer from references to the architectural |
| 221 | <code>SP</code> register. |
| 222 | That is, <code>x-8(SP)</code> and <code>-8(SP)</code> |
| 223 | are different memory locations: |
| 224 | the first refers to the virtual stack pointer pseudo-register, |
| 225 | while the second refers to the |
Russ Cox | a664b49 | 2013-11-13 21:29:34 -0500 | [diff] [blame] | 226 | hardware's <code>SP</code> register. |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 227 | </p> |
| 228 | |
| 229 | <p> |
Rob Pike | 012917a | 2015-07-08 15:53:47 +1000 | [diff] [blame] | 230 | On machines where <code>SP</code> and <code>PC</code> are |
| 231 | traditionally aliases for a physical, numbered register, |
| 232 | in the Go assembler the names <code>SP</code> and <code>PC</code> |
| 233 | are still treated specially; |
| 234 | for instance, references to <code>SP</code> require a symbol, |
| 235 | much like <code>FP</code>. |
| 236 | To access the actual hardware register use the true <code>R</code> name. |
| 237 | For example, on the ARM architecture the hardware |
| 238 | <code>SP</code> and <code>PC</code> are accessible as |
| 239 | <code>R13</code> and <code>R15</code>. |
| 240 | </p> |
| 241 | |
| 242 | <p> |
| 243 | Branches and direct jumps are always written as offsets to the PC, or as |
| 244 | jumps to labels: |
| 245 | </p> |
| 246 | |
| 247 | <pre> |
| 248 | label: |
| 249 | MOVW $0, R1 |
| 250 | JMP label |
| 251 | </pre> |
| 252 | |
| 253 | <p> |
| 254 | Each label is visible only within the function in which it is defined. |
| 255 | It is therefore permitted for multiple functions in a file to define |
| 256 | and use the same label names. |
| 257 | Direct jumps and call instructions can target text symbols, |
| 258 | such as <code>name(SB)</code>, but not offsets from symbols, |
| 259 | such as <code>name+4(SB)</code>. |
| 260 | </p> |
| 261 | |
| 262 | <p> |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 263 | Instructions, registers, and assembler directives are always in UPPER CASE to remind you |
| 264 | that assembly programming is a fraught endeavor. |
Russ Cox | 89f185f | 2014-06-26 11:54:39 -0400 | [diff] [blame] | 265 | (Exception: the <code>g</code> register renaming on ARM.) |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 266 | </p> |
| 267 | |
| 268 | <p> |
| 269 | In Go object files and binaries, the full name of a symbol is the |
| 270 | package path followed by a period and the symbol name: |
| 271 | <code>fmt.Printf</code> or <code>math/rand.Int</code>. |
| 272 | Because the assembler's parser treats period and slash as punctuation, |
| 273 | those strings cannot be used directly as identifier names. |
| 274 | Instead, the assembler allows the middle dot character U+00B7 |
| 275 | and the division slash U+2215 in identifiers and rewrites them to |
| 276 | plain period and slash. |
| 277 | Within an assembler source file, the symbols above are written as |
| 278 | <code>fmt·Printf</code> and <code>math∕rand·Int</code>. |
| 279 | The assembly listings generated by the compilers when using the <code>-S</code> flag |
| 280 | show the period and slash directly instead of the Unicode replacements |
| 281 | required by the assemblers. |
| 282 | </p> |
| 283 | |
| 284 | <p> |
| 285 | Most hand-written assembly files do not include the full package path |
| 286 | in symbol names, because the linker inserts the package path of the current |
| 287 | object file at the beginning of any name starting with a period: |
| 288 | in an assembly source file within the math/rand package implementation, |
| 289 | the package's Int function can be referred to as <code>·Int</code>. |
| 290 | This convention avoids the need to hard-code a package's import path in its |
| 291 | own source code, making it easier to move the code from one location to another. |
| 292 | </p> |
| 293 | |
| 294 | <h3 id="directives">Directives</h3> |
| 295 | |
| 296 | <p> |
| 297 | The assembler uses various directives to bind text and data to symbol names. |
| 298 | For example, here is a simple complete function definition. The <code>TEXT</code> |
| 299 | directive declares the symbol <code>runtime·profileloop</code> and the instructions |
| 300 | that follow form the body of the function. |
| 301 | The last instruction in a <code>TEXT</code> block must be some sort of jump, usually a <code>RET</code> (pseudo-)instruction. |
| 302 | (If it's not, the linker will append a jump-to-itself instruction; there is no fallthrough in <code>TEXTs</code>.) |
| 303 | After the symbol, the arguments are flags (see below) |
| 304 | and the frame size, a constant (but see below): |
| 305 | </p> |
| 306 | |
| 307 | <pre> |
| 308 | TEXT runtime·profileloop(SB),NOSPLIT,$8 |
| 309 | MOVQ $runtime·profileloop1(SB), CX |
| 310 | MOVQ CX, 0(SP) |
| 311 | CALL runtime·externalthreadhandler(SB) |
| 312 | RET |
| 313 | </pre> |
| 314 | |
| 315 | <p> |
| 316 | In the general case, the frame size is followed by an argument size, separated by a minus sign. |
Brad Fitzpatrick | 6607534 | 2014-04-27 07:40:48 -0700 | [diff] [blame] | 317 | (It's not a subtraction, just idiosyncratic syntax.) |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 318 | The frame size <code>$24-8</code> states that the function has a 24-byte frame |
| 319 | and is called with 8 bytes of argument, which live on the caller's frame. |
| 320 | If <code>NOSPLIT</code> is not specified for the <code>TEXT</code>, |
| 321 | the argument size must be provided. |
Russ Cox | 202bf8d | 2014-10-28 15:51:06 -0400 | [diff] [blame] | 322 | For assembly functions with Go prototypes, <code>go</code> <code>vet</code> will check that the |
| 323 | argument size is correct. |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 324 | </p> |
| 325 | |
| 326 | <p> |
| 327 | Note that the symbol name uses a middle dot to separate the components and is specified as an offset from the |
| 328 | static base pseudo-register <code>SB</code>. |
| 329 | This function would be called from Go source for package <code>runtime</code> using the |
| 330 | simple name <code>profileloop</code>. |
| 331 | </p> |
| 332 | |
| 333 | <p> |
Russ Cox | 202bf8d | 2014-10-28 15:51:06 -0400 | [diff] [blame] | 334 | Global data symbols are defined by a sequence of initializing |
| 335 | <code>DATA</code> directives followed by a <code>GLOBL</code> directive. |
| 336 | Each <code>DATA</code> directive initializes a section of the |
| 337 | corresponding memory. |
| 338 | The memory not explicitly initialized is zeroed. |
| 339 | The general form of the <code>DATA</code> directive is |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 340 | |
| 341 | <pre> |
Russ Cox | 202bf8d | 2014-10-28 15:51:06 -0400 | [diff] [blame] | 342 | DATA symbol+offset(SB)/width, value |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 343 | </pre> |
| 344 | |
| 345 | <p> |
Russ Cox | 202bf8d | 2014-10-28 15:51:06 -0400 | [diff] [blame] | 346 | which initializes the symbol memory at the given offset and width with the given value. |
| 347 | The <code>DATA</code> directives for a given symbol must be written with increasing offsets. |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 348 | </p> |
| 349 | |
| 350 | <p> |
| 351 | The <code>GLOBL</code> directive declares a symbol to be global. |
| 352 | The arguments are optional flags and the size of the data being declared as a global, |
| 353 | which will have initial value all zeros unless a <code>DATA</code> directive |
| 354 | has initialized it. |
| 355 | The <code>GLOBL</code> directive must follow any corresponding <code>DATA</code> directives. |
Russ Cox | 202bf8d | 2014-10-28 15:51:06 -0400 | [diff] [blame] | 356 | </p> |
| 357 | |
| 358 | <p> |
| 359 | For example, |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 360 | </p> |
| 361 | |
| 362 | <pre> |
Russ Cox | 202bf8d | 2014-10-28 15:51:06 -0400 | [diff] [blame] | 363 | DATA divtab<>+0x00(SB)/4, $0xf4f8fcff |
| 364 | DATA divtab<>+0x04(SB)/4, $0xe6eaedf0 |
| 365 | ... |
| 366 | DATA divtab<>+0x3c(SB)/4, $0x81828384 |
| 367 | GLOBL divtab<>(SB), RODATA, $64 |
| 368 | |
| 369 | GLOBL runtime·tlsoffset(SB), NOPTR, $4 |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 370 | </pre> |
| 371 | |
| 372 | <p> |
Russ Cox | 202bf8d | 2014-10-28 15:51:06 -0400 | [diff] [blame] | 373 | declares and initializes <code>divtab<></code>, a read-only 64-byte table of 4-byte integer values, |
| 374 | and declares <code>runtime·tlsoffset</code>, a 4-byte, implicitly zeroed variable that |
| 375 | contains no pointers. |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 376 | </p> |
| 377 | |
| 378 | <p> |
| 379 | There may be one or two arguments to the directives. |
| 380 | If there are two, the first is a bit mask of flags, |
| 381 | which can be written as numeric expressions, added or or-ed together, |
| 382 | or can be set symbolically for easier absorption by a human. |
Rob Pike | 8bca148 | 2014-08-12 17:04:45 -0700 | [diff] [blame] | 383 | Their values, defined in the standard <code>#include</code> file <code>textflag.h</code>, are: |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 384 | </p> |
| 385 | |
| 386 | <ul> |
| 387 | <li> |
| 388 | <code>NOPROF</code> = 1 |
| 389 | <br> |
| 390 | (For <code>TEXT</code> items.) |
| 391 | Don't profile the marked function. This flag is deprecated. |
| 392 | </li> |
| 393 | <li> |
| 394 | <code>DUPOK</code> = 2 |
| 395 | <br> |
| 396 | It is legal to have multiple instances of this symbol in a single binary. |
| 397 | The linker will choose one of the duplicates to use. |
| 398 | </li> |
| 399 | <li> |
| 400 | <code>NOSPLIT</code> = 4 |
| 401 | <br> |
| 402 | (For <code>TEXT</code> items.) |
| 403 | Don't insert the preamble to check if the stack must be split. |
| 404 | The frame for the routine, plus anything it calls, must fit in the |
| 405 | spare space at the top of the stack segment. |
| 406 | Used to protect routines such as the stack splitting code itself. |
| 407 | </li> |
| 408 | <li> |
| 409 | <code>RODATA</code> = 8 |
| 410 | <br> |
| 411 | (For <code>DATA</code> and <code>GLOBL</code> items.) |
| 412 | Put this data in a read-only section. |
| 413 | </li> |
| 414 | <li> |
| 415 | <code>NOPTR</code> = 16 |
| 416 | <br> |
| 417 | (For <code>DATA</code> and <code>GLOBL</code> items.) |
| 418 | This data contains no pointers and therefore does not need to be |
| 419 | scanned by the garbage collector. |
| 420 | </li> |
| 421 | <li> |
Rob Pike | 012917a | 2015-07-08 15:53:47 +1000 | [diff] [blame] | 422 | <code>WRAPPER</code> = 32 |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 423 | <br> |
| 424 | (For <code>TEXT</code> items.) |
| 425 | This is a wrapper function and should not count as disabling <code>recover</code>. |
| 426 | </li> |
Rob Pike | 012917a | 2015-07-08 15:53:47 +1000 | [diff] [blame] | 427 | <li> |
| 428 | <code>NEEDCTXT</code> = 64 |
| 429 | <br> |
| 430 | (For <code>TEXT</code> items.) |
| 431 | This function is a closure so it uses its incoming context register. |
| 432 | </li> |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 433 | </ul> |
| 434 | |
Russ Cox | 202bf8d | 2014-10-28 15:51:06 -0400 | [diff] [blame] | 435 | <h3 id="runtime">Runtime Coordination</h3> |
| 436 | |
| 437 | <p> |
| 438 | For garbage collection to run correctly, the runtime must know the |
| 439 | location of pointers in all global data and in most stack frames. |
| 440 | The Go compiler emits this information when compiling Go source files, |
| 441 | but assembly programs must define it explicitly. |
| 442 | </p> |
| 443 | |
| 444 | <p> |
| 445 | A data symbol marked with the <code>NOPTR</code> flag (see above) |
| 446 | is treated as containing no pointers to runtime-allocated data. |
| 447 | A data symbol with the <code>RODATA</code> flag |
| 448 | is allocated in read-only memory and is therefore treated |
| 449 | as implicitly marked <code>NOPTR</code>. |
| 450 | A data symbol with a total size smaller than a pointer |
| 451 | is also treated as implicitly marked <code>NOPTR</code>. |
| 452 | It is not possible to define a symbol containing pointers in an assembly source file; |
| 453 | such a symbol must be defined in a Go source file instead. |
| 454 | Assembly source can still refer to the symbol by name |
| 455 | even without <code>DATA</code> and <code>GLOBL</code> directives. |
| 456 | A good general rule of thumb is to define all non-<code>RODATA</code> |
| 457 | symbols in Go instead of in assembly. |
| 458 | </p> |
| 459 | |
| 460 | <p> |
| 461 | Each function also needs annotations giving the location of |
| 462 | live pointers in its arguments, results, and local stack frame. |
| 463 | For an assembly function with no pointer results and |
| 464 | either no local stack frame or no function calls, |
| 465 | the only requirement is to define a Go prototype for the function |
Shenghou Ma | 7aa6875 | 2015-01-08 21:43:47 -0500 | [diff] [blame] | 466 | in a Go source file in the same package. The name of the assembly |
| 467 | function must not contain the package name component (for example, |
| 468 | function <code>Syscall</code> in package <code>syscall</code> should |
| 469 | use the name <code>·Syscall</code> instead of the equivalent name |
| 470 | <code>syscall·Syscall</code> in its <code>TEXT</code> directive). |
Russ Cox | 202bf8d | 2014-10-28 15:51:06 -0400 | [diff] [blame] | 471 | For more complex situations, explicit annotation is needed. |
| 472 | These annotations use pseudo-instructions defined in the standard |
| 473 | <code>#include</code> file <code>funcdata.h</code>. |
| 474 | </p> |
| 475 | |
| 476 | <p> |
| 477 | If a function has no arguments and no results, |
| 478 | the pointer information can be omitted. |
| 479 | This is indicated by an argument size annotation of <code>$<i>n</i>-0</code> |
| 480 | on the <code>TEXT</code> instruction. |
| 481 | Otherwise, pointer information must be provided by |
| 482 | a Go prototype for the function in a Go source file, |
| 483 | even for assembly functions not called directly from Go. |
| 484 | (The prototype will also let <code>go</code> <code>vet</code> check the argument references.) |
| 485 | At the start of the function, the arguments are assumed |
| 486 | to be initialized but the results are assumed uninitialized. |
| 487 | If the results will hold live pointers during a call instruction, |
| 488 | the function should start by zeroing the results and then |
| 489 | executing the pseudo-instruction <code>GO_RESULTS_INITIALIZED</code>. |
| 490 | This instruction records that the results are now initialized |
| 491 | and should be scanned during stack movement and garbage collection. |
| 492 | It is typically easier to arrange that assembly functions do not |
| 493 | return pointers or do not contain call instructions; |
| 494 | no assembly functions in the standard library use |
| 495 | <code>GO_RESULTS_INITIALIZED</code>. |
| 496 | </p> |
| 497 | |
| 498 | <p> |
| 499 | If a function has no local stack frame, |
| 500 | the pointer information can be omitted. |
| 501 | This is indicated by a local frame size annotation of <code>$0-<i>n</i></code> |
| 502 | on the <code>TEXT</code> instruction. |
| 503 | The pointer information can also be omitted if the |
| 504 | function contains no call instructions. |
| 505 | Otherwise, the local stack frame must not contain pointers, |
| 506 | and the assembly must confirm this fact by executing the |
| 507 | pseudo-instruction <code>NO_LOCAL_POINTERS</code>. |
| 508 | Because stack resizing is implemented by moving the stack, |
| 509 | the stack pointer may change during any function call: |
| 510 | even pointers to stack data must not be kept in local variables. |
| 511 | </p> |
| 512 | |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 513 | <h2 id="architectures">Architecture-specific details</h2> |
| 514 | |
| 515 | <p> |
| 516 | It is impractical to list all the instructions and other details for each machine. |
Rob Pike | 3c5eb96 | 2015-07-13 15:22:35 +1000 | [diff] [blame] | 517 | To see what instructions are defined for a given machine, say ARM, |
| 518 | look in the source for the <code>obj</code> support library for |
| 519 | that architecture, located in the directory <code>src/cmd/internal/obj/arm</code>. |
| 520 | In that directory is a file <code>a.out.go</code>; it contains |
| 521 | a long list of constants starting with <code>A</code>, like this: |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 522 | </p> |
| 523 | |
| 524 | <pre> |
Rob Pike | 3c5eb96 | 2015-07-13 15:22:35 +1000 | [diff] [blame] | 525 | const ( |
| 526 | AAND = obj.ABaseARM + obj.A_ARCHSPECIFIC + iota |
| 527 | AEOR |
| 528 | ASUB |
| 529 | ARSB |
| 530 | AADD |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 531 | ... |
| 532 | </pre> |
| 533 | |
| 534 | <p> |
Rob Pike | 3c5eb96 | 2015-07-13 15:22:35 +1000 | [diff] [blame] | 535 | This is the list of instructions and their spellings as known to the assembler and linker for that architecture. |
| 536 | Each instruction begins with an initial capital <code>A</code> in this list, so <code>AAND</code> |
| 537 | represents the bitwise and instruction, |
| 538 | <code>AND</code> (without the leading <code>A</code>), |
| 539 | and is written in assembly source as <code>AND</code>. |
| 540 | The enumeration is mostly in alphabetical order. |
| 541 | (The architecture-independent <code>AXXX</code>, defined in the |
| 542 | <code>cmd/internal/obj</code> package, |
| 543 | represents an invalid instruction). |
| 544 | The sequence of the <code>A</code> names has nothing to do with the actual |
| 545 | encoding of the machine instructions. |
| 546 | The <code>cmd/internal/obj</code> package takes care of that detail. |
| 547 | </p> |
| 548 | |
| 549 | <p> |
| 550 | The instructions for both the 386 and AMD64 architectures are listed in |
| 551 | <code>cmd/internal/obj/x86/a.out.go</code>. |
| 552 | </p> |
| 553 | |
| 554 | <p> |
| 555 | The architectures share syntax for common addressing modes such as |
| 556 | <code>(R1)</code> (register indirect), |
| 557 | <code>4(R1)</code> (register indirect with offset), and |
| 558 | <code>$foo(SB)</code> (absolute address). |
| 559 | The assembler also supports some (not necessarily all) addressing modes |
| 560 | specific to each architecture. |
| 561 | The sections below list these. |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 562 | </p> |
| 563 | |
| 564 | <p> |
| 565 | One detail evident in the examples from the previous sections is that data in the instructions flows from left to right: |
| 566 | <code>MOVQ</code> <code>$0,</code> <code>CX</code> clears <code>CX</code>. |
Rob Pike | 3c5eb96 | 2015-07-13 15:22:35 +1000 | [diff] [blame] | 567 | This rule applies even on architectures where the conventional notation uses the opposite direction. |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 568 | </p> |
| 569 | |
| 570 | <p> |
Rob Pike | 3c5eb96 | 2015-07-13 15:22:35 +1000 | [diff] [blame] | 571 | Here follow some descriptions of key Go-specific details for the supported architectures. |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 572 | </p> |
| 573 | |
| 574 | <h3 id="x86">32-bit Intel 386</h3> |
| 575 | |
| 576 | <p> |
Russ Cox | 89f185f | 2014-06-26 11:54:39 -0400 | [diff] [blame] | 577 | The runtime pointer to the <code>g</code> structure is maintained |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 578 | through the value of an otherwise unused (as far as Go is concerned) register in the MMU. |
| 579 | A OS-dependent macro <code>get_tls</code> is defined for the assembler if the source includes |
Rob Pike | 3c5eb96 | 2015-07-13 15:22:35 +1000 | [diff] [blame] | 580 | a special header, <code>go_asm.h</code>: |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 581 | </p> |
| 582 | |
| 583 | <pre> |
Rob Pike | 3c5eb96 | 2015-07-13 15:22:35 +1000 | [diff] [blame] | 584 | #include "go_asm.h" |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 585 | </pre> |
| 586 | |
| 587 | <p> |
| 588 | Within the runtime, the <code>get_tls</code> macro loads its argument register |
Russ Cox | 89f185f | 2014-06-26 11:54:39 -0400 | [diff] [blame] | 589 | with a pointer to the <code>g</code> pointer, and the <code>g</code> struct |
| 590 | contains the <code>m</code> pointer. |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 591 | The sequence to load <code>g</code> and <code>m</code> using <code>CX</code> looks like this: |
| 592 | </p> |
| 593 | |
| 594 | <pre> |
| 595 | get_tls(CX) |
Russ Cox | 89f185f | 2014-06-26 11:54:39 -0400 | [diff] [blame] | 596 | MOVL g(CX), AX // Move g into AX. |
Rob Pike | 3c5eb96 | 2015-07-13 15:22:35 +1000 | [diff] [blame] | 597 | MOVL g_m(AX), BX // Move g.m into BX. |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 598 | </pre> |
| 599 | |
Rob Pike | 3c5eb96 | 2015-07-13 15:22:35 +1000 | [diff] [blame] | 600 | <p> |
| 601 | Addressing modes: |
| 602 | </p> |
| 603 | |
| 604 | <ul> |
| 605 | |
| 606 | <li> |
| 607 | <code>(DI)(BX*2)</code>: The location at address <code>DI</code> plus <code>BX*2</code>. |
| 608 | </li> |
| 609 | |
| 610 | <li> |
| 611 | <code>64(DI)(BX*2)</code>: The location at address <code>DI</code> plus <code>BX*2</code> plus 64. |
| 612 | These modes accept only 1, 2, 4, and 8 as scale factors. |
| 613 | </li> |
| 614 | |
| 615 | </ul> |
| 616 | |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 617 | <h3 id="amd64">64-bit Intel 386 (a.k.a. amd64)</h3> |
| 618 | |
| 619 | <p> |
Rob Pike | 3c5eb96 | 2015-07-13 15:22:35 +1000 | [diff] [blame] | 620 | The two architectures behave largely the same at the assembler level. |
| 621 | Assembly code to access the <code>m</code> and <code>g</code> |
| 622 | pointers on the 64-bit version is the same as on the 32-bit 386, |
| 623 | except it uses <code>MOVQ</code> rather than <code>MOVL</code>: |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 624 | </p> |
| 625 | |
| 626 | <pre> |
| 627 | get_tls(CX) |
Russ Cox | 89f185f | 2014-06-26 11:54:39 -0400 | [diff] [blame] | 628 | MOVQ g(CX), AX // Move g into AX. |
Rob Pike | 3c5eb96 | 2015-07-13 15:22:35 +1000 | [diff] [blame] | 629 | MOVQ g_m(AX), BX // Move g.m into BX. |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 630 | </pre> |
| 631 | |
| 632 | <h3 id="arm">ARM</h3> |
| 633 | |
| 634 | <p> |
Russ Cox | 89f185f | 2014-06-26 11:54:39 -0400 | [diff] [blame] | 635 | The registers <code>R10</code> and <code>R11</code> |
Russ Cox | a664b49 | 2013-11-13 21:29:34 -0500 | [diff] [blame] | 636 | are reserved by the compiler and linker. |
| 637 | </p> |
| 638 | |
| 639 | <p> |
Russ Cox | 89f185f | 2014-06-26 11:54:39 -0400 | [diff] [blame] | 640 | <code>R10</code> points to the <code>g</code> (goroutine) structure. |
| 641 | Within assembler source code, this pointer must be referred to as <code>g</code>; |
| 642 | the name <code>R10</code> is not recognized. |
Russ Cox | a664b49 | 2013-11-13 21:29:34 -0500 | [diff] [blame] | 643 | </p> |
| 644 | |
| 645 | <p> |
| 646 | To make it easier for people and compilers to write assembly, the ARM linker |
| 647 | allows general addressing forms and pseudo-operations like <code>DIV</code> or <code>MOD</code> |
| 648 | that may not be expressible using a single hardware instruction. |
| 649 | It implements these forms as multiple instructions, often using the <code>R11</code> register |
| 650 | to hold temporary values. |
| 651 | Hand-written assembly can use <code>R11</code>, but doing so requires |
| 652 | being sure that the linker is not also using it to implement any of the other |
| 653 | instructions in the function. |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 654 | </p> |
| 655 | |
| 656 | <p> |
| 657 | When defining a <code>TEXT</code>, specifying frame size <code>$-4</code> |
| 658 | tells the linker that this is a leaf function that does not need to save <code>LR</code> on entry. |
| 659 | </p> |
| 660 | |
Russ Cox | a664b49 | 2013-11-13 21:29:34 -0500 | [diff] [blame] | 661 | <p> |
| 662 | The name <code>SP</code> always refers to the virtual stack pointer described earlier. |
| 663 | For the hardware register, use <code>R13</code>. |
| 664 | </p> |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 665 | |
Rob Pike | 3c5eb96 | 2015-07-13 15:22:35 +1000 | [diff] [blame] | 666 | <p> |
Rob Pike | df9423f | 2015-07-14 10:24:40 +1000 | [diff] [blame] | 667 | Condition code syntax is to append a period and the one- or two-letter code to the instruction, |
| 668 | as in <code>MOVW.EQ</code>. |
| 669 | Multiple codes may be appended: <code>MOVM.IA.W</code>. |
| 670 | The order of the code modifiers is irrelevant. |
| 671 | </p> |
| 672 | |
| 673 | <p> |
Rob Pike | 3c5eb96 | 2015-07-13 15:22:35 +1000 | [diff] [blame] | 674 | Addressing modes: |
| 675 | </p> |
| 676 | |
| 677 | <ul> |
| 678 | |
| 679 | <li> |
| 680 | <code>R0->16</code> |
| 681 | <br> |
| 682 | <code>R0>>16</code> |
| 683 | <br> |
| 684 | <code>R0<<16</code> |
| 685 | <br> |
| 686 | <code>R0@>16</code>: |
| 687 | For <code><<</code>, left shift <code>R0</code> by 16 bits. |
| 688 | The other codes are <code>-></code> (arithmetic right shift), |
| 689 | <code>>></code> (logical right shift), and |
| 690 | <code>@></code> (rotate right). |
| 691 | </li> |
| 692 | |
| 693 | <li> |
| 694 | <code>R0->R1</code> |
| 695 | <br> |
| 696 | <code>R0>>R1</code> |
| 697 | <br> |
| 698 | <code>R0<<R1</code> |
| 699 | <br> |
| 700 | <code>R0@>R1</code>: |
| 701 | For <code><<</code>, left shift <code>R0</code> by the count in <code>R1</code>. |
| 702 | The other codes are <code>-></code> (arithmetic right shift), |
| 703 | <code>>></code> (logical right shift), and |
| 704 | <code>@></code> (rotate right). |
| 705 | |
| 706 | </li> |
| 707 | |
| 708 | <li> |
| 709 | <code>[R0,g,R12-R15]</code>: For multi-register instructions, the set comprising |
| 710 | <code>R0</code>, <code>g</code>, and <code>R12</code> through <code>R15</code> inclusive. |
| 711 | </li> |
| 712 | |
Rob Pike | df9423f | 2015-07-14 10:24:40 +1000 | [diff] [blame] | 713 | <li> |
| 714 | <code>(R5, R6)</code>: Destination register pair. |
| 715 | </li> |
| 716 | |
Rob Pike | 3c5eb96 | 2015-07-13 15:22:35 +1000 | [diff] [blame] | 717 | </ul> |
| 718 | |
| 719 | <h3 id="arm64">ARM64</h3> |
| 720 | |
| 721 | <p> |
Rob Pike | df9423f | 2015-07-14 10:24:40 +1000 | [diff] [blame] | 722 | The ARM64 port is in an experimental state. |
| 723 | </p> |
| 724 | |
| 725 | <p> |
| 726 | Instruction modifiers are appended to the instruction following a period. |
| 727 | The only modifiers are <code>P</code> (postincrement) and <code>W</code> |
| 728 | (preincrement): |
| 729 | <code>MOVW.P</code>, <code>MOVW.W</code> |
Rob Pike | 3c5eb96 | 2015-07-13 15:22:35 +1000 | [diff] [blame] | 730 | </p> |
| 731 | |
| 732 | <p> |
| 733 | Addressing modes: |
| 734 | </p> |
| 735 | |
| 736 | <ul> |
| 737 | |
| 738 | <li> |
Rob Pike | df9423f | 2015-07-14 10:24:40 +1000 | [diff] [blame] | 739 | <code>(R5, R6)</code>: Register pair for <code>LDP</code>/<code>STP</code>. |
Rob Pike | 3c5eb96 | 2015-07-13 15:22:35 +1000 | [diff] [blame] | 740 | </li> |
| 741 | |
| 742 | </ul> |
| 743 | |
| 744 | <h3 id="ppc64">Power64, a.k.a. ppc64</h3> |
| 745 | |
| 746 | <p> |
Rob Pike | df9423f | 2015-07-14 10:24:40 +1000 | [diff] [blame] | 747 | The Power 64 port is in an experimental state. |
Rob Pike | 3c5eb96 | 2015-07-13 15:22:35 +1000 | [diff] [blame] | 748 | </p> |
| 749 | |
| 750 | <p> |
| 751 | Addressing modes: |
| 752 | </p> |
| 753 | |
| 754 | <ul> |
| 755 | |
| 756 | <li> |
| 757 | <code>(R5)(R6*1)</code>: The location at <code>R5</code> plus <code>R6</code>. It is a scaled |
Rob Pike | df9423f | 2015-07-14 10:24:40 +1000 | [diff] [blame] | 758 | mode as on the x86, but the only scale allowed is <code>1</code>. |
| 759 | </li> |
| 760 | |
| 761 | <li> |
| 762 | <code>(R5+R6)</code>: Alias for (R5)(R6*1) |
Rob Pike | 3c5eb96 | 2015-07-13 15:22:35 +1000 | [diff] [blame] | 763 | </li> |
| 764 | |
| 765 | </ul> |
| 766 | |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 767 | <h3 id="unsupported_opcodes">Unsupported opcodes</h3> |
| 768 | |
| 769 | <p> |
| 770 | The assemblers are designed to support the compiler so not all hardware instructions |
| 771 | are defined for all architectures: if the compiler doesn't generate it, it might not be there. |
| 772 | If you need to use a missing instruction, there are two ways to proceed. |
| 773 | One is to update the assembler to support that instruction, which is straightforward |
| 774 | but only worthwhile if it's likely the instruction will be used again. |
| 775 | Instead, for simple one-off cases, it's possible to use the <code>BYTE</code> |
| 776 | and <code>WORD</code> directives |
| 777 | to lay down explicit data into the instruction stream within a <code>TEXT</code>. |
| 778 | Here's how the 386 runtime defines the 64-bit atomic load function. |
| 779 | </p> |
| 780 | |
| 781 | <pre> |
| 782 | // uint64 atomicload64(uint64 volatile* addr); |
| 783 | // so actually |
| 784 | // void atomicload64(uint64 *res, uint64 volatile *addr); |
Rob Pike | 3c5eb96 | 2015-07-13 15:22:35 +1000 | [diff] [blame] | 785 | TEXT runtime·atomicload64(SB), NOSPLIT, $0-12 |
Russ Cox | 202bf8d | 2014-10-28 15:51:06 -0400 | [diff] [blame] | 786 | MOVL ptr+0(FP), AX |
Rob Pike | 3c5eb96 | 2015-07-13 15:22:35 +1000 | [diff] [blame] | 787 | TESTL $7, AX |
| 788 | JZ 2(PC) |
| 789 | MOVL 0, AX // crash with nil ptr deref |
Russ Cox | 202bf8d | 2014-10-28 15:51:06 -0400 | [diff] [blame] | 790 | LEAL ret_lo+4(FP), BX |
Rob Pike | 3c5eb96 | 2015-07-13 15:22:35 +1000 | [diff] [blame] | 791 | // MOVQ (%EAX), %MM0 |
| 792 | BYTE $0x0f; BYTE $0x6f; BYTE $0x00 |
| 793 | // MOVQ %MM0, 0(%EBX) |
| 794 | BYTE $0x0f; BYTE $0x7f; BYTE $0x03 |
| 795 | // EMMS |
| 796 | BYTE $0x0F; BYTE $0x77 |
Rob Pike | 2fbcb08 | 2013-11-12 20:04:22 -0800 | [diff] [blame] | 797 | RET |
| 798 | </pre> |