| // Copyright 2019 The Go Authors. All rights reserved. |
| // Use of this source code is governed by a BSD-style |
| // license that can be found in the LICENSE file. |
| |
| /* |
| Package ppc64 implements a PPC64 assembler that assembles Go asm into |
| the corresponding PPC64 instructions as defined by the Power ISA 3.0B. |
| |
| This document provides information on how to write code in Go assembler |
| for PPC64, focusing on the differences between Go and PPC64 assembly language. |
| It assumes some knowledge of PPC64 assembler. The original implementation of |
| PPC64 in Go defined many opcodes that are different from PPC64 opcodes, but |
| updates to the Go assembly language used mnemonics that are mostly similar if not |
| identical to the PPC64 mneumonics, such as VMX and VSX instructions. Not all detail |
| is included here; refer to the Power ISA document if interested in more detail. |
| |
| Starting with Go 1.15 the Go objdump supports the -gnu option, which provides a |
| side by side view of the Go assembler and the PPC64 assembler output. This is |
| extremely helpful in determining what final PPC64 assembly is generated from the |
| corresponding Go assembly. |
| |
| In the examples below, the Go assembly is on the left, PPC64 assembly on the right. |
| |
| 1. Operand ordering |
| |
| In Go asm, the last operand (right) is the target operand, but with PPC64 asm, |
| the first operand (left) is the target. The order of the remaining operands is |
| not consistent: in general opcodes with 3 operands that perform math or logical |
| operations have their operands in reverse order. Opcodes for vector instructions |
| and those with more than 3 operands usually have operands in the same order except |
| for the target operand, which is first in PPC64 asm and last in Go asm. |
| |
| Example: |
| ADD R3, R4, R5 <=> add r5, r4, r3 |
| |
| 2. Constant operands |
| |
| In Go asm, an operand that starts with '$' indicates a constant value. If the |
| instruction using the constant has an immediate version of the opcode, then an |
| immediate value is used with the opcode if possible. |
| |
| Example: |
| ADD $1, R3, R4 <=> addi r4, r3, 1 |
| |
| 3. Opcodes setting condition codes |
| |
| In PPC64 asm, some instructions other than compares have variations that can set |
| the condition code where meaningful. This is indicated by adding '.' to the end |
| of the PPC64 instruction. In Go asm, these instructions have 'CC' at the end of |
| the opcode. The possible settings of the condition code depend on the instruction. |
| CR0 is the default for fixed-point instructions; CR1 for floating point; CR6 for |
| vector instructions. |
| |
| Example: |
| ANDCC R3, R4, R5 <=> and. r5, r3, r4 (set CR0) |
| |
| 4. Loads and stores from memory |
| |
| In Go asm, opcodes starting with 'MOV' indicate a load or store. When the target |
| is a memory reference, then it is a store; when the target is a register and the |
| source is a memory reference, then it is a load. |
| |
| MOV{B,H,W,D} variations identify the size as byte, halfword, word, doubleword. |
| |
| Adding 'Z' to the opcode for a load indicates zero extend; if omitted it is sign extend. |
| Adding 'U' to a load or store indicates an update of the base register with the offset. |
| Adding 'BR' to an opcode indicates byte-reversed load or store, or the order opposite |
| of the expected endian order. If 'BR' is used then zero extend is assumed. |
| |
| Memory references n(Ra) indicate the address in Ra + n. When used with an update form |
| of an opcode, the value in Ra is incremented by n. |
| |
| Memory references (Ra+Rb) or (Ra)(Rb) indicate the address Ra + Rb, used by indexed |
| loads or stores. Both forms are accepted. When used with an update then the base register |
| is updated by the value in the index register. |
| |
| Examples: |
| MOVD (R3), R4 <=> ld r4,0(r3) |
| MOVW (R3), R4 <=> lwa r4,0(r3) |
| MOVWZU 4(R3), R4 <=> lwzu r4,4(r3) |
| MOVWZ (R3+R5), R4 <=> lwzx r4,r3,r5 |
| MOVHZ (R3), R4 <=> lhz r4,0(r3) |
| MOVHU 2(R3), R4 <=> lhau r4,2(r3) |
| MOVBZ (R3), R4 <=> lbz r4,0(r3) |
| |
| MOVD R4,(R3) <=> std r4,0(r3) |
| MOVW R4,(R3) <=> stw r4,0(r3) |
| MOVW R4,(R3+R5) <=> stwx r4,r3,r5 |
| MOVWU R4,4(R3) <=> stwu r4,4(r3) |
| MOVH R4,2(R3) <=> sth r4,2(r3) |
| MOVBU R4,(R3)(R5) <=> stbux r4,r3,r5 |
| |
| 4. Compares |
| |
| When an instruction does a compare or other operation that might |
| result in a condition code, then the resulting condition is set |
| in a field of the condition register. The condition register consists |
| of 8 4-bit fields named CR0 - CR7. When a compare instruction |
| identifies a CR then the resulting condition is set in that field |
| to be read by a later branch or isel instruction. Within these fields, |
| bits are set to indicate less than, greater than, or equal conditions. |
| |
| Once an instruction sets a condition, then a subsequent branch, isel or |
| other instruction can read the condition field and operate based on the |
| bit settings. |
| |
| Examples: |
| CMP R3, R4 <=> cmp r3, r4 (CR0 assumed) |
| CMP R3, R4, CR1 <=> cmp cr1, r3, r4 |
| |
| Note that the condition register is the target operand of compare opcodes, so |
| the remaining operands are in the same order for Go asm and PPC64 asm. |
| When CR0 is used then it is implicit and does not need to be specified. |
| |
| 5. Branches |
| |
| Many branches are represented as a form of the BC instruction. There are |
| other extended opcodes to make it easier to see what type of branch is being |
| used. |
| |
| The following is a brief description of the BC instruction and its commonly |
| used operands. |
| |
| BC op1, op2, op3 |
| |
| op1: type of branch |
| 16 -> bctr (branch on ctr) |
| 12 -> bcr (branch if cr bit is set) |
| 8 -> bcr+bctr (branch on ctr and cr values) |
| 4 -> bcr != 0 (branch if specified cr bit is not set) |
| |
| There are more combinations but these are the most common. |
| |
| op2: condition register field and condition bit |
| |
| This contains an immediate value indicating which condition field |
| to read and what bits to test. Each field is 4 bits long with CR0 |
| at bit 0, CR1 at bit 4, etc. The value is computed as 4*CR+condition |
| with these condition values: |
| |
| 0 -> LT |
| 1 -> GT |
| 2 -> EQ |
| 3 -> OVG |
| |
| Thus 0 means test CR0 for LT, 5 means CR1 for GT, 30 means CR7 for EQ. |
| |
| op3: branch target |
| |
| Examples: |
| |
| BC 12, 0, target <=> blt cr0, target |
| BC 12, 2, target <=> beq cr0, target |
| BC 12, 5, target <=> bgt cr1, target |
| BC 12, 30, target <=> beq cr7, target |
| BC 4, 6, target <=> bne cr1, target |
| BC 4, 1, target <=> ble cr1, target |
| |
| The following extended opcodes are available for ease of use and readability: |
| |
| BNE CR2, target <=> bne cr2, target |
| BEQ CR4, target <=> beq cr4, target |
| BLT target <=> blt target (cr0 default) |
| BGE CR7, target <=> bge cr7, target |
| |
| Refer to the ISA for more information on additional values for the BC instruction, |
| how to handle OVG information, and much more. |
| |
| 5. Align directive |
| |
| Starting with Go 1.12, Go asm supports the PCALIGN directive, which indicates |
| that the next instruction should be aligned to the specified value. Currently |
| 8 and 16 are the only supported values, and a maximum of 2 NOPs will be added |
| to align the code. That means in the case where the code is aligned to 4 but |
| PCALIGN $16 is at that location, the code will only be aligned to 8 to avoid |
| adding 3 NOPs. |
| |
| The purpose of this directive is to improve performance for cases like loops |
| where better alignment (8 or 16 instead of 4) might be helpful. This directive |
| exists in PPC64 assembler and is frequently used by PPC64 assembler writers. |
| |
| PCALIGN $16 |
| PCALIGN $8 |
| |
| Functions in Go are aligned to 16 bytes, as is the case in all other compilers |
| for PPC64. |
| |
| 6. Shift instructions |
| |
| The simple scalar shifts on PPC64 expect a shift count that fits in 5 bits for |
| 32-bit values or 6 bit for 64-bit values. If the shift count is a constant value |
| greater than the max then the assembler sets it to the max for that size (31 for |
| 32 bit values, 63 for 64 bit values). If the shift count is in a register, then |
| only the low 5 or 6 bits of the register will be used as the shift count. The |
| Go compiler will add appropriate code to compare the shift value to achieve the |
| the correct result, and the assembler does not add extra checking. |
| |
| Examples: |
| |
| SRAD $8,R3,R4 => sradi r4,r3,8 |
| SRD $8,R3,R4 => rldicl r4,r3,56,8 |
| SLD $8,R3,R4 => rldicr r4,r3,8,55 |
| SRAW $16,R4,R5 => srawi r5,r4,16 |
| SRW $40,R4,R5 => rlwinm r5,r4,0,0,31 |
| SLW $12,R4,R5 => rlwinm r5,r4,12,0,19 |
| |
| Some non-simple shifts have operands in the Go assembly which don't map directly |
| onto operands in the PPC64 assembly. When an operand in a shift instruction in the |
| Go assembly is a bit mask, that mask is represented as a start and end bit in the |
| PPC64 assembly instead of a mask. See the ISA for more detail on these types of shifts. |
| Here are a few examples: |
| |
| RLWMI $7,R3,$65535,R6 => rlwimi r6,r3,7,16,31 |
| RLDMI $0,R4,$7,R6 => rldimi r6,r4,0,61 |
| |
| More recently, Go opcodes were added which map directly onto the PPC64 opcodes. It is |
| recommended to use the newer opcodes to avoid confusion. |
| |
| RLDICL $0,R4,$15,R6 => rldicl r6,r4,0,15 |
| RLDICR $0,R4,$15,R6 => rldicr r6.r4,0,15 |
| |
| Register naming |
| |
| 1. Special register usage in Go asm |
| |
| The following registers should not be modified by user Go assembler code. |
| |
| R0: Go code expects this register to contain the value 0. |
| R1: Stack pointer |
| R2: TOC pointer when compiled with -shared or -dynlink (a.k.a position independent code) |
| R13: TLS pointer |
| R30: g (goroutine) |
| |
| Register names: |
| |
| Rn is used for general purpose registers. (0-31) |
| Fn is used for floating point registers. (0-31) |
| Vn is used for vector registers. Slot 0 of Vn overlaps with Fn. (0-31) |
| VSn is used for vector-scalar registers. V0-V31 overlap with VS32-VS63. (0-63) |
| CTR represents the count register. |
| LR represents the link register. |
| |
| */ |
| package ppc64 |