[dev.ssa] cmd/compile, etc.: more ARM64 optimizations, and enable SSA by default
Add more ARM64 optimizations:
- use hardware zero register when it is possible.
- use shifted ops.
The assembler supports shifted ops but not documented, nor knows
how to print it. This CL adds them.
- enable fast division.
This was disabled because it makes the old backend generate slower
code. But with SSA it generates faster code.
Turn on SSA by default, also adjust tests.
Change-Id: I7794479954c83bb65008dcb457bc1e21d7496da6
Reviewed-on: https://go-review.googlesource.com/26950
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
diff --git a/src/cmd/compile/internal/gc/walk.go b/src/cmd/compile/internal/gc/walk.go
index c6aeddb..1e7d80d 100644
--- a/src/cmd/compile/internal/gc/walk.go
+++ b/src/cmd/compile/internal/gc/walk.go
@@ -3336,6 +3336,7 @@
// The result of walkrotate MUST be assigned back to n, e.g.
// n.Left = walkrotate(n.Left)
func walkrotate(n *Node) *Node {
+ //TODO: enable LROT on ARM64 once the old backend is gone
if Thearch.LinkArch.InFamily(sys.MIPS64, sys.ARM64, sys.PPC64) {
return n
}
@@ -3529,16 +3530,6 @@
goto ret
}
- // TODO(zhongwei) Test shows that TUINT8, TINT8, TUINT16 and TINT16's "quick division" method
- // on current arm64 backend is slower than hardware div instruction on ARM64 due to unnecessary
- // data movement between registers. It could be enabled when generated code is good enough.
- if Thearch.LinkArch.Family == sys.ARM64 {
- switch Simtype[nl.Type.Etype] {
- case TUINT8, TINT8, TUINT16, TINT16:
- return n
- }
- }
-
switch Simtype[nl.Type.Etype] {
default:
return n