cmd/compile: emit fused multiply-{add,subtract} instructions on s390x
Explcitly block fused multiply-add pattern matching when a cast is used
after the multiplication, for example:
- (a * b) + c // can emit fused multiply-add
- float64(a * b) + c // cannot emit fused multiply-add
float{32,64} and complex{64,128} casts of matching types are now kept
as OCONV operations rather than being replaced with OCONVNOP operations
because they now imply a rounding operation (and therefore aren't a
no-op anymore).
Operations (for example, multiplication) on complex types may utilize
fused multiply-add and -subtract instructions internally. There is no
way to disable this behavior at the moment.
Improves the performance of the floating point implementation of
poly1305:
name old speed new speed delta
64 246MB/s ± 0% 275MB/s ± 0% +11.48% (p=0.000 n=10+8)
1K 312MB/s ± 0% 357MB/s ± 0% +14.41% (p=0.000 n=10+10)
64Unaligned 246MB/s ± 0% 274MB/s ± 0% +11.43% (p=0.000 n=10+10)
1KUnaligned 312MB/s ± 0% 357MB/s ± 0% +14.39% (p=0.000 n=10+8)
Updates #17895.
Change-Id: Ia771d275bb9150d1a598f8cc773444663de5ce16
Reviewed-on: https://go-review.googlesource.com/36963
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
diff --git a/src/cmd/compile/internal/ssa/gen/genericOps.go b/src/cmd/compile/internal/ssa/gen/genericOps.go
index 3854a39..ab8b0ab 100644
--- a/src/cmd/compile/internal/ssa/gen/genericOps.go
+++ b/src/cmd/compile/internal/ssa/gen/genericOps.go
@@ -343,6 +343,10 @@
{name: "Cvt32Fto64F", argLength: 1},
{name: "Cvt64Fto32F", argLength: 1},
+ // Force rounding to precision of type.
+ {name: "Round32F", argLength: 1},
+ {name: "Round64F", argLength: 1},
+
// Automatically inserted safety checks
{name: "IsNonNil", argLength: 1, typ: "Bool"}, // arg0 != nil
{name: "IsInBounds", argLength: 2, typ: "Bool"}, // 0 <= arg0 < arg1. arg1 is guaranteed >= 0.