cmd/internal/gc: use hardware instruction for math.Sqrt (amd64/arm)

I first prototyped this change in Sept 2011, and I discarded it
because it made no difference in the obvious benchmark loop.
It still makes no difference in the obvious benchmark loop,
but in a less obvious one, doing some extra computation
around the calls to Sqrt, not making the call does have a
significant effect.

benchmark                 old ns/op     new ns/op     delta
BenchmarkSqrt             4.56          4.57          +0.22%
BenchmarkSqrtIndirect     4.56          4.56          +0.00%
BenchmarkSqrtGo           69.4          69.4          +0.00%
BenchmarkSqrtPrime        4417          3647          -17.43%

This is a warmup for using hardware expansions for some
calls to 1-line assembly routines in the runtime (for example getg).

Change-Id: Ie66be23f8c09d0f7dc4ddd7ca8a93cfce28f55a4
Reviewed-on: https://go-review.googlesource.com/8356
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
diff --git a/src/cmd/internal/gc/walk.go b/src/cmd/internal/gc/walk.go
index c10201a..a0a29d3 100644
--- a/src/cmd/internal/gc/walk.go
+++ b/src/cmd/internal/gc/walk.go
@@ -622,6 +622,16 @@
 		walkexpr(&n.Left, init)
 		walkexprlist(n.List, init)
 
+		if n.Left.Op == ONAME && n.Left.Sym.Name == "Sqrt" && n.Left.Sym.Pkg.Path == "math" {
+			switch Thearch.Thechar {
+			case '5', '6':
+				n.Op = OSQRT
+				n.Left = n.List.N
+				n.List = nil
+				goto ret
+			}
+		}
+
 		ll := ascompatte(int(n.Op), n, n.Isddd, getinarg(t), n.List, 0, init)
 		n.List = reorder1(ll)
 		goto ret