curve25519: improve cswap

Simplify the constant swap function.

On amd64: Replace the CMOVQEQ scheme with SSE2 code similar to the non-amd64 code.
On non-amd64: Avoid unnecessary loop iterations.

The result is less and slightly faster code.

name 			old time/op 	new time/op 	delta
ScalarBaseMult-4   	653µs ± 0%   	636µs ± 0%   	~     (p=0.100 n=3+3)

name 			old time/op 	new time/op 	delta
ConstantSwap-4  	10.4ns ± 1%   	6.2ns ± 0%  	-39.86%  (p=0.029 n=4+4)

On an i7-65000U

Change-Id: Ia5eea92e0b3eabb6c291d25229aa582b51278552
Reviewed-on: https://go-review.googlesource.com/39693
Reviewed-by: Adam Langley <agl@golang.org>
Run-TryBot: Adam Langley <agl@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
3 files changed