go /
crypto /
f671756e047d6bc429798536b39e1bbd761e5ce5 blake2b: fix AVX performance problems on amd64
On some amd64 CPUs (Xeon E5-2680v4 / E5-2620v3) using SSE and AVX instructions
leads to very low performance.
On a i7-6500U the SSE-AVX code performs following:
AVX2:
name time/op
Write128-4 165ns ± 0%
Write1K-4 1.20µs ± 0%
Sum128-4 189ns ± 1%
Sum1K-4 1.22µs ± 0%
name speed
Write128-4 773MB/s ± 1%
Write1K-4 855MB/s ± 0%
Sum128-4 675MB/s ± 1%
Sum1K-4 838MB/s ± 0%
while the same code achieves values < 65MB/s on a Xeon E5-2620v3.
Replacing the `MOVQ` and `PINSRQ` with the AVX instructions `VMOVQ` and `VPINSRQ`
increases the performance of the AVX/AVX2 code to some expected values:
name old time/op new time/op delta
Write128-12 2.20µs ±10% 0.22µs ± 9% -90.00% (p=0.029 n=4+4)
Write1K-12 16.2µs ± 0% 1.1µs ± 0% -93.07% (p=0.029 n=4+4)
Sum128-12 2.10µs ± 0% 0.22µs ± 0% -89.47% (p=0.029 n=4+4)
Sum1K-12 16.3µs ± 0% 1.2µs ± 0% -92.65% (p=0.029 n=4+4)
name old speed new speed delta
Write128-12 58.5MB/s ±10% 582.8MB/s ±10% +897.08% (p=0.029 n=4+4)
Write1K-12 63.1MB/s ± 0% 909.8MB/s ± 0% +1341.40% (p=0.029 n=4+4)
Sum128-12 60.8MB/s ± 0% 576.3MB/s ± 0% +847.84% (p=0.029 n=4+4)
Sum1K-12 62.8MB/s ± 0% 855.2MB/s ± 0% +1260.78% (p=0.029 n=4+4)
The AVX/AVX2 code now uses only AVX (no SSE) instructions.
Fixes golang/go#18563.
Change-Id: I1961dd8fa02014642587523b7f099816a263c9f5
Reviewed-on: https://go-review.googlesource.com/34993
Reviewed-by: Adam Langley <agl@golang.org>
1 file changed