blake2b: add AVX assembly
Add an AVX implementation and improve SSE4.1 assembly.
AVX vs SSE4.1
name old time/op new time/op delta
Write128-8 249ns ± 0% 220ns ± 0% -11.85% (p=0.029 n=4+4)
Write1K-8 1.68µs ± 1% 1.56µs ± 1% -6.71% (p=0.029 n=4+4)
Write32K-8 52.6µs ± 0% 48.7µs ± 0% -7.40% (p=0.029 n=4+4)
Sum128-8 264ns ± 0% 241ns ± 1% -8.52% (p=0.029 n=4+4)
Sum1K-8 1.70µs ± 0% 1.57µs ± 0% -7.79% (p=0.029 n=4+4)
Sum32K-8 54.1µs ± 3% 49.5µs ± 1% -8.36% (p=0.029 n=4+4)
name old speed new speed delta
Write128-8 513MB/s ± 0% 582MB/s ± 0% +13.38% (p=0.029 n=4+4)
Write1K-8 610MB/s ± 1% 654MB/s ± 1% +7.22% (p=0.029 n=4+4)
Write32K-8 622MB/s ± 0% 672MB/s ± 0% +7.99% (p=0.029 n=4+4)
Sum128-8 484MB/s ± 1% 529MB/s ± 0% +9.21% (p=0.029 n=4+4)
Sum1K-8 602MB/s ± 0% 653MB/s ± 0% +8.42% (p=0.029 n=4+4)
Sum32K-8 607MB/s ± 3% 662MB/s ± 1% +9.03% (p=0.029 n=4+4)
AVX2 vs AVX
name old time/op new time/op delta
Write128-4 192ns ± 0% 166ns ± 0% -14.03% (p=0.029 n=4+4)
Write1K-4 1.37µs ± 0% 1.19µs ± 0% -12.65% (p=0.029 n=4+4)
Write32K-4 42.5µs ± 0% 37.3µs ± 0% -12.33% (p=0.029 n=4+4)
Sum128-4 213ns ± 0% 188ns ± 0% -11.97% (p=0.029 n=4+4)
Sum1K-4 1.40µs ± 0% 1.22µs ± 0% -12.85% (p=0.029 n=4+4)
Sum32K-4 42.8µs ± 0% 37.3µs ± 0% -12.94% (p=0.029 n=4+4)
name old speed new speed delta
Write128-4 662MB/s ± 0% 771MB/s ± 0% +16.47% (p=0.029 n=4+4)
Write1K-4 748MB/s ± 0% 857MB/s ± 0% +14.49% (p=0.029 n=4+4)
Write32K-4 771MB/s ± 0% 879MB/s ± 0% +14.07% (p=0.029 n=4+4)
Sum128-4 600MB/s ± 0% 680MB/s ± 0% +13.49% (p=0.029 n=4+4)
Sum1K-4 733MB/s ± 0% 841MB/s ± 0% +14.72% (p=0.029 n=4+4)
Sum32K-4 765MB/s ± 0% 879MB/s ± 0% +14.85% (p=0.029 n=4+4)
Change-Id: Idf85742e952c07b76c0c7fb5404ed9b0caf0f6eb
Reviewed-on: https://go-review.googlesource.com/34319
Reviewed-by: Adam Langley <agl@golang.org>
5 files changed