go /
go /
1af960802ad7032b668140e846dbc4e902fffa9c crypto/rc4: faster amd64, 386 implementations
-- amd64 --
On a MacBookPro10,2 (Core i5):
benchmark old ns/op new ns/op delta
BenchmarkRC4_128 470 421 -10.43%
BenchmarkRC4_1K 3123 3275 +4.87%
BenchmarkRC4_8K 26351 25866 -1.84%
benchmark old MB/s new MB/s speedup
BenchmarkRC4_128 272.22 303.40 1.11x
BenchmarkRC4_1K 327.80 312.58 0.95x
BenchmarkRC4_8K 307.24 313.00 1.02x
For comparison, on the same machine, openssl 0.9.8r reports
its rc4 speed as somewhat under 350 MB/s for both 1K and 8K.
The Core i5 performance can be boosted another 20%, but only
by making the Xeon performance significantly slower.
On an Intel Xeon E5520:
benchmark old ns/op new ns/op delta
BenchmarkRC4_128 774 417 -46.12%
BenchmarkRC4_1K 6121 3200 -47.72%
BenchmarkRC4_8K 48394 25151 -48.03%
benchmark old MB/s new MB/s speedup
BenchmarkRC4_128 165.18 306.84 1.86x
BenchmarkRC4_1K 167.28 319.92 1.91x
BenchmarkRC4_8K 167.29 321.89 1.92x
For comparison, on the same machine, openssl 1.0.1
(which uses a different implementation than 0.9.8r)
reports its rc4 speed as 587 MB/s for 1K and 601 MB/s for 8K.
It is using SIMD instructions to do more in parallel.
So there's still some improvement to be had, but even so,
this is almost 2x faster than what it replaced.
-- 386 --
On a MacBookPro10,2 (Core i5):
benchmark old ns/op new ns/op delta
BenchmarkRC4_128 3491 421 -87.94%
BenchmarkRC4_1K 28063 3205 -88.58%
BenchmarkRC4_8K 220392 25228 -88.55%
benchmark old MB/s new MB/s speedup
BenchmarkRC4_128 36.66 303.81 8.29x
BenchmarkRC4_1K 36.49 319.42 8.75x
BenchmarkRC4_8K 36.73 320.90 8.74x
On an Intel Xeon E5520:
benchmark old ns/op new ns/op delta
BenchmarkRC4_128 2268 524 -76.90%
BenchmarkRC4_1K 18161 4137 -77.22%
BenchmarkRC4_8K 142396 32350 -77.28%
benchmark old MB/s new MB/s speedup
BenchmarkRC4_128 56.42 244.13 4.33x
BenchmarkRC4_1K 56.38 247.46 4.39x
BenchmarkRC4_8K 56.86 250.26 4.40x
R=agl
CC=golang-dev
https://golang.org/cl/7547050
5 files changed