crypto/md5: faster amd64, 386 implementations

-- amd64 --

On a MacBookPro10,2 (Core i5):

benchmark                       old ns/op    new ns/op    delta
BenchmarkHash8Bytes                   471          524  +11.25%
BenchmarkHash1K                      3018         2220  -26.44%
BenchmarkHash8K                     20634        14604  -29.22%
BenchmarkHash8BytesUnaligned          468          523  +11.75%
BenchmarkHash1KUnaligned             3006         2212  -26.41%
BenchmarkHash8KUnaligned            20820        14652  -29.63%

benchmark                        old MB/s     new MB/s  speedup
BenchmarkHash8Bytes                 16.98        15.26    0.90x
BenchmarkHash1K                    339.26       461.19    1.36x
BenchmarkHash8K                    397.00       560.92    1.41x
BenchmarkHash8BytesUnaligned        17.08        15.27    0.89x
BenchmarkHash1KUnaligned           340.65       462.75    1.36x
BenchmarkHash8KUnaligned           393.45       559.08    1.42x

For comparison, on the same machine, openssl 0.9.8r reports
its md5 speed as 350 MB/s for 1K and 410 MB/s for 8K.

On an Intel Xeon E5520:

benchmark                       old ns/op    new ns/op    delta
BenchmarkHash8Bytes                   565          607   +7.43%
BenchmarkHash1K                      3753         2475  -34.05%
BenchmarkHash8K                     25945        16250  -37.37%
BenchmarkHash8BytesUnaligned          559          594   +6.26%
BenchmarkHash1KUnaligned             3754         2474  -34.10%
BenchmarkHash8KUnaligned            26011        16359  -37.11%

benchmark                        old MB/s     new MB/s  speedup
BenchmarkHash8Bytes                 14.15        13.17    0.93x
BenchmarkHash1K                    272.83       413.58    1.52x
BenchmarkHash8K                    315.74       504.11    1.60x
BenchmarkHash8BytesUnaligned        14.31        13.46    0.94x
BenchmarkHash1KUnaligned           272.73       413.78    1.52x
BenchmarkHash8KUnaligned           314.93       500.73    1.59x

For comparison, on the same machine, openssl 1.0.1 reports
its md5 speed as 443 MB/s for 1K and 513 MB/s for 8K.

-- 386 --

On a MacBookPro10,2 (Core i5):

benchmark                       old ns/op    new ns/op    delta
BenchmarkHash8Bytes                   602          670  +11.30%
BenchmarkHash1K                      4038         2549  -36.87%
BenchmarkHash8K                     27879        16690  -40.13%
BenchmarkHash8BytesUnaligned          602          670  +11.30%
BenchmarkHash1KUnaligned             4025         2546  -36.75%
BenchmarkHash8KUnaligned            27844        16692  -40.05%

benchmark                        old MB/s     new MB/s  speedup
BenchmarkHash8Bytes                 13.28        11.93    0.90x
BenchmarkHash1K                    253.58       401.69    1.58x
BenchmarkHash8K                    293.83       490.81    1.67x
BenchmarkHash8BytesUnaligned        13.27        11.94    0.90x
BenchmarkHash1KUnaligned           254.40       402.05    1.58x
BenchmarkHash8KUnaligned           294.21       490.77    1.67x

On an Intel Xeon E5520:

benchmark                       old ns/op    new ns/op    delta
BenchmarkHash8Bytes                   752          716   -4.79%
BenchmarkHash1K                      5307         2799  -47.26%
BenchmarkHash8K                     36993        18042  -51.23%
BenchmarkHash8BytesUnaligned          748          730   -2.41%
BenchmarkHash1KUnaligned             5301         2795  -47.27%
BenchmarkHash8KUnaligned            36983        18085  -51.10%

benchmark                        old MB/s     new MB/s  speedup
BenchmarkHash8Bytes                 10.64        11.16    1.05x
BenchmarkHash1K                    192.93       365.80    1.90x
BenchmarkHash8K                    221.44       454.03    2.05x
BenchmarkHash8BytesUnaligned        10.69        10.95    1.02x
BenchmarkHash1KUnaligned           193.15       366.36    1.90x
BenchmarkHash8KUnaligned           221.51       452.96    2.04x

R=agl
CC=golang-dev
https://golang.org/cl/7621049
5 files changed