poly1305: modify s390x assembly to implement MAC interface

The vector (vx) implementation has been updated to read in the
state and update it - as opposed to being a single shot function.
This has allowed the new MAC interface can be implemented.

For performance reasons s390x uses a larger buffer than the generic
implementation. There is a relatively high fixed cost to read the
state, calculate the key coefficients and serialize the state, so
it makes sense to buffer more blocks before calling it.

For now I've had to remove the faster VMSL implementation. It is
too complex for me to update in time for Go 1.15. At some point
I'd like to revisit it but for now it looks like using the MAC
interface is more of a win than using VMSL.

The benchmarks show considerable improvements when using the MAC
interface. The Sum benchmarks show slowdown due to a combination
of the removal of the VMSL implementation and also the added
overhead from splitting the summation function into multiple parts.

poly1305:

name              old speed      new speed      delta
64                1.33GB/s ± 0%  0.80GB/s ± 1%   -39.51%  (p=0.000 n=16+20)
1K                4.04GB/s ± 0%  2.97GB/s ± 0%   -26.46%  (p=0.000 n=19+19)
2M                5.32GB/s ± 1%  3.63GB/s ± 0%   -31.76%  (p=0.000 n=20+19)
64Unaligned       1.33GB/s ± 0%  0.80GB/s ± 0%   -39.80%  (p=0.000 n=19+18)
1KUnaligned       4.09GB/s ± 1%  2.94GB/s ± 0%   -28.23%  (p=0.000 n=19+18)
2MUnaligned       5.33GB/s ± 1%  3.52GB/s ± 0%   -34.04%  (p=0.000 n=20+19)
Write64           1.03GB/s ± 1%  1.49GB/s ± 1%   +44.34%  (p=0.000 n=20+20)
Write1K           1.21GB/s ± 0%  3.24GB/s ± 0%  +169.02%  (p=0.000 n=20+17)
Write2M           1.24GB/s ± 1%  3.63GB/s ± 0%  +192.36%  (p=0.000 n=20+19)
Write64Unaligned  1.04GB/s ± 1%  1.50GB/s ± 0%   +44.16%  (p=0.000 n=19+14)
Write1KUnaligned  1.21GB/s ± 0%  3.20GB/s ± 0%  +164.55%  (p=0.000 n=20+16)
Write2MUnaligned  1.24GB/s ± 1%  3.51GB/s ± 0%  +183.96%  (p=0.000 n=20+19)

chacha20poly1305 (this vs. using generic MAC interface - post CL 206977):

name         old speed      new speed      delta
Open-64       147MB/s ± 2%   156MB/s ± 1%   +6.15%  (p=0.000 n=20+19)
Seal-64       151MB/s ± 0%   164MB/s ± 1%   +8.86%  (p=0.000 n=19+16)
Open-64-X     104MB/s ± 2%   111MB/s ± 1%   +6.24%  (p=0.000 n=20+20)
Seal-64-X     109MB/s ± 2%   111MB/s ± 1%   +2.11%  (p=0.000 n=20+19)
Open-1350     555MB/s ± 0%   751MB/s ± 1%  +35.19%  (p=0.000 n=20+20)
Seal-1350     557MB/s ± 0%   759MB/s ± 0%  +36.23%  (p=0.000 n=20+20)
Open-1350-X   517MB/s ± 1%   683MB/s ± 1%  +31.97%  (p=0.000 n=20+20)
Seal-1350-X   511MB/s ± 0%   683MB/s ± 0%  +33.77%  (p=0.000 n=18+19)
Open-8192     672MB/s ± 0%  1013MB/s ± 0%  +50.65%  (p=0.000 n=19+19)
Seal-8192     674MB/s ± 0%  1018MB/s ± 0%  +50.98%  (p=0.000 n=18+20)
Open-8192-X   663MB/s ± 0%   979MB/s ± 0%  +47.57%  (p=0.000 n=20+20)
Seal-8192-X   658MB/s ± 0%   985MB/s ± 0%  +49.62%  (p=0.000 n=18+20)

name         old allocs/op  new allocs/op  delta
Open-64          0.00           0.00          ~     (all equal)
Seal-64          0.00           0.00          ~     (all equal)
Open-64-X        0.00           0.00          ~     (all equal)
Seal-64-X        0.00           0.00          ~     (all equal)
Open-1350        0.00           0.00          ~     (all equal)
Seal-1350        0.00           0.00          ~     (all equal)
Open-1350-X      0.00           0.00          ~     (all equal)
Seal-1350-X      0.00           0.00          ~     (all equal)
Open-8192        0.00           0.00          ~     (all equal)
Seal-8192        0.00           0.00          ~     (all equal)
Open-8192-X      0.00           0.00          ~     (all equal)
Seal-8192-X      0.00           0.00          ~     (all equal)

chacha20poly1305 (this vs. using asm Sum interface - pre CL 206977):

name         old speed      new speed      delta
Open-64       144MB/s ± 0%   156MB/s ± 1%    +8.16%  (p=0.000 n=20+19)
Seal-64       150MB/s ± 0%   164MB/s ± 1%    +9.35%  (p=0.000 n=20+16)
Open-64-X     104MB/s ± 1%   111MB/s ± 1%    +6.15%  (p=0.000 n=19+20)
Seal-64-X     109MB/s ± 1%   111MB/s ± 1%    +1.43%  (p=0.000 n=19+19)
Open-1350     702MB/s ± 1%   751MB/s ± 1%    +6.98%  (p=0.000 n=20+20)
Seal-1350     715MB/s ± 0%   759MB/s ± 0%    +6.09%  (p=0.000 n=19+20)
Open-1350-X   642MB/s ± 0%   683MB/s ± 1%    +6.37%  (p=0.000 n=19+20)
Seal-1350-X   639MB/s ± 0%   683MB/s ± 0%    +6.98%  (p=0.000 n=20+19)
Open-8192     994MB/s ± 0%  1013MB/s ± 0%    +1.85%  (p=0.000 n=20+19)
Seal-8192    1.00GB/s ± 0%  1.02GB/s ± 0%    +1.90%  (p=0.000 n=20+20)
Open-8192-X   965MB/s ± 0%   979MB/s ± 0%    +1.43%  (p=0.000 n=19+20)
Seal-8192-X   962MB/s ± 0%   985MB/s ± 0%    +2.39%  (p=0.000 n=20+20)

name         old allocs/op  new allocs/op  delta
Open-64          1.00 ± 0%      0.00       -100.00%  (p=0.000 n=20+20)
Seal-64          1.00 ± 0%      0.00       -100.00%  (p=0.000 n=20+20)
Open-64-X        1.00 ± 0%      0.00       -100.00%  (p=0.000 n=20+20)
Seal-64-X        1.00 ± 0%      0.00       -100.00%  (p=0.000 n=20+20)
Open-1350        1.00 ± 0%      0.00       -100.00%  (p=0.000 n=20+20)
Seal-1350        1.00 ± 0%      0.00       -100.00%  (p=0.000 n=20+20)
Open-1350-X      1.00 ± 0%      0.00       -100.00%  (p=0.000 n=20+20)
Seal-1350-X      1.00 ± 0%      0.00       -100.00%  (p=0.000 n=20+20)
Open-8192        1.00 ± 0%      0.00       -100.00%  (p=0.000 n=20+20)
Seal-8192        1.00 ± 0%      0.00       -100.00%  (p=0.000 n=20+20)
Open-8192-X      1.00 ± 0%      0.00       -100.00%  (p=0.000 n=20+20)
Seal-8192-X      1.00 ± 0%      0.00       -100.00%  (p=0.000 n=20+20)

Updates golang/go#25219.

Change-Id: Ib491e3a47b6b3ec8bbbe1f41f7bf42ad82f5c249
Reviewed-on: https://go-review.googlesource.com/c/crypto/+/219057
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Filippo Valsorda <filippo@golang.org>
9 files changed
tree: fda1cab226f0a36537a006ca0b87df54d9c655dc
  1. .gitattributes
  2. .gitignore
  3. AUTHORS
  4. CONTRIBUTING.md
  5. CONTRIBUTORS
  6. LICENSE
  7. PATENTS
  8. README.md
  9. acme/
  10. argon2/
  11. bcrypt/
  12. blake2b/
  13. blake2s/
  14. blowfish/
  15. bn256/
  16. cast5/
  17. chacha20/
  18. chacha20poly1305/
  19. codereview.cfg
  20. cryptobyte/
  21. curve25519/
  22. ed25519/
  23. go.mod
  24. go.sum
  25. hkdf/
  26. internal/
  27. md4/
  28. nacl/
  29. ocsp/
  30. openpgp/
  31. otr/
  32. pbkdf2/
  33. pkcs12/
  34. poly1305/
  35. ripemd160/
  36. salsa20/
  37. scrypt/
  38. sha3/
  39. ssh/
  40. tea/
  41. twofish/
  42. xtea/
  43. xts/
README.md

Go Cryptography

This repository holds supplementary Go cryptography libraries.

Download/Install

The easiest way to install is to run go get -u golang.org/x/crypto/.... You can also manually git clone the repository to $GOPATH/src/golang.org/x/crypto.

Report Issues / Send Patches

This repository uses Gerrit for code changes. To learn how to submit changes to this repository, see https://golang.org/doc/contribute.html.

The main issue tracker for the crypto repository is located at https://github.com/golang/go/issues. Prefix your issue with “x/crypto:” in the subject line, so it is easy to find.

Note that contributions to the cryptography package receive additional scrutiny due to their sensitive nature. Patches may take longer than normal to receive feedback.