internal/chacha20: implement the cipher.Stream interface and optimize

SIMD implementations of ChaCha20 (such as CL 35842) interleave block
computations in order to achieve high performance. This means that
they produce more than 64 bytes of output at a time. Unfortunately
when encrypting small amounts of data (such as Poly1305 keys) the
current interface to ChaCha20 forces the additional encrypted blocks
of output to be discarded and recomputed later since it does not
maintain any state. This additional overhead slows down the encryption
of small amounts of data when using such optimized code.

This CL makes the generic ChaCha20 implementation stateful, caching
key, nonce and counter values and buffering any unused key stream bytes.
ChaCha20 now also implements the high level cipher.Stream interface
which makes the API more consistent with other stream ciphers in the
standard library's crypto package. This will make it easier to add high
performance SIMD implementations in the future.

In addition to modifying the API I have also added some optimizations
to improve the performance of the generic implementation. Note that
the performance will improve further on amd64 with Go 1.11 due to
CL 95475 (binary.LittleEndian.PutUint32 optimization). These benchmarks
are based on Go 1.10.1.

name            old speed      new speed      delta
ChaCha20/32      174MB/s ± 2%   174MB/s ± 1%     ~     (p=0.796 n=10+10)
ChaCha20/63      309MB/s ± 1%   337MB/s ± 2%   +9.32%  (p=0.000 n=10+9)
ChaCha20/64      299MB/s ± 2%   350MB/s ± 1%  +17.12%  (p=0.000 n=9+8)
ChaCha20/256     297MB/s ± 2%   390MB/s ± 1%  +31.40%  (p=0.000 n=10+10)
ChaCha20/1024    300MB/s ± 0%   400MB/s ± 3%  +33.38%  (p=0.000 n=7+10)
ChaCha20/1350    290MB/s ± 1%   386MB/s ± 2%  +33.10%  (p=0.000 n=9+10)
ChaCha20/65536   301MB/s ± 1%   416MB/s ± 2%  +38.25%  (p=0.000 n=9+10)

ChaCha20-Poly1305 (AEAD optimizations manually disabled):

name                       old speed      new speed      delta
Chacha20Poly1305Open_64     122MB/s ± 7%   131MB/s ± 2%   +7.23%  (p=0.000 n=18+18)
Chacha20Poly1305Seal_64     125MB/s ± 4%   137MB/s ± 2%   +9.88%  (p=0.000 n=20+19)
Chacha20Poly1305Open_1350   244MB/s ± 4%   305MB/s ± 3%  +25.04%  (p=0.000 n=20+19)
Chacha20Poly1305Seal_1350   242MB/s ± 3%   309MB/s ± 2%  +27.56%  (p=0.000 n=20+19)
Chacha20Poly1305Open_8K     260MB/s ± 7%   338MB/s ± 3%  +29.96%  (p=0.000 n=20+19)
Chacha20Poly1305Seal_8K     262MB/s ± 5%   335MB/s ± 4%  +27.80%  (p=0.000 n=20+19)

No change in allocations for either set of benchmarks.

Change-Id: I28ca7947904e9d79debe2d5aac6623526fe5e595
Reviewed-on: https://go-review.googlesource.com/104856
Run-TryBot: Michael Munday <mike.munday@ibm.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
4 files changed
tree: 8a078f5ce5d56a079d402c328be96d37edb92bf0
  1. acme/
  2. argon2/
  3. bcrypt/
  4. blake2b/
  5. blake2s/
  6. blowfish/
  7. bn256/
  8. cast5/
  9. chacha20poly1305/
  10. cryptobyte/
  11. curve25519/
  12. ed25519/
  13. hkdf/
  14. internal/
  15. md4/
  16. nacl/
  17. ocsp/
  18. openpgp/
  19. otr/
  20. pbkdf2/
  21. pkcs12/
  22. poly1305/
  23. ripemd160/
  24. salsa20/
  25. scrypt/
  26. sha3/
  27. ssh/
  28. tea/
  29. twofish/
  30. xtea/
  31. xts/
  32. .gitattributes
  33. .gitignore
  34. AUTHORS
  35. codereview.cfg
  36. CONTRIBUTING.md
  37. CONTRIBUTORS
  38. LICENSE
  39. PATENTS
  40. README.md
README.md

Go Cryptography

This repository holds supplementary Go cryptography libraries.

Download/Install

The easiest way to install is to run go get -u golang.org/x/crypto/.... You can also manually git clone the repository to $GOPATH/src/golang.org/x/crypto.

Report Issues / Send Patches

This repository uses Gerrit for code changes. To learn how to submit changes to this repository, see https://golang.org/doc/contribute.html.

The main issue tracker for the crypto repository is located at https://github.com/golang/go/issues. Prefix your issue with “x/crypto:” in the subject line, so it is easy to find.

Note that contributions to the cryptography package receive additional scrutiny due to their sensitive nature. Patches may take longer than normal to receive feedback.