crypto/internal/poly1305: implement function update in assembly on loong64

The performance improvements on Loongson-3A5000 and Loongson-3A6000 are as follows:

goos: linux
goarch: loong64
pkg: golang.org/x/crypto/internal/poly1305
cpu: Loongson-3A5000 @ 2500.00MHz
                 |  bench.old   |              bench.new              |
                 |    sec/op    |   sec/op     vs base                |
64                  122.8n ± 0%   100.0n ± 0%  -18.57% (p=0.000 n=10)
1K                 1152.0n ± 0%   732.2n ± 0%  -36.44% (p=0.000 n=10)
2M                  2.356m ± 0%   1.443m ± 0%  -38.74% (p=0.000 n=10)
64Unaligned         122.7n ± 0%   101.5n ± 0%  -17.24% (p=0.000 n=10)
1KUnaligned        1152.0n ± 0%   745.4n ± 0%  -35.30% (p=0.000 n=10)
2MUnaligned         2.336m ± 0%   1.473m ± 0%  -36.94% (p=0.000 n=10)
Write64             77.92n ± 0%   54.88n ± 0%  -29.57% (p=0.000 n=10)
Write1K            1106.0n ± 0%   683.3n ± 0%  -38.22% (p=0.000 n=10)
Write2M             2.356m ± 0%   1.444m ± 0%  -38.72% (p=0.000 n=10)
Write64Unaligned    77.87n ± 0%   55.69n ± 0%  -28.49% (p=0.000 n=10)
Write1KUnaligned   1106.0n ± 0%   708.1n ± 0%  -35.97% (p=0.000 n=10)
Write2MUnaligned    2.335m ± 0%   1.471m ± 0%  -37.01% (p=0.000 n=10)
geomean             6.373µ        4.272µ       -32.96%

                 |  bench.old   |               bench.new               |
                 |     B/s      |      B/s       vs base                |
64                 497.1Mi ± 0%    610.3Mi ± 0%  +22.78% (p=0.000 n=10)
1K                 847.6Mi ± 0%   1333.7Mi ± 0%  +57.35% (p=0.000 n=10)
2M                 849.0Mi ± 0%   1385.9Mi ± 0%  +63.24% (p=0.000 n=10)
64Unaligned        497.4Mi ± 0%    600.9Mi ± 0%  +20.81% (p=0.000 n=10)
1KUnaligned        847.6Mi ± 0%   1310.1Mi ± 0%  +54.57% (p=0.000 n=10)
2MUnaligned        856.3Mi ± 0%   1357.9Mi ± 0%  +58.58% (p=0.000 n=10)
Write64            783.3Mi ± 0%   1112.2Mi ± 0%  +41.99% (p=0.000 n=10)
Write1K            882.8Mi ± 0%   1429.1Mi ± 0%  +61.88% (p=0.000 n=10)
Write2M            849.0Mi ± 0%   1385.4Mi ± 0%  +63.18% (p=0.000 n=10)
Write64Unaligned   783.8Mi ± 0%   1096.1Mi ± 0%  +39.85% (p=0.000 n=10)
Write1KUnaligned   882.8Mi ± 0%   1379.0Mi ± 0%  +56.20% (p=0.000 n=10)
Write2MUnaligned   856.5Mi ± 0%   1359.9Mi ± 0%  +58.76% (p=0.000 n=10)
geomean            772.2Mi         1.125Gi       +49.18%

goos: linux
goarch: loong64
pkg: golang.org/x/crypto/internal/poly1305
cpu: Loongson-3A6000-HV @ 2500.00MHz
                 |  bench.old  |              bench.new              |
                 |   sec/op    |   sec/op     vs base                |
64                 92.06n ± 0%   71.55n ± 0%  -22.28% (p=0.000 n=10)
1K                 998.4n ± 0%   607.7n ± 0%  -39.13% (p=0.000 n=10)
2M                 1.976m ± 0%   1.165m ± 0%  -41.07% (p=0.000 n=10)
64Unaligned        92.05n ± 0%   71.55n ± 0%  -22.27% (p=0.000 n=10)
1KUnaligned        998.3n ± 0%   607.6n ± 0%  -39.13% (p=0.000 n=10)
2MUnaligned        1.975m ± 0%   1.222m ± 0%  -38.11% (p=0.000 n=10)
Write64            65.24n ± 0%   45.23n ± 0%  -30.67% (p=0.000 n=10)
Write1K            970.8n ± 0%   577.6n ± 0%  -40.51% (p=0.000 n=10)
Write2M            1.965m ± 0%   1.163m ± 0%  -40.81% (p=0.000 n=10)
Write64Unaligned   65.24n ± 0%   45.24n ± 0%  -30.66% (p=0.000 n=10)
Write1KUnaligned   970.8n ± 0%   577.6n ± 0%  -40.50% (p=0.000 n=10)
Write2MUnaligned   1.965m ± 0%   1.222m ± 0%  -37.81% (p=0.000 n=10)
geomean            5.317µ        3.426µ       -35.58%

                 |   bench.old   |               bench.new               |
                 |      B/s      |      B/s       vs base                |
64                  663.0Mi ± 0%    853.1Mi ± 0%  +28.67% (p=0.000 n=10)
1K                  978.1Mi ± 0%   1606.9Mi ± 0%  +64.28% (p=0.000 n=10)
2M                 1012.0Mi ± 0%   1717.4Mi ± 0%  +69.70% (p=0.000 n=10)
64Unaligned         663.1Mi ± 0%    853.1Mi ± 0%  +28.65% (p=0.000 n=10)
1KUnaligned         978.2Mi ± 0%   1607.1Mi ± 0%  +64.29% (p=0.000 n=10)
2MUnaligned        1012.6Mi ± 0%   1636.2Mi ± 0%  +61.58% (p=0.000 n=10)
Write64             935.5Mi ± 0%   1349.3Mi ± 0%  +44.23% (p=0.000 n=10)
Write1K            1005.9Mi ± 0%   1690.9Mi ± 0%  +68.09% (p=0.000 n=10)
Write2M            1017.7Mi ± 0%   1719.5Mi ± 0%  +68.95% (p=0.000 n=10)
Write64Unaligned    935.6Mi ± 0%   1349.3Mi ± 0%  +44.22% (p=0.000 n=10)
Write1KUnaligned   1006.0Mi ± 0%   1690.9Mi ± 0%  +68.08% (p=0.000 n=10)
Write2MUnaligned   1017.7Mi ± 0%   1636.4Mi ± 0%  +60.80% (p=0.000 n=10)
geomean             925.6Mi         1.403Gi       +55.22%

Change-Id: If05a8bfc868b3e6f903ff169eed7a894af741f9b
Reviewed-on: https://go-review.googlesource.com/c/crypto/+/638455
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
4 files changed
tree: 689cd5cf5359689bb4d1cb4fbf0f96d84c1982d4
  1. acme/
  2. argon2/
  3. bcrypt/
  4. blake2b/
  5. blake2s/
  6. blowfish/
  7. bn256/
  8. cast5/
  9. chacha20/
  10. chacha20poly1305/
  11. cryptobyte/
  12. curve25519/
  13. ed25519/
  14. hkdf/
  15. internal/
  16. md4/
  17. nacl/
  18. ocsp/
  19. openpgp/
  20. otr/
  21. pbkdf2/
  22. pkcs12/
  23. poly1305/
  24. ripemd160/
  25. salsa20/
  26. scrypt/
  27. sha3/
  28. ssh/
  29. tea/
  30. twofish/
  31. x509roots/
  32. xtea/
  33. xts/
  34. .gitattributes
  35. .gitignore
  36. codereview.cfg
  37. CONTRIBUTING.md
  38. go.mod
  39. go.sum
  40. LICENSE
  41. PATENTS
  42. README.md
README.md

Go Cryptography

Go Reference

This repository holds supplementary Go cryptography packages.

Report Issues / Send Patches

This repository uses Gerrit for code changes. To learn how to submit changes to this repository, see https://go.dev/doc/contribute.

The git repository is https://go.googlesource.com/crypto.

The main issue tracker for the crypto repository is located at https://go.dev/issues. Prefix your issue with “x/crypto:” in the subject line, so it is easy to find.

Note that contributions to the cryptography package receive additional scrutiny due to their sensitive nature. Patches may take longer than normal to receive feedback.