x/crypto/poly1305: fix memory alignment fault in ARM

  The current ARM implementation assumes that the input message
  is memory aligned and so it can cause alignment fault when it
  is not enabled. Also it may generate incorrect outputs in ARMv5.

  This change fixes this issue by temporarily copying the input
  to a local aligned space. Although there may be a better way
  to handle unaligned access, this would be a safe way in all
  ARM versions.

  This change also added a test and benchmarks with unaligned
  data. The benchmark result on RasberryPI 2 is

  Benchmark64  2000000         812 ns/op    78.81 MB/s
  Benchmark1K   200000        7809 ns/op   131.12 MB/s
  Benchmark64Unaligned   2000000         967 ns/op    66.13 MB/s
  Benchmark1KUnaligned    200000       10316 ns/op    99.26 MB/s

Change-Id: I189cc1b7bb6c67a04c9877271fb27326f2896e82
Reviewed-on: https://go-review.googlesource.com/12797
Reviewed-by: Adam Langley <agl@golang.org>
2 files changed