crypto: allocate less.

The code in hash functions themselves could write directly into the
output buffer for a savings of about 50ns. But it's a little ugly so I
wasted a copy.

R=bradfitz
CC=golang-dev
https://golang.org/cl/5440111
12 files changed