runtime/bytes: fast Compare for byte arrays and strings.

Uses SSE instructions to process 16 bytes at a time.

fixes #5354

R=bradfitz, google
CC=golang-dev
https://golang.org/cl/8853048
9 files changed