shiny/driver/internal/swizzle: new package.

On my amd64 desktop machine:
BenchmarkBGRA-8      	    3000	    469214 ns/op
BenchmarkPureGoBGRA-8	     500	   3267103 ns/op

When swizzling a 1920x1080 RGBA pixel buffer, there's a 7x difference
between 3.27ms and 0.47ms, and that 3-ish milliseconds difference is a
noticable fraction of the 16.67ms that a 60Hz refresh rate gives you.

Thanks to Aaron Jacobs for his help with SIMD assembly.

Change-Id: I8c1a50cc3f038824e07442492f8f0f6b22c83728
Reviewed-on: https://go-review.googlesource.com/13003
Reviewed-by: David Crawshaw <crawshaw@golang.org>
6 files changed