image/jpeg: speed up decoding by inlining the clip function and
writing the idct result directly to the image buffer instead of
storing it in an intermediate d.blocks field.

Writing to d.blocks was necessary when decoding to an image.RGBA image,
but now that we decode to a ycbcr.YCbCr we can write each component
directly to the image buffer.

Crude "time ./6.out" scores to decode a specific 2592x1944 JPEG 20
times show a 16% speed-up:

BEFORE

user	0m10.410s
user	0m10.400s
user	0m10.480s
user	0m10.480s
user	0m10.460s

AFTER

user	0m9.050s
user	0m9.050s
user	0m9.050s
user	0m9.070s
user	0m9.020s

R=r
CC=golang-dev
https://golang.org/cl/4523052
2 files changed