draw: use a sync.Pool for kernel scaling's temporary buffers.

benchmark                      old ns/op     new ns/op     delta
BenchmarkScaleBLLargeDown      257715146     260286012     +1.00%
BenchmarkScaleCRLargeDown      426797448     430078734     +0.77%
BenchmarkScaleBLDown           4449939       4222542       -5.11%
BenchmarkScaleCRDown           8160446       8010056       -1.84%
BenchmarkScaleBLUp             22290312      21044122      -5.59%
BenchmarkScaleCRUp             33010722      32021468      -3.00%
BenchmarkScaleCRSrcGray        13307961      13020192      -2.16%
BenchmarkScaleCRSrcNRGBA       40567431      40801939      +0.58%
BenchmarkScaleCRSrcRGBA        39892971      40240558      +0.87%
BenchmarkScaleCRSrcYCbCr       59020222      59686699      +1.13%

benchmark                      old allocs     new allocs     delta
BenchmarkScaleBLLargeDown      1              1              +0.00%
BenchmarkScaleCRLargeDown      1              2              +100.00%
BenchmarkScaleBLDown           1              0              -100.00%
BenchmarkScaleCRDown           1              0              -100.00%
BenchmarkScaleBLUp             1              0              -100.00%
BenchmarkScaleCRUp             1              0              -100.00%
BenchmarkScaleCRSrcGray        1              0              -100.00%
BenchmarkScaleCRSrcNRGBA       1              0              -100.00%
BenchmarkScaleCRSrcRGBA        1              0              -100.00%
BenchmarkScaleCRSrcYCbCr       1              0              -100.00%

benchmark                      old bytes     new bytes     delta
BenchmarkScaleBLLargeDown      14745600      2949200       -80.00%
BenchmarkScaleCRLargeDown      14745600      4915333       -66.67%
BenchmarkScaleBLDown           1523712       5079          -99.67%
BenchmarkScaleCRDown           1523712       7619          -99.50%
BenchmarkScaleBLUp             10117120      101175        -99.00%
BenchmarkScaleCRUp             10117120      202350        -98.00%
BenchmarkScaleCRSrcGray        4915200       49156         -99.00%
BenchmarkScaleCRSrcNRGBA       4915200       163853        -96.67%
BenchmarkScaleCRSrcRGBA        4915200       163853        -96.67%
BenchmarkScaleCRSrcYCbCr       4915200       245780        -95.00%

The increase in BenchmarkScale??LargeDown number of allocs I think is an
accounting error due to the low number of iterations: a low denominator.
I suspect that there are one or two extra allocs up front for using the
sync.Pool, but one fewer alloc per iteration. The number of iterations
is only 5 for BL and 3 for CR, for the default timeout. If I increase
the -test.benchtime value to 5s, then the reported average (allocs/op)
drop from 2 to 0, so the delta should actually be -100% instead of +0 or
+100%.

Change-Id: I21d9bb0086bdb25517b6a430e8a21bdf3db026f6
Reviewed-on: https://go-review.googlesource.com/8150
Reviewed-by: Rob Pike <r@golang.org>
4 files changed