protobuf-go/internal/encoding/wire: SizeVarint optimisation

Replace division by 7 in SizeVarint(). The previous method was optimised by the compiler to use a 64bit multiplication.
This uses 9/64 as 1/7 and unsigned 32bit multiplication (which compiler can optimise further using scaling address modes, lea (ax,ax*8),ax)) and a shift.)

protobuf-go/internal/benchmarks/micro benchmark

name                            old time/op  new time/op  delta
EmptyMessage/Wire/Marshal-4     40.0ns ± 1%  39.9ns ± 5%    ~     (p=0.683 n=5+5)
EmptyMessage/Wire/Unmarshal-4   20.5ns ± 2%  20.3ns ± 2%    ~     (p=0.317 n=5+5)
EmptyMessage/Wire/Validate-4    21.5ns ± 0%  21.5ns ± 1%    ~     (p=0.825 n=4+5)
EmptyMessage/Clone-4             135ns ± 2%   136ns ± 1%    ~     (p=0.365 n=5+5)
RepeatedInt32/Wire/Marshal-4    4.06µs ± 1%  3.69µs ± 1%  -9.05%  (p=0.008 n=5+5)
RepeatedInt32/Wire/Unmarshal-4  4.72µs ± 0%  4.55µs ± 2%  -3.74%  (p=0.008 n=5+5)
RepeatedInt32/Wire/Validate-4   3.08µs ± 2%  2.94µs ± 0%  -4.69%  (p=0.008 n=5+5)
RepeatedInt32/Clone-4           1.09µs ± 1%  1.09µs ± 0%    ~     (p=0.810 n=5+5)
Required/Wire/Marshal-4          296ns ± 1%   293ns ± 0%  -0.95%  (p=0.000 n=5+4)
Required/Wire/Unmarshal-4        147ns ± 1%   135ns ± 1%  -8.17%  (p=0.008 n=5+5)
Required/Wire/Validate-4         127ns ± 2%   123ns ± 0%  -3.15%  (p=0.000 n=5+4)
Required/Clone-4                 393ns ± 1%   391ns ± 2%    ~     (p=0.238 n=5+5)

Change-Id: Idfe75a9cd80b2bddaf13a8e879403c0c94ebc419
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/221803
Reviewed-by: Damien Neil <dneil@google.com>
diff --git a/internal/encoding/wire/wire.go b/internal/encoding/wire/wire.go
index d7baa7f..e624ff8 100644
--- a/internal/encoding/wire/wire.go
+++ b/internal/encoding/wire/wire.go
@@ -362,7 +362,9 @@
 // SizeVarint returns the encoded size of a varint.
 // The size is guaranteed to be within 1 and 10, inclusive.
 func SizeVarint(v uint64) int {
-	return 1 + (bits.Len64(v)-1)/7
+	// This computes 1 + (bits.Len64(v)-1)/7.
+	// 9/64 is a good enough approximation of 1/7
+	return int(9*uint32(bits.Len64(v))+64) / 64
 }
 
 // AppendFixed32 appends v to b as a little-endian uint32.