syncmap: remove blocking for all operations on existing keys

In the previous version, Loads of keys with unchanged values would
block on the Mutex if the map was dirty, leading to potentially long
stalls while the read-only map is being copied to the dirty map.

This change adds a double pointer indirection to each entry and an
extra atomic load on the fast path. In exchange, it removes the need
to lock the map for nearly all operations on existing keys.

Each entry is represented by an atomically-updated pointer to an
interface{} value. The same entries are stored in both the read-only
and dirty maps. Keys deleted before the dirty map was last copied are
marked as "expunged" and omitted from the dirty map until the next
Store to that key.

Newly-stored values exist only in the dirty map. The existence of new
keys in the dirty map is indicated by an "amended" bit attached to the
read-only map, allowing Load or Delete calls for nonexistent keys to
sometimes return immediately (without acquiring the Mutex).

This trades off a bit of steady-state throughput in the "mostly hits"
case in exchange for less contention (and lower Load tail latencies)
for mixed Load/Store/Delete usage.

Unfortunately, the latency impact is currently difficult to measure
(#19128).

updates golang/go#19128
updates golang/go#18177

https://perf.golang.org/search?q=upload:20170315.5

name                                               old time/op    new time/op    delta
LoadMostlyHits/*syncmap_test.DeepCopyMap             70.3ns ± 6%    70.2ns ± 4%      ~     (p=0.886 n=4+4)
LoadMostlyHits/*syncmap_test.DeepCopyMap-48          10.9ns ± 4%    13.6ns ±24%      ~     (p=0.314 n=4+4)
LoadMostlyHits/*syncmap_test.RWMutexMap              86.0ns ± 9%    86.0ns ± 4%      ~     (p=1.000 n=4+4)
LoadMostlyHits/*syncmap_test.RWMutexMap-48            140ns ± 3%     139ns ± 2%      ~     (p=0.686 n=4+4)
LoadMostlyHits/*syncmap.Map                          70.6ns ± 8%    70.3ns ± 1%      ~     (p=0.571 n=4+4)
LoadMostlyHits/*syncmap.Map-48                       10.3ns ± 7%    11.2ns ± 5%      ~     (p=0.114 n=4+4)
LoadMostlyMisses/*syncmap_test.DeepCopyMap           59.4ns ± 4%    59.4ns ± 4%      ~     (p=1.000 n=4+4)
LoadMostlyMisses/*syncmap_test.DeepCopyMap-48        10.4ns ±30%    12.6ns ±29%      ~     (p=0.486 n=4+4)
LoadMostlyMisses/*syncmap_test.RWMutexMap            64.8ns ± 6%    63.8ns ± 4%      ~     (p=0.686 n=4+4)
LoadMostlyMisses/*syncmap_test.RWMutexMap-48          138ns ± 3%     139ns ± 2%      ~     (p=0.800 n=4+4)
LoadMostlyMisses/*syncmap.Map                        51.5ns ± 7%    50.4ns ± 2%      ~     (p=0.371 n=4+4)
LoadMostlyMisses/*syncmap.Map-48                     9.37ns ± 3%    9.40ns ± 3%      ~     (p=0.886 n=4+4)
LoadOrStoreBalanced/*syncmap_test.RWMutexMap          820ns ±17%     834ns ±14%      ~     (p=1.000 n=4+4)
LoadOrStoreBalanced/*syncmap_test.RWMutexMap-48      1.38µs ± 4%    1.38µs ± 3%      ~     (p=0.886 n=4+4)
LoadOrStoreBalanced/*syncmap.Map                      709ns ±13%    1085ns ± 9%   +53.09%  (p=0.029 n=4+4)
LoadOrStoreBalanced/*syncmap.Map-48                  1.30µs ±13%    1.08µs ± 8%   -17.40%  (p=0.029 n=4+4)
LoadOrStoreUnique/*syncmap_test.RWMutexMap           1.26µs ±14%    1.16µs ±29%      ~     (p=0.343 n=4+4)
LoadOrStoreUnique/*syncmap_test.RWMutexMap-48        1.82µs ±15%    1.98µs ± 6%      ~     (p=0.143 n=4+4)
LoadOrStoreUnique/*syncmap.Map                       1.06µs ± 7%    1.86µs ±11%   +76.09%  (p=0.029 n=4+4)
LoadOrStoreUnique/*syncmap.Map-48                    2.00µs ± 4%    1.62µs ± 4%   -19.20%  (p=0.029 n=4+4)
LoadOrStoreCollision/*syncmap_test.DeepCopyMap       32.9ns ± 8%    32.6ns ± 9%      ~     (p=0.686 n=4+4)
LoadOrStoreCollision/*syncmap_test.DeepCopyMap-48   2.26ns ±136%    3.41ns ±62%      ~     (p=0.114 n=4+4)
LoadOrStoreCollision/*syncmap_test.RWMutexMap        58.0ns ±20%    54.6ns ±17%      ~     (p=0.343 n=4+4)
LoadOrStoreCollision/*syncmap_test.RWMutexMap-48      458ns ± 2%     445ns ± 6%      ~     (p=0.229 n=4+4)
LoadOrStoreCollision/*syncmap.Map                    35.8ns ± 5%    39.6ns ± 9%      ~     (p=0.114 n=4+4)
LoadOrStoreCollision/*syncmap.Map-48                 1.24ns ± 2%    1.58ns ± 4%   +27.16%  (p=0.029 n=4+4)
Range/*syncmap_test.DeepCopyMap                      19.7µs ± 4%    19.8µs ± 3%      ~     (p=0.686 n=4+4)
Range/*syncmap_test.DeepCopyMap-48                    763ns ± 1%     864ns ± 3%   +13.24%  (p=0.029 n=4+4)
Range/*syncmap_test.RWMutexMap                       20.9µs ± 3%    20.4µs ± 4%      ~     (p=0.343 n=4+4)
Range/*syncmap_test.RWMutexMap-48                     764ns ± 1%     870ns ± 1%   +13.77%  (p=0.029 n=4+4)
Range/*syncmap.Map                                   20.4µs ± 5%    22.7µs ± 5%   +10.98%  (p=0.029 n=4+4)
Range/*syncmap.Map-48                                 776ns ± 3%     954ns ± 4%   +22.95%  (p=0.029 n=4+4)
AdversarialAlloc/*syncmap_test.DeepCopyMap            206ns ± 5%     199ns ± 2%      ~     (p=0.200 n=4+4)
AdversarialAlloc/*syncmap_test.DeepCopyMap-48        8.94µs ± 5%    9.21µs ± 4%      ~     (p=0.200 n=4+4)
AdversarialAlloc/*syncmap_test.RWMutexMap            63.4ns ± 4%    63.8ns ± 3%      ~     (p=0.686 n=4+4)
AdversarialAlloc/*syncmap_test.RWMutexMap-48          184ns ±10%     198ns ±11%      ~     (p=0.343 n=4+4)
AdversarialAlloc/*syncmap.Map                         213ns ± 3%     264ns ± 3%   +23.97%  (p=0.029 n=4+4)
AdversarialAlloc/*syncmap.Map-48                      556ns ± 5%    1389ns ± 8%  +149.93%  (p=0.029 n=4+4)
AdversarialDelete/*syncmap_test.DeepCopyMap           300ns ± 6%     304ns ± 7%      ~     (p=0.886 n=4+4)
AdversarialDelete/*syncmap_test.DeepCopyMap-48        647ns ± 3%     646ns ± 3%      ~     (p=0.943 n=4+4)
AdversarialDelete/*syncmap_test.RWMutexMap           69.1ns ± 1%    69.2ns ± 6%      ~     (p=0.686 n=4+4)
AdversarialDelete/*syncmap_test.RWMutexMap-48         289ns ±15%     300ns ±17%      ~     (p=0.829 n=4+4)
AdversarialDelete/*syncmap.Map                        198ns ± 5%     264ns ± 2%   +33.17%  (p=0.029 n=4+4)
AdversarialDelete/*syncmap.Map-48                     291ns ± 9%     173ns ± 8%   -40.50%  (p=0.029 n=4+4)

name                                               old alloc/op   new alloc/op   delta
LoadMostlyHits/*syncmap_test.DeepCopyMap              7.00B ± 0%     7.00B ± 0%      ~     (all equal)
LoadMostlyHits/*syncmap_test.DeepCopyMap-48           7.00B ± 0%     7.00B ± 0%      ~     (all equal)
LoadMostlyHits/*syncmap_test.RWMutexMap               7.00B ± 0%     7.00B ± 0%      ~     (all equal)
LoadMostlyHits/*syncmap_test.RWMutexMap-48            7.00B ± 0%     7.00B ± 0%      ~     (all equal)
LoadMostlyHits/*syncmap.Map                           7.00B ± 0%     7.00B ± 0%      ~     (all equal)
LoadMostlyHits/*syncmap.Map-48                        7.00B ± 0%     7.00B ± 0%      ~     (all equal)
LoadMostlyMisses/*syncmap_test.DeepCopyMap            7.00B ± 0%     7.00B ± 0%      ~     (all equal)
LoadMostlyMisses/*syncmap_test.DeepCopyMap-48         7.00B ± 0%     7.00B ± 0%      ~     (all equal)
LoadMostlyMisses/*syncmap_test.RWMutexMap             7.00B ± 0%     7.00B ± 0%      ~     (all equal)
LoadMostlyMisses/*syncmap_test.RWMutexMap-48          7.00B ± 0%     7.00B ± 0%      ~     (all equal)
LoadMostlyMisses/*syncmap.Map                         7.00B ± 0%     7.00B ± 0%      ~     (all equal)
LoadMostlyMisses/*syncmap.Map-48                      7.00B ± 0%     7.00B ± 0%      ~     (all equal)
LoadOrStoreBalanced/*syncmap_test.RWMutexMap          95.0B ± 0%     95.0B ± 0%      ~     (all equal)
LoadOrStoreBalanced/*syncmap_test.RWMutexMap-48       95.0B ± 0%     95.0B ± 0%      ~     (all equal)
LoadOrStoreBalanced/*syncmap.Map                      95.0B ± 0%     88.0B ± 0%    -7.37%  (p=0.029 n=4+4)
LoadOrStoreBalanced/*syncmap.Map-48                   95.0B ± 0%     88.0B ± 0%    -7.37%  (p=0.029 n=4+4)
LoadOrStoreUnique/*syncmap_test.RWMutexMap             175B ± 0%      175B ± 0%      ~     (all equal)
LoadOrStoreUnique/*syncmap_test.RWMutexMap-48          175B ± 0%      175B ± 0%      ~     (all equal)
LoadOrStoreUnique/*syncmap.Map                         175B ± 0%      161B ± 0%    -8.00%  (p=0.029 n=4+4)
LoadOrStoreUnique/*syncmap.Map-48                      175B ± 0%      161B ± 0%    -8.00%  (p=0.029 n=4+4)
LoadOrStoreCollision/*syncmap_test.DeepCopyMap        0.00B          0.00B           ~     (all equal)
LoadOrStoreCollision/*syncmap_test.DeepCopyMap-48     0.00B          0.00B           ~     (all equal)
LoadOrStoreCollision/*syncmap_test.RWMutexMap         0.00B          0.00B           ~     (all equal)
LoadOrStoreCollision/*syncmap_test.RWMutexMap-48      0.00B          0.00B           ~     (all equal)
LoadOrStoreCollision/*syncmap.Map                     0.00B          0.00B           ~     (all equal)
LoadOrStoreCollision/*syncmap.Map-48                  0.00B          0.00B           ~     (all equal)
Range/*syncmap_test.DeepCopyMap                       0.00B          0.00B           ~     (all equal)
Range/*syncmap_test.DeepCopyMap-48                    0.00B          0.00B           ~     (all equal)
Range/*syncmap_test.RWMutexMap                        0.00B          0.00B           ~     (all equal)
Range/*syncmap_test.RWMutexMap-48                     0.00B          0.00B           ~     (all equal)
Range/*syncmap.Map                                    0.00B          0.00B           ~     (all equal)
Range/*syncmap.Map-48                                 0.00B          0.00B           ~     (all equal)
AdversarialAlloc/*syncmap_test.DeepCopyMap            74.0B ± 0%     74.0B ± 0%      ~     (all equal)
AdversarialAlloc/*syncmap_test.DeepCopyMap-48        3.15kB ± 0%    3.15kB ± 0%      ~     (p=1.000 n=4+4)
AdversarialAlloc/*syncmap_test.RWMutexMap             8.00B ± 0%     8.00B ± 0%      ~     (all equal)
AdversarialAlloc/*syncmap_test.RWMutexMap-48          8.00B ± 0%     8.00B ± 0%      ~     (all equal)
AdversarialAlloc/*syncmap.Map                         74.0B ± 0%     55.0B ± 0%   -25.68%  (p=0.029 n=4+4)
AdversarialAlloc/*syncmap.Map-48                      8.00B ± 0%    56.25B ± 1%  +603.12%  (p=0.029 n=4+4)
AdversarialDelete/*syncmap_test.DeepCopyMap            155B ± 0%      155B ± 0%      ~     (all equal)
AdversarialDelete/*syncmap_test.DeepCopyMap-48         156B ± 0%      156B ± 0%      ~     (p=1.000 n=4+4)
AdversarialDelete/*syncmap_test.RWMutexMap            8.00B ± 0%     8.00B ± 0%      ~     (all equal)
AdversarialDelete/*syncmap_test.RWMutexMap-48         8.00B ± 0%     8.00B ± 0%      ~     (all equal)
AdversarialDelete/*syncmap.Map                        81.0B ± 0%     65.0B ± 0%   -19.75%  (p=0.029 n=4+4)
AdversarialDelete/*syncmap.Map-48                     23.0B ± 9%     15.2B ± 5%   -33.70%  (p=0.029 n=4+4)

name                                               old allocs/op  new allocs/op  delta
LoadMostlyHits/*syncmap_test.DeepCopyMap               0.00           0.00           ~     (all equal)
LoadMostlyHits/*syncmap_test.DeepCopyMap-48            0.00           0.00           ~     (all equal)
LoadMostlyHits/*syncmap_test.RWMutexMap                0.00           0.00           ~     (all equal)
LoadMostlyHits/*syncmap_test.RWMutexMap-48             0.00           0.00           ~     (all equal)
LoadMostlyHits/*syncmap.Map                            0.00           0.00           ~     (all equal)
LoadMostlyHits/*syncmap.Map-48                         0.00           0.00           ~     (all equal)
LoadMostlyMisses/*syncmap_test.DeepCopyMap             0.00           0.00           ~     (all equal)
LoadMostlyMisses/*syncmap_test.DeepCopyMap-48          0.00           0.00           ~     (all equal)
LoadMostlyMisses/*syncmap_test.RWMutexMap              0.00           0.00           ~     (all equal)
LoadMostlyMisses/*syncmap_test.RWMutexMap-48           0.00           0.00           ~     (all equal)
LoadMostlyMisses/*syncmap.Map                          0.00           0.00           ~     (all equal)
LoadMostlyMisses/*syncmap.Map-48                       0.00           0.00           ~     (all equal)
LoadOrStoreBalanced/*syncmap_test.RWMutexMap           2.00 ± 0%      2.00 ± 0%      ~     (all equal)
LoadOrStoreBalanced/*syncmap_test.RWMutexMap-48        2.00 ± 0%      2.00 ± 0%      ~     (all equal)
LoadOrStoreBalanced/*syncmap.Map                       2.00 ± 0%      3.00 ± 0%   +50.00%  (p=0.029 n=4+4)
LoadOrStoreBalanced/*syncmap.Map-48                    2.00 ± 0%      3.00 ± 0%   +50.00%  (p=0.029 n=4+4)
LoadOrStoreUnique/*syncmap_test.RWMutexMap             2.00 ± 0%      2.00 ± 0%      ~     (all equal)
LoadOrStoreUnique/*syncmap_test.RWMutexMap-48          2.00 ± 0%      2.00 ± 0%      ~     (all equal)
LoadOrStoreUnique/*syncmap.Map                         2.00 ± 0%      4.00 ± 0%  +100.00%  (p=0.029 n=4+4)
LoadOrStoreUnique/*syncmap.Map-48                      2.00 ± 0%      4.00 ± 0%  +100.00%  (p=0.029 n=4+4)
LoadOrStoreCollision/*syncmap_test.DeepCopyMap         0.00           0.00           ~     (all equal)
LoadOrStoreCollision/*syncmap_test.DeepCopyMap-48      0.00           0.00           ~     (all equal)
LoadOrStoreCollision/*syncmap_test.RWMutexMap          0.00           0.00           ~     (all equal)
LoadOrStoreCollision/*syncmap_test.RWMutexMap-48       0.00           0.00           ~     (all equal)
LoadOrStoreCollision/*syncmap.Map                      0.00           0.00           ~     (all equal)
LoadOrStoreCollision/*syncmap.Map-48                   0.00           0.00           ~     (all equal)
Range/*syncmap_test.DeepCopyMap                        0.00           0.00           ~     (all equal)
Range/*syncmap_test.DeepCopyMap-48                     0.00           0.00           ~     (all equal)
Range/*syncmap_test.RWMutexMap                         0.00           0.00           ~     (all equal)
Range/*syncmap_test.RWMutexMap-48                      0.00           0.00           ~     (all equal)
Range/*syncmap.Map                                     0.00           0.00           ~     (all equal)
Range/*syncmap.Map-48                                  0.00           0.00           ~     (all equal)
AdversarialAlloc/*syncmap_test.DeepCopyMap             1.00 ± 0%      1.00 ± 0%      ~     (all equal)
AdversarialAlloc/*syncmap_test.DeepCopyMap-48          1.00 ± 0%      1.00 ± 0%      ~     (all equal)
AdversarialAlloc/*syncmap_test.RWMutexMap              1.00 ± 0%      1.00 ± 0%      ~     (all equal)
AdversarialAlloc/*syncmap_test.RWMutexMap-48           1.00 ± 0%      1.00 ± 0%      ~     (all equal)
AdversarialAlloc/*syncmap.Map                          1.00 ± 0%      1.00 ± 0%      ~     (all equal)
AdversarialAlloc/*syncmap.Map-48                       1.00 ± 0%      1.00 ± 0%      ~     (all equal)
AdversarialDelete/*syncmap_test.DeepCopyMap            1.00 ± 0%      1.00 ± 0%      ~     (all equal)
AdversarialDelete/*syncmap_test.DeepCopyMap-48         1.00 ± 0%      1.00 ± 0%      ~     (all equal)
AdversarialDelete/*syncmap_test.RWMutexMap             1.00 ± 0%      1.00 ± 0%      ~     (all equal)
AdversarialDelete/*syncmap_test.RWMutexMap-48          1.00 ± 0%      1.00 ± 0%      ~     (all equal)
AdversarialDelete/*syncmap.Map                         1.00 ± 0%      1.00 ± 0%      ~     (all equal)
AdversarialDelete/*syncmap.Map-48                      1.00 ± 0%      1.00 ± 0%      ~     (all equal)

Change-Id: I93c9458505d84238cc8e9016a7dfd285868af236
Reviewed-on: https://go-review.googlesource.com/37342
Reviewed-by: Russ Cox <rsc@golang.org>
1 file changed