GoKer

The following examples are obtained from the publication “GoBench: A Benchmark Suite of Real-World Go Concurrency Bugs” (doi:10.1109/CGO51591.2021.9370317).

Authors Ting Yuan (yuanting@ict.ac.cn): State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing, China; Guangwei Li (liguangwei@ict.ac.cn): State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing, China; Jie Lu† (lujie@ict.ac.an): State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences; Chen Liu (liuchen17z@ict.ac.cn): State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing, China Lian Li (lianli@ict.ac.cn): State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing, China; Jingling Xue (jingling@cse.unsw.edu.au): University of New South Wales, School of Computer Science and Engineering, Sydney, Australia

White paper: https://lujie.ac.cn/files/papers/GoBench.pdf

The examples have been modified in order to run the goroutine leak profiler. Buggy snippets are moved from within a unit test to separate applications. Each is then independently executed, possibly as multiple copies within the same application in order to exercise more interleavings. Concurrently, the main program sets up a waiting period (typically 1ms), followed by a goroutine leak profile request. Other modifications may involve injecting calls to runtime.Gosched(), to more reliably exercise buggy interleavings, or reductions in waiting periods when calling time.Sleep, in order to reduce overall testing time.

The resulting goroutine leak profile is analyzed to ensure that no unexpecte leaks occurred, and that the expected leaks did occur. If the leak is flaky, the only purpose of the expected leak list is to protect against unexpected leaks.

The examples have also been modified to remove data races, since those create flaky test failures, when really all we care about are leaked goroutines.

The entries below document each of the corresponding leaks.

Cockroach/10214

Bug IDRefPatchTypeSub-type
cockroach#10214pull requestpatchResourceAB-BA leak

Description

This goroutine leak is caused by different order when acquiring coalescedMu.Lock() and raftMu.Lock(). The fix is to refactor sendQueuedHeartbeats() so that cockroachdb can unlock coalescedMu before locking raftMu.

Example execution

G1                                      G2
------------------------------------------------------------------------------------
s.sendQueuedHeartbeats()                .
s.coalescedMu.Lock() [L1]               .
s.sendQueuedHeartbeatsToNode()          .
s.mu.replicas[0].reportUnreachable()    .
s.mu.replicas[0].raftMu.Lock() [L2]     .
.                                       s.mu.replicas[0].tick()
.                                       s.mu.replicas[0].raftMu.Lock() [L2]
.                                       s.mu.replicas[0].tickRaftMuLocked()
.                                       s.mu.replicas[0].mu.Lock() [L3]
.                                       s.mu.replicas[0].maybeQuiesceLocked()
.                                       s.mu.replicas[0].maybeCoalesceHeartbeat()
.                                       s.coalescedMu.Lock() [L1]
--------------------------------G1,G2 leak------------------------------------------

Cockroach/1055

Bug IDRefPatchTypeSub-type
cockroach#1055pull requestpatchMixedChannel & WaitGroup

Description

  1. Stop() is called and blocked at s.stop.Wait() after acquiring the lock.
  2. StartTask() is called and attempts to acquire the lock. It is then blocked.
  3. Stop() never finishes since the task doesn't call SetStopped.

Example execution

G1                      G2.0                       G2.1                       G2.2                       G3
-------------------------------------------------------------------------------------------------------------------------------
s[0].stop.Add(1) [1]
go func() [G2.0]
s[1].stop.Add(1) [1]    .
go func() [G2.1]        .
s[2].stop.Add(1) [1]    .                          .
go func() [G2.2]        .                          .
go func() [G3]          .                          .                          .
<-done                  .                          .                          .                          .
.                       s[0].StartTask()           .                          .                          .
.                       s[0].draining == 0         .                          .                          .
.                       .                          s[1].StartTask()           .                          .
.                       .                          s[1].draining == 0         .                          .
.                       .                          .                          s[2].StartTask()           .
.                       .                          .                          s[2].draining == 0         .
.                       .                          .                          .                          s[0].Quiesce()
.                       .                          .                          .                          s[0].mu.Lock() [L1[0]]
.                       s[0].mu.Lock() [L1[0]]     .                          .                          .
.                       s[0].drain.Add(1) [1]      .                          .                          .
.                       s[0].mu.Unlock() [L1[0]]   .                          .                          .
.                       <-s[0].ShouldStop()        .                          .                          .
.                       .                          .                          .                          s[0].draining = 1
.                       .                          .                          .                          s[0].drain.Wait()
.                       .                          s[0].mu.Lock() [L1[1]]     .                          .
.                       .                          s[1].drain.Add(1) [1]      .                          .
.                       .                          s[1].mu.Unlock() [L1[1]]   .                          .
.                       .                          <-s[1].ShouldStop()        .                          .
.                       .                          .                          s[2].mu.Lock() [L1[2]]     .
.                       .                          .                          s[2].drain.Add() [1]       .
.                       .                          .                          s[2].mu.Unlock() [L1[2]]   .
.                       .                          .                          <-s[2].ShouldStop()        .
----------------------------------------------------G1, G2.[0..2], G3 leak-----------------------------------------------------

Cockroach/10790

Bug IDRefPatchTypeSub-type
cockroach#10790pull requestpatchCommunicationChannel & Context

Description

It is possible that a message from ctxDone will make beginCmds return without draining the channel ch, so that anonymous function goroutines will leak.

Example execution

G1                  G2              helper goroutine
-----------------------------------------------------
.                   .               r.sendChans()
r.beginCmds()       .               .
.                   .               ch1 <- true
<- ch1              .               .
.                   .               ch2 <- true
...
.                   cancel()
<- ch1
------------------G1 leak----------------------------

Cockroach/13197

Bug IDRefPatchTypeSub-type
cockroach#13197pull requestpatchCommunicationChannel & Context

Description

One goroutine executing (*Tx).awaitDone() blocks and waiting for a signal context.Done().

Example execution

G1              G2
-------------------------------
begin()
.               awaitDone()
return          .
.               <-tx.ctx.Done()
-----------G2 leaks------------

Cockroach/13755

Bug IDRefPatchTypeSub-type
cockroach#13755pull requestpatchCommunicationChannel & Context

Description

The buggy code does not close the db query result (rows), so that one goroutine running (*Rows).awaitDone is blocked forever. The blocking goroutine is waiting for cancel signal from context.

Example execution

G1                      G2
---------------------------------------
initContextClose()
.                       awaitDone()
return                  .
.                       <-tx.ctx.Done()
---------------G2 leaks----------------

Cockroach/1462

Bug IDRefPatchTypeSub-type
cockroach#1462pull requestpatchMixedChannel & WaitGroup

Description

Executing <-stopper.ShouldStop() in processEventsUntil may cause goroutines created by lt.RunWorker in lt.start to be stuck sending a message over lt.Events. The main thread is then stuck at s.stop.Wait(), since the sender goroutines cannot call s.stop.Done().

Example execution

G1                                  G2                                G3
-------------------------------------------------------------------------------------------------------
NewLocalInterceptableTransport()
lt.start()
lt.stopper.RunWorker()
s.AddWorker()
s.stop.Add(1) [1]
go func() [G2]
stopper.RunWorker()                 .
s.AddWorker()                       .
s.stop.Add(1) [2]                   .
go func() [G3]                      .
s.Stop()                            .                                 .
s.Quiesce()                         .                                 .
.                                   select [default]                  .
.                                   lt.Events <- interceptMessage(0)  .
close(s.stopper)                    .                                 .
.                                   .                                 select [<-stopper.ShouldStop()]
.                                   .                                 <<<done>>>
s.stop.Wait()                       .
----------------------------------------------G1,G2 leak-----------------------------------------------

Cockroach/16167

Bug IDRefPatchTypeSub-type
cockroach#16167pull requestpatchResourceDouble Locking

Description

This is another example of goroutine leaks caused by recursively acquiring RWLock. There are two lock variables (systemConfigCond and systemConfigMu) which refer to the same underlying lock. The leak invovlves two goroutines. The first acquires systemConfigMu.Lock(), then tries to acquire systemConfigMu.RLock(). The second acquires systemConfigMu.Lock(). If the second goroutine interleaves in between the two lock operations of the first goroutine, both goroutines will leak.

Example execution

G1                                G2
---------------------------------------------------------------
.                                 e.Start()
.                                 e.updateSystemConfig()
e.execParsed()                    .
e.systemConfigCond.L.Lock() [L1]  .
.                                 e.systemConfigMu.Lock() [L1]
e.systemConfigMu.RLock() [L1]     .
------------------------G1,G2 leak-----------------------------

Cockroach/18101

Bug IDRefPatchTypeSub-type
cockroach#18101pull requestpatchResourceDouble Locking

Description

The context.Done() signal short-circuits the reader goroutine, but not the senders, leading them to leak.

Example execution

G1                      G2                   helper goroutine
--------------------------------------------------------------
restore()
.                       splitAndScatter()
<-readyForImportCh      .
<-readyForImportCh <==> readyForImportCh<-
...
.                       .                    cancel()
<<done>>                .                    <<done>>
                        readyForImportCh<-
-----------------------G2 leaks--------------------------------

Cockroach/2448

Bug IDRefPatchTypeSub-type
cockroach#2448pull requestpatchCommunicationChannel

Description

This bug is caused by two goroutines waiting for each other to unblock their channels:

  1. MultiRaft sends the commit event for the Membership change
  2. store.processRaft takes it and begins processing
  3. another command commits and triggers another sendEvent, but this blocks since store.processRaft isn't ready for another select. Consequently the main MultiRaft loop is waiting for that as well.
  4. the Membership change was applied to the range, and the store now tries to execute the callback
  5. the callback tries to write to callbackChan, but that is consumed by the MultiRaft loop, which is currently waiting for store.processRaft to consume from the events channel, which it will only do after the callback has completed.

Example execution

G1                          G2
--------------------------------------------------------------------------
s.processRaft()             st.start()
select                      .
.                           select [default]
.                           s.handleWriteResponse()
.                           s.sendEvent()
.                           select
<-s.multiraft.Events <----> m.Events <- event
.                           select [default]
.                           s.handleWriteResponse()
.                           s.sendEvent()
.                           select [m.Events<-, <-s.stopper.ShouldStop()]
callback()                  .
select [
   m.callbackChan<-,
   <-s.stopper.ShouldStop()
]                           .
------------------------------G1,G2 leak----------------------------------

Cockroach/24808

Bug IDRefPatchTypeSub-type
cockroach#24808pull requestpatchCommunicationChannel

Description

When we Start the Compactor, it may already have received Suggestions, leaking the previously blocking write to a full channel.

Example execution

G1
------------------------------------------------
...
compactor.ch <-
compactor.Start()
compactor.ch <-
--------------------G1 leaks--------------------

Cockroach/25456

Bug IDRefPatchTypeSub-type
cockroach#25456pull requestpatchCommunicationChannel

Description

When CheckConsistency (in the complete code) returns an error, the queue checks whether the store is draining to decide whether the error is worth logging. This check was incorrect and would block until the store actually started draining.

Example execution

G1
---------------------------------------
...
<-repl.store.Stopper().ShouldQuiesce()
---------------G1 leaks----------------

Cockroach/35073

Bug IDRefPatchTypeSub-type
cockroach#35073pull requestpatchCommunicationChannel

Description

Previously, the outbox could fail during startup without closing its RowChannel. This could lead to goroutine leaks in rare cases due to channel communication mismatch.

Example execution

Cockroach/35931

Bug IDRefPatchTypeSub-type
cockroach#35931pull requestpatchCommunicationChannel

Description

Previously, if a processor that reads from multiple inputs was waiting on one input to provide more data, and the other input was full, and both inputs were connected to inbound streams, it was possible to cause goroutine leaks during flow cancellation when trying to propagate the cancellation metadata messages into the flow. The cancellation method wrote metadata messages to each inbound stream one at a time, so if the first one was full, the canceller would block and never send a cancellation message to the second stream, which was the one actually being read from.

Example execution

Cockroach/3710

Bug IDRefPatchTypeSub-type
cockroach#3710pull requestpatchResourceRWR Deadlock

Description

The goroutine leak is caused by acquiring a RLock twice in a call chain. ForceRaftLogScanAndProcess(acquire s.mu.RLock()) -> MaybeAdd() -> shouldQueue() -> getTruncatableIndexes() ->RaftStatus(acquire s.mu.Rlock())

Example execution

G1                                     	G2
------------------------------------------------------------
store.ForceRaftLogScanAndProcess()
s.mu.RLock()
s.raftLogQueue.MaybeAdd()
bq.impl.shouldQueue()
getTruncatableIndexes()
r.store.RaftStatus()
.                                        store.processRaft()
.                                        s.mu.Lock()
s.mu.RLock()
----------------------G1,G2 leak-----------------------------

Cockroach/584

Bug IDRefPatchTypeSub-type
cockroach#584pull requestpatchResourceDouble Locking

Description

Missing call to mu.Unlock() before the break in the loop.

Example execution

G1
---------------------------
g.bootstrap()
g.mu.Lock() [L1]
if g.closed { ==> break
g.manage()
g.mu.Lock() [L1]
----------G1 leaks---------

Cockroach/6181

Bug IDRefPatchTypeSub-type
cockroach#6181pull requestpatchResourceRWR Deadlock

Description

The same RWMutex may be recursively acquired for both reading and writing.

Example execution

G1                                G2                               G3                     ...
-----------------------------------------------------------------------------------------------
testRangeCacheCoalescedRquests()
initTestDescriptorDB()
pauseLookupResumeAndAssert()
return
.                                 doLookupWithToken()
.                                 .                                doLookupWithToken()
.                                 rc.LookupRangeDescriptor()       .
.                                 .                                rc.LookupRangeDescriptor()
.                                 rdc.rangeCacheMu.RLock()         .
.                                 rdc.String()                     .
.                                 .                                rdc.rangeCacheMu.RLock()
.                                 .                                fmt.Printf()
.                                 .                                rdc.rangeCacheMu.RUnlock()
.                                 .                                rdc.rangeCacheMu.Lock()
.                                 rdc.rangeCacheMu.RLock()         .
-----------------------------------G2,G3,... leak----------------------------------------------

Cockroach/7504

Bug IDRefPatchTypeSub-type
cockroach#7504pull requestpatchResourceAB-BA Deadlock

Description

The locks are acquired as leaseState and tableNameCache in Release(), but as tableNameCache and leaseState in AcquireByName, leading to an AB-BA deadlock.

Example execution

G1                        G2
-----------------------------------------------------
mgr.AcquireByName()       mgr.Release()
m.tableNames.get(id)      .
c.mu.Lock() [L2]          .
.                         t.release(lease)
.                         t.mu.Lock() [L3]
.                         s.mu.Lock() [L1]
lease.mu.Lock() [L1]      .
.                         t.removeLease(s)
.                         t.tableNameCache.remove()
.                         c.mu.Lock() [L2]
---------------------G1, G2 leak---------------------

Cockroach/9935

Bug IDRefPatchTypeSub-type
cockroach#9935pull requestpatchResourceDouble Locking

Description

This bug is caused by acquiring l.mu.Lock() twice.

Example execution

Etcd/10492

Bug IDRefPatchTypeSub-type
etcd#10492pull requestpatchResourceDouble locking

Description

A simple double locking case for lines 19, 31.

Etcd/5509

Bug IDRefPatchTypeSub-type
etcd#5509pull requestpatchResourceDouble locking

Description

r.acquire() returns holding r.client.mu.RLock() on a failure path (line 42). This causes any call to client.Close() to leak goroutines.

Etcd/6708

Bug IDRefPatchTypeSub-type
etcd#6708pull requestpatchResourceDouble locking

Description

Line 54, 49 double locking

Etcd/6857

Bug IDRefPatchTypeSub-type
etcd#6857pull requestpatchCommunicationChannel

Description

Choosing a different case in a select statement (n.stop) will lead to goroutine leaks when sending over n.status.

Example execution

G1            	G2            	G3
-------------------------------------------
n.run()         .               .
.               .               n.Stop()
.               .               n.stop<-
<-n.stop        .               .
.               .               <-n.done
close(n.done)   .               .
return          .               .
.               .               return
.           	n.Status()
.           	n.status<-
----------------G2 leaks-------------------

Etcd/6873

Bug IDRefPatchTypeSub-type
etcd#6873pull requestpatchMixedChannel & Lock

Description

This goroutine leak involves a goroutine acquiring a lock and being blocked over a channel operation with no partner, while another tries to acquire the same lock.

Example execution

G1                      G2                  G3
--------------------------------------------------------------
newWatchBroadcasts()
wbs.update()
wbs.updatec <-
return
.                       <-wbs.updatec       .
.                       wbs.coalesce()      .
.                       .                   wbs.stop()
.                       .                   wbs.mu.Lock()
.                       .                   close(wbs.updatec)
.                       .                   <-wbs.donec
.                       wbs.mu.Lock()       .
---------------------G2,G3 leak--------------------------------

Etcd/7492

Bug IDRefPatchTypeSub-type
etcd#7492pull requestpatchMixedChannel & Lock

Description

This goroutine leak involves a goroutine acquiring a lock and being blocked over a channel operation with no partner, while another tries to acquire the same lock.

Example execution

G2                                    G1
---------------------------------------------------------------
.                                     stk.run()
ts.assignSimpleTokenToUser()          .
t.simpleTokensMu.Lock()               .
t.simpleTokenKeeper.addSimpleToken()  .
tm.addSimpleTokenCh <- true           .
.                                     <-tm.addSimpleTokenCh
t.simpleTokensMu.Unlock()             .
ts.assignSimpleTokenToUser()          .
...
t.simpleTokensMu.Lock()
.                                     <-tokenTicker.C
tm.addSimpleTokenCh <- true           .
.                                     tm.deleteTokenFunc()
.                                     t.simpleTokensMu.Lock()
---------------------------G1,G2 leak--------------------------

Etcd/7902

Bug IDRefPatchTypeSub-type
etcd#7902pull requestpatchMixedChannel & Lock

Description

If the follower gooroutine acquires mu.Lock() first and calls rc.release(), it will be blocked sending over rcNextc. Only the leader can close(nextc) to unblock the follower. However, in order to invoke rc.release(), the leader needs to acquires mu.Lock(). The fix is to remove the lock and unlock around rc.release().

Example execution

G1                      G2 (leader)              G3 (follower)
---------------------------------------------------------------------
runElectionFunc()
doRounds()
wg.Wait()
.                       ...
.                       mu.Lock()
.                       rc.validate()
.                       rcNextc = nextc
.                       mu.Unlock()              ...
.                       .                        mu.Lock()
.                       .                        rc.validate()
.                       .                        mu.Unlock()
.                       .                        mu.Lock()
.                       .                        rc.release()
.                       .                        <-rcNextc
.                       mu.Lock()
-------------------------G1,G2,G3 leak--------------------------

Grpc/1275

Bug IDRefPatchTypeSub-type
grpc#1275pull requestpatchCommunicationChannel

Description

Two goroutines are involved in this leak. The main goroutine is blocked at case <- donec, and is waiting for the second goroutine to close the channel. The second goroutine is created by the main goroutine. It is blocked when calling stream.Read(), which invokes recvBufferRead.Read(). The second goroutine is blocked at case i := r.recv.get(), and it is waiting for someone to send a message to this channel. It is the client.CloseSream() method called by the main goroutine that should send the message, but it is not. The patch is to send out this message.

Example execution

G1                            G2
-----------------------------------------------------
testInflightStreamClosing()
.                             stream.Read()
.                             io.ReadFull()
.                             <-r.recv.get()
CloseStream()
<-donec
---------------------G1, G2 leak---------------------

Grpc/1424

Bug IDRefPatchTypeSub-type
grpc#1424pull requestpatchCommunicationChannel

Description

The goroutine running cc.lbWatcher returns without draining the done channel.

Example execution

G1                      G2                          G3
-----------------------------------------------------------------
DialContext()           .                           .
.                       cc.dopts.balancer.Notify()  .
.                       .                           cc.lbWatcher()
.                       <-doneChan
close()
---------------------------G2 leaks-------------------------------

Grpc/1460

Bug IDRefPatchTypeSub-type
grpc#1460pull requestpatchMixedChannel & Lock

Description

When gRPC keepalives are enabled (which isn't the case by default at this time) and PermitWithoutStream is false (the default), the client can leak goroutines when transitioning between having no active stream and having one active stream.The keepalive() goroutine is stuck at “<-t.awakenKeepalive”, while the main goroutine is stuck in NewStream() on t.mu.Lock().

Example execution

G1                         G2
--------------------------------------------
client.keepalive()
.                       client.NewStream()
t.mu.Lock()
<-t.awakenKeepalive
.                       t.mu.Lock()
---------------G1,G2 leak-------------------

Grpc/3017

Bug IDRefPatchTypeSub-type
grpc#3017pull requestpatchResourceMissing unlock

Description

Line 65 is an execution path with a missing unlock.

Example execution

G1                                  G2                                         G3
------------------------------------------------------------------------------------------------
NewSubConn([1])
ccc.mu.Lock() [L1]
sc = 1
ccc.subConnToAddr[1] = 1
go func() [G2]
<-done								.
.                                   ccc.RemoveSubConn(1)
.                                   ccc.mu.Lock()
.                                   addr = 1
.                                   entry = &subConnCacheEntry_grpc3017{}
.                                   cc.subConnCache[1] = entry
.                                   timer = time.AfterFunc() [G3]
.                                   entry.cancel = func()
.                                   sc = ccc.NewSubConn([1])
.                                   ccc.mu.Lock() [L1]
.                                   entry.cancel()
.                                   !timer.Stop() [true]
.                                   entry.abortDeleting = true
.                                   .                                          ccc.mu.Lock()
.                                   .                                          <<<done>>>
.                                   ccc.RemoveSubConn(1)
.                                   ccc.mu.Lock() [L1]
-------------------------------------------G1, G2 leak-----------------------------------------

Grpc/660

Bug IDRefPatchTypeSub-type
grpc#660pull requestpatchCommunicationChannel

Description

The parent function could return without draining the done channel.

Example execution

G1                         G2               helper goroutine
-------------------------------------------------------------
doCloseLoopUnary()
.                                   	bc.stop <- true
<-bc.stop
return
.                       done <-
----------------------G2 leak--------------------------------

Grpc/795

Bug IDRefPatchTypeSub-type
grpc#795pull requestpatchResourceDouble locking

Description

Line 20 is an execution path with a missing unlock.

Grpc/862

Bug IDRefPatchTypeSub-type
grpc#862pull requestpatchCommunicationChannel & Context

Description

When return value conn is nil, cc(ClientConn) is not closed. The goroutine executing resetAddrConn is leaked. The patch is to close ClientConn in defer func().

Example execution

G1                  G2
---------------------------------------
DialContext()
.                   cc.resetAddrConn()
.                   resetTransport()
.                   <-ac.ctx.Done()
--------------G2 leak------------------

Hugo/3251

Bug IDRefPatchTypeSub-type
hugo#3251pull requestpatchResourceRWR deadlock

Description

A goroutine can hold Lock() at line 20 then acquire RLock() at line 29. RLock() at line 29 will never be acquired because Lock() at line 20 will never be released.

Example execution

G1                        G2                             G3
------------------------------------------------------------------------------------------
wg.Add(1) [W1: 1]
go func() [G2]
go func() [G3]
.                        resGetRemote()
.                        remoteURLLock.URLLock(url)
.                        l.Lock() [L1]
.                        l.m[url] = &sync.Mutex{} [L2]
.                        l.m[url].Lock() [L2]
.                        l.Unlock() [L1]
.                        .                        	     resGetRemote()
.                        .                        	     remoteURLLock.URLLock(url)
.                        .                        	     l.Lock() [L1]
.                        .                        	     l.m[url].Lock() [L2]
.                        remoteURLLock.URLUnlock(url)
.                        l.RLock() [L1]
...
wg.Wait() [W1]
----------------------------------------G1,G2,G3 leak--------------------------------------

Hugo/5379

Bug IDRefPatchTypeSub-type
hugo#5379pull requestpatchResourceDouble locking

Description

A goroutine first acquire contentInitMu at line 99 then acquire the same Mutex at line 66

Istio/16224

Bug IDRefPatchTypeSub-type
istio#16224pull requestpatchMixedChannel & Lock

Description

A goroutine holds a Mutex at line 91 and is then blocked at line 93. Another goroutine attempts to acquire the same Mutex at line 101 to further drains the same channel at 103.

Istio/17860

Bug IDRefPatchTypeSub-type
istio#17860pull requestpatchCommunicationChannel

Description

a.statusCh can't be drained at line 70.

Istio/18454

Bug IDRefPatchTypeSub-type
istio#18454pull requestpatchCommunicationChannel & Context

Description

s.timer.Stop() at line 56 and 61 can be called concurrency (i.e. from their entry point at line 104 and line 66). See Timer.

Kubernetes/10182

Bug IDRefPatchTypeSub-type
kubernetes#10182pull requestpatchMixedChannel & Lock

Description

Goroutine 1 is blocked on a lock held by goroutine 3, while goroutine 3 is blocked on sending message to ch, which is read by goroutine 1.

Example execution

G1                         G2                        	G3
-------------------------------------------------------------------------------
s.Start()
s.syncBatch()
.                          s.SetPodStatus()
.                          s.podStatusesLock.Lock()
<-s.podStatusChannel <===> s.podStatusChannel <- true
.                          s.podStatusesLock.Unlock()
.                          return
s.DeletePodStatus()        .
.                          .                    	    s.podStatusesLock.Lock()
.                          .                    	    s.podStatusChannel <- true
s.podStatusesLock.Lock()
-----------------------------G1,G3 leak-----------------------------------------

Kubernetes/11298

Bug IDRefPatchTypeSub-type
kubernetes#11298pull requestpatchCommunicationChannel & Condition Variable

Description

n.node used the n.lock as underlaying locker. The service loop initially locked it, the Notify function tried to lock it before calling n.node.Signal(), leading to a goroutine leak. n.cond.Signal() at line 59 and line 81 are not guaranteed to unblock the n.cond.Wait at line 56.

Kubernetes/13135

Bug IDRefPatchTypeSub-type
kubernetes#13135pull requestpatchResourceAB-BA deadlock

Description

G1                              G2                              G3
----------------------------------------------------------------------------------
NewCacher()
watchCache.SetOnReplace()
watchCache.SetOnEvent()
.                               cacher.startCaching()
.                               c.Lock()
.                               c.reflector.ListAndWatch()
.                               r.syncWith()
.                               r.store.Replace()
.                               w.Lock()
.                               w.onReplace()
.                               cacher.initOnce.Do()
.                               cacher.Unlock()
return cacher                   .
.                               .                           	c.watchCache.Add()
.                               .                           	w.processEvent()
.                               .                           	w.Lock()
.                               cacher.startCaching()           .
.                               c.Lock()                        .
...
.                                                           	c.Lock()
.                               w.Lock()
--------------------------------G2,G3 leak-----------------------------------------

Kubernetes/1321

Bug IDRefPatchTypeSub-type
kubernetes#1321pull requestpatchMixedChannel & Lock

Description

This is a lock-channel bug. The first goroutine invokes distribute(), which holds m.lock.Lock(), while blocking at sending message to w.result. The second goroutine invokes stopWatching() function, which can unblock the first goroutine by closing w.result. However, in order to close w.result, stopWatching() function needs to acquire m.lock.Lock().

The fix is to introduce another channel and put receive message from the second channel in the same select statement as the w.result. Close the second channel can unblock the first goroutine, while no need to hold m.lock.Lock().

Example execution

G1                          G2
----------------------------------------------
testMuxWatcherClose()
NewMux()
.                           m.loop()
.                           m.distribute()
.                           m.lock.Lock()
.                           w.result <- true
w := m.Watch()
w.Stop()
mw.m.stopWatching()
m.lock.Lock()
---------------G1,G2 leak---------------------

Kubernetes/25331

Bug IDRefPatchTypeSub-type
kubernetes#25331pull requestpatchCommunicationChannel & Context

Description

A potential goroutine leak occurs when an error has happened, blocking resultChan, while cancelling context in Stop().

Example execution

G1                    G2
------------------------------------
wc.run()
.                   wc.Stop()
.                   wc.errChan <-
.                   wc.cancel()
<-wc.errChan
wc.cancel()
wc.resultChan <-
-------------G1 leak----------------

Kubernetes/26980

Bug IDRefPatchTypeSub-type
kubernetes#26980pull requestpatchMixedChannel & Lock

Description

A goroutine holds a Mutex at line 24 and blocked at line 35. Another goroutine blocked at line 58 by acquiring the same Mutex.

Kubernetes/30872

Bug IDRefPatchTypeSub-type
kubernetes#30872pull requestpatchResourceAB-BA deadlock

Description

The lock is acquired both at lines 92 and 157.

Kubernetes/38669

Bug IDRefPatchTypeSub-type
kubernetes#38669pull requestpatchCommunicationChannel

Description

No sender for line 33.

Kubernetes/5316

Bug IDRefPatchTypeSub-type
kubernetes#5316pull requestpatchCommunicationChannel

Description

If the main goroutine selects a case that doesn’t consumes the channels, the anonymous goroutine will be blocked on sending to channel.

Example execution

G1                      G2
--------------------------------------
finishRequest()
.                       fn()
time.After()
.                       errCh<-/ch<-
--------------G2 leaks----------------

Kubernetes/58107

Bug IDRefPatchTypeSub-type
kubernetes#58107pull requestpatchResourceRWR deadlock

Description

The rules for read and write lock: allows concurrent read lock; write lock has higher priority than read lock.

There are two queues (queue 1 and queue 2) involved in this bug, and the two queues are protected by the same read-write lock (rq.workerLock.RLock()). Before getting an element from queue 1 or queue 2, rq.workerLock.RLock() is acquired. If the queue is empty, cond.Wait() will be invoked. There is another goroutine (goroutine D), which will periodically invoke rq.workerLock.Lock(). Under the following situation, deadlock will happen. Queue 1 is empty, so that some goroutines hold rq.workerLock.RLock(), and block at cond.Wait(). Goroutine D is blocked when acquiring rq.workerLock.Lock(). Some goroutines try to process jobs in queue 2, but they are blocked when acquiring rq.workerLock.RLock(), since write lock has a higher priority.

The fix is to not acquire rq.workerLock.RLock(), while pulling data from any queue. Therefore, when a goroutine is blocked at cond.Wait(), rq.workLock.RLock() is not held.

Example execution

G3                      G4                      G5
--------------------------------------------------------------------
.                       .                       Sync()
rq.workerLock.RLock()   .                       .
q.cond.Wait()           .                       .
.                       .                       rq.workerLock.Lock()
.                       rq.workerLock.RLock()
.                       q.cond.L.Lock()
-----------------------------G3,G4,G5 leak-----------------------------

Kubernetes/62464

Bug IDRefPatchTypeSub-type
kubernetes#62464pull requestpatchResourceRWR deadlock

Description

This is another example for recursive read lock bug. It has been noticed by the go developers that RLock should not be recursively used in the same thread.

Example execution

G1                                  G2
--------------------------------------------------------
m.reconcileState()
m.state.GetCPUSetOrDefault()
s.RLock()
s.GetCPUSet()
.                                   p.RemoveContainer()
.                                   s.GetDefaultCPUSet()
.                                   s.SetDefaultCPUSet()
.                                   s.Lock()
s.RLock()
---------------------G1,G2 leak--------------------------

Kubernetes/6632

Bug IDRefPatchTypeSub-type
kubernetes#6632pull requestpatchMixedChannel & Lock

Description

When resetChan is full, WriteFrame holds the lock and blocks on the channel. Then monitor() fails to close the resetChan because the lock is already held by WriteFrame.

Example execution

G1                      G2                  helper goroutine
----------------------------------------------------------------
i.monitor()
<-i.conn.closeChan
.                       i.WriteFrame()
.                       i.writeLock.Lock()
.                       i.resetChan <-
.                       .                   i.conn.closeChan<-
i.writeLock.Lock()
----------------------G1,G2 leak--------------------------------

Kubernetes/70277

Bug IDRefPatchTypeSub-type
kubernetes#70277pull requestpatchCommunicationChannel

Description

wait.poller() returns a function with type WaitFunc. the function creates a goroutine and the goroutine only quits when after or done closed.

The doneCh defined at line 70 is never closed.

Moby/17176

Bug IDRefPatchTypeSub-type
moby#17176pull requestpatchResourceDouble locking

Description

devices.nrDeletedDevices takes devices.Lock() but does not release it (line 36) if there are no deleted devices. This will block other goroutines trying to acquire devices.Lock().

Moby/21233

Bug IDRefPatchTypeSub-type
moby#21233pull requestpatchCommunicationChannel

Description

This test was checking that it received every progress update that was produced. But delivery of these intermediate progress updates is not guaranteed. A new update can overwrite the previous one if the previous one hasn't been sent to the channel yet.

The call to t.Fatalf terminated the current goroutine which was consuming the channel, which caused a deadlock and eventual test timeout rather than a proper failure message.

Example execution

G1                      G2                  G3
----------------------------------------------------------
testTransfer()          .                   .
tm.Transfer()           .                   .
t.Watch()               .                   .
.                       WriteProgress()     .
.                       ProgressChan<-      .
.                       .                   <-progressChan
.                       ...                 ...
.                       return              .
.                                           <-progressChan
<-watcher.running
----------------------G1,G3 leak--------------------------

Moby/25384

Bug IDRefPatchTypeSub-type
moby#25384pull requestpatchMixedMisuse WaitGroup

Description

When n=1 (where n is len(pm.plugins)), the location of group.Wait() doesn’t matter. When n > 1, group.Wait() is invoked in each iteration. Whenever group.Wait() is invoked, it waits for group.Done() to be executed n times. However, group.Done() is only executed once in one iteration.

Misuse of sync.WaitGroup

Moby/27782

Bug IDRefPatchTypeSub-type
moby#27782pull requestpatchCommunicationChannel & Condition Variable

Description

Example execution

G1                      G2                         G3
-----------------------------------------------------------------------
InitializeStdio()
startLogging()
l.ReadLogs()
NewLogWatcher()
.                       l.readLogs()
container.Reset()       .
LogDriver.Close()       .
r.Close()               .
close(w.closeNotifier)  .
.                       followLogs(logWatcher)
.                       watchFile()
.                       New()
.                       NewEventWatcher()
.                       NewWatcher()
.                       .                       	w.readEvents()
.                       .                       	event.ignoreLinux()
.                       .                       	return false
.                       <-logWatcher.WatchClose()   .
.                       fileWatcher.Remove()        .
.                       w.cv.Wait()                 .
.                       .                       	w.Events <- event
------------------------------G2,G3 leak-------------------------------

Moby/28462

Bug IDRefPatchTypeSub-type
moby#28462pull requestpatchMixedChannel & Lock

Description

One goroutine may acquire a lock and try to send a message over channel stop, while the other will try to acquire the same lock. With the wrong ordering, both goroutines will leak.

Example execution

G1                        	G2
--------------------------------------------------------------
monitor()
handleProbeResult()
.                       	d.StateChanged()
.                       	c.Lock()
.                       	d.updateHealthMonitorElseBranch()
.                       	h.CloseMonitorChannel()
.                       	s.stop <- struct{}{}
c.Lock()
----------------------G1,G2 leak------------------------------

Moby/30408

Bug IDRefPatchTypeSub-type
moby#30408pull requestpatchCommunicationCondition Variable

Description

Wait() at line 22 has no corresponding Signal() or Broadcast().

Example execution

G1                 G2
------------------------------------------
testActive()
.                  p.waitActive()
.                  p.activateWait.L.Lock()
.                  p.activateWait.Wait()
<-done
-----------------G1,G2 leak---------------

Moby/33781

Bug IDRefPatchTypeSub-type
moby#33781pull requestpatchCommunicationChannel & Context

Description

The goroutine created using an anonymous function is blocked sending a message over an unbuffered channel. However there exists a path in the parent goroutine where the parent function will return without draining the channel.

Example execution

G1             	G2             G3
----------------------------------------
monitor()       .
<-time.After()  .
.           	.
<-stop          stop<-
.
cancelProbe()
return
.                               result<-
----------------G3 leak------------------

Moby/36114

Bug IDRefPatchTypeSub-type
moby#36114pull requestpatchResourceDouble locking

Description

The the lock for the struct svm has already been locked when calling svm.hotRemoveVHDsAtStart().

Moby/4951

Bug IDRefPatchTypeSub-type
moby#4951pull requestpatchResourceAB-BA deadlock

Description

The root cause and patch is clearly explained in the commit description. The global lock is devices.Lock(), and the device lock is baseInfo.lock.Lock(). It is very likely that this bug can be reproduced.

Moby/7559

Bug IDRefPatchTypeSub-type
moby#7559pull requestpatchResourceDouble locking

Description

Line 25 is missing a call to .Unlock.

Example execution

G1
---------------------------
proxy.connTrackLock.Lock()
if err != nil { continue }
proxy.connTrackLock.Lock()
-----------G1 leaks--------

Serving/2137

Bug IDRefPatchTypeSub-type
serving#2137pull requestpatchMixedChannel & Lock

patch:https://github.com/ knative/serving/pull/2137/files pull request:https://github.com/ knative/serving/pull/2137

Description

Example execution

G1                           	G2                      	G3
----------------------------------------------------------------------------------
b.concurrentRequests(2)         .                           .
b.concurrentRequest()           .                           .
r.lock.Lock()                   .                           .
.                               start.Done()                .
start.Wait()                    .                           .
b.concurrentRequest()           .                           .
r.lock.Lock()                   .                           .
.                               .                           start.Done()
start.Wait()                    .                           .
unlockAll(locks)                .                           .
unlock(lc)                      .                           .
req.lock.Unlock()               .                           .
ok := <-req.accepted            .                           .
.                            	b.Maybe()                   .
.                            	b.activeRequests <- t       .
.                            	thunk()                     .
.                            	r.lock.Lock()               .
.                            	.                           b.Maybe()
.                            	.                           b.activeRequests <- t
----------------------------G1,G2,G3 leak-----------------------------------------

Syncthing/4829

Bug IDRefPatchTypeSub-type
syncthing#4829pull requestpatchResourceDouble locking

Description

Double locking at line 17 and line 30.

Example execution

G1
---------------------------
mapping.clearAddresses()
m.mut.Lock() [L2]
m.notify(...)
m.mut.RLock() [L2]
----------G1 leaks---------

Syncthing/5795

Bug IDRefPatchTypeSub-type
syncthing#5795pull requestpatchCommunicationChannel

Description

<-c.dispatcherLoopStopped at line 82 is blocking forever because dispatcherLoop() is blocking at line 72.

Example execution

G1                            G2
--------------------------------------------------------------
c.Start()
go c.dispatcherLoop() [G3]
.                             select [<-c.inbox, <-c.closed]
c.inbox <- <================> [<-c.inbox]
<-c.dispatcherLoopStopped     .
.                             default
.                             c.ccFn()/c.Close()
.                             close(c.closed)
.                             <-c.dispatcherLoopStopped
---------------------G1,G2 leak-------------------------------