Author(s): Georgian-Vlad Saioc (vsaioc@uber.com), Milind Chabbi (milind@uber.com)
Last updated: 14 Aug 2025
Discussion at issue #74609.
This proposal outlines a dynamic technique for detecting goroutine leaks within Go programs. It leverages the existing marking phase of the Go garbage collector (GC) to find goroutines blocked over concurrency primitives that are not reachable in memory from goroutines that may still be runnable.
Due to its concurrency features (lightweight goroutines, message passing), Go is particularly susceptible to concurrency bugs known as goroutine leaks (also known as partial deadlocks in literature 1). Unlike global deadlocks (wherein all goroutines are blocked) that halt an entire application, goroutine leaks occur whenever a goroutine is blocked indefinitely, e.g., by reading from a channel that no other goroutine has access to, but other running goroutines keep the program operational. This issue can lead to (a) severe memory leaks, and (b) performance penalties, by over-burdening the GC with the task to mark useless memory. Goroutine leaks may be notoriously difficult to debug; in some cases even their presence alone is difficult to discern, even with otherwise thorough diagnostic information, e.g., memory and goroutine profiles. This makes tooling capable of detecting their presence valuable to the Go ecosystem.
The change involves several modifications to key points during phases of the GC cycle, as follows:
For an additional in-depth description of the theoretical underpinnings, refer here.
The proposal expands the developer toolset when it comes to identifying goroutine leaks, especially in long-running systems with complex non-deterministic behavior. The advantage of this approach over other goroutine leak detection techniques is that it can be leveraged, with a minimal performance cost, in regular Go systems, e.g., production services. It is also theoretically sound, i.e., there are no false positives. Its primary limitation is that its effectiveness is reduced the more heap resources are over-exposed in memory, i.e., pair-wise reachable.
The feature is backwards-compatible with any Go program. Changes are strictly internal, and any extensions are only accessible on an opt-in basis via additional APIs, in this case by adding a new profile type.
A working prototype is available at go.dev/cl/688335.
In this section we discuss various aspects of the implementation.
Goroutine leak detection behaviour is triggered on-demand via profiling. An additional profile type, "goroutineleak"
, is now available. Attempting to extract it will perform the following:
debug < 2
; alternatively, get a full stack dump of all goroutines, if debug >=2
.Otherwise, the GC preserves regular behavior, with a few exceptions described in the remainder of this section.
In order to avoid most performance penalties, the proposal is currently only enabled via the experimental flag goleakprofiler
.
It is essential for the approach that certain pointers are only conditionally traced by the GC. In the current implementation, this is achieved via maybe-traceable pointers, expressed as type maybeTraceablePtr
in the runtime.
A maybe-traceable pointer value is a pair between a unsafe.Pointer
and uintptr
value, stored at fields .vp
and .vu
, respectively, within the maybeTraceablePtr
type. A maybe-traceable pointer has one of three states:
.vp
and .vu
are zero values. This is homologous to nil
..vp
and .vu
are set, where both point to the same address..vu
is set to the address that is referenced, but .vp
is set to nil
, such that the GC does not automatically trace it when scanning the object embedding the maybe-traceable pointer.Maybe-traceable pointers are then provided with a set of methods for setting and unsetting them, that guarantee certain invariants at runtime, e.g., that if .vp
and .vu
are set, they point to the same address.
The use of maybe-traceable pointers is only required for *sudog
objects, specifically for the .elem
and .hchan
fields. This prevents the GC from inadvertendly marking channels that have not yet been deemed reachable in memory via eventually runnable goroutines. This may occur because *sudog
objects are globally reachable: via the list of goroutine objects (*g
) at allgs
, and via the treap forest of semaphore-related *sudog
s at semtable
.
All uses of these fields have been updated with the methods provided by the maybeTraceablePtr
type. When a goroutine leak detection GC cycle starts, it sets all maybe-traceable pointers in *sudog
objects as untraceable. Once the cycle concludes, it resets all the pointers to being traceable.
In the current implementation of the GC, there is a check for whether marking phase must be restarted due to go.dev/issue/27993. We extend that checkpoint with additional logic: (1) to find additional eventually-runnable goroutines, or (2) to mark goroutines as leaked, both of which provide another reason to restart the marking phase. Even if #27993 is resolved, the checkpoint must be preserved for goroutine leak detection.