blob: 3edc9adf206dec654740cffdf2aebdd0f534654f [file] [log] [blame] [view]
# Proposal: Soft memory limit
Author: Michael Knyszek
Date: 15 September 2021
## Summary
This document proposes a new option for tuning the behavior of the Go garbage
collector by setting a soft memory limit on the total amount of memory that Go
uses.
This option comes in two flavors: a new `runtime/debug` function called
`SetMemoryLimit` and a `GOMEMLIMIT` environment variable.
This new option gives applications better control over their resource economy.
It empowers users to:
* Better utilize the memory that they already have,
* Confidently decrease their memory limits, knowing Go will respect them,
* Avoid unsupported forms of garbage collection tuning like heap ballasts.
## Motivation
Go famously offers a single option for tuning the Go garbage collector: `GOGC`
(or `runtime/debug.SetGCPercent`).
This option offers a direct tradeoff between CPU and memory: it directly
dictates the amount of memory overhead available to the garbage collector.
Less memory overhead means more frequent garbage collection cycles, and more CPU
spent on GC.
More memory overhead means less frequent cycles and more CPU for the
application.
This option has carried the Go project well for a long time.
However, years of experience has produced a meaningful amount of evidence
suggesting that `GOGC` is not enough.
A direct tradeoff isn't always possible, because memory is not fungible relative
to CPU resources.
Consider an application that runs as a service or a daemon.
In these cases, out-of-memory errors often arise due to transient spikes in
applications' heap sizes.
Today, Go does not respect users' memory limits.
As a result, the Go community has developed various patterns for dealing with
out-of-memory errors.
In scenarios where out-of-memory errors are acceptable (to varying degrees), Go
users choose to live with these errors, restarting their service regularly,
instead of reducing `GOGC`.
Reducing `GOGC` directly impacts users' productivity metrics, metrics whose
behavior is largely governed by steady-state behavior, not transients, so this
course of action is undesirable.
In scenarios where out-of-memory errors are unacceptable, a similar situation
occurs.
Users are unwilling to increase GOGC to achieve productivity improvements, even
though in the steady-state they may actually have that memory available to them.
They pick an overly conservative value to reduce the chance of an out-of-memory
condition in transient states, but this choice leaves productivity on the table.
This out-of-memory avoidance led to the Go community developing its own
homegrown garbage collector tuning.
The first example of such tuning is the heap ballast.
In order to increase their productivity metrics while also avoiding
out-of-memory errors, users sometimes pick a low `GOGC` value, and fool the GC
into thinking that there's a fixed amount of memory live.
This solution elegantly scales with `GOGC`: as the real live heap increases, the
impact of that fixed set decreases, and `GOGC`'s contribution to the heap size
dominates.
In effect, `GOGC` is larger when the heap is smaller, and smaller (down to its
actual value) when the heap is larger.
Unfortunately, this solution is not portable across platforms, and is not
guaranteed to continue to work as the Go runtime changes.
Furthermore, users are forced to do complex calculations and estimate runtime
overheads in order to pick a heap ballast size that aligns with their memory
limits.
The second example of such tuning is calling `runtime/debug.FreeOSMemory` at
some regular interval, forcing a garbage collection to trigger sooner, usually
to respect some memory limit.
This case is much more dangerous, because calling it too frequently can lead a
process to entirely freeze up, spending all its time on garbage collection.
Working with it takes careful consideration and experimentation to be both
effective and avoid serious repercussions.
Both of these situations, dealing with out-of-memory errors and homegrown
garbage collection tuning, have a straightforward solution that other platforms
(like Java and TCMalloc) already provide its users: a configurable memory limit,
enforced by the Go runtime.
A memory limit would give the Go runtime the information it needs to both
respect users' memory limits, and allow them to optionally use that memory
always, to cut back the cost of garbage collection.
## Non-goals
1. Accounting for and react to memory outside the Go runtime, such as:
* Co-tenant processes in a container,
* C/C++ memory, and
* Kernel memory counted against the process.
Dealing with and reacting to memory used by other processes, and even to memory
within the process governed by the semantics of a completely different
programming language, is an incredibly hard problem and often requires a
coordinated effort.
It's outside of the scope of the Go runtime to solve this problem for everyone,
but I believe the Go runtime has an important role to play in supporting these
worthwhile efforts.
1. Eliminate out-of-memory errors in 100% of cases.
Whatever policy this API adheres to is going to fail for some use-case, and
that's OK.
The policy can be tweaked and improved upon as time goes on, but it's impossible
for us to create a solution that is all things to all people without a
tremendous amount of toil for our team (by e.g. exposing lots of tuning knobs).
On top of this, any such solution is likely to become difficult to use at all.
The best we can do is make life better for as many users as possible.
## Detailed design
The design of a memory soft limit consists of four parts: an API, mechanisms to
enforce the soft limit, and guidance through thorough documentation, and
telemetry for identifying issues in production.
### API
```go
package runtime/debug
// SetMemoryLimit provides the runtime with a soft memory limit.
//
// The runtime undertakes several processes to try to respect this
// memory limit, including adjustments to the frequency of garbage
// collections and returning memory to the underlying system more
// aggressively. This limit will be respected even if GOGC=off (or,
// if SetGCPercent(-1) is executed).
//
// The input limit is provided as bytes, and is intended to include
// all memory that the Go runtime has direct control over. In other
// words, runtime.MemStats.Sys - runtime.MemStats.HeapReleased.
//
// This limit does not account for memory external to Go, such as
// memory managed by the underlying system on behalf of the process,
// or memory managed by non-Go code inside the same process.
//
// A zero limit or a limit that's lower than the amount of memory
// used by the Go runtime may cause the garbage collector to run
// nearly continuously. However, the application may still make
// progress.
//
// See https://golang.org/doc/gc-ergonomics for a detailed guide
// explaining the soft memory limit as well as a variety of common
// use-cases and scenarios.
//
// SetMemoryLimit returns the previously set memory limit.
// By default, the limit is math.MaxInt64.
// A negative input does not adjust the limit, and allows for
// retrieval of the currently set memory limit.
func SetMemoryLimit(limit int64) int64
```
Note that the soft limit is expressed in terms of the total amount of memory
used by the Go runtime.
This choice means that enforcement of the soft memory limit by the GC must
account for additional memory use such as heap metadata and fragmentation.
It also means that the runtime is responsible for any idle heap memory above the
limit, i.e. any memory that is currently unused by the Go runtime, but has not
been returned to the operating system.
As a result, the Go runtime's memory scavenger must also participate in
enforcement.
This choice is a departure from similar APIs in other languages (including the
experimental `SetMaxHeap` [patch](https://golang.org/cl/46751)), whose limits
only include space occupied by heap objects themselves.
To reduce confusion and help facilitate understanding, each class of memory that
is accounted for will be precisely listed in the documentation.
In addition, the soft memory limit can be set directly via an environment
variable that all Go programs recognize: `GOMEMLIMIT`.
For ease-of-use, I propose `GOMEMLIMIT` accept either an integer value in bytes,
or a string such as "8GiB."
More specifically, an integer followed by one of several recognized unit
strings, without spaces.
I propose supporting "B," "KiB," "MiB," "GiB," and "TiB" indicating the
power-of-two versions of each.
Similarly, I propose supporting "KB," "MB," "GB," and "TB," which refer to their
power-of-ten counterparts.
### Enforcement
#### Garbage collection
In order to ensure the runtime maintains the soft memory limit, it needs to
trigger at a point such that the total heap memory used does not exceed the soft
limit.
Because the Go garbage collector's memory use is defined entirely in terms of
the heap goal, altering its definition is sufficient to ensure that a memory
limit is enforced.
However, the heap goal is defined in terms of object bytes, while the memory
limit includes a much broader variety of memory classes, necessitating a
conversion function between the two.
To compute the heap limit ![`\hat{L}`](48409/inl1.png) from the soft memory
limit ![`L`](48409/inl2.png), I propose the following calculation:
![Equation 1](48409/eqn1.png)
![`T`](48409/inl3.png) is the total amount of memory mapped by the Go runtime.
![`F`](48409/inl4.png) is the amount of free and unscavenged memory the Go
runtime is holding.
![`A`](48409/inl5.png) is the number of bytes in allocated heap objects at the
time ![`\hat{L}`](48409/inl1.png) is computed.
The second term, ![`(T - F - A)`](48409/inl6.png), represents the sum of
non-heap overheads.
Free and unscavenged memory is specifically excluded because this is memory that
the runtime might use in the near future, and the scavenger is specifically
instructed to leave the memory up to the heap goal unscavenged.
Failing to exclude free and unscavenged memory could lead to a very poor
accounting of non-heap overheads.
With ![`\hat{L}`](48409/inl1.png) fully defined, our heap goal for cycle
![`n`](48409/inl7.png) (![`N_n`](48409/inl8.png)) is a straightforward extension
of the existing one.
Where
* ![`M_n`](48409/inl9.png) is equal to bytes marked at the end of GC n's mark
phase
* ![`S_n`](48409/inl10.png) is equal to stack bytes at the beginning of GC n's
mark phase
* ![`G_n`](48409/inl11.png) is equal to bytes of globals at the beginning of GC
n's mark phase
* ![`\gamma`](48409/inl12.png) is equal to
![`1+\frac{GOGC}{100}`](48409/inl13.png)
then
![Equation 2](48409/eqn2.png)
Over the course of a GC cycle, non-heap overheads remain stable because the
mostly increase monotonically.
However, the GC needs to be responsive to any change in non-heap overheads.
Therefore, I propose a more heavy-weight recomputation of the heap goal every
time its needed, as opposed to computing it only once per cycle.
This also means the GC trigger point needs to be dynamically recomputable.
This check will create additional overheads, but they're likely to be low, as
the GC's internal statistics are updated only on slow paths.
The nice thing about this definition of ![`\hat{L}`](48409/inl1.png) is that
it's fairly robust to changes to the Go GC, since total mapped memory, free and
unscavenged memory, and bytes allocated in objects, are fairly fundamental
properties (especially to any tracing GC design).
#### Death spirals
As the live heap grows toward ![`\hat{L}`](48409/inl1.png), the Go garbage
collector is going to stray from the tradeoff defined by `GOGC`, and will
trigger more and more often to reach its goal.
Left unchecked, the Go garbage collector will eventually run continuously, and
increase its utilization as its runway disappears.
Eventually, the application will fail to make progress.
This process is referred to as a death spiral.
One way to deal with this situation is to place a limit on the amount of total
CPU utilization of the garbage collector.
If the garbage collector were to execute and exceed that limit at any point, it
will instead let the application proceed, even if that means missing its goal
and breaching the memory limit.
I propose we do exactly this, but rather than provide another knob for
determining the maximum fraction of CPU time, I believe that we should simply
pick a reasonable default based on `GOGC`.
I propose that we pick 50% as the default fraction.
This fraction is reasonable and conservative since most applications won't come
close to this threshold in normal execution.
To implement this policy, I propose a leaky bucket mechanism inspired by a tool
called `jvmquake` developed by [Netflix for killing Java service
instances](https://netflixtechblog.medium.com/introducing-jvmquake-ec944c60ba70)
that could fall into a death spiral.
To summarize, the mechanism consists of a conceptual bucket with a capacity, and
that bucket accumulates GC CPU time.
At the same time, the bucket is drained by mutator CPU time.
Should the ratio of GC CPU time to mutator CPU time exceed 1:1 for some time
(determined by the bucket's capacity) then the bucket's contents will tend
toward infinity.
At the point in which the bucket's contents exceed its capacity, `jvmquake`
would kill the target service instance.
In this case instead of killing the process, the garbage collector will
deliberately prevent user goroutines from assisting the garbage collector in
order to prevent the bucket from overflowing.
The purpose of the bucket is to allow brief spikes in GC CPU utilization.
Otherwise, anomalous situations could cause unnecessary missed assists that make
GC assist pacing less smooth.
A reasonable bucket capacity will have to be chosen empirically, as it should be
large enough to accommodate worst-case pause times but not too large such that a
100% GC CPU utilization spike could cause the program to become unresponsive for
more than about a second.
1 CPU-second per `GOMAXPROCS` seems like a reasonable place to start.
Unfortunately, 50% is not always a reasonable choice for small values of `GOGC`.
Consider an application running with `GOGC=10`: an overall 50% GC CPU
utilization limit for `GOGC=10` is likely going to be always active, leading to
significant overshoot.
This high utilization is due to the fact that the Go GC at `GOGC=10` will reach
the point at which it may no longer start a GC much sooner than, say `GOGC=100`.
At that point, the GC has no option but to increase utilization to meet its
goal.
Because it will then be capped at increasing utilization, the GC will have no
choice but to use more memory and overshoot.
As a result, this effectively creates a minimum `GOGC` value: below a certain
`GOGC`, the runtime will be effectively acting as if the `GOGC` value was
higher.
For now, I consider this acceptable.
#### Returning memory to the platform
In the context of maintaining a memory limit, it's critical that the Go runtime
return memory to the underlying platform as a part of that process.
Today, the Go runtime returns memory to the system with a background goroutine
called the scavenger, which paces itself to consume around 1% of 1 CPU.
This pacing is conservative, but necessarily so: the scavenger must synchronize
with any goroutine allocating pages of memory from the heap, so this pacing is
generally a slight underestimate as it fails to include synchronization
overheads from any concurrent allocating goroutines.
Currently, the scavenger's goal is to return free memory to the platform until
it reaches the heap goal, accounting for page-level fragmentation and a fixed
10% overhead to avoid paging costs.
In the context of a memory limit, I propose that the scavenger's goal becomes
that limit.
Then, the scavenger should pace itself more aggressively as the runtime's memory
use approaches the limit.
I propose it does so using a proportional-integral controller whose input is the
difference between the memory limit and the memory used by Go, and whose output
is the CPU utilization target of the background scavenger.
This will make the background scavenger more reliable.
However, the background scavenger likely won't return memory to the OS promptly
enough for the memory limit, so in addition, I propose having span allocations
eagerly return memory to the OS to stay under the limit.
The time a goroutine spends in this will also count toward the 50% GC CPU limit
described in the [Death spirals](#death-spirals) section.
#### Alternative approaches considered
##### Enforcement
The conversion function from the memory limit to the heap limit described in
this section is the result of an impedance mismatch between how the GC pacer
views memory and how the memory limit is defined (i.e. how the platform views
memory).
An alternative approach would be to resolve this impedance mismatch.
One way to do so would be to define the memory limit in terms of heap object
bytes.
As discussed in the [API section](#api) however, this makes for a poor user
experience.
Another way is to redefine the GC pacer's view of memory to include other memory
sources.
Let's focus on the most significant of these: fragmentation.
Suppose we redefined the heap goal and the garbage collector's pacing to be
based on the spans containing objects, rather than the objects themselves.
This definition is straightforward to implement: marked memory is defined as the
sum total of memory used spans containing marked objects, and the heap is
considered to grow each time a fresh span is allocated.
Unfortunately, this redefinition comes with two major caveats that make it very
risky.
The first is that the definition of the GC steady-state, upon which much of the
pacer's intuition is built, now also depends on the degree of fragmentation,
making it a less fragile state in practice.
The second is that most heaps will have an inflated size.
Consider a situation where we start with a very dense heap.
After some time, most of the objects die, but there's still at least one object
alive in each span that previously contained a live object.
With the redefinition, the overall heap will grow to the same size despite much
less memory being alive.
In contrast, the existing definition will cause the heap to grow only only to a
multiple of the actual live objects memory, and it's very unlikely that it will
go beyond the spans already in-use.
##### Returning memory to the platform
If returning memory to the OS eagerly becomes a significant performance issue, a
reasonable alternative could be to crank up the background scavenger's CPU usage
in response to growing memory pressure.
This needs more thought, but given that it would now be controlled by a
controller, its CPU usage will be more reliable, and this is an option we can
keep in mind.
One benefit of this option is that it may impact latency less prominently.
### Documentation
Alongside this new feature I plan to create a new document in the doc directory
of the Go repository entitled "Go GC Ergonomics."
The purpose of this document is four-fold:
* Provide Go users with a high-level, but accurate mental and visual model of
how the Go GC and scavenger behave with varying GOGC and GOMEMLIMIT settings.
* Address common use-cases and provide advice, to promote good practice for each
setting.
* Break down how Go accounts for memory in excruciating detail.
Often memory-related documentation is frustratingly imprecise, making every
user's job much more difficult.
* Describe how to identify and diagnose issues related to the GC through runtime
metrics.
While Hyrum's Law guarantees that the API will be used in unexpected ways, at
least a central and complete living document will exist to help users better
understand what it is that their code is doing.
### Telemetry
Identifying issues with the garbage collector becomes even more important with
new ways to interact with it.
While the Go runtime already exposes metrics that could aid in identifying
issues, these metrics are insufficient to create a complete diagnosis,
especially in light of the new API.
To further assist users in diagnosing issues related to the API (be that misuse
or bugs in the implementation) and the garbage collector in general, I propose
the addition of three new metrics to the [runtime/metrics
package](https://pkg.go.dev/runtime/metrics):
* `/gc/throttles:events`: a monotonically increasing count of leaky bucket
overflows.
* Direct indicator of the application entering a death spiral with the soft
memory limit enabled.
* `/gc/cycles-by-utilization:percent`: histogram of GC cycles by GC CPU
utilization.
* Replaces the very misleading runtime.MemStats.GCCPUFraction
* `/gc/scavenge/cycles-by-utilization:percent`: histogram of scavenger
utilization.
* Since the scavenging rate can now change, identifying possible issues
there will be critical.
## Prior art
### Java
Nearly every Java garbage collector operates with a heap limit by default.
As a result, the heap limit is not a special mode of operation for memory
limits, but rather the status quo.
The limit is typically configured by passing the `-Xmx` flag to the Java runtime
executable.
Note that this is a heap limit, not a memory limit, and so only counts heap
objects.
The OpenJDK runtime operates with a default value of ΒΌ of available memory or 1
GiB, whichever is lesser.
Generally speaking, Java runtimes often only return memory to the OS when it
decides to shrink the heap space used; more recent implementations (e.g. G1) do
so more rarely, except when [the application is
idle](https://openjdk.java.net/jeps/346).
Some JVMs are "container aware" and read the memory limits of their containers
to stay under the limit.
This behavior is closer to what is proposed in this document, but I do not
believe the memory limit is directly configurable, like the one proposed here.
### SetMaxHeap
For nearly 4 years, the Go project has been trialing an experimental API in the
`runtime/debug` package called `SetMaxHeap`.
The API is available as a patch in Gerrit and has been used for some time within
Google.
The API proposed in this document builds on top of the work done on
`SetMaxHeap`.
Some notable details about `SetMaxHeap`:
* Its limit is defined solely in terms of heap object bytes, like Java.
* It does not alter the scavenging policy of the Go runtime at all.
* It accepts an extra channel argument that provides GC pressure notifications.
* This channel is ignored in most uses I'm aware of.
* One exception is where it is used to log GC behavior for telemetry.
* It attempts to limit death spirals by placing a minimum on the runway the GC
has.
* This minimum was set to 10%.
Lessons learned from `SetMaxHeap`:
* Backpressure notifications are unnecessarily complex for many use-cases.
* Figuring out a good heap limit is tricky and leads to conservative limits.
* Without a memory return policy, its usefulness for OOM avoidance is limited.
### TCMalloc
TCMalloc provides a `SetMemoryLimit` function to set a [soft memory
limit](https://github.com/google/tcmalloc/blob/cb5aa92545ded39f75115f3b2cc2ffd66a17d55b/tcmalloc/malloc_extension.h#L306).
Because dynamic memory allocation is provided in C and C++ as a library,
TCMalloc's `SetMemoryLimit` can only be aware of its own overheads, but notably
it does include all sources of fragmentation and metadata in its calculation.
Furthermore, it maintains a policy of eagerly returning memory to the platform
if an allocation would cause TCMalloc's memory use to exceed the specified
limit.
### Go 1 compatibility
This change adds an API to the Go project and does not alter existing ones.
Therefore, the proposed changes are Go 1 backwards compatible.