design: add user-configurable memory target

For golang/go#44309.

Change-Id: Ibd2f9bed3a1a1da40b5a3d216ccb1f48c9b64c04
Reviewed-on: https://go-review.googlesource.com/c/proposal/+/292789
Reviewed-by: Michael Pratt <mpratt@google.com>
diff --git a/design/44309-user-configurable-memory-target.md b/design/44309-user-configurable-memory-target.md
new file mode 100644
index 0000000..f1c6a44
--- /dev/null
+++ b/design/44309-user-configurable-memory-target.md
@@ -0,0 +1,489 @@
+# User-configurable memory target
+
+Author: Michael Knyszek
+
+Updated: 16 February 2021
+
+## Background
+
+Issue [#23044](https://golang.org/issue/23044) proposed the addition of some
+kind of API to provide a "minimum heap" size; that is, the minimum heap goal
+that the GC would ever set.
+The purpose of a minimum heap size, as explored in that proposal, is as a
+performance optimization: by preventing the heap from shrinking, each GC cycle
+will get longer as the live heap shrinks further beyond the minimum.
+
+While `GOGC` already provides a way for Go users to trade off GC CPU time and
+heap memory use, the argument against setting `GOGC` higher is that a live heap
+spike is potentially dangerous, since the Go GC will use proportionally more
+memory with a high proportional constant.
+Instead, users (including a [high-profile account by
+Twitch](https://blog.twitch.tv/en/2019/04/10/go-memory-ballast-how-i-learnt-to-stop-worrying-and-love-the-heap-26c2462549a2/)
+have resorted to using a heap ballast: a large memory allocation that the Go GC
+includes in its live heap size, but does not actually take up any resident
+pages, according to the OS.
+This technique thus effectively sets a minimum heap size in the runtime.
+The main disadvantage of this technique is portability.
+It relies on implementation-specific behavior, namely that the runtime will not
+touch that new allocation, thereby preventing the OS from backing that space
+with RAM on Unix-like systems.
+It also relies on the Go GC never scanning that allocation.
+This technique is also platform-specific, because on Windows such an allocation
+would always count as committed.
+
+Today, the Go GC already has a fixed minimum heap size of 4 MiB.
+The reasons around this minimum heap size stem largely from a failure to account
+for alternative GC work sources.
+See [the GC pacer problems meta-issue](https://golang.org/issue/42430) for more
+details.
+The problems are resolved by a [proposed GC pacer
+redesign](https://golang.org/issue/44167).
+
+## Design
+
+I propose the addition of the following API to the `runtime/debug` package:
+
+```go
+// SetMemoryTarget provides a hint to the runtime that it can use at least
+// amount bytes of memory. amount is the sum total of in-ue Go-related memory
+// that the Go runtime can measure.
+//
+// That explictly includes:
+// - Space and fragmentation for goroutine stacks.
+// - Space for runtime structures.
+// - The size of the heap, with fragmentation.
+// - Space for global variables (including dynamically-loaded plugins).
+//
+// And it explicitly excludes:
+// - Read-only parts of the Go program, such as machine instructions.
+// - Any non-Go memory present in the process, such as from C or another
+//   language runtime.
+// - Memory required to maintain OS kernel resources that this process has a
+//   handle to.
+// - Memory allocated via low-level functions in the syscall package, like Mmap.
+//
+// The intuition and goal with this definition is the ability to treat the Go
+// part of any system as a black box: runtime overheads and fragmentation that
+// are otherwise difficult to account for are explicitly included.
+// Anything that is difficult or impossible for the runtime to measure portably
+// is excluded. For these cases, the user is encouraged to monitor these
+// sources for their particular system and update the memory target as
+// necessary.
+//
+// The runtime is free to ignore the hint at any time.
+//
+// In practice, the runtime will use this hint to run the garbage collector
+// less frequently by using up any additional memory up-front. Any memory used
+// beyond that will obey the GOGC trade-off.
+//
+// If the GOGC mechanism is turned off, the hint is always ignored.
+//
+// Note that if the memory target is set higher than the amount of memory
+// available on the system, the Go runtime may attempt to use all that memory,
+// and trigger an out-of-memory condition.
+//
+// An amount of 0 will retract the hint. A negative amount will always be
+// ignored.
+//
+// Returns the old memory target, or -1 if none was previously set.
+func SetMemoryTarget(amount int) int
+```
+
+The design of this feature builds off of the [proposed GC pacer
+redesign](https://golang.org/issue/44167).
+
+I propose we move forward with almost exactly what issue
+[#23044](https://golang.org/issue/23044) proposed, namely exposing the heap
+minimum and making it configurable via a runtime API.
+The behavior of `SetMemoryTarget` is thus analogous to the common (but
+non-standard) Java runtime flag `-Xms` (with Adaptive Size Policy disabled).
+With the GC pacer redesign, smooth behavior here should be straightforward to
+ensure, as the troubles here basically boil down to the "high `GOGC`" issue
+mentioned in that design.
+
+There's one missing piece and that's how to turn the hint (which is memory use)
+into a heap goal.
+Because the heap goal includes both stacks and globals, I propose that we
+compute the heap goal as follows:
+
+```
+Heap goal = amount
+	// These are runtime overheads.
+	- MemStats.GCSys
+	- Memstats.MSpanSys
+	- MemStats.MCacheSys
+	- MemStats.BuckHashSys
+	- MemStats.OtherSys
+	- MemStats.StackSys
+	// Fragmentation.
+	- (MemStats.HeapSys-MemStats.HeapInuse)
+	- (MemStats.StackInuse-(unused portions of stack spans))
+```
+
+What this formula leaves us with is a value that should include:
+1. Stack space that is actually allocated for goroutine stacks,
+1. Global variables (so the part of the binary mapped read-write), and
+1. Heap space allocated for objects.
+These are the three factors that go into determining the `GOGC`-based heap goal
+according to the GC pacer redesign.
+
+Note that while at first it might appear that this definition of the heap goal
+will cause significant jitter in what the heap goal is actually set to, runtime
+overheads and fragmentation tend to be remarkably stable over the lifetime of a
+Go process.
+
+In an ideal world, that would be it, but as the API documentation points out,
+there are a number of sources of memory that are unaccounted for that deserve
+more explanation.
+
+Firstly, there's the read-only parts of the binary, like the instructions
+themselves, but these parts' impact on memory use are murkier since the
+operating system tends to de-duplicate this memory between processes.
+Furthermore, on platforms like Linux, this memory is always evictable, down to
+the last available page.
+As a result, I intentionally ignore that factor here.
+If the size of the binary is a factor, unfortunately it will be up to the user
+to subtract out that size from the amount they pass to `SetMemoryTarget`.
+
+The source of memory is anything non-Go, such as C code (or, say a Python VM)
+running in the same process.
+These sources also need to be accounted for by the user because this could be
+absolutely anything, and portably interacting with the large number of different
+possibilities is infeasible.
+Luckily, `SetMemoryTarget` is a run-time API that can be made to respond to
+changes in external memory sources that Go could not possibly be aware of, so
+API recommends updating the target on-line if need be.
+
+Another source of memory use is kernel memory.
+If the Go process holds onto kernel resources that use memory within the kernel
+itself, those are unaccounted for.
+Unfortunately, while this API tries to avoid situations where the user needs to
+make conservative estimates, this is one such case.
+As far as I know, most systems do not associate kernel memory with a process, so
+querying and reacting to this information is just impossible.
+
+The final source of memory is memory that's created by the Go program, but that
+the runtime isn't necessarily aware of, like explicitly `Mmap`'d memory.
+Theoretically the Go runtime could be aware of this specific case, but this is
+tricky to account for in general given the wide range of options that can be
+passed to `mmap`-like functionality on various platforms.
+Sometimes it's worth accounting for it, sometimes not.
+I believe it's best to leave that up to the user.
+
+To validate the design, I ran several [simulations](#simulations) of this
+implementation.
+In general, the runtime is resilient to a changing heap target (even one that
+changes wildly) but shrinking the heap target significantly has the potential to
+cause GC CPU utilization spikes.
+This is by design: the runtime suddenly has much less runway than it thought
+before the change, so it needs to make that up to reach its goal.
+
+The only issue I found with this formulation is the potential for consistent
+undershoot in the case where the heap size is very small, mostly because we
+place a limit on how late a GC cycle can start.
+I think this is OK, and I don't think we should alter our current setting.
+This choice means that in extreme cases, there may be some missed performance.
+But I don't think it's enough to justify the additional complexity.
+
+### Simulations
+
+These simulations were produced by the same tool as those for the [GC pacer
+redesign](https://github.com/golang/go/issues/44167).
+That is,
+[github.com/mknyszek/pacer-model](https://github.com/mknyszek/pacer-model).
+See the GC pacer design document for a list of caveats and assumptions, as well
+as a description of each subplot, though the plots are mostly straightforward.
+
+**Small heap target.**
+
+In this scenario, we set a fairly small target (around 64 MiB) as a baseline.
+This target is fairly close to what `GOGC` would have picked.
+Mid-way through the scenario, the live heap grows a bit.
+
+![](44309/low-heap-target.png)
+
+Notes:
+- There's a small amount of overshoot when the live heap size changes, which is
+  expected.
+- The pacer is otherwise resilient to changes in the live heap size.
+
+**Very small heap target.**
+
+In this scenario, we set a fairly small target (around 64 MiB) as a baseline.
+This target is much smaller than what `GOGC` would have picked, since the live
+heap grows to around 5 GiB.
+
+![](44309/very-low-heap-target.png)
+
+Notes:
+- `GOGC` takes over very quickly.
+
+**Large heap target.**
+
+In this scenario, we set a fairly large target (around 2 GiB).
+This target is fairly far from what `GOGC` would have picked.
+Mid-way through the scenario, the live heap grows a lot.
+
+![](44309/high-heap-target.png)
+
+Notes:
+- There's a medium amount of overshoot when the live heap size changes, which is
+  expected.
+- The pacer is otherwise resilient to changes in the live heap size.
+
+**Exceed heap target.**
+
+In this scenario, we set a fairly small target (around 64 MiB) as a baseline.
+This target is fairly close to what `GOGC` would have picked.
+Mid-way through the scenario, the live heap grows enough such that we exit the
+memory target regime and enter the `GOGC` regime.
+
+![](44309/exceed-heap-target.png)
+
+Notes:
+- There's a small amount of overshoot when the live heap size changes, which is
+  expected.
+- The pacer is otherwise resilient to changes in the live heap size.
+- The pacer smoothly transitions between regimes.
+
+**Exceed heap target with a high GOGC.**
+
+In this scenario, we set a fairly small target (around 64 MiB) as a baseline.
+This target is fairly close to what `GOGC` would have picked.
+Mid-way through the scenario, the live heap grows enough such that we exit the
+memory target regime and enter the `GOGC` regime.
+The `GOGC` value is set very high.
+
+![](44309/exceed-heap-target-high-GOGC.png)
+
+Notes:
+- There's a small amount of overshoot when the live heap size changes, which is
+  expected.
+- The pacer is otherwise resilient to changes in the live heap size.
+- The pacer smoothly transitions between regimes.
+
+**Change in heap target.**
+
+In this scenario, the heap target is set mid-way through execution, to around
+256 MiB.
+This target is fairly far from what `GOGC` would have picked.
+The live heap stays constant, meanwhile.
+
+![](44309/step-heap-target.png)
+
+Notes:
+- The pacer is otherwise resilient to changes in the heap target.
+- There's no overshoot.
+
+**Noisy heap target.**
+
+In this scenario, the heap target is set once per GC and is somewhat noisy.
+It swings at most 3% around 2 GiB.
+This target is fairly far from what `GOGC` would have picked.
+Mid-way through the live heap increases.
+
+![](44309/low-noise-heap-target.png)
+
+Notes:
+- The pacer is otherwise resilient to a noisy heap target.
+- There's expected overshoot when the live heap size changes.
+- GC CPU utilization bounces around slightly.
+
+**Very noisy heap target.**
+
+In this scenario, the heap target is set once per GC and is very noisy.
+It swings at most 50% around 2 GiB.
+This target is fairly far from what `GOGC` would have picked.
+Mid-way through the live heap increases.
+
+![](44309/high-noise-heap-target.png)
+
+Notes:
+- The pacer is otherwise resilient to a noisy heap target.
+- There's expected overshoot when the live heap size changes.
+- GC CPU utilization bounces around, but not much.
+
+**Large heap target with a change in allocation rate.**
+
+In this scenario, we set a fairly large target (around 2 GiB).
+This target is fairly far from what `GOGC` would have picked.
+Mid-way through the simulation, the application begins to suddenly allocate much
+more aggressively.
+
+![](44309/heavy-step-alloc-high-heap-target.png)
+
+Notes:
+- The pacer is otherwise resilient to changes in the live heap size.
+- There's no overshoot.
+- There's a spike in utilization that's consistent with other simulations of the
+  GC pacer.
+- The live heap grows due to floating garbage from the high allocation rate
+  causing each GC cycle to start earlier.
+
+### Interactions with other GC mechanisms
+
+Although listed already in the API documentation, there are a few additional
+details I want to consider.
+
+#### GOGC
+
+The design of the new pacer means that switching between the "memory target"
+regime and the `GOGC` regime (the regimes being defined as the mechanism that
+determines the heap goal) is very smooth.
+While the live heap times `1+GOGC/100` is less than the heap goal set by the
+memory target, we are in the memory target regime.
+Otherwise, we are in the `GOGC` regime.
+Notice that as `GOGC` rises to higher and higher values, the range of the memory
+target regime shrinks.
+At infinity, meaning `GOGC=off`, the memory target regime no longer exists.
+
+Therefore, it's very clear to me that the memory target should be completely
+ignored if `GOGC` is set to "off" or a negative value.
+
+#### Memory limit
+
+If we choose to also adopt an API for setting a memory limit in the runtime, it
+would necessarily always need to override a memory target, though both could
+plausibly be active simultaneously.
+If that memory limit interacts with `GOGC` being set to "off," then the rule of
+the memory target being ignored holds; the memory limit effectively acts like a
+target in that circumstance.
+If the two are set to an equal value, that behavior is virtually identical to
+`GOGC` being set to "off" and *only* a memory limit being set.
+Therefore, we need only check that these two cases behave identically.
+Note however that otherwise that the memory target and the memory limit define
+different regimes, so they're otherwise orthogonal.
+While there's a fairly large gap between the two (relative to `GOGC`), the two
+are easy to separate.
+Where it gets tricky is when they're relatively close, and this case would need
+to be tested extensively.
+
+## Risks
+
+The primary risk with this proposal is adding another "knob" to the garbage
+collector, with `GOGC` famously being the only one.
+Lots of language runtimes provide flags and options that alter the behavior of
+the garbage collector, but when the number of flags gets large, maintaining
+every possible configuration becomes a daunting, if not impossible task, because
+the space of possible configurations explodes with each knob.
+
+This risk is a strong reason to be judicious.
+The bar for new knobs is high.
+
+But there are a few good reasons why this might still be important.
+The truth is, this API already exists, but is considered unsupported and is
+otherwise unmaintained.
+The API exists in the form of heap ballasts, a concept we can thank Hyrum's Law
+for.
+It's already possible for an application to "fool" the garbage collector into
+thinking there's more live memory than there actually is.
+The downside is resizing the ballast is never going to be nearly as reactive as
+the garbage collector itself, because it is at the mercy of the of the runtime
+managing the user application.
+The simple fact is performance-sensitive Go users are going to write this code
+anyway.
+It is worth noting that unlike a memory maximum, for instance, a memory target
+is purely an optimization.
+On the whole, I suspect it's better for the Go ecosystem for there to be a
+single solution to this problem in the standard library, rather than solutions
+that *by construction* will never be as good.
+
+And I believe we can mitigate some of the risks with "knob explosion."
+The memory target, as defined above, has very carefully specified and limited
+interactions with other (potential) GC knobs.
+Going forward I believe a good criterion for the addition of new knobs should be
+that a knob should only be added if it is *only* fully orthogonal with `GOGC`,
+and nothing else.
+
+## Monitoring
+
+I propose adding a new metric to the `runtime/metrics` package to enable
+monitoring of the memory target, since that is a new value that could change at
+runtime.
+I propose the metric name `/memory/config/target:bytes` for this purpose.
+Otherwise, it could be useful for an operator to understand which regime the Go
+application is operating in at any given time.
+We currently expose the `/gc/heap/goal:bytes` metric which could theoretically
+be used to determine this, but because of the dynamic nature of the heap goal in
+this regime, it won't be clear which regime the application is in at-a-glance.
+
+Therefore, I propose adding another metric `/memory/goal:bytes`.
+This metric is analagous to `/gc/heap/goal:bytes` but is directly comparable
+with `/memory/config/target:bytes` (that is, it includes additional overheads
+beyond just what goes into the heap goal, it "converts back").
+When this metric "bottoms out" at a flat line, that should serve as a clear
+indicator that the pacer is in the "target" regime.
+This same metric could be reused for a memory limit in the future, where it will
+"top out" at the limit.
+
+## Documentation
+
+This API has an inherent complexity as it directly influences the behavior of
+the Go garbage collector.
+It also deals with memory accounting, a process that is infamously (and
+unfortunately) difficult to wrap one's head around and get right.
+Effective of use of this API will come down to having good documentation.
+
+The documentation will have two target audiences: software developers, and
+systems administrators (referred to as "developers" and "operators,"
+respectively).
+
+For both audiences, it's incredibly important to understand exactly what's
+included and excluded in the memory target.
+That is why it is explicitly broken down in the most visible possible place for
+a developer: the documentation for the API itself.
+For the operator, the `runtime/metrics` metric definition should either
+duplicate this documentation, or point to the API.
+This documentation is important for immediate use and understanding, but API
+documentation is never going to be expressive enough.
+I propose also introducing a new document to the `doc` directory in the Go
+source tree that explains common use-cases, extreme scenarios, and what to
+expect in monitoring in these various situations.
+This document should include a list of known bugs and how they might appear in
+monitoring.
+In other words, it should include a more formal and living version of the [GC
+pacer meta-issue](https://golang.org/issues/42430).
+The hard truth is that memory accounting and GC behavior are always going to
+fall short in some cases, and it's immensely useful to be honest and up-front
+about those cases where they're known, while always striving to do better.
+As every other document in this directory, it would be a living document that
+will grow as new scenarios are discovered, bugs are fixed, and new functionality
+is made available.
+
+## Alternatives considered
+
+Since this is a performance optimization, it's possible to do nothing.
+But as I mentioned in [the risks section](#risks), I think there's a solid
+justification for doing *something*.
+
+Another alternative I considered was to provide better hooks into the runtime to
+allow users to implement equivalent functionality themselves.
+Today, we provide `debug.SetGCPercent` as well as access to a number of runtime
+statistics.
+Thanks to work done for the `runtime/metrics` package, that information is now
+much more efficiently accessible.
+By exposing just the right metric, one could imagine a background goroutine that
+calls `debug.SetGCPercent` in response to polling for metrics.
+The reason why I ended up discarding this alternative, however, is this then
+forces the user writing the code that relies on the implementation details of
+garbage collector.
+For instance, a reasonable implementation of a memory target using the above
+mechanism would be to make an adjustment each time the heap goal changes.
+What if future GC implementations don't have a heap goal? Furthermore, the heap
+goal needs to be sampled; what if GCs are occurring rapidly? Should the runtime
+expose when a GC ends? What if the new GC design is fully incremental, and there
+is no well-defined notion of "GC end"? It suffices to say that in order to keep
+Go implementations open to new possibilities, we should avoid any behavior that
+exposes implementation details.
+
+## Go 1 backwards compatibility
+
+This change only adds to the Go standard library's API surface, and is therefore
+Go 1 backwards compatible.
+
+## Implementation
+
+Michael Knyszek will implement this.
+1. Implement in the runtime.
+1. Extend the pacer simulation test suite with this use-case in a variety of
+   configurations.
diff --git a/design/44309/exceed-heap-target-high-GOGC.png b/design/44309/exceed-heap-target-high-GOGC.png
new file mode 100644
index 0000000..601982c
--- /dev/null
+++ b/design/44309/exceed-heap-target-high-GOGC.png
Binary files differ
diff --git a/design/44309/exceed-heap-target.png b/design/44309/exceed-heap-target.png
new file mode 100644
index 0000000..2a695fa
--- /dev/null
+++ b/design/44309/exceed-heap-target.png
Binary files differ
diff --git a/design/44309/heavy-step-alloc-high-heap-target.png b/design/44309/heavy-step-alloc-high-heap-target.png
new file mode 100644
index 0000000..c57bbcd
--- /dev/null
+++ b/design/44309/heavy-step-alloc-high-heap-target.png
Binary files differ
diff --git a/design/44309/high-heap-target.png b/design/44309/high-heap-target.png
new file mode 100644
index 0000000..b69ae21
--- /dev/null
+++ b/design/44309/high-heap-target.png
Binary files differ
diff --git a/design/44309/high-noise-high-heap-target.png b/design/44309/high-noise-high-heap-target.png
new file mode 100644
index 0000000..9989dce
--- /dev/null
+++ b/design/44309/high-noise-high-heap-target.png
Binary files differ
diff --git a/design/44309/low-heap-target.png b/design/44309/low-heap-target.png
new file mode 100644
index 0000000..6b8739d
--- /dev/null
+++ b/design/44309/low-heap-target.png
Binary files differ
diff --git a/design/44309/low-noise-high-heap-target.png b/design/44309/low-noise-high-heap-target.png
new file mode 100644
index 0000000..66d67ab
--- /dev/null
+++ b/design/44309/low-noise-high-heap-target.png
Binary files differ
diff --git a/design/44309/step-heap-target.png b/design/44309/step-heap-target.png
new file mode 100644
index 0000000..af93ed2
--- /dev/null
+++ b/design/44309/step-heap-target.png
Binary files differ
diff --git a/design/44309/very-low-heap-target.png b/design/44309/very-low-heap-target.png
new file mode 100644
index 0000000..c6a3bdd
--- /dev/null
+++ b/design/44309/very-low-heap-target.png
Binary files differ