design: update gc-pacer-redesign and remove inaccuracies

This change updates the GC pacer redesign design document to remove a
few inaccuracies and update two points that became apparent after
experimentation.

Firstly, the inaccuracies were mostly around what was ignored. For
instance, goroutines already donate their debt or credit back to the
global pool on death.

Secondly, the definition of the heap goal included S_n and G_n twice
erroneously. It was written that way with _overall_ GC-work-related
memory used in mind, but the definition is _just_ for heap memory.

Lastly, it turns out that the current pacer does (in its own indirect
way) account for idle priority GC in some way, and not accounting for it
in the new pacer leads to a performance regression. This change adds a
section describing how to account for it.

For golang/go#44167.

Change-Id: I396bbcb87fc3acd84584b10769e31d7da699fdb9
Reviewed-on: https://go-review.googlesource.com/c/proposal/+/350429
Reviewed-by: Michael Knyszek <mknyszek@google.com>
diff --git a/design/44167-gc-pacer-redesign.md b/design/44167-gc-pacer-redesign.md
index 23b4242..62909bf 100644
--- a/design/44167-gc-pacer-redesign.md
+++ b/design/44167-gc-pacer-redesign.md
@@ -409,6 +409,37 @@
 eliminating GC assists in the steady-state (that's not out-pacing the GC), and
 potentially improving application latency as a result.
 
+#### Idle-priority background GC workers
+
+Idle-priority GC workers are extra workers that run if the application isn't
+utilizing all `GOMAXPROCS` worth of parallelism.
+The scheduler schedules "low priority" background workers on any additional CPU
+resources, and this ultimately skews utilization measurements in the GC.
+In today's pacer, they're somewhat accounted for.
+In general, the idle workers skew toward undershoot: because the pacer is not
+explicitly aware of the idle workers, GC cycles will complete sooner than it
+might expect.
+However if a GC cycle completes early, then in theory the current pacer will
+simply adjust.
+From its perspective, scanning and marking objects was just faster than
+expected.
+
+The new proposed design must also account for them somehow.
+I propose including idle priority worker CPU utilization in both the measured
+utilization, ![`u_n`](44167/inl23.png), and the target utilization,
+![`u_t`](44167/inl9.png), in each GC cycle.
+In this way, the only difference between the two terms remains GC assist
+utilization, and while idle worker CPU utilization remains stable, the pacer may
+effectively account for it.
+
+Unfortunately this does mean idle-priority GC worker utilization becomes part of
+the definition of a GC steady-state, making it slightly more fragile.
+The good news is that that was already true.
+Due to other issues with idle-priority GC workers, it may be worth revisiting
+them as a concept, but they do appear to legitimately help certain applications,
+particularly those that spend a significant chunk of time blocked on I/O, yet
+spend some significant chunk of their time awake allocating memory.
+
 ### Smoothing out GC assists
 
 This discussion of GC assists brings us to the existing issues around pacing
@@ -503,8 +534,6 @@
 Notable exclusions are:
 1. Mark assists are front-loaded in a GC cycle.
 1. The hard heap goal isn't actually hard in some circumstances.
-1. Dealing with idle GC workers.
-1. Donating goroutine assist credit/debt on exit.
 1. Existing trigger limits to prevent unbounded memory growth.
 
 (1) is difficult to resolve without special cases and arbitrary heuristics, and
@@ -521,27 +550,7 @@
 I believe that the fundamental problem there lies with the fact that fractional
 GC workers don't provide that sort of consistent progress.
 
-For (3) I believe we should remove idle GC workers entirely, which is why this
-document ignores them.
-Idle GC workers are extra mark workers that run if the application isn't
-utilizing all GOMAXPROCS worth of parallelism.
-The scheduler schedules "low priority" background workers on any additional CPU
-resources, and this ultimately skews utilization measurements in the GC, because
-as of today they're unaccounted for.
-Unfortunately, it's likely that idle GC workers have accidentally become
-necessary for the GC to make progress, so just pulling them out won't be quite
-so easy.
-I believe that needs a separate proposal given other large potential changes
-coming to the compiler and runtime in the near future, because there's
-unfortunately a fairly significant risk of bugs with doing so, though I do think
-it's ultimately an improvement.
-See [the related issue for more
-details](https://github.com/golang/go/issues/44163).
-
-(4) is easy and I don't believe needs any deliberation.
-That is a bug we should simply fix.
-
-For (5), I propose we retain the limits, translated to the current design.
+For (3), I propose we retain the limits, translated to the current design.
 For reference, these limits are ![`0.95 (\gamma - 1)`](44167/inl25.png) as the
 upper-bound on the trigger ratio, and ![`0.6 (\gamma - 1)`](44167/inl26.png) as
 the lower-bound.
diff --git a/design/44167/eqn1.png b/design/44167/eqn1.png
index 7261d6e..a092617 100644
--- a/design/44167/eqn1.png
+++ b/design/44167/eqn1.png
Binary files differ
diff --git a/design/44167/eqn2.png b/design/44167/eqn2.png
index 6207419..209b6b2 100644
--- a/design/44167/eqn2.png
+++ b/design/44167/eqn2.png
Binary files differ
diff --git a/design/44167/eqn3.png b/design/44167/eqn3.png
index 1919c35..bad25d3 100644
--- a/design/44167/eqn3.png
+++ b/design/44167/eqn3.png
Binary files differ
diff --git a/design/44167/eqn4.png b/design/44167/eqn4.png
index 0ec898d..428cd18 100644
--- a/design/44167/eqn4.png
+++ b/design/44167/eqn4.png
Binary files differ
diff --git a/design/44167/eqn5.png b/design/44167/eqn5.png
index c120a67..b9edde2 100644
--- a/design/44167/eqn5.png
+++ b/design/44167/eqn5.png
Binary files differ
diff --git a/design/44167/eqn6.png b/design/44167/eqn6.png
index 6806914..76dcb0f 100644
--- a/design/44167/eqn6.png
+++ b/design/44167/eqn6.png
Binary files differ
diff --git a/design/44167/eqn7.png b/design/44167/eqn7.png
index 6c02c17..40a1913 100644
--- a/design/44167/eqn7.png
+++ b/design/44167/eqn7.png
Binary files differ
diff --git a/design/44167/eqn8.png b/design/44167/eqn8.png
index 302e9ed..2c8dbf3 100644
--- a/design/44167/eqn8.png
+++ b/design/44167/eqn8.png
Binary files differ
diff --git a/design/44167/gc-pacer-redesign.src.md b/design/44167/gc-pacer-redesign.src.md
index 1420b34..2662215 100644
--- a/design/44167/gc-pacer-redesign.src.md
+++ b/design/44167/gc-pacer-redesign.src.md
@@ -189,7 +189,7 @@
 Let `$N_n$` be the heap goal for GC `$n$` ("N" stands for "Next GC").
 
 ```render-latex
-N_n = \gamma(M_{n-1} + S_n + G_n)
+N_n = \gamma M_{n-1} + S_n + G_n
 ```
 
 The old definition makes the assumption that non-heap sources of GC work are
@@ -412,6 +412,36 @@
 eliminating GC assists in the steady-state (that's not out-pacing the GC), and
 potentially improving application latency as a result.
 
+#### Idle-priority background GC workers
+
+Idle-priority GC workers are extra workers that run if the application isn't
+utilizing all `GOMAXPROCS` worth of parallelism.
+The scheduler schedules "low priority" background workers on any additional CPU
+resources, and this ultimately skews utilization measurements in the GC.
+In today's pacer, they're somewhat accounted for.
+In general, the idle workers skew toward undershoot: because the pacer is not
+explicitly aware of the idle workers, GC cycles will complete sooner than it
+might expect.
+However if a GC cycle completes early, then in theory the current pacer will
+simply adjust.
+From its perspective, scanning and marking objects was just faster than
+expected.
+
+The new proposed design must also account for them somehow.
+I propose including idle priority worker CPU utilization in both the measured
+utilization, `$u_n$`, and the target utilization, `$u_t$`, in each GC cycle.
+In this way, the only difference between the two terms remains GC assist
+utilization, and while idle worker CPU utilization remains stable, the pacer may
+effectively account for it.
+
+Unfortunately this does mean idle-priority GC worker utilization becomes part of
+the definition of a GC steady-state, making it slightly more fragile.
+The good news is that that was already true.
+Due to other issues with idle-priority GC workers, it may be worth revisiting
+them as a concept, but they do appear to legitimately help certain applications,
+particularly those that spend a significant chunk of time blocked on I/O, yet
+spend some significant chunk of their time awake allocating memory.
+
 ### Smoothing out GC assists
 
 This discussion of GC assists brings us to the existing issues around pacing
@@ -512,8 +542,6 @@
 Notable exclusions are:
 1. Mark assists are front-loaded in a GC cycle.
 1. The hard heap goal isn't actually hard in some circumstances.
-1. Dealing with idle GC workers.
-1. Donating goroutine assist credit/debt on exit.
 1. Existing trigger limits to prevent unbounded memory growth.
 
 (1) is difficult to resolve without special cases and arbitrary heuristics, and
@@ -530,27 +558,7 @@
 I believe that the fundamental problem there lies with the fact that fractional
 GC workers don't provide that sort of consistent progress.
 
-For (3) I believe we should remove idle GC workers entirely, which is why this
-document ignores them.
-Idle GC workers are extra mark workers that run if the application isn't
-utilizing all GOMAXPROCS worth of parallelism.
-The scheduler schedules "low priority" background workers on any additional CPU
-resources, and this ultimately skews utilization measurements in the GC, because
-as of today they're unaccounted for.
-Unfortunately, it's likely that idle GC workers have accidentally become
-necessary for the GC to make progress, so just pulling them out won't be quite
-so easy.
-I believe that needs a separate proposal given other large potential changes
-coming to the compiler and runtime in the near future, because there's
-unfortunately a fairly significant risk of bugs with doing so, though I do think
-it's ultimately an improvement.
-See [the related issue for more
-details](https://github.com/golang/go/issues/44163).
-
-(4) is easy and I don't believe needs any deliberation.
-That is a bug we should simply fix.
-
-For (5), I propose we retain the limits, translated to the current design.
+For (3), I propose we retain the limits, translated to the current design.
 For reference, these limits are `$0.95 (\gamma - 1)$` as the upper-bound on the
 trigger ratio, and `$0.6 (\gamma - 1)$` as the lower-bound.
 
diff --git a/design/44167/inl1.png b/design/44167/inl1.png
index 65e95c4..5a8fabf 100644
--- a/design/44167/inl1.png
+++ b/design/44167/inl1.png
Binary files differ
diff --git a/design/44167/inl10.png b/design/44167/inl10.png
index d456b71..2ed019b 100644
--- a/design/44167/inl10.png
+++ b/design/44167/inl10.png
Binary files differ
diff --git a/design/44167/inl11.png b/design/44167/inl11.png
index b49102e..83919e3 100644
--- a/design/44167/inl11.png
+++ b/design/44167/inl11.png
Binary files differ
diff --git a/design/44167/inl12.png b/design/44167/inl12.png
index e01717e..48d195e 100644
--- a/design/44167/inl12.png
+++ b/design/44167/inl12.png
Binary files differ
diff --git a/design/44167/inl13.png b/design/44167/inl13.png
index 49e45f3..85c2de9 100644
--- a/design/44167/inl13.png
+++ b/design/44167/inl13.png
Binary files differ
diff --git a/design/44167/inl14.png b/design/44167/inl14.png
index 8fbcaba..c9bd278 100644
--- a/design/44167/inl14.png
+++ b/design/44167/inl14.png
Binary files differ
diff --git a/design/44167/inl15.png b/design/44167/inl15.png
index 0c92c3b..aacc6ca 100644
--- a/design/44167/inl15.png
+++ b/design/44167/inl15.png
Binary files differ
diff --git a/design/44167/inl16.png b/design/44167/inl16.png
index bb885e8..639824e 100644
--- a/design/44167/inl16.png
+++ b/design/44167/inl16.png
Binary files differ
diff --git a/design/44167/inl17.png b/design/44167/inl17.png
index de913d6..ce8dfac 100644
--- a/design/44167/inl17.png
+++ b/design/44167/inl17.png
Binary files differ
diff --git a/design/44167/inl18.png b/design/44167/inl18.png
index cc76b06..8dcceaa 100644
--- a/design/44167/inl18.png
+++ b/design/44167/inl18.png
Binary files differ
diff --git a/design/44167/inl19.png b/design/44167/inl19.png
index f258005..6d4c8eb 100644
--- a/design/44167/inl19.png
+++ b/design/44167/inl19.png
Binary files differ
diff --git a/design/44167/inl2.png b/design/44167/inl2.png
index b75dfe1..20029f9 100644
--- a/design/44167/inl2.png
+++ b/design/44167/inl2.png
Binary files differ
diff --git a/design/44167/inl20.png b/design/44167/inl20.png
index ebd9fee..b86b91a 100644
--- a/design/44167/inl20.png
+++ b/design/44167/inl20.png
Binary files differ
diff --git a/design/44167/inl21.png b/design/44167/inl21.png
index 65057ea..77ca2d4 100644
--- a/design/44167/inl21.png
+++ b/design/44167/inl21.png
Binary files differ
diff --git a/design/44167/inl22.png b/design/44167/inl22.png
index df1cd2f..1d560e4 100644
--- a/design/44167/inl22.png
+++ b/design/44167/inl22.png
Binary files differ
diff --git a/design/44167/inl23.png b/design/44167/inl23.png
index 58366dc..d66fade 100644
--- a/design/44167/inl23.png
+++ b/design/44167/inl23.png
Binary files differ
diff --git a/design/44167/inl24.png b/design/44167/inl24.png
index 254ac81..df164b1 100644
--- a/design/44167/inl24.png
+++ b/design/44167/inl24.png
Binary files differ
diff --git a/design/44167/inl25.png b/design/44167/inl25.png
index be1423f..71c19b2 100644
--- a/design/44167/inl25.png
+++ b/design/44167/inl25.png
Binary files differ
diff --git a/design/44167/inl26.png b/design/44167/inl26.png
index 5d08eab..86bfb70 100644
--- a/design/44167/inl26.png
+++ b/design/44167/inl26.png
Binary files differ
diff --git a/design/44167/inl27.png b/design/44167/inl27.png
index 1752ae2..b76bc3d 100644
--- a/design/44167/inl27.png
+++ b/design/44167/inl27.png
Binary files differ
diff --git a/design/44167/inl28.png b/design/44167/inl28.png
index f385442..f7f387d 100644
--- a/design/44167/inl28.png
+++ b/design/44167/inl28.png
Binary files differ
diff --git a/design/44167/inl29.png b/design/44167/inl29.png
index f447caa..5654f40 100644
--- a/design/44167/inl29.png
+++ b/design/44167/inl29.png
Binary files differ
diff --git a/design/44167/inl3.png b/design/44167/inl3.png
index 42bb9f1..3b607a2 100644
--- a/design/44167/inl3.png
+++ b/design/44167/inl3.png
Binary files differ
diff --git a/design/44167/inl4.png b/design/44167/inl4.png
index ea1729a..506375d 100644
--- a/design/44167/inl4.png
+++ b/design/44167/inl4.png
Binary files differ
diff --git a/design/44167/inl5.png b/design/44167/inl5.png
index 7090982..326b4a4 100644
--- a/design/44167/inl5.png
+++ b/design/44167/inl5.png
Binary files differ
diff --git a/design/44167/inl6.png b/design/44167/inl6.png
index 4c5d9a4..b04cbec 100644
--- a/design/44167/inl6.png
+++ b/design/44167/inl6.png
Binary files differ
diff --git a/design/44167/inl7.png b/design/44167/inl7.png
index 5516cd6..d7cb17e 100644
--- a/design/44167/inl7.png
+++ b/design/44167/inl7.png
Binary files differ
diff --git a/design/44167/inl8.png b/design/44167/inl8.png
index fa56487..298e846 100644
--- a/design/44167/inl8.png
+++ b/design/44167/inl8.png
Binary files differ
diff --git a/design/44167/inl9.png b/design/44167/inl9.png
index 05c1a1c..3e6551a 100644
--- a/design/44167/inl9.png
+++ b/design/44167/inl9.png
Binary files differ