design/24543: split out safe-points-everywhere

The non-cooperative preemption proposal document currently mixes
discussion of non-cooperative preemption, which is a very general
idea, with a specific implementation based on stack maps and register
maps at every instruction.

I'm about to introduce a proposal for an alternative implementation
approach to non-cooperative preemption. In order to prepare for that,
this CL splits the existing document into a general overview of
non-cooperative preemption, and a separate document detailing the
safe-points-everywhere implementation approach. This is similar to how
we organize the generics proposal (15292-generics.md).

Updates golang/go#24543.

Change-Id: Iacd1cff3f00ec9e153122d5eb21a9a1699630e34
Reviewed-on: https://go-review.googlesource.com/c/158859
Reviewed-by: Michael Knyszek <mknyszek@google.com>
diff --git a/design/24543-non-cooperative-preemption.md b/design/24543-non-cooperative-preemption.md
index a15350a..9bc4079 100644
--- a/design/24543-non-cooperative-preemption.md
+++ b/design/24543-non-cooperative-preemption.md
@@ -2,7 +2,7 @@
 
 Author(s): Austin Clements
 
-Last updated: 2018-03-26
+Last updated: 2019-01-18
 
 Discussion at https://golang.org/issue/24543.
 
@@ -27,12 +27,18 @@
 acceptable).
 
 I propose that the Go implementation switch to non-cooperative
-preemption using stack and register maps at (essentially) every
-instruction.
-This would allow goroutines to be preempted without explicit
-preemption checks.
-This approach will solve the problem of delayed preemption with zero
-run-time overhead.
+preemption, which would allow goroutines to be preempted at
+essentially any point without the need for explicit preemption checks.
+This approach will solve the problem of delayed preemption and do so
+with zero run-time overhead.
+
+Non-cooperative preemption is a general concept with a whole class of
+implementation techniques.
+This document describes and motivates the switch to non-cooperative
+preemption and discusses common concerns of any non-cooperative
+preemption approach in Go.
+Specific implementation approaches are detailed in sub-proposals
+linked from this document.
 
 
 ## Background
@@ -70,6 +76,10 @@
 and expose Go's users to implementation details they shouldn't have to
 worry about.
 
+<!-- TODO: Give background on non-cooperative preemption in general.
+Cover how general-purpose operating systems do this and special
+considerations in a garbage-collected language. -->
+
 ### Loop preemption
 
 @dr2chase put significant effort into trying to solve these problems
@@ -110,42 +120,28 @@
 
 ## Proposal
 
-I propose that we implement fully non-cooperative preemption by
-recording enough metadata to allow safe-points (almost) everywhere
-without explicit preemption checks in function bodies.
+I propose that Go implement non-cooperative goroutine preemption.
+There are many possible ways to do this, which are detailed in these
+sub-proposals:
 
-To do this, we would modify the compiler to produce register maps in
-addition to stack maps, and to emit these for as many program counters
-as possible.
-The runtime would use a signal (or `GetThreadContext` on Windows, or a
-note on Plan9) to retrieve each thread's register state, including the
-stack and register map at the interrupted PC.
-The garbage collector would then treat live pointers in registers just
-as it treats live pointers on the stack.
+* The [safe-points everywhere
+  proposal](24543/safe-points-everywhere.md) describes an
+  implementation where the compiler records stack and register maps
+  for nearly every instruction.
+  This allows the runtime to halt a goroutine anywhere and find its GC
+  roots.
 
-Certain instructions cannot be safe-points, so if a signal occurs at
-such a point, the runtime would simply resume the thread and try again
-later.
-The compiler just needs to make *most* instructions safe-points.
-
-To @minux's credit, he suggested this in [the very first
-reply](https://github.com/golang/go/issues/10958#issuecomment-105678822)
-to #10958.
-At the time we thought adding safe-points everywhere would be too
-difficult and that the overhead of explicit loop preemption would be
-lower than it turned out to be.
-
-Many other garbage-collected languages use explicit safe-points on
-back-edges, or they use forward-simulation to reach a safe-point.
-Partly, it's possible for Go to support safe-points everywhere because
-Go's GC already must have excellent support for interior pointers; in
-many languages, interior pointers never appear at a safe-point.
+<!-- TODO(austin): Write and link to conservative inner-frame
+proposal.
+Write a "trade-offs" section comparing the sub-proposals. -->
 
 
 ## Handling unsafe-points
 
-There are various points in generated code that must be GC-atomic and
-thus cannot have safe-points in them.
+Any non-cooperative preemption approach in Go must deal with code
+sequences that have to be atomic with respect to the garbage
+collector.
+We call these "unsafe-points", in contrast with GC safe-points.
 A few known situations are:
 
 1. Expressions involving `unsafe.Pointer` may temporarily represent
@@ -233,7 +229,7 @@
 
 ### Range loops
 
-Currently (as of 1.10), range loops are compiled roughly like:
+As of Go 1.10, range loops are compiled roughly like:
 
 ```go
 for i, x := range s { b }
@@ -255,17 +251,16 @@
 Currently this is safe because there's never a safe-point while this
 value of `_p` is live.
 
-We could keep this lowering and mark the increment and condition
-blocks as unsafe-points.
+This lowering requires that the compiler mark the increment and
+condition blocks as unsafe-points.
 However, if the body is short, this could result in infrequent
 safe-points.
-It would also require creating a separate block for the increment,
-which is currently usually appended to the end of the body.
+It also requires creating a separate block for the increment, which is
+currently usually appended to the end of the body.
 Separating these blocks would inhibit reordering opportunities.
 
-Alternatively, we could rewrite the loop to never create a
-past-the-end pointer.
-For example, we could lower it like:
+In preparation for non-cooperative preemption, Go 1.11 began compiling
+range loops as follows to avoid ever creating a past-the-end pointer:
 
 ```go
 i, _n, _p := 0, len(s), &s[0]
@@ -279,15 +274,15 @@
 end:
 ```
 
-This would allow safe-points everywhere in the loop.
-Compared to the current loop compilation, it generates slightly more
+This allows safe-points everywhere in the loop.
+Compared to the original loop compilation, it generates slightly more
 code, but executes the same number of conditional branch instructions
 (n+1) and results in the same number of SSA basic blocks (3).
 
 This lowering does complicate bounds-check elimination.
-Currently, bounds-check elimination knows that `i < _n` in the body
+In Go 1.10, bounds-check elimination knew that `i < _n` in the body
 because the body block is dominated by the cond block.
-However, in the new lowering, deriving this fact requires detecting
+However, in the new lowering, deriving this fact required detecting
 that `i < _n` on *both* paths into body and hence is true in body.
 
 ### Runtime safe-points
@@ -325,8 +320,8 @@
 
 ### Ensuring progress with unsafe-points
 
-Above we propose simply giving up and retrying later when a goroutine
-is interrupted at an unsafe-point.
+We propose simply giving up and retrying later when a goroutine is
+interrupted at an unsafe-point.
 One danger of this is that safe points may be rare in tight loops.
 However, in many cases, there are more sophisticated alternatives to
 this approach.
@@ -360,136 +355,17 @@
 writes in the sequence or more complex writes such as DUFFCOPY.
 
 
-## Encoding of stack and register maps
-
-In the implementation for Go 1.11, register maps are encoded using the
-exact same encoding as argument and locals maps.
-Unlike argument and locals maps, which are indexed together in a
-single PCDATA stream, the register maps are indexed by a separate
-PCDATA stream because changes to the register map tend not to
-correlate with changes to the arguments and locals maps.
-
-Curiously, the significant majority of the space overhead from this
-scheme is from the PCDATA stream that indexes into the register map.
-The actual register map FUNCDATA is relatively small, suggesting that
-functions have relatively few distinct register maps, but change
-between them frequently.
-
-### Alternates considered/attempted
-
-Biasing the register allocator to allocate pointers and scalars from
-different registers to reduce the number of unique maps and possibly
-reduce the number of map changes would seem like an easy improvement.
-However, it had very little effect.
-
-Similarly, adding "slop" to the register maps by allowing the liveness
-of a register to extend between its last use and next clobber slightly
-reduced the number of register map changes, but only slightly.
-
-The one successful alternate tried was to Huffman-code the delta
-stream, which roughly halved the size of the metadata.
-In this scheme, the register maps are encoded in a single bit stream
-per function that alternates between PC delta (as a positive offset
-from the previous PC), and register map delta (as an XOR from the
-previous register map).
-The two deltas are Huffman coded with separate Huffman tables, and the
-Huffman tables are shared across the entire binary.
-It may be even more effective to interleave the stack map changes into
-the same stream, since this would allow the PC deltas to be shared.
-This change was too invasive to implement for Go 1.11, but may be
-worth attempting for Go 1.12.
-
-
-## Other uses
-
-### Heap dump analysis
-
-Having safe-points everywhere fixes problems with heap dump analysis
-from core files, which currently has to use conservative heuristics to
-identify live pointers in active call frames.
-
-### Call injection
-
-Having safe-points everywhere also allows some function calls to be
-safely injected at runtime.
-This is useful in at least two situations:
-
-1. To handle synchronous signals, such as nil-pointer dereferences,
-   the runtime injects a call to `runtime.sigpanic` at the location of
-   the fault.
-   However, since there isn't usually a call at this location, the
-   stack map may be inaccurate, which leads to complicated
-   interactions between defers, escape analysis, and traceback
-   handling.
-   Having safe-points everywhere could simplify this.
-
-2. Debugger function calls (#21678).
-   Currently it's essentially impossible for a debugger to dynamically
-   invoke a Go function call because of poor interactions with stack
-   scanning and the garbage collector.
-   Having stack and register maps everywhere would make this
-   significantly more viable, since a function call could be injected
-   nearly anywhere without making the stack un-scannable.
-
-
-## Testing
-
-The primary danger of this approach is its potential for a long bug
-tail, since the coverage of safe-points in regular testing will
-decrease substantially.
-In addition to standard testing, I propose checking the generated
-liveness maps using static analysis of binaries.
-This tool would look for pointer dereferences or stores with a write
-barrier to indicate that a value is a pointer and would check the flow
-of that value through all possible paths.
-It would report anywhere a value transitioned from dead/scalar to live
-pointer and anywhere a value was used both like a pointer and like a
-scalar.
-
-In effect, this tool would simulate the program to answer the question
-"for every two points in time A < B, are there allocations reachable
-from the liveness map at time B that were not reachable at time A and
-were not allocated between A and B?"
-
-Most likely this static analysis tool could be written atop the
-existing [golang.org/x/arch](https://godoc.org/golang.org/x/arch)
-packages.
-These are the same packages used by, for example, `go tool objdump`,
-and handle most heavy-lifting of decoding the binary itself.
-
-
 ## Other considerations
 
-**Space overhead.** Traditionally, the main concern with having
-safe-points everywhere is the overhead of saving the stack/register
-maps.
-A very preliminary implementation of register maps and safe-points
-everywhere increased binary size by ~10%.
-However, this left several obvious optimizations on the table.
-Work by Stichmoth, et al. [1] further suggests that this overhead
-can be significantly curtailed with simple compression techniques.
+All of the proposed approaches to non-cooperative preemption involve
+stopping a running goroutine by sending its thread an OS signal.
+This section discusses general consequences of this.
 
 **Windows support.** Unlike fault-based loop preemption, signaled
 preemption is quite easy to support in Windows because it provides
 `SuspendThread` and `GetThreadContext`, which make it trivial to get a
 thread's register set.
 
-**Decoupling stack-move points from GC safe-points.** Because of the
-details of the current implementation, these are (essentially) the
-same.
-By decoupling these and only allowing stack growth and shrinking at
-function entry, stack copying would not need to adjust registers.
-This keeps stack copying simpler.
-It also enables better compiler optimizations and more safe-points
-since the compiler need not mark a register as a live pointer if it
-knows there's another live pointer to the object.
-Likewise, registers that are known to be derived from pointer
-arguments can be marked as scalar as long as those arguments are live.
-Such optimizations are possible because of Go's non-moving collector.
-This also prevents stack moving from observing (and crashing on)
-transient small-valued pointers that the compiler constructs when it
-knows an offset from a potentially-nil pointer will be small.
-
 **Choosing a signal.** We have to choose a signal that is unlikely to
 interfere with existing uses of signals or with debuggers.
 There are no perfect choices, but there are some heuristics.
@@ -552,34 +428,26 @@
 Such uses are a clear violation of the `unsafe.Pointer` rules, but
 they may happen (especially in, for example, cgo-using code).
 
-**Assembly.** By default, the runtime cannot safely preempt assembly
-code since it won't know what registers contain pointers. As a
-follow-on to the work on safe-points everywhere, we should audit
-assembly in the standard library for non-preemptible loops and
-annotate them with register maps. In most cases this should be trivial
-since most assembly never constructs a pointer that isn't shadowed by
-an argument, so it can simply claim there are no pointers in
-registers. We should also document in the Go assembly guide how to do
-this for user code.
-
 
 ## Alternatives
 
 ### Single-stepping
 
-Rather than significantly increasing the number of safe-points, we
-could use a signal to stop a thread and then use hardware
-single-stepping support to advance the thread to a safe-point (or a
-point where the compiler has provided a branch to reach a safe-point,
-like in the current loop preemption approach).
+Rather than making an effort to be able to stop at any instruction,
+the compiler could emit metadata for safe-points only at back-edges
+and the runtime could use hardware single-stepping support to advance
+the thread to a safe-point (or a point where the compiler has provided
+a branch to reach a safe-point, like in the current loop preemption
+approach).
 This works (somewhat surprisingly), but thoroughly bamboozles
 debuggers since both the debugger and the operating system assume the
 debugger owns single-stepping, not the process itself.
 This would also require the compiler to provide register flushing
 stubs for these safe-points, which increases code size (and hence
-instruction cache pressure) as well as stack size.
-Safe-points everywhere increase binary size, but not code size or
-stack size.
+instruction cache pressure) as well as stack size, much like explicit
+loop preemption.
+However, unlike explicit loop preemption, this approach would have no
+effect on mainline code size or performance.
 
 ### Jump rewriting
 
@@ -642,28 +510,3 @@
 It's also potentially problematic for scheduler preemptions, since
 that may keep leaked pointers live for longer than a GC pause or stack
 scan.
-
-
-## Compatibility
-
-This proposal introduces no new APIs, so it is Go 1 compatible.
-
-
-## Implementation
-
-Austin Clements (@austin) plans to implement register and stack maps
-everywhere for Go 1.11.
-This will enable some low-risk uses in the short term, such as
-debugger function calls.
-
-Debugging and testing of register and stack maps can continue into the
-Go 1.11 freeze, including building the static analysis tool.
-
-Then, for Go 1.12, Austin will implement safe-points everywhere atop
-the register and stacks maps.
-
-
-## References
-
-[1] James M. Stichnoth, Guei-Yuan Lueh, and Michał Cierniak. 1999. Support for garbage collection at every instruction in a Java compiler. In *Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation* (PLDI '99). ACM, New York, NY, USA, 118–127.
-
diff --git a/design/24543/safe-points-everywhere.md b/design/24543/safe-points-everywhere.md
new file mode 100644
index 0000000..85e7f80
--- /dev/null
+++ b/design/24543/safe-points-everywhere.md
@@ -0,0 +1,213 @@
+# Proposal: Safe-points everywhere for non-cooperative goroutine preemption
+
+Author(s): Austin Clements
+
+Last updated: 2018-03-26 (extracted from general proposal 2019-01-17)
+
+Discussion at https://golang.org/issue/24543.
+
+## Introduction
+
+Up to and including Go 1.10, Go has used cooperative preemption with
+safe-points only at function calls.
+We propose that the Go implementation switch to *non-cooperative*
+preemption.
+The background and rationale for this proposal are detailed in the
+[top-level proposal document](../20543-non-cooperative-preemption.md).
+
+This document details a specific approach to non-cooperative
+preemption based on constructing stack and register maps at
+(essentially) every instruction.
+
+
+## Proposal
+
+I propose that we implement fully non-cooperative preemption by
+recording enough metadata to allow safe-points (almost) everywhere.
+
+To do this, we would modify the compiler to produce register maps in
+addition to stack maps, and to emit these for as many program counters
+as possible.
+The runtime would use a signal (or `GetThreadContext` on Windows, or a
+note on Plan9) to retrieve each thread's register state, including the
+stack and register map at the interrupted PC.
+The garbage collector would then treat live pointers in registers just
+as it treats live pointers on the stack.
+
+Certain instructions cannot be safe-points, so if a signal occurs at
+such a point, the runtime would simply resume the thread and try again
+later.
+The compiler just needs to make *most* instructions safe-points.
+
+To @minux's credit, he suggested this in [the very first
+reply](https://github.com/golang/go/issues/10958#issuecomment-105678822)
+to #10958.
+At the time we thought adding safe-points everywhere would be too
+difficult and that the overhead of explicit loop preemption would be
+lower than it turned out to be.
+
+Many other garbage-collected languages use explicit safe-points on
+back-edges, or they use forward-simulation to reach a safe-point.
+Partly, it's possible for Go to support safe-points everywhere because
+Go's GC already must have excellent support for interior pointers; in
+many languages, interior pointers never appear at a safe-point.
+
+
+## Encoding of stack and register maps
+
+In the implementation for Go 1.11, register maps are encoded using the
+exact same encoding as argument and locals maps.
+Unlike argument and locals maps, which are indexed together in a
+single PCDATA stream, the register maps are indexed by a separate
+PCDATA stream because changes to the register map tend not to
+correlate with changes to the arguments and locals maps.
+
+Curiously, the significant majority of the space overhead from this
+scheme is from the PCDATA stream that indexes into the register map.
+The actual register map FUNCDATA is relatively small, suggesting that
+functions have relatively few distinct register maps, but change
+between them frequently.
+
+### Alternates considered/attempted
+
+Biasing the register allocator to allocate pointers and scalars from
+different registers to reduce the number of unique maps and possibly
+reduce the number of map changes would seem like an easy improvement.
+However, it had very little effect.
+
+Similarly, adding "slop" to the register maps by allowing the liveness
+of a register to extend between its last use and next clobber slightly
+reduced the number of register map changes, but only slightly.
+
+The one successful alternate tried was to Huffman-code the delta
+stream, which roughly halved the size of the metadata.
+In this scheme, the register maps are encoded in a single bit stream
+per function that alternates between PC delta (as a positive offset
+from the previous PC), and register map delta (as an XOR from the
+previous register map).
+The two deltas are Huffman coded with separate Huffman tables, and the
+Huffman tables are shared across the entire binary.
+It may be even more effective to interleave the stack map changes into
+the same stream, since this would allow the PC deltas to be shared.
+This change was too invasive to implement for Go 1.11, but may be
+worth attempting for Go 1.12.
+
+
+## Other uses of stack/register maps
+
+### Heap dump analysis
+
+Having safe-points everywhere fixes problems with heap dump analysis
+from core files, which currently has to use conservative heuristics to
+identify live pointers in active call frames.
+
+### Call injection
+
+Having safe-points everywhere also allows some function calls to be
+safely injected at runtime.
+This is useful in at least two situations:
+
+1. To handle synchronous signals, such as nil-pointer dereferences,
+   the runtime injects a call to `runtime.sigpanic` at the location of
+   the fault.
+   However, since there isn't usually a call at this location, the
+   stack map may be inaccurate, which leads to complicated
+   interactions between defers, escape analysis, and traceback
+   handling.
+   Having safe-points everywhere could simplify this.
+
+2. Debugger function calls (#21678).
+   Currently it's essentially impossible for a debugger to dynamically
+   invoke a Go function call because of poor interactions with stack
+   scanning and the garbage collector.
+   Having stack and register maps everywhere would make this
+   significantly more viable, since a function call could be injected
+   nearly anywhere without making the stack un-scannable.
+
+
+## Testing
+
+The primary danger of this approach is its potential for a long bug
+tail, since the coverage of safe-points in regular testing will
+decrease substantially.
+In addition to standard testing, I propose checking the generated
+liveness maps using static analysis of binaries.
+This tool would look for pointer dereferences or stores with a write
+barrier to indicate that a value is a pointer and would check the flow
+of that value through all possible paths.
+It would report anywhere a value transitioned from dead/scalar to live
+pointer and anywhere a value was used both like a pointer and like a
+scalar.
+
+In effect, this tool would simulate the program to answer the question
+"for every two points in time A < B, are there allocations reachable
+from the liveness map at time B that were not reachable at time A and
+were not allocated between A and B?"
+
+Most likely this static analysis tool could be written atop the
+existing [golang.org/x/arch](https://godoc.org/golang.org/x/arch)
+packages.
+These are the same packages used by, for example, `go tool objdump`,
+and handle most heavy-lifting of decoding the binary itself.
+
+
+## Other considerations
+
+**Space overhead.** Traditionally, the main concern with having
+safe-points everywhere is the overhead of saving the stack/register
+maps.
+A very preliminary implementation of register maps and safe-points
+everywhere increased binary size by ~10%.
+However, this left several obvious optimizations on the table.
+Work by Stichmoth, et al. [1] further suggests that this overhead
+can be significantly curtailed with simple compression techniques.
+
+**Decoupling stack-move points from GC safe-points.** Because of the
+details of the current implementation, these are (essentially) the
+same.
+By decoupling these and only allowing stack growth and shrinking at
+function entry, stack copying would not need to adjust registers.
+This keeps stack copying simpler.
+It also enables better compiler optimizations and more safe-points
+since the compiler need not mark a register as a live pointer if it
+knows there's another live pointer to the object.
+Likewise, registers that are known to be derived from pointer
+arguments can be marked as scalar as long as those arguments are live.
+Such optimizations are possible because of Go's non-moving collector.
+This also prevents stack moving from observing (and crashing on)
+transient small-valued pointers that the compiler constructs when it
+knows an offset from a potentially-nil pointer will be small.
+
+**Assembly.** By default, the runtime cannot safely preempt assembly
+code since it won't know what registers contain pointers. As a
+follow-on to the work on safe-points everywhere, we should audit
+assembly in the standard library for non-preemptible loops and
+annotate them with register maps. In most cases this should be trivial
+since most assembly never constructs a pointer that isn't shadowed by
+an argument, so it can simply claim there are no pointers in
+registers. We should also document in the Go assembly guide how to do
+this for user code.
+
+
+## Compatibility
+
+This proposal introduces no new APIs, so it is Go 1 compatible.
+
+
+## Implementation
+
+Austin Clements (@aclements) plans to implement register and stack
+maps everywhere for Go 1.11.
+This will enable some low-risk uses in the short term, such as
+debugger function calls.
+
+Debugging and testing of register and stack maps can continue into the
+Go 1.11 freeze, including building the static analysis tool.
+
+Then, for Go 1.12, Austin will implement safe-points everywhere atop
+the register and stacks maps.
+
+
+## References
+
+[1] James M. Stichnoth, Guei-Yuan Lueh, and Michał Cierniak. 1999. Support for garbage collection at every instruction in a Java compiler. In *Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation* (PLDI '99). ACM, New York, NY, USA, 118–127.