Blame - design/24543-non-cooperative-preemption.md - proposal

blob: 888d29633ed1168c5fd99b681df1e85b90a9c705 [file] [log] [blame] [view]

Austin Clements	fc58df5	2018-03-26 15:41:40 -0400	[diff] [blame]	1	# Proposal: Non-cooperative goroutine preemption
				2
				3	Author(s): Austin Clements
				4
Austin Clements	87b05a0	2019-01-18 14:28:40 -0500	[diff] [blame]	5	Last updated: 2019-01-18
Austin Clements	fc58df5	2018-03-26 15:41:40 -0400	[diff] [blame]	6
				7	Discussion at https://golang.org/issue/24543.
				8
				9	## Abstract
				10
				11	Go currently uses compiler-inserted cooperative preemption points in
				12	function prologues.
				13	The majority of the time, this is good enough to allow Go developers
				14	to ignore preemption and focus on writing clear parallel code, but it
				15	has sharp edges that we've seen degrade the developer experience time
				16	and time again.
				17	When it goes wrong, it goes spectacularly wrong, leading to mysterious
				18	system-wide latency issues and sometimes complete freezes.
				19	And because this is a language implementation issue that exists
				20	outside of Go's language semantics, these failures are surprising and
				21	very difficult to debug.
				22
				23	@dr2chase has put significant effort into prototyping cooperative
				24	preemption points in loops, which is one way to solve this problem.
				25	However, even sophisticated approaches to this led to unacceptable
				26	slow-downs in tight loops (where slow-downs are generally least
				27	acceptable).
				28
				29	I propose that the Go implementation switch to non-cooperative
Austin Clements	87b05a0	2019-01-18 14:28:40 -0500	[diff] [blame]	30	preemption, which would allow goroutines to be preempted at
				31	essentially any point without the need for explicit preemption checks.
				32	This approach will solve the problem of delayed preemption and do so
				33	with zero run-time overhead.
				34
				35	Non-cooperative preemption is a general concept with a whole class of
				36	implementation techniques.
				37	This document describes and motivates the switch to non-cooperative
				38	preemption and discusses common concerns of any non-cooperative
				39	preemption approach in Go.
				40	Specific implementation approaches are detailed in sub-proposals
				41	linked from this document.
Austin Clements	fc58df5	2018-03-26 15:41:40 -0400	[diff] [blame]	42
				43
				44	## Background
				45
				46	Up to and including Go 1.10, Go has used cooperative preemption with
				47	safe-points only at function calls (and even then, not if the function
				48	is small or gets inlined).
Austin Clements	5c6560d	2019-01-18 14:28:40 -0500	[diff] [blame]	49	This means that Go can only switch between concurrently-executing
				50	goroutines at specific points.
				51	The main advantage of this is that the compiler can ensure useful
				52	invariants at these safe-points.
				53	In particular, the compiler ensures that all local garbage collection
				54	roots are known at all safe-points, which is critical to precise
				55	garbage collection.
				56	It can also ensure that no registers are live at safe-points, which
				57	means the Go runtime can switch goroutines without having to save and
				58	restore a large register set.
				59
				60	However, this can result in infrequent safe-points, which leads to
				61	many problems:
Austin Clements	fc58df5	2018-03-26 15:41:40 -0400	[diff] [blame]	62
				63	1. The most common in production code is that this can delay STW
				64	operations, such as starting and ending a GC cycle.
				65	This increases STW latency, and on large core counts can
				66	significantly impact throughput (if, for example, most threads are
				67	stopped while the runtime waits on a straggler for a long time).
Austin Clements	5b63da9	2019-02-25 18:00:12 -0500	[diff] [blame]	68	([#17831](https://golang.org/issue/17831),
				69	[#19241](https://golang.org/issue/19241))
Austin Clements	fc58df5	2018-03-26 15:41:40 -0400	[diff] [blame]	70
				71	2. This can delay scheduling, preventing competing goroutines from
				72	executing in a timely manner.
				73
				74	3. This can delay stack scanning, which consumes CPU while the runtime
				75	waits for a preemption point and can ultimately delay GC
				76	termination, resulting in an effective STW where the system runs
				77	out of heap and no goroutines can allocate.
				78
				79	4. In really extreme cases, it can cause a program to halt, such as
				80	when a goroutine spinning on an atomic load starves out the
				81	goroutine responsible for setting that atomic.
				82	This often indicates bad or buggy code, but is surprising
				83	nonetheless and has clearly wasted a lot of developer time on
				84	debugging.
Austin Clements	5b63da9	2019-02-25 18:00:12 -0500	[diff] [blame]	85	([#543](https://golang.org/issue/543),
				86	[#12553](https://golang.org/issue/12553),
				87	[#13546](https://golang.org/issue/13546),
				88	[#14561](https://golang.org/issue/14561),
				89	[#15442](https://golang.org/issue/15442),
				90	[#17174](https://golang.org/issue/17174),
				91	[#20793](https://golang.org/issue/20793),
				92	[#21053](https://golang.org/issue/21053))
Austin Clements	fc58df5	2018-03-26 15:41:40 -0400	[diff] [blame]	93
				94	These problems impede developer productivity and production efficiency
				95	and expose Go's users to implementation details they shouldn't have to
				96	worry about.
				97
Austin Clements	5c6560d	2019-01-18 14:28:40 -0500	[diff] [blame]	98	### Cooperative loop preemption
Austin Clements	fc58df5	2018-03-26 15:41:40 -0400	[diff] [blame]	99
				100	@dr2chase put significant effort into trying to solve these problems
Austin Clements	5b63da9	2019-02-25 18:00:12 -0500	[diff] [blame]	101	using cooperative loop preemption
				102	([#10958](https://golang.org/issue/10958)).
Austin Clements	fc58df5	2018-03-26 15:41:40 -0400	[diff] [blame]	103	This is a standard approach for runtimes employing cooperative
				104	preemption in which the compiler inserts preemption checks and
				105	safe-points at back-edges in the flow graph.
				106	This significantly improves the quality of preemption, since code
				107	almost never executes without a back-edge for any non-trivial amount
				108	of time.
				109
				110	Our most recent approach to loop preemption, which we call
				111	fault-based preemption, adds a single instruction, no branches, and
				112	no register pressure to loops on x86 and UNIX platforms ([CL
				113	43050](https://golang.org/cl/43050)).
				114	Despite this, the geomean slow-down on a [large suite of
				115	benchmarks](https://perf.golang.org/search?q=upload%3A20171003.1+%7C+upload-part%3A20171003.1%2F3+vs+upload-part%3A20171003.1%2F1)
				116	is 7.8%, with a handful of significantly worse outliers.
				117	Even [compared to Go
				118	1.9](https://perf.golang.org/search?q=upload%3A20171003.1+%7C+upload-part%3A20171003.1%2F0+vs+upload-part%3A20171003.1%2F1),
				119	where the slow-down is only 1% thanks to other improvements, most
				120	benchmarks see some slow-down and there are still significant
				121	outliers.
				122
				123	Fault-based preemption also has several implementation downsides.
				124	It can't target specific threads or goroutines, so it's a poor match
				125	for stack scanning, ragged barriers, or regular scheduler preemption.
				126	It's also "sticky", in that we can't resume any loops until we resume
				127	all loops, so the safe-point can't simply resume if it occurs in an
				128	unsafe state (such as when runtime locks are held).
Austin Clements	04a0b7d	2019-01-18 14:26:00 -0500	[diff] [blame]	129	It requires more instructions (and more overhead) on non-x86 and
Austin Clements	fc58df5	2018-03-26 15:41:40 -0400	[diff] [blame]	130	non-UNIX platforms.
Austin Clements	04a0b7d	2019-01-18 14:26:00 -0500	[diff] [blame]	131	Finally, it interferes with debuggers, which assume bad memory
				132	references are a good reason to stop a program.
				133	It's not clear it can work at all under many debuggers on OS X due to
				134	a [kernel bug](https://bugs.llvm.org/show_bug.cgi?id=22868).
Austin Clements	fc58df5	2018-03-26 15:41:40 -0400	[diff] [blame]	135
				136
Austin Clements	5c6560d	2019-01-18 14:28:40 -0500	[diff] [blame]	137	## Non-cooperative preemption
				138
				139	Non-cooperative preemption switches between concurrent execution
				140	contexts without explicit preemption checks or assistance from those
				141	contexts.
				142	This is used by all modern desktop and server operating systems to
				143	switch between threads.
				144	Without this, a single poorly-behaved application could wedge the
				145	entire system, much like how a single poorly-behaved goroutine can
				146	currently wedge a Go application.
				147	It is also a convenient abstraction: it lets us program as if there
				148	are an infinite number of CPUs available, hiding the fact that the OS
				149	is time-multiplexing a finite number of CPUs.
				150
				151	Operating system schedulers use hardware interrupt support to switch a
				152	running thread into the OS scheduler, which can save that thread's
				153	state such as its CPU registers so that it can be resumed later.
				154	In Go, we would use operating system support to do the same thing.
				155	On UNIX-like operating systems, this can be done using signals.
				156
				157	However, because of the garbage collector, Go has requirements that an
				158	operating system does not: Go must be able to find the live pointers
				159	on a goroutine's stack wherever it stops it.
				160	Most of the complexity of non-cooperative preemption in Go derives
				161	from this requirement.
				162
				163
Austin Clements	fc58df5	2018-03-26 15:41:40 -0400	[diff] [blame]	164	## Proposal
				165
Austin Clements	29ff007	2019-01-21 18:05:30 -0500	[diff] [blame]	166	I propose that Go implement non-cooperative goroutine preemption by
				167	sending a POSIX signal (or using an equivalent OS mechanism) to stop a
				168	running goroutine and capture its CPU state.
				169	If a goroutine is interrupted at a point that must be GC atomic, as
				170	detailed in the ["Handling unsafe-points"](#handling-unsafe-points)
				171	section, the runtime can simply resume the goroutine and try again
				172	later.
				173
				174	The key difficulty of implementing non-cooperative preemption for Go
				175	is finding live pointers in the stack of a preempted goroutine.
Austin Clements	87b05a0	2019-01-18 14:28:40 -0500	[diff] [blame]	176	There are many possible ways to do this, which are detailed in these
				177	sub-proposals:
Austin Clements	fc58df5	2018-03-26 15:41:40 -0400	[diff] [blame]	178
Austin Clements	87b05a0	2019-01-18 14:28:40 -0500	[diff] [blame]	179	* The [safe-points everywhere
				180	proposal](24543/safe-points-everywhere.md) describes an
				181	implementation where the compiler records stack and register maps
				182	for nearly every instruction.
				183	This allows the runtime to halt a goroutine anywhere and find its GC
				184	roots.
Austin Clements	fc58df5	2018-03-26 15:41:40 -0400	[diff] [blame]	185
Austin Clements	29ff007	2019-01-21 18:05:30 -0500	[diff] [blame]	186	* The [conservative inner-frame scanning
				187	proposal](24543/conservative-inner-frame.md) describes an
				188	implementation that uses conservative GC techniques to find pointers
				189	in the inner-most stack frame of a preempted goroutine.
				190	This can be done without any extra safe-point metadata.
Austin Clements	fc58df5	2018-03-26 15:41:40 -0400	[diff] [blame]	191
				192
				193	## Handling unsafe-points
				194
Austin Clements	87b05a0	2019-01-18 14:28:40 -0500	[diff] [blame]	195	Any non-cooperative preemption approach in Go must deal with code
				196	sequences that have to be atomic with respect to the garbage
				197	collector.
				198	We call these "unsafe-points", in contrast with GC safe-points.
Austin Clements	fc58df5	2018-03-26 15:41:40 -0400	[diff] [blame]	199	A few known situations are:
				200
				201	1. Expressions involving `unsafe.Pointer` may temporarily represent
				202	the only pointer to an object as a `uintptr`.
				203	Hence, there must be no safe-points while a `uintptr` derived from
				204	an `unsafe.Pointer` is live.
				205	Likewise, we must recognize `reflect.Value.Pointer`,
				206	`reflect.Value.UnsafeAddr`, and `reflect.Value.InterfaceData` as
				207	`unsafe.Pointer`-to-`uintptr` conversions.
				208	Alternatively, if the compiler can reliably detect such `uintptr`s,
				209	it could mark this as pointers, but there's a danger that an
				210	intermediate value may not represent a legal pointer value.
				211
				212	2. In the write barrier there must not be a safe-point between the
				213	write-barrier-enabled check and a direct write.
				214	For example, suppose the goroutine is writing a pointer to B into
				215	object A.
				216	If the check happens, then GC starts and scans A, then the
				217	goroutine writes B into A and drops all references to B from its
				218	stack, the garbage collector could fail to mark B.
				219
				220	3. There are places where the compiler generates temporary pointers
				221	that can be past the end of allocations, such as in range loops
				222	over slices and arrays.
				223	These would either have to be avoided or safe-points would have to
				224	be disallowed while these are live.
				225
				226	All of these cases must already avoid significant reordering to avoid
				227	being split across a call.
				228	Internally, this is achieved via the "mem" pseudo-value, which must be
				229	sequentially threaded through all SSA values that manipulate memory.
				230	Mem is also threaded through values that must not be reordered, even
				231	if they don't touch memory.
				232	For example, conversion between `unsafe.Pointer` and `uintptr` is done
				233	with a special "Convert" operation that takes a mem solely to
				234	constrain reordering.
				235
				236	There are several possible solutions to these problem, some of which
				237	can be combined:
				238
				239	1. We could mark basic blocks that shouldn't contain preemption
				240	points.
				241	For `unsafe.Pointer` conversions, we would opt-out the basic block
				242	containing the conversion.
				243	For code adhering to the `unsafe.Pointer` rules, this should be
				244	sufficient, but it may break code that is incorrect but happens to
				245	work today in ways that are very difficult to debug.
				246	For write barriers this is also sufficient.
				247	For loops, this is overly broad and would require splitting some
				248	basic blocks.
				249
				250	2. For `unsafe.Pointer` conversions, we could simply opt-out entire
				251	functions that convert from `unsafe.Pointer` to `uintptr`.
				252	This would be easy to implement, and would keep even broken unsafe
				253	code working as well as it does today, but may have broad impact,
				254	especially in the presence of inlining.
				255
				256	3. A simple combination of 1 and 2 would be to opt-out any basic block
				257	that is reachable from an `unsafe.Pointer` to `uintptr`
				258	conversion, up to a function call (which is a safe-point today).
				259
				260	4. For range loops, the compiler could compile them differently such
				261	that it never constructs an out-of-bounds pointer (see below).
				262
				263	5. A far more precise and general approach (thanks to @cherrymui)
				264	would be to create new SSA operations that "taint" and "untaint"
				265	memory.
				266	The taint operation would take a mem and return a new tainted mem.
				267	This taint would flow to any values that themselves took a tainted
				268	value.
				269	The untaint operation would take a value and a mem and return an
				270	untainted value and an untainted mem.
				271	During liveness analysis, safe-points would be disallowed wherever
				272	a tainted value was live.
				273	This is probably the most precise solution, and is likely to keep
				274	even incorrect uses of unsafe working, but requires a complex
				275	implementation.
				276
				277	More broadly, it's worth considering making the compiler check
				278	`unsafe.Pointer`-using code and actively reject code that doesn't
				279	follow the allowed patterns.
				280	This could be implemented as a simple type system that distinguishes
				281	pointer-ish `uintptr` from numeric `uintptr`.
				282	But this is out of scope for this proposal.
				283
				284	### Range loops
				285
Austin Clements	87b05a0	2019-01-18 14:28:40 -0500	[diff] [blame]	286	As of Go 1.10, range loops are compiled roughly like:
Austin Clements	fc58df5	2018-03-26 15:41:40 -0400	[diff] [blame]	287
				288	```go
				289	for i, x := range s { b }
				290	⇓
				291	for i, _n, _p := 0, len(s), &s[0]; i < _n; i, _p = i+1, _p + unsafe.Sizeof(s[0]) { b }
				292	⇓
				293	i, _n, _p := 0, len(s), &s[0]
				294	goto cond
				295	body:
				296	{ b }
				297	i, _p = i+1, _p + unsafe.Sizeof(s[0])
				298	cond:
				299	if i < _n { goto body } else { goto end }
				300	end:
				301	```
				302
				303	The problem with this lowering is that `_p` may temporarily point past
				304	the end of the allocation the moment before the loop terminates.
				305	Currently this is safe because there's never a safe-point while this
				306	value of `_p` is live.
				307
Austin Clements	87b05a0	2019-01-18 14:28:40 -0500	[diff] [blame]	308	This lowering requires that the compiler mark the increment and
				309	condition blocks as unsafe-points.
Austin Clements	fc58df5	2018-03-26 15:41:40 -0400	[diff] [blame]	310	However, if the body is short, this could result in infrequent
				311	safe-points.
Austin Clements	87b05a0	2019-01-18 14:28:40 -0500	[diff] [blame]	312	It also requires creating a separate block for the increment, which is
				313	currently usually appended to the end of the body.
Austin Clements	fc58df5	2018-03-26 15:41:40 -0400	[diff] [blame]	314	Separating these blocks would inhibit reordering opportunities.
				315
Austin Clements	87b05a0	2019-01-18 14:28:40 -0500	[diff] [blame]	316	In preparation for non-cooperative preemption, Go 1.11 began compiling
				317	range loops as follows to avoid ever creating a past-the-end pointer:
Austin Clements	fc58df5	2018-03-26 15:41:40 -0400	[diff] [blame]	318
				319	```go
				320	i, _n, _p := 0, len(s), &s[0]
mewmew	9830606	2018-03-29 21:29:35 +0000	[diff] [blame]	321	if i >= _n { goto end } else { goto body }
Austin Clements	fc58df5	2018-03-26 15:41:40 -0400	[diff] [blame]	322	top:
				323	_p += unsafe.Sizeof(s[0])
				324	body:
				325	{ b }
				326	i++
mewmew	9830606	2018-03-29 21:29:35 +0000	[diff] [blame]	327	if i >= _n { goto end } else { goto top }
Austin Clements	fc58df5	2018-03-26 15:41:40 -0400	[diff] [blame]	328	end:
				329	```
				330
Austin Clements	87b05a0	2019-01-18 14:28:40 -0500	[diff] [blame]	331	This allows safe-points everywhere in the loop.
				332	Compared to the original loop compilation, it generates slightly more
Austin Clements	fc58df5	2018-03-26 15:41:40 -0400	[diff] [blame]	333	code, but executes the same number of conditional branch instructions
				334	(n+1) and results in the same number of SSA basic blocks (3).
				335
				336	This lowering does complicate bounds-check elimination.
Austin Clements	87b05a0	2019-01-18 14:28:40 -0500	[diff] [blame]	337	In Go 1.10, bounds-check elimination knew that `i < _n` in the body
Austin Clements	fc58df5	2018-03-26 15:41:40 -0400	[diff] [blame]	338	because the body block is dominated by the cond block.
Austin Clements	87b05a0	2019-01-18 14:28:40 -0500	[diff] [blame]	339	However, in the new lowering, deriving this fact required detecting
Austin Clements	fc58df5	2018-03-26 15:41:40 -0400	[diff] [blame]	340	that `i < _n` on both paths into body and hence is true in body.
				341
				342	### Runtime safe-points
				343
				344	Beyond generated code, the runtime in general is not written to be
				345	arbitrarily preemptible and there are many places that must not be
				346	preempted.
				347	Hence, we would likely disable safe-points by default in the runtime,
				348	except at calls (where they occur now).
				349
				350	While this would have little downside for most of the runtime, there
				351	are some parts of the runtime that could benefit substantially from
				352	non-cooperative preemption, such as memory functions like `memmove`.
				353	Non-cooperative preemption is an excellent way to make these
				354	preemptible without slowing down the common case, since we would only
				355	need to mark their register maps (which would often be empty for
				356	functions like `memmove` since all pointers would already be protected
				357	by arguments).
				358
				359	Over time we may opt-in more of the runtime.
				360
Austin Clements	b7216a9	2018-07-09 17:16:48 -0400	[diff] [blame]	361	### Unsafe standard library code
				362
				363	The Windows syscall package contains many `unsafe.Pointer` conversions
				364	that don't follow the `unsafe.Pointer` rules.
				365	It broadly makes shaky assumptions about safe-point behavior,
				366	liveness, and when stack movement can happen.
				367	It would likely need a thorough auditing, or would need to be opted
				368	out like the runtime.
				369
				370	Perhaps more troubling is that some of the Windows syscall package
				371	types have uintptr fields that are actually pointers, hence forcing
				372	callers to perform unsafe pointer conversions.
Austin Clements	5b63da9	2019-02-25 18:00:12 -0500	[diff] [blame]	373	For example, see issue [#21376](https://golang.org/issue/21376).
Austin Clements	b7216a9	2018-07-09 17:16:48 -0400	[diff] [blame]	374
				375	### Ensuring progress with unsafe-points
				376
Austin Clements	87b05a0	2019-01-18 14:28:40 -0500	[diff] [blame]	377	We propose simply giving up and retrying later when a goroutine is
				378	interrupted at an unsafe-point.
Austin Clements	b7216a9	2018-07-09 17:16:48 -0400	[diff] [blame]	379	One danger of this is that safe points may be rare in tight loops.
				380	However, in many cases, there are more sophisticated alternatives to
				381	this approach.
				382
				383	For interruptions in the runtime or in functions without any safe
				384	points (such as assembly), the signal handler could unwind the stack
				385	and insert a return trampoline at the next return to a function with
				386	safe point metadata.
				387	The runtime could then let the goroutine continue running and the
				388	trampoline would pause it as soon as possible.
				389
				390	For write barriers and `unsafe.Pointer` sequences, the compiler could
				391	insert a cheap, explicit preemption check at the end of the sequence.
				392	For example, the runtime could modify some register that would be
				393	checked at the end of the sequence and let the thread continue
				394	executing.
				395	In the write barrier sequence, this could even be the register that
				396	the write barrier flag was loaded into, and the compiler could insert
				397	a simple register test and conditional branch at the end of the
				398	sequence.
				399	To even further shrink the sequence, the runtime could put the address
				400	of the stop function in this register so the stop sequence would be
				401	just a register call and a jump.
				402
				403	Alternatives to this check include forward and reverse simulation.
				404	Forward simulation is tricky because the compiler must be careful to
				405	only generate operations the runtime knows how to simulate.
				406	Reverse simulation is easy if the compiler can always generate a
				407	restartable sequence (simply move the PC back to the write barrier
				408	flag check), but quickly becomes complicated if there are multiple
				409	writes in the sequence or more complex writes such as DUFFCOPY.
				410
Austin Clements	fc58df5	2018-03-26 15:41:40 -0400	[diff] [blame]	411
Austin Clements	fc58df5	2018-03-26 15:41:40 -0400	[diff] [blame]	412	## Other considerations
				413
Austin Clements	87b05a0	2019-01-18 14:28:40 -0500	[diff] [blame]	414	All of the proposed approaches to non-cooperative preemption involve
				415	stopping a running goroutine by sending its thread an OS signal.
				416	This section discusses general consequences of this.
Austin Clements	fc58df5	2018-03-26 15:41:40 -0400	[diff] [blame]	417
				418	Windows support. Unlike fault-based loop preemption, signaled
				419	preemption is quite easy to support in Windows because it provides
				420	`SuspendThread` and `GetThreadContext`, which make it trivial to get a
				421	thread's register set.
				422
Austin Clements	04a0b7d	2019-01-18 14:26:00 -0500	[diff] [blame]	423	Choosing a signal. We have to choose a signal that is unlikely to
				424	interfere with existing uses of signals or with debuggers.
				425	There are no perfect choices, but there are some heuristics.
				426	1) It should be a signal that's passed-through by debuggers by
				427	default.
				428	On Linux, this is SIGALRM, SIGURG, SIGCHLD, SIGIO, SIGVTALRM, SIGPROF,
				429	and SIGWINCH, plus some glibc-internal signals.
				430	2) It shouldn't be used internally by libc in mixed Go/C binaries
				431	because libc may assume it's the only thing that can handle these
				432	signals.
				433	For example SIGCANCEL or SIGSETXID.
				434	3) It should be a signal that can happen spuriously without
				435	consequences.
				436	For example, SIGALRM is a bad choice because the signal handler can't
				437	tell if it was caused by the real process alarm or not (arguably this
				438	means the signal is broken, but I digress).
				439	SIGUSR1 and SIGUSR2 are also bad because those are often used in
				440	meaningful ways by applications.
				441	4) We need to deal with platforms without real-time signals (like
				442	macOS), so those are out.
				443
				444	We use SIGURG because it meets all of these criteria, is extremely
				445	unlikely to be used by an application for its "real" meaning (both
				446	because out-of-band data is basically unused and because SIGURG
				447	doesn't report which socket has the condition, making it pretty
				448	useless), and even if it is, the application has to be ready for
				449	spurious SIGURG. SIGIO wouldn't be a bad choice either, but is more
				450	likely to be used for real.
Austin Clements	fc58df5	2018-03-26 15:41:40 -0400	[diff] [blame]	451
				452	Scheduler preemption. This mechanism is well-suited to temporary
				453	preemptions where the same goroutine will resume after the preemption
				454	because we don't need to save the full register state and can rely on
				455	the existing signal return path to restore the full register state.
				456	This applies to all GC-related preemptions, but it's not as well
				457	suited to permanent preemption performed by the scheduler.
				458	However, we could still build on this mechanism.
				459	For example, since most of the time goroutines self-preempt, we only
				460	need to save the full signal state in the uncommon case, so the `g`
				461	could contain a pointer to its full saved state that's only used after
				462	a forced preemption.
				463	Restoring the full signal state could be done by either writing the
				464	architecture-dependent code to restore the full register set (a
				465	beefed-up `runtime.gogo`), or by self-signaling, swapping in the
				466	desired context, and letting the OS restore the full register set.
				467
				468	Targeting and resuming. In contrast with fault-based loop
				469	preemption, signaled preemption can be targeted at a specific thread
				470	and can immediately resume.
				471	Thread-targeting is a little different from cooperative preemption,
				472	which is goroutine-targeted.
				473	However, in many cases this is actually better, since targeting
				474	goroutines for preemption is racy and hence requires retry loops that
				475	can add significantly to STW time.
				476	Taking advantage of this for stack scanning will require some
				477	restructuring of how we track GC roots, but the result should
				478	eliminate the blocking retry loop we currently use.
				479
				480	Non-pointer pointers. This has the potential to expose incorrect
				481	uses of `unsafe.Pointer` for transiently storing non-pointers.
				482	Such uses are a clear violation of the `unsafe.Pointer` rules, but
				483	they may happen (especially in, for example, cgo-using code).
				484
Austin Clements	fc58df5	2018-03-26 15:41:40 -0400	[diff] [blame]	485
				486	## Alternatives
				487
Austin Clements	b7216a9	2018-07-09 17:16:48 -0400	[diff] [blame]	488	### Single-stepping
				489
Austin Clements	87b05a0	2019-01-18 14:28:40 -0500	[diff] [blame]	490	Rather than making an effort to be able to stop at any instruction,
				491	the compiler could emit metadata for safe-points only at back-edges
				492	and the runtime could use hardware single-stepping support to advance
				493	the thread to a safe-point (or a point where the compiler has provided
				494	a branch to reach a safe-point, like in the current loop preemption
				495	approach).
Austin Clements	fc58df5	2018-03-26 15:41:40 -0400	[diff] [blame]	496	This works (somewhat surprisingly), but thoroughly bamboozles
				497	debuggers since both the debugger and the operating system assume the
				498	debugger owns single-stepping, not the process itself.
				499	This would also require the compiler to provide register flushing
				500	stubs for these safe-points, which increases code size (and hence
Austin Clements	5c6560d	2019-01-18 14:28:40 -0500	[diff] [blame]	501	instruction cache pressure) as well as stack size, much like
				502	cooperative loop preemption.
				503	However, unlike cooperative loop preemption, this approach would have
				504	no effect on mainline code size or performance.
Austin Clements	fc58df5	2018-03-26 15:41:40 -0400	[diff] [blame]	505
Austin Clements	ed886f4	2019-01-18 14:02:31 -0500	[diff] [blame]	506	### Jump rewriting
				507
				508	We can solve the problems of single-stepping by instead rewriting the
				509	next safe-point jump instruction after the interruption point to jump
				510	to a preemption path and resuming execution like usual.
				511	To make this easy, the compiler could leave enough room (via padding
				512	NOPs) so only the jump target needs to be modified.
				513
				514	This approach has the usual drawbacks of modifiable code.
				515	It's a security risk, it breaks text page sharing, and simply isn't
				516	allowed on iOS.
				517	It also can't target an individual goroutine (since another goroutine
				518	could be executing the same code) and may have odd interactions with
				519	concurrent execution on other cores.
				520
				521	### Out-of-line execution
				522
				523	A further alternative in the same vein, but that doesn't require
				524	modifying existing text is out-of-line execution.
				525	In this approach, the signal handler relocates the instruction stream
				526	from the interruption point to the next safe-point jump into a
				527	temporary buffer, patches it to jump into the runtime at the end, and
				528	resumes execution in this relocated sequence.
				529
				530	This solves most of the problems with single-stepping and jump
				531	rewriting, but is quite complex to implement and requires substantial
				532	implementation effort for each platform.
				533	It also isn't allowed on iOS.
				534
				535	There is precedent for this sort of approach.
				536	For example, when Linux uprobes injects an INT3, it relocates the
				537	overwritten instructions into an "execute out-of-line" area to avoid
				538	the usual problems with resuming from an INT3 instruction.
				539	[The
				540	implementation](https://github.com/torvalds/linux/blob/v4.18/arch/x86/kernel/uprobes.c)
				541	is surprisingly simple given the complexity of the x86 instruction
				542	encoding, but is still quite complex.