blob: 888d29633ed1168c5fd99b681df1e85b90a9c705 [file] [log] [blame] [view]
Austin Clementsfc58df52018-03-26 15:41:40 -04001# Proposal: Non-cooperative goroutine preemption
2
3Author(s): Austin Clements
4
Austin Clements87b05a02019-01-18 14:28:40 -05005Last updated: 2019-01-18
Austin Clementsfc58df52018-03-26 15:41:40 -04006
7Discussion at https://golang.org/issue/24543.
8
9## Abstract
10
11Go currently uses compiler-inserted cooperative preemption points in
12function prologues.
13The majority of the time, this is good enough to allow Go developers
14to ignore preemption and focus on writing clear parallel code, but it
15has sharp edges that we've seen degrade the developer experience time
16and time again.
17When it goes wrong, it goes spectacularly wrong, leading to mysterious
18system-wide latency issues and sometimes complete freezes.
19And because this is a language implementation issue that exists
20outside of Go's language semantics, these failures are surprising and
21very difficult to debug.
22
23@dr2chase has put significant effort into prototyping cooperative
24preemption points in loops, which is one way to solve this problem.
25However, even sophisticated approaches to this led to unacceptable
26slow-downs in tight loops (where slow-downs are generally least
27acceptable).
28
29I propose that the Go implementation switch to non-cooperative
Austin Clements87b05a02019-01-18 14:28:40 -050030preemption, which would allow goroutines to be preempted at
31essentially any point without the need for explicit preemption checks.
32This approach will solve the problem of delayed preemption and do so
33with zero run-time overhead.
34
35Non-cooperative preemption is a general concept with a whole class of
36implementation techniques.
37This document describes and motivates the switch to non-cooperative
38preemption and discusses common concerns of any non-cooperative
39preemption approach in Go.
40Specific implementation approaches are detailed in sub-proposals
41linked from this document.
Austin Clementsfc58df52018-03-26 15:41:40 -040042
43
44## Background
45
46Up to and including Go 1.10, Go has used cooperative preemption with
47safe-points only at function calls (and even then, not if the function
48is small or gets inlined).
Austin Clements5c6560d2019-01-18 14:28:40 -050049This means that Go can only switch between concurrently-executing
50goroutines at specific points.
51The main advantage of this is that the compiler can ensure useful
52invariants at these safe-points.
53In particular, the compiler ensures that all local garbage collection
54roots are known at all safe-points, which is critical to precise
55garbage collection.
56It can also ensure that no registers are live at safe-points, which
57means the Go runtime can switch goroutines without having to save and
58restore a large register set.
59
60However, this can result in infrequent safe-points, which leads to
61many problems:
Austin Clementsfc58df52018-03-26 15:41:40 -040062
631. The most common in production code is that this can delay STW
64 operations, such as starting and ending a GC cycle.
65 This increases STW latency, and on large core counts can
66 significantly impact throughput (if, for example, most threads are
67 stopped while the runtime waits on a straggler for a long time).
Austin Clements5b63da92019-02-25 18:00:12 -050068 ([#17831](https://golang.org/issue/17831),
69 [#19241](https://golang.org/issue/19241))
Austin Clementsfc58df52018-03-26 15:41:40 -040070
712. This can delay scheduling, preventing competing goroutines from
72 executing in a timely manner.
73
743. This can delay stack scanning, which consumes CPU while the runtime
75 waits for a preemption point and can ultimately delay GC
76 termination, resulting in an effective STW where the system runs
77 out of heap and no goroutines can allocate.
78
794. In really extreme cases, it can cause a program to halt, such as
80 when a goroutine spinning on an atomic load starves out the
81 goroutine responsible for setting that atomic.
82 This often indicates bad or buggy code, but is surprising
83 nonetheless and has clearly wasted a lot of developer time on
84 debugging.
Austin Clements5b63da92019-02-25 18:00:12 -050085 ([#543](https://golang.org/issue/543),
86 [#12553](https://golang.org/issue/12553),
87 [#13546](https://golang.org/issue/13546),
88 [#14561](https://golang.org/issue/14561),
89 [#15442](https://golang.org/issue/15442),
90 [#17174](https://golang.org/issue/17174),
91 [#20793](https://golang.org/issue/20793),
92 [#21053](https://golang.org/issue/21053))
Austin Clementsfc58df52018-03-26 15:41:40 -040093
94These problems impede developer productivity and production efficiency
95and expose Go's users to implementation details they shouldn't have to
96worry about.
97
Austin Clements5c6560d2019-01-18 14:28:40 -050098### Cooperative loop preemption
Austin Clementsfc58df52018-03-26 15:41:40 -040099
100@dr2chase put significant effort into trying to solve these problems
Austin Clements5b63da92019-02-25 18:00:12 -0500101using cooperative *loop preemption*
102([#10958](https://golang.org/issue/10958)).
Austin Clementsfc58df52018-03-26 15:41:40 -0400103This is a standard approach for runtimes employing cooperative
104preemption in which the compiler inserts preemption checks and
105safe-points at back-edges in the flow graph.
106This significantly improves the quality of preemption, since code
107almost never executes without a back-edge for any non-trivial amount
108of time.
109
110Our most recent approach to loop preemption, which we call
111*fault-based preemption*, adds a single instruction, no branches, and
112no register pressure to loops on x86 and UNIX platforms ([CL
11343050](https://golang.org/cl/43050)).
114Despite this, the geomean slow-down on a [large suite of
115benchmarks](https://perf.golang.org/search?q=upload%3A20171003.1+%7C+upload-part%3A20171003.1%2F3+vs+upload-part%3A20171003.1%2F1)
116is 7.8%, with a handful of significantly worse outliers.
117Even [compared to Go
1181.9](https://perf.golang.org/search?q=upload%3A20171003.1+%7C+upload-part%3A20171003.1%2F0+vs+upload-part%3A20171003.1%2F1),
119where the slow-down is only 1% thanks to other improvements, most
120benchmarks see some slow-down and there are still significant
121outliers.
122
123Fault-based preemption also has several implementation downsides.
124It can't target specific threads or goroutines, so it's a poor match
125for stack scanning, ragged barriers, or regular scheduler preemption.
126It's also "sticky", in that we can't resume any loops until we resume
127*all* loops, so the safe-point can't simply resume if it occurs in an
128unsafe state (such as when runtime locks are held).
Austin Clements04a0b7d2019-01-18 14:26:00 -0500129It requires more instructions (and more overhead) on non-x86 and
Austin Clementsfc58df52018-03-26 15:41:40 -0400130non-UNIX platforms.
Austin Clements04a0b7d2019-01-18 14:26:00 -0500131Finally, it interferes with debuggers, which assume bad memory
132references are a good reason to stop a program.
133It's not clear it can work at all under many debuggers on OS X due to
134a [kernel bug](https://bugs.llvm.org/show_bug.cgi?id=22868).
Austin Clementsfc58df52018-03-26 15:41:40 -0400135
136
Austin Clements5c6560d2019-01-18 14:28:40 -0500137## Non-cooperative preemption
138
139*Non-cooperative preemption* switches between concurrent execution
140contexts without explicit preemption checks or assistance from those
141contexts.
142This is used by all modern desktop and server operating systems to
143switch between threads.
144Without this, a single poorly-behaved application could wedge the
145entire system, much like how a single poorly-behaved goroutine can
146currently wedge a Go application.
147It is also a convenient abstraction: it lets us program as if there
148are an infinite number of CPUs available, hiding the fact that the OS
149is time-multiplexing a finite number of CPUs.
150
151Operating system schedulers use hardware interrupt support to switch a
152running thread into the OS scheduler, which can save that thread's
153state such as its CPU registers so that it can be resumed later.
154In Go, we would use operating system support to do the same thing.
155On UNIX-like operating systems, this can be done using signals.
156
157However, because of the garbage collector, Go has requirements that an
158operating system does not: Go must be able to find the live pointers
159on a goroutine's stack wherever it stops it.
160Most of the complexity of non-cooperative preemption in Go derives
161from this requirement.
162
163
Austin Clementsfc58df52018-03-26 15:41:40 -0400164## Proposal
165
Austin Clements29ff0072019-01-21 18:05:30 -0500166I propose that Go implement non-cooperative goroutine preemption by
167sending a POSIX signal (or using an equivalent OS mechanism) to stop a
168running goroutine and capture its CPU state.
169If a goroutine is interrupted at a point that must be GC atomic, as
170detailed in the ["Handling unsafe-points"](#handling-unsafe-points)
171section, the runtime can simply resume the goroutine and try again
172later.
173
174The key difficulty of implementing non-cooperative preemption for Go
175is finding live pointers in the stack of a preempted goroutine.
Austin Clements87b05a02019-01-18 14:28:40 -0500176There are many possible ways to do this, which are detailed in these
177sub-proposals:
Austin Clementsfc58df52018-03-26 15:41:40 -0400178
Austin Clements87b05a02019-01-18 14:28:40 -0500179* The [safe-points everywhere
180 proposal](24543/safe-points-everywhere.md) describes an
181 implementation where the compiler records stack and register maps
182 for nearly every instruction.
183 This allows the runtime to halt a goroutine anywhere and find its GC
184 roots.
Austin Clementsfc58df52018-03-26 15:41:40 -0400185
Austin Clements29ff0072019-01-21 18:05:30 -0500186* The [conservative inner-frame scanning
187 proposal](24543/conservative-inner-frame.md) describes an
188 implementation that uses conservative GC techniques to find pointers
189 in the inner-most stack frame of a preempted goroutine.
190 This can be done without any extra safe-point metadata.
Austin Clementsfc58df52018-03-26 15:41:40 -0400191
192
193## Handling unsafe-points
194
Austin Clements87b05a02019-01-18 14:28:40 -0500195Any non-cooperative preemption approach in Go must deal with code
196sequences that have to be atomic with respect to the garbage
197collector.
198We call these "unsafe-points", in contrast with GC safe-points.
Austin Clementsfc58df52018-03-26 15:41:40 -0400199A few known situations are:
200
2011. Expressions involving `unsafe.Pointer` may temporarily represent
202 the only pointer to an object as a `uintptr`.
203 Hence, there must be no safe-points while a `uintptr` derived from
204 an `unsafe.Pointer` is live.
205 Likewise, we must recognize `reflect.Value.Pointer`,
206 `reflect.Value.UnsafeAddr`, and `reflect.Value.InterfaceData` as
207 `unsafe.Pointer`-to-`uintptr` conversions.
208 Alternatively, if the compiler can reliably detect such `uintptr`s,
209 it could mark this as pointers, but there's a danger that an
210 intermediate value may not represent a legal pointer value.
211
2122. In the write barrier there must not be a safe-point between the
213 write-barrier-enabled check and a direct write.
214 For example, suppose the goroutine is writing a pointer to B into
215 object A.
216 If the check happens, then GC starts and scans A, then the
217 goroutine writes B into A and drops all references to B from its
218 stack, the garbage collector could fail to mark B.
219
2203. There are places where the compiler generates temporary pointers
221 that can be past the end of allocations, such as in range loops
222 over slices and arrays.
223 These would either have to be avoided or safe-points would have to
224 be disallowed while these are live.
225
226All of these cases must already avoid significant reordering to avoid
227being split across a call.
228Internally, this is achieved via the "mem" pseudo-value, which must be
229sequentially threaded through all SSA values that manipulate memory.
230Mem is also threaded through values that must not be reordered, even
231if they don't touch memory.
232For example, conversion between `unsafe.Pointer` and `uintptr` is done
233with a special "Convert" operation that takes a mem solely to
234constrain reordering.
235
236There are several possible solutions to these problem, some of which
237can be combined:
238
2391. We could mark basic blocks that shouldn't contain preemption
240 points.
241 For `unsafe.Pointer` conversions, we would opt-out the basic block
242 containing the conversion.
243 For code adhering to the `unsafe.Pointer` rules, this should be
244 sufficient, but it may break code that is incorrect but happens to
245 work today in ways that are very difficult to debug.
246 For write barriers this is also sufficient.
247 For loops, this is overly broad and would require splitting some
248 basic blocks.
249
2502. For `unsafe.Pointer` conversions, we could simply opt-out entire
251 functions that convert from `unsafe.Pointer` to `uintptr`.
252 This would be easy to implement, and would keep even broken unsafe
253 code working as well as it does today, but may have broad impact,
254 especially in the presence of inlining.
255
2563. A simple combination of 1 and 2 would be to opt-out any basic block
257 that is *reachable* from an `unsafe.Pointer` to `uintptr`
258 conversion, up to a function call (which is a safe-point today).
259
2604. For range loops, the compiler could compile them differently such
261 that it never constructs an out-of-bounds pointer (see below).
262
2635. A far more precise and general approach (thanks to @cherrymui)
264 would be to create new SSA operations that "taint" and "untaint"
265 memory.
266 The taint operation would take a mem and return a new tainted mem.
267 This taint would flow to any values that themselves took a tainted
268 value.
269 The untaint operation would take a value and a mem and return an
270 untainted value and an untainted mem.
271 During liveness analysis, safe-points would be disallowed wherever
272 a tainted value was live.
273 This is probably the most precise solution, and is likely to keep
274 even incorrect uses of unsafe working, but requires a complex
275 implementation.
276
277More broadly, it's worth considering making the compiler check
278`unsafe.Pointer`-using code and actively reject code that doesn't
279follow the allowed patterns.
280This could be implemented as a simple type system that distinguishes
281pointer-ish `uintptr` from numeric `uintptr`.
282But this is out of scope for this proposal.
283
284### Range loops
285
Austin Clements87b05a02019-01-18 14:28:40 -0500286As of Go 1.10, range loops are compiled roughly like:
Austin Clementsfc58df52018-03-26 15:41:40 -0400287
288```go
289for i, x := range s { b }
290 ⇓
291for i, _n, _p := 0, len(s), &s[0]; i < _n; i, _p = i+1, _p + unsafe.Sizeof(s[0]) { b }
292 ⇓
293i, _n, _p := 0, len(s), &s[0]
294goto cond
295body:
296{ b }
297i, _p = i+1, _p + unsafe.Sizeof(s[0])
298cond:
299if i < _n { goto body } else { goto end }
300end:
301```
302
303The problem with this lowering is that `_p` may temporarily point past
304the end of the allocation the moment before the loop terminates.
305Currently this is safe because there's never a safe-point while this
306value of `_p` is live.
307
Austin Clements87b05a02019-01-18 14:28:40 -0500308This lowering requires that the compiler mark the increment and
309condition blocks as unsafe-points.
Austin Clementsfc58df52018-03-26 15:41:40 -0400310However, if the body is short, this could result in infrequent
311safe-points.
Austin Clements87b05a02019-01-18 14:28:40 -0500312It also requires creating a separate block for the increment, which is
313currently usually appended to the end of the body.
Austin Clementsfc58df52018-03-26 15:41:40 -0400314Separating these blocks would inhibit reordering opportunities.
315
Austin Clements87b05a02019-01-18 14:28:40 -0500316In preparation for non-cooperative preemption, Go 1.11 began compiling
317range loops as follows to avoid ever creating a past-the-end pointer:
Austin Clementsfc58df52018-03-26 15:41:40 -0400318
319```go
320i, _n, _p := 0, len(s), &s[0]
mewmew98306062018-03-29 21:29:35 +0000321if i >= _n { goto end } else { goto body }
Austin Clementsfc58df52018-03-26 15:41:40 -0400322top:
323_p += unsafe.Sizeof(s[0])
324body:
325{ b }
326i++
mewmew98306062018-03-29 21:29:35 +0000327if i >= _n { goto end } else { goto top }
Austin Clementsfc58df52018-03-26 15:41:40 -0400328end:
329```
330
Austin Clements87b05a02019-01-18 14:28:40 -0500331This allows safe-points everywhere in the loop.
332Compared to the original loop compilation, it generates slightly more
Austin Clementsfc58df52018-03-26 15:41:40 -0400333code, but executes the same number of conditional branch instructions
334(n+1) and results in the same number of SSA basic blocks (3).
335
336This lowering does complicate bounds-check elimination.
Austin Clements87b05a02019-01-18 14:28:40 -0500337In Go 1.10, bounds-check elimination knew that `i < _n` in the body
Austin Clementsfc58df52018-03-26 15:41:40 -0400338because the body block is dominated by the cond block.
Austin Clements87b05a02019-01-18 14:28:40 -0500339However, in the new lowering, deriving this fact required detecting
Austin Clementsfc58df52018-03-26 15:41:40 -0400340that `i < _n` on *both* paths into body and hence is true in body.
341
342### Runtime safe-points
343
344Beyond generated code, the runtime in general is not written to be
345arbitrarily preemptible and there are many places that must not be
346preempted.
347Hence, we would likely disable safe-points by default in the runtime,
348except at calls (where they occur now).
349
350While this would have little downside for most of the runtime, there
351are some parts of the runtime that could benefit substantially from
352non-cooperative preemption, such as memory functions like `memmove`.
353Non-cooperative preemption is an excellent way to make these
354preemptible without slowing down the common case, since we would only
355need to mark their register maps (which would often be empty for
356functions like `memmove` since all pointers would already be protected
357by arguments).
358
359Over time we may opt-in more of the runtime.
360
Austin Clementsb7216a92018-07-09 17:16:48 -0400361### Unsafe standard library code
362
363The Windows syscall package contains many `unsafe.Pointer` conversions
364that don't follow the `unsafe.Pointer` rules.
365It broadly makes shaky assumptions about safe-point behavior,
366liveness, and when stack movement can happen.
367It would likely need a thorough auditing, or would need to be opted
368out like the runtime.
369
370Perhaps more troubling is that some of the Windows syscall package
371types have uintptr fields that are actually pointers, hence forcing
372callers to perform unsafe pointer conversions.
Austin Clements5b63da92019-02-25 18:00:12 -0500373For example, see issue [#21376](https://golang.org/issue/21376).
Austin Clementsb7216a92018-07-09 17:16:48 -0400374
375### Ensuring progress with unsafe-points
376
Austin Clements87b05a02019-01-18 14:28:40 -0500377We propose simply giving up and retrying later when a goroutine is
378interrupted at an unsafe-point.
Austin Clementsb7216a92018-07-09 17:16:48 -0400379One danger of this is that safe points may be rare in tight loops.
380However, in many cases, there are more sophisticated alternatives to
381this approach.
382
383For interruptions in the runtime or in functions without any safe
384points (such as assembly), the signal handler could unwind the stack
385and insert a return trampoline at the next return to a function with
386safe point metadata.
387The runtime could then let the goroutine continue running and the
388trampoline would pause it as soon as possible.
389
390For write barriers and `unsafe.Pointer` sequences, the compiler could
391insert a cheap, explicit preemption check at the end of the sequence.
392For example, the runtime could modify some register that would be
393checked at the end of the sequence and let the thread continue
394executing.
395In the write barrier sequence, this could even be the register that
396the write barrier flag was loaded into, and the compiler could insert
397a simple register test and conditional branch at the end of the
398sequence.
399To even further shrink the sequence, the runtime could put the address
400of the stop function in this register so the stop sequence would be
401just a register call and a jump.
402
403Alternatives to this check include forward and reverse simulation.
404Forward simulation is tricky because the compiler must be careful to
405only generate operations the runtime knows how to simulate.
406Reverse simulation is easy *if* the compiler can always generate a
407restartable sequence (simply move the PC back to the write barrier
408flag check), but quickly becomes complicated if there are multiple
409writes in the sequence or more complex writes such as DUFFCOPY.
410
Austin Clementsfc58df52018-03-26 15:41:40 -0400411
Austin Clementsfc58df52018-03-26 15:41:40 -0400412## Other considerations
413
Austin Clements87b05a02019-01-18 14:28:40 -0500414All of the proposed approaches to non-cooperative preemption involve
415stopping a running goroutine by sending its thread an OS signal.
416This section discusses general consequences of this.
Austin Clementsfc58df52018-03-26 15:41:40 -0400417
418**Windows support.** Unlike fault-based loop preemption, signaled
419preemption is quite easy to support in Windows because it provides
420`SuspendThread` and `GetThreadContext`, which make it trivial to get a
421thread's register set.
422
Austin Clements04a0b7d2019-01-18 14:26:00 -0500423**Choosing a signal.** We have to choose a signal that is unlikely to
424interfere with existing uses of signals or with debuggers.
425There are no perfect choices, but there are some heuristics.
4261) It should be a signal that's passed-through by debuggers by
427default.
428On Linux, this is SIGALRM, SIGURG, SIGCHLD, SIGIO, SIGVTALRM, SIGPROF,
429and SIGWINCH, plus some glibc-internal signals.
4302) It shouldn't be used internally by libc in mixed Go/C binaries
431because libc may assume it's the only thing that can handle these
432signals.
433For example SIGCANCEL or SIGSETXID.
4343) It should be a signal that can happen spuriously without
435consequences.
436For example, SIGALRM is a bad choice because the signal handler can't
437tell if it was caused by the real process alarm or not (arguably this
438means the signal is broken, but I digress).
439SIGUSR1 and SIGUSR2 are also bad because those are often used in
440meaningful ways by applications.
4414) We need to deal with platforms without real-time signals (like
442macOS), so those are out.
443
444We use SIGURG because it meets all of these criteria, is extremely
445unlikely to be used by an application for its "real" meaning (both
446because out-of-band data is basically unused and because SIGURG
447doesn't report which socket has the condition, making it pretty
448useless), and even if it is, the application has to be ready for
449spurious SIGURG. SIGIO wouldn't be a bad choice either, but is more
450likely to be used for real.
Austin Clementsfc58df52018-03-26 15:41:40 -0400451
452**Scheduler preemption.** This mechanism is well-suited to temporary
453preemptions where the same goroutine will resume after the preemption
454because we don't need to save the full register state and can rely on
455the existing signal return path to restore the full register state.
456This applies to all GC-related preemptions, but it's not as well
457suited to permanent preemption performed by the scheduler.
458However, we could still build on this mechanism.
459For example, since most of the time goroutines self-preempt, we only
460need to save the full signal state in the uncommon case, so the `g`
461could contain a pointer to its full saved state that's only used after
462a forced preemption.
463Restoring the full signal state could be done by either writing the
464architecture-dependent code to restore the full register set (a
465beefed-up `runtime.gogo`), or by self-signaling, swapping in the
466desired context, and letting the OS restore the full register set.
467
468**Targeting and resuming.** In contrast with fault-based loop
469preemption, signaled preemption can be targeted at a specific thread
470and can immediately resume.
471Thread-targeting is a little different from cooperative preemption,
472which is goroutine-targeted.
473However, in many cases this is actually better, since targeting
474goroutines for preemption is racy and hence requires retry loops that
475can add significantly to STW time.
476Taking advantage of this for stack scanning will require some
477restructuring of how we track GC roots, but the result should
478eliminate the blocking retry loop we currently use.
479
480**Non-pointer pointers.** This has the potential to expose incorrect
481uses of `unsafe.Pointer` for transiently storing non-pointers.
482Such uses are a clear violation of the `unsafe.Pointer` rules, but
483they may happen (especially in, for example, cgo-using code).
484
Austin Clementsfc58df52018-03-26 15:41:40 -0400485
486## Alternatives
487
Austin Clementsb7216a92018-07-09 17:16:48 -0400488### Single-stepping
489
Austin Clements87b05a02019-01-18 14:28:40 -0500490Rather than making an effort to be able to stop at any instruction,
491the compiler could emit metadata for safe-points only at back-edges
492and the runtime could use hardware single-stepping support to advance
493the thread to a safe-point (or a point where the compiler has provided
494a branch to reach a safe-point, like in the current loop preemption
495approach).
Austin Clementsfc58df52018-03-26 15:41:40 -0400496This works (somewhat surprisingly), but thoroughly bamboozles
497debuggers since both the debugger and the operating system assume the
498debugger owns single-stepping, not the process itself.
499This would also require the compiler to provide register flushing
500stubs for these safe-points, which increases code size (and hence
Austin Clements5c6560d2019-01-18 14:28:40 -0500501instruction cache pressure) as well as stack size, much like
502cooperative loop preemption.
503However, unlike cooperative loop preemption, this approach would have
504no effect on mainline code size or performance.
Austin Clementsfc58df52018-03-26 15:41:40 -0400505
Austin Clementsed886f42019-01-18 14:02:31 -0500506### Jump rewriting
507
508We can solve the problems of single-stepping by instead rewriting the
509next safe-point jump instruction after the interruption point to jump
510to a preemption path and resuming execution like usual.
511To make this easy, the compiler could leave enough room (via padding
512NOPs) so only the jump target needs to be modified.
513
514This approach has the usual drawbacks of modifiable code.
515It's a security risk, it breaks text page sharing, and simply isn't
516allowed on iOS.
517It also can't target an individual goroutine (since another goroutine
518could be executing the same code) and may have odd interactions with
519concurrent execution on other cores.
520
521### Out-of-line execution
522
523A further alternative in the same vein, but that doesn't require
524modifying existing text is out-of-line execution.
525In this approach, the signal handler relocates the instruction stream
526from the interruption point to the next safe-point jump into a
527temporary buffer, patches it to jump into the runtime at the end, and
528resumes execution in this relocated sequence.
529
530This solves most of the problems with single-stepping and jump
531rewriting, but is quite complex to implement and requires substantial
532implementation effort for each platform.
533It also isn't allowed on iOS.
534
535There is precedent for this sort of approach.
536For example, when Linux uprobes injects an INT3, it relocates the
537overwritten instructions into an "execute out-of-line" area to avoid
538the usual problems with resuming from an INT3 instruction.
539[The
540implementation](https://github.com/torvalds/linux/blob/v4.18/arch/x86/kernel/uprobes.c)
541is surprisingly simple given the complexity of the x86 instruction
542encoding, but is still quite complex.