blob: 28728171066d9a3505fc72d02712df47c45ea470 [file] [log] [blame] [view]
# Proposal: Low-cost defers through inline code, and extra funcdata to manage the panic case
Author(s): Dan Scales, Keith Randall, and Austin Clements
(with input from many others, including Russ Cox and Cherry Zhang)
Last updated: 2019-09-23
Discussion at https://golang.org/issue/34481
General defer performance discussion at https://golang.org/issue/14939.
## Abstract
As of Go 1.13, most `defer` operations take about 35ns (reduced from about 50ns
in Go 1.12).
In contrast, a direct call takes about 6ns.
This gap incentivizes engineers to eliminate `defer` operations from hot code
paths, which takes away time from more productive tasks, leads to less
maintainable code (e.g., if a `panic` is later introduced, the "optimization" is
no longer correct), and discourages people from using a language feature when it
would otherwise be an appropriate solution to a problem.
We propose a way to make most `defer` calls no more expensive than open-coding the
call, hence eliminating the incentives to shy away from using this language
feature to its fullest extent.
## Background
Go 1.13 implements the `defer` statement by calling into the runtime to push a
"defer object" onto the defer chain.
Then, in any function that contains `defer` statements, the compiler inserts a
call to `runtime.deferreturn` at every function exit point to unwind that
function's defers.
Both of these cause overhead: the defer object must be populated with function
call information (function pointer, arguments, closure information, etc.) when
it is added to the chain, and `deferreturn` must find the right defers to
unwind, copy out the call information, and invoke the deferred calls.
Furthermore, this inhibits compiler optimizations like inlining, since the defer
functions are called from the runtime using the defer information.
When a function panics, the runtime runs the deferred calls on the defer chain
until one of these calls `recover` or it exhausts the chain, resulting in a
fatal panic.
The stack itself is *not* unwound unless a deferred call recovers.
This has the important property that examining the stack from a deferred call
run during a panic will include the panicking frame, even if the defer was
pushed by an ancestor of the panicking frame.
In general, this defer chain is necessary since a function can defer an
unbounded or dynamic number of calls that must all run when it returns.
For example, a `defer` statement can appear in a loop or an `if` block.
This also means that, in general, the defer objects must be heap-allocated,
though the runtime uses an allocation pool to reduce the cost of the allocation.
This is notably different from exception handling in C++ or Java, where the
applicable set of `except` or `finally` blocks can be determined statically at
every program counter in a function.
In these languages, the non-exception case is typically inlined and exception
handling is driven by a side table giving the locations of the `except` and
`finally` blocks that apply to each PC.
However, while Go's `defer` mechanism permits unbounded calls, the vast majority
of functions that use `defer` invoke each `defer` statement at most once, and do
not invoke `defer` in a loop.
Go 1.13 adds an [optimization](https://golang.org/cl/171758) to stack-allocate
defer objects in this case, but they must still be pushed and popped from the
defer chain.
This applies to 363 out of the 370 static defer sites in the `cmd/go` binary and
speeds up this case by 30% relative to heap-allocated defer objects.
This proposal combines this insight with the insights used in C++ and Java to
make the non-panic case of most `defer` operations no more expensive than the
manually open-coded case, while retaining correct `panic` handling.
## Requirements
While the common case of defer handling is simple enough, it can interact in
often non-obvious ways with things like recursive panics, recover, and stack
traces.
Here we attempt to enumerate the requirements that any new defer implementation
should likely satisfy, in addition to those in the language specification for
[Defer statements](https://golang.org/ref/spec#Defer_statements) and [Handling
panics](https://golang.org/ref/spec#Handling_panics).
1. Executing a `defer` statement logically pushes a deferred function call onto
a per-goroutine stack.
Deferred calls are always executed starting from the top of this stack (hence
in the reverse order of the execution of `defer` statements).
Furthermore, each execution of a `defer` statement corresponds to exactly one
deferred call (except in the case of program termination, where a deferred
function may not be called at all).
2. Defer calls are executed in one of two ways.
Whenever a function call returns normally, the runtime starts popping and
executing all existing defer calls for that stack frame only (in reverse order
of original execution).
Separately, whenever a panic (or a call to Goexit) occurs, the runtime starts
popping and executing all existing defer calls for the entire defer stack.
The execution of any defer call may be interrupted by a panic within the
execution of the defer call.
3. A program may have multiple outstanding panics, since a recursive (second)
panic may occur during any of the defer calls being executed during the
processing of the first panic.
A previous panic is aborted if the processing of defers by the new panic
reaches the frame where the previous panic was processing defers when the new
panic happened.
When a defer call returns that did a successful `recover` that applies to a
panic, the stack is immediately unwound to the frame which contains the defer
that did the recover call, and any remaining defers in that frame are
executed.
Normal execution continues in the preceding frame (but note that normal
execution may actually be continuing a defer call for an outer panic).
Any panic that has not been recovered or aborted must still appear on the
caller stack.
Note that the first panic may never continue its defer processing, if the
second panic actually successfully runs all defer calls, but the original
panic must appear on the stack during all the processing by the second panic.
4. When a defer call is executed because a function is returning normally
(whether there are any outstanding panics or not), the call site of a
deferred call must appear to be the function that invoked `defer` to push
that function on the defer stack, at the line where that function is
returning.
A consequence of this is that, if the runtime is executing deferred calls in
panic mode and a deferred call recovers, it must unwind the stack immediately
after that deferred call returns and before executing another deferred call.
5. When a defer call is executed because of an explicit panic, the call stack of
a deferred function must include `runtime.gopanic` and the frame that
panicked (and its callers) immediately below the deferred function call.
As mentioned, the call stack must also include any outstanding previous
panics.
If a defer call is executed because of a run-time panic, the same condition
applies, except that `runtime.gopanic` does not necessarily need to be on the
stack.
(In the current gc-go implementation, `runtime.gopanic` does appear on
the stack even for run-time panics.)
## Proposal
We propose optimizing deferred calls in functions where every `defer` is
executed at most once (specifically, a `defer` may be on a conditional path, but
is never in a loop in the control-flow graph).
In this optimization, the compiler assigns a bit for every `defer` site to
indicate whether that defer had been reached or not.
The `defer` statement itself simply sets the corresponding bit and stores all
necessary arguments in specific stack slots.
Then, at every exit point of the function, the compiler open-codes each deferred
call, protected by (and clearing) each corresponding bit.
For example, the following:
```go
defer f1(a)
if cond {
defer f2(b)
}
body...
```
would compile to
```go
deferBits |= 1<<0
tmpF1 = f1
tmpA = a
if cond {
deferBits |= 1<<1
tmpF2 = f2
tmpB = b
}
body...
exit:
if deferBits & 1<<1 != 0 {
deferBits &^= 1<<1
tmpF2(tmpB)
}
if deferBits & 1<<0 != 0 {
deferBits &^= 1<<0
tmpF1(tmpA)
}
```
In order to ensure that the value of `deferBits` and all the tmp variables are
available in case of a panic, these variables must be allocated explicit stack
slots and the stores to deferBits and the tmp variables (`tmpF1`, `tmpA`, etc.)
must write the values into these stack slots.
In addition, the updates to `deferBits` in the defer exit code must explicitly
store the `deferBits` value to the corresponding stack slot.
This will ensure that panic processing can determine exactly which defers have
been executed so far.
However, the defer exit code can still be optimized significantly in many cases.
We can refer directly to the `deferBits` and tmpA values (in the SSA sense),
and these accesses can therefore be optimized in terms of using existing values
in registers, propagating constants, etc.
Also, if the defers were called unconditionally, then constant propagation may
in some cases to eliminate the checks on `deferBits` (because the value of
`deferBits` is known statically at the exit point).
If there are multiple exits (returns) from the function, we can either duplicate
the defer exit code at each exit, or we can have one copy of the defer exit code
that is shared among all (or most) of the exits.
Note that any sharing of defer-exit code code may lead to less specific line
numbers (which dont indicate the exact exit location) if the user happens to
look at the call stack while in a call made by the defer exit code.
For any function that contains a defer which could be executed more than once
(e.g. occurs in a loop), we will fall back to the current way of handling
defers.
That is, we will create a defer object at each defer statement and push it on to
the defer chain.
At function exit, we will call deferreturn to execute an active defer objects
for that function.
We may similarly revert to the defer chain implementation if there are too many
defers or too many function exits.
Our goal is to optimize the common cases where current defers overheads show up,
which is typically in small functions with only a small number of defers
statements.
## Panic processing
Because no actual defer records have been created, panic processing is quite
different and somewhat more complex in the open-coded approach.
When generating the code for a function, the compiler also emits an extra set of
`FUNCDATA` information that records information about each of the open-coded
defers.
For each open-coded defer, the compiler emits `FUNCDATA` that specifies the
exact stack locations that store the function pointer and each of the arguments.
It also emits the location of the stack slot containing `deferBits`.
Since stack frames can get arbitrarily large, the compiler uses a varint
encoding for the stack slot offsets.
In addition, for all functions with open-coded defers, the compiler adds a small
segment of code that does a call to `runtime.deferreturn` and then returns.
This code segment is not reachable by the main code of the function, but is used
to unwind the stack properly when a panic is successfully recovered.
To handle a `panic`, the runtime conceptually walks the defer chain in parallel
with the stack in order to interleave execution of pushed defers with defers in
open-coded frames.
When the runtime encounters an open-coded frame `F` executing function `f`, it
executes the following steps.
1. The runtime reads the funcdata for function `f` that contains the open-defer
information.
2. Using the information about the location in frame `F` of the stack slot for
`deferBits`, the runtime loads the current value of `deferBits` for this
frame.
The runtime processes each of the active defers, as specified by the value of
`deferBits`, in reverse order.
3. For each active defer, the runtime loads the function pointer for the defer
call from the appropriate stack slot.
It also builds up an argument frame by copying each of the defer arguments
from its specified stack slot to the appropriate location in the argument
frame.
It then updates `deferBits` in its stack slot after masking off the bit for
the current defer.
Then it uses the function pointer and argument frame to call the deferred
function.
4. If the defer call returns normally without doing a recovery, then the runtime
continues executing active defer calls for frame F until all active defer
calls have finished.
5. If any defer call returns normally but has done a successful recover, then
the runtime stops processing defers in the current frame.
There may or may not be any remaining defers to process.
The runtime then arranges to jump to the `deferreturn` code segment and
unwind the stack to frame `F`, by simultaneously setting the PC to the
address of the `deferreturn` segment and setting the SP to the appropriate
value for frame `F`.
The `deferreturn` code segment then calls back into the runtime.
The runtime can now process any remaining active defers from frame `F`.
But for these defers, the stack has been appropriately unwound and the defer
appears to be called directly from function `f`.
When all defers for the frame have finished, the deferreturn finishes and the
code segment returns from frame F to continue execution.
If a deferred call in step 3 itself panics, the runtime starts its normal panic
processing again.
For any frame with open-coded defers that has already run some defers, the
deferBits value at the specified stack slot will always accurately reflect the
remaining defers that need to be run.
The processing for `runtime.Goexit` is similar.
The main difference is that there is no panic happening, so there is no need to
check for or do special handling for recovery in `runtime.Goexit`.
A panic could happen while running defer functions for `runtime.Goexit`, but
that will be handled in `runtime.gopanic`.
## Rationale
One other approach that we extensively considered (and prototyped) also has
inlined defer code for the normal case, but actual executes the defer exit code
directly even in the panic case.
Executing the defer exit code in the panic case requires duplication of stack
frame F and some complex runtime code to start execution of the defer exit code
using this new duplicated frame and to regain control when the defer exit code
complete.
The required runtime code for this approach is much more architecture-dependent
and seems to be much more complex (and possibly fragile).
## Compatibility
This proposal does not change any user-facing APIs, and hence satisfies the [compatibility
guidelines](https://golang.org/doc/go1compat).
## Implementation
An implementation has been mostly done.
The change is [here](https://go-review.googlesource.com/c/go/+/190098/6).
Comments on the design or implementation are very welcome.
Some miscellaneous implementation details:
1. We need to restrict the number of defers in a function to the size of the
deferBits bitmask.
To minimize code size, we currently make deferBits to be 8 bits, and dont do
open-coded defers if there are more than 8 defers in a function.
If there are more than 8 defers in a function, we revert to the standard
defer chain implementation.
2. The deferBits variable and defer arguments variables (such as `tmpA`) must be
declared (via `OpVarDef`) in the entry block, since the unconditional defer
exit code at the bottom of the function will access them, so these variables
are live throughout the entire function.
(And, of course, they can be accessed by panic processing at any point within
the function that might cause a panic.)
For any defer argument stack slots that are pointers (or contain pointers),
we must initialize those stack slots to zero in the entry block.
The initialization is required for garbage collection, which doesnt know
which of these defer arguments are active (i.e. which of the defer sites have
been reached, but the corresponding defer call has not yet happened)
3. Because the `deferreturn` code segment is disconnected from the rest of the
function, it would not normally indicate that any stack slots are live.
However, we want the liveness information at the `deferreturn` call to
indicate that all of the stack slots associated with defers (which may
include pointers to variables accessed by closures) and all of the return
values are live.
We must explicitly set the liveness for the `deferreturn` call to be the same
as the liveness at the first defer call on the defer exit path.