blob: 6bb4cef055dd565c824a93ad5e16fbf0d074d086 [file] [log] [blame] [edit]
// Copyright 2023 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
/*
Package inline implements inlining of Go function calls.
The client provides information about the caller and callee,
including the source text, syntax tree, and type information, and
the inliner returns the modified source file for the caller, or an
error if the inlining operation is invalid (for example because the
function body refers to names that are inaccessible to the caller).
Although this interface demands more information from the client
than might seem necessary, it enables smoother integration with
existing batch and interactive tools that have their own ways of
managing the processes of reading, parsing, and type-checking
packages. In particular, this package does not assume that the
caller and callee belong to the same token.FileSet or
types.Importer realms.
There are many aspects to a function call. It is the only construct
that can simultaneously bind multiple variables of different
explicit types, with implicit assignment conversions. (Neither var
nor := declarations can do that.) It defines the scope of control
labels, of return statements, and of defer statements. Arguments
and results of function calls may be tuples even though tuples are
not first-class values in Go, and a tuple-valued call expression
may be "spread" across the argument list of a call or the operands
of a return statement. All these unique features mean that in the
general case, not everything that can be expressed by a function
call can be expressed without one.
So, in general, inlining consists of modifying a function or method
call expression f(a1, ..., an) so that the name of the function f
is replaced ("literalized") by a literal copy of the function
declaration, with free identifiers suitably modified to use the
locally appropriate identifiers or perhaps constant argument
values.
Inlining must not change the semantics of the call. Semantics
preservation is crucial for clients such as codebase maintenance
tools that automatically inline all calls to designated functions
on a large scale. Such tools must not introduce subtle behavior
changes. (Fully inlining a call is dynamically observable using
reflection over the call stack, but this exception to the rule is
explicitly allowed.)
In many cases it is possible to entirely replace ("reduce") the
call by a copy of the function's body in which parameters have been
replaced by arguments. The inliner supports a number of reduction
strategies, and we expect this set to grow. Nonetheless, sound
reduction is surprisingly tricky.
The inliner is in some ways like an optimizing compiler. A compiler
is considered correct if it doesn't change the meaning of the
program in translation from source language to target language. An
optimizing compiler exploits the particulars of the input to
generate better code, where "better" usually means more efficient.
When a case is found in which it emits suboptimal code, the
compiler is improved to recognize more cases, or more rules, and
more exceptions to rules; this process has no end. Inlining is
similar except that "better" code means tidier code. The baseline
translation (literalization) is correct, but there are endless
rules--and exceptions to rules--by which the output can be
improved.
The following section lists some of the challenges, and ways in
which they can be addressed.
- All effects of the call argument expressions must be preserved,
both in their number (they must not be eliminated or repeated),
and in their order (both with respect to other arguments, and any
effects in the callee function).
This must be the case even if the corresponding parameters are
never referenced, are referenced multiple times, referenced in
a different order from the arguments, or referenced within a
nested function that may be executed an arbitrary number of
times.
Currently, parameter replacement is not applied to arguments
with effects, but with further analysis of the sequence of
strict effects within the callee we could relax this constraint.
- When not all parameters can be substituted by their arguments
(e.g. due to possible effects), if the call appears in a
statement context, the inliner may introduce a var declaration
that declares the parameter variables (with the correct types)
and assigns them to their corresponding argument values.
The rest of the function body may then follow.
For example, the call
f(1, 2)
to the function
func f(x, y int32) { stmts }
may be reduced to
{ var x, y int32 = 1, 2; stmts }.
There are many reasons why this is not always possible. For
example, true parameters are statically resolved in the same
scope, and are dynamically assigned their arguments in
parallel; but each spec in a var declaration is statically
resolved in sequence and dynamically executed in sequence, so
earlier parameters may shadow references in later ones.
- Even an argument expression as simple as ptr.x may not be
referentially transparent, because another argument may have the
effect of changing the value of ptr.
This constraint could be relaxed by some kind of alias or
escape analysis that proves that ptr cannot be mutated during
the call.
- Although constants are referentially transparent, as a matter of
style we do not wish to duplicate literals that are referenced
multiple times in the body because this undoes proper factoring.
Also, string literals may be arbitrarily large.
- If the function body consists of statements other than just
"return expr", in some contexts it may be syntactically
impossible to reduce the call. Consider:
if x := f(); cond { ... }
Go has no equivalent to Lisp's progn or Rust's blocks,
nor ML's let expressions (let param = arg in body);
its closest equivalent is func(param){body}(arg).
Reduction strategies must therefore consider the syntactic
context of the call.
In such situations we could work harder to extract a statement
context for the call, by transforming it to:
{ x := f(); if cond { ... } }
- Similarly, without the equivalent of Rust-style blocks and
first-class tuples, there is no general way to reduce a call
to a function such as
func(params)(args)(results) { stmts; return expr }
to an expression such as
{ var params = args; stmts; expr }
or even a statement such as
results = { var params = args; stmts; expr }
Consequently the declaration and scope of the result variables,
and the assignment and control-flow implications of the return
statement, must be dealt with by cases.
- A standalone call statement that calls a function whose body is
"return expr" cannot be simply replaced by the body expression
if it is not itself a call or channel receive expression; it is
necessary to explicitly discard the result using "_ = expr".
Similarly, if the body is a call expression, only calls to some
built-in functions with no result (such as copy or panic) are
permitted as statements, whereas others (such as append) return
a result that must be used, even if just by discarding.
- If a parameter or result variable is updated by an assignment
within the function body, it cannot always be safely replaced
by a variable in the caller. For example, given
func f(a int) int { a++; return a }
The call y = f(x) cannot be replaced by { x++; y = x } because
this would change the value of the caller's variable x.
Only if the caller is finished with x is this safe.
A similar argument applies to parameter or result variables
that escape: by eliminating a variable, inlining would change
the identity of the variable that escapes.
- If the function body uses 'defer' and the inlined call is not a
tail-call, inlining may delay the deferred effects.
- Because the scope of a control label is the entire function, a
call cannot be reduced if the caller and callee have intersecting
sets of control labels. (It is possible to α-rename any
conflicting ones, but our colleagues building C++ refactoring
tools report that, when tools must choose new identifiers, they
generally do a poor job.)
- Given
func f() uint8 { return 0 }
var x any = f()
reducing the call to var x any = 0 is unsound because it
discards the implicit conversion to uint8. We may need to make
each argument-to-parameter conversion explicit if the types
differ. Assignments to variadic parameters may need to
explicitly construct a slice.
An analogous problem applies to the implicit assignments in
return statements:
func g() any { return f() }
Replacing the call f() with 0 would silently lose a
conversion to uint8 and change the behavior of the program.
- When inlining a call f(1, x, g()) where those parameters are
unreferenced, we should be able to avoid evaluating 1 and x
since they are pure and thus have no effect. But x may be the
last reference to a local variable in the caller, so removing
it would cause a compilation error. Parameter substitution must
avoid making the caller's local variables unreferenced (or must
be prepared to eliminate the declaration too---this is where an
iterative framework for simplification would really help).
- An expression such as s[i] may be valid if s and i are
variables but invalid if either or both of them are constants.
For example, a negative constant index s[-1] is always out of
bounds, and even a non-negative constant index may be out of
bounds depending on the particular string constant (e.g.
"abc"[4]).
So, if a parameter participates in any expression that is
subject to additional compile-time checks when its operands are
constant, it may be unsafe to substitute that parameter by a
constant argument value (#62664).
More complex callee functions are inlinable with more elaborate and
invasive changes to the statements surrounding the call expression.
TODO(adonovan): future work:
- Handle more of the above special cases by careful analysis,
thoughtful factoring of the large design space, and thorough
test coverage.
- Compute precisely (not conservatively) when parameter
substitution would remove the last reference to a caller local
variable, and blank out the local instead of retreating from
the substitution.
- Afford the client more control such as a limit on the total
increase in line count, or a refusal to inline using the
general approach (replacing name by function literal). This
could be achieved by returning metadata alongside the result
and having the client conditionally discard the change.
- Support inlining of generic functions, replacing type parameters
by their instantiations.
- Support inlining of calls to function literals ("closures").
But note that the existing algorithm makes widespread assumptions
that the callee is a package-level function or method.
- Eliminate explicit conversions of "untyped" literals inserted
conservatively when they are redundant. For example, the
conversion int32(1) is redundant when this value is used only as a
slice index; but it may be crucial if it is used in x := int32(1)
as it changes the type of x, which may have further implications.
The conversions may also be important to the falcon analysis.
- Allow non-'go' build systems such as Bazel/Blaze a chance to
decide whether an import is accessible using logic other than
"/internal/" path segments. This could be achieved by returning
the list of added import paths instead of a text diff.
- Inlining a function from another module may change the
effective version of the Go language spec that governs it. We
should probably make the client responsible for rejecting
attempts to inline from newer callees to older callers, since
there's no way for this package to access module versions.
- Use an alternative implementation of the import-organizing
operation that doesn't require operating on a complete file
(and reformatting). Then return the results in a higher-level
form as a set of import additions and deletions plus a single
diff that encloses the call expression. This interface could
perhaps be implemented atop imports.Process by post-processing
its result to obtain the abstract import changes and discarding
its formatted output.
*/
package inline