Draft: additions to go/ast and go/token to support generics

Authors: Rob Findley, Robert Griesemer

Last Updated: 2021-06-16

About this document

This CL holds a working-copy of the to-be-proposed changes to go/ast and go/token for generics.

Once the type set proposal has been resolved and we have sufficient experience with the APIs below, it will be filed as an official proposal. This document might just be embedded in the proposal issue, depending on how long it gets.

Early feedback is welcome, by commenting on the CL 328609

Abstract

This document proposes changes to go/ast to store the additional syntactic information necessary for the type parameters proposal (#43651), including the recent amendment for type sets (#45346). Specifically, it describes both the new fields and types that will hold type parameters and type arguments, and the invariants of this new data.

This document deals only with syntax. There must also be changes to go/types, but that is left for a separate proposal.

While not depending on the exact semantics of type sets, this document assumes that some form of #45346 is accepted for expressing type-restricted elements in interfaces, so that for example the syntax ~a|b|~c is a valid embedded element in an interface expression.

Background

Syntax Changes

See the type parameters proposal for a full discussion of the language changes to support generics, but to summarize the changes in syntax:

Type and function declarations get optional type parameters, as in type List[T any] ... or func f[T1, T2 any]() { ... }. Type parameters are a parameter list.
Generic types may be instantiated with one or more type arguments, to make them non-generic type expressions, as in l := &List[int]{} or type intList List[int]. Type arguments are an expression list.
Generic functions may be instantiated with one or more type arguments when they are called or used as function values, as in g := f[int] or x := f[int](). Function type arguments are an expression list.
Interface types can have new embedded elements that restrict the set of types that may implement them, for example interface { ~int64|~float64 }. Such elements are composite expressions consisting of type expressions, possibly prefixed with a new tilde token ~ and joined by the | token.

The changes to go/ast to support each instance of new syntax are discussed below. Function instantiation is listed separately from type instantiation above, as the options to represent it in the AST could be different; though, as will be seen, this proposal unifies them.

Considerations for API changes

Before detailing the proposed changes to go/ast, it is worth discussing what makes a change to go/ast break compatibility, what impact changes can have on users beyond pure compatibility, and what type of information is available to the parser at the time we choose a representation for syntax.

As described in the go1 compatibility promise, it is not enough for standard library packages to simply make no breaking API changes: valid programs must continue to both compile and run (or put differently: the API of a library is both the structure and runtime behavior of its exported objects).

This is important, because the definition of a ‘valid program’ using go/ast is arguably a gray area. In go/ast, there is no separation between the interface to AST nodes and the data they contain: the node set consists entirely of pointers to structs where every field is exported. Is it a valid use of go/ast to assume that every field is exported (e.g. walk nodes using reflection)? Is it valid to assume that the set of nodes is complete (e.g. by panicking in the default clause of a type switch)? Which fields may be assumed to be non-nil?

Writing down the answers to these questions is perhaps worthy of a separate proposal, but for the purpose of this document, I propose the following heuristic:

A breaking change to go/ast (or go/parser) is any change that modifies (1) the parsed representation of existing, valid Go code, or (2) the per-node invariants that are preserved in the representation of invalid Go code. We consider all documented invariants plus any additional invariants that are assumed in significant amounts of code.

Of these two clauses, (1) is straightforward and hopefully uncontroversial: code that is valid in go1.17 must parse to an equivalent AST in go1.18. (2) is more subtle: there is no guarantee that the parsing of invalid code will not change. After all, generic code is invalid in go1.17. Rather, the only guarantee is that if a property of existing fields holds for a node type N in all representations of code, invalid or invalid, it should continue to hold. For example, ast.Walk assumes that ast.IndexExpr.Index is never nil. This must be preserved if we use IndexExpr to represent type instantiation, even for invalid instantiation expressions such as var l List[].

The rationale for this heuristic is pragmatic: there is too much code in the wild that makes assumptions about nodes in AST representations; that code should not break.

A couple notable edge cases:

It makes sense to preserve the property that all fields on Nodes are exported. cmd/gofmt makes this assumption, and it is reasonable to assume that other users will have made this assumption as well (and this was the original intent).
There is code in the wild that assumes the completeness of node sets, i.e. panicking if an unknown node is encountered. For example, see issue vscode-go#1551 for x/tools. If we were to consider this a valid use of go/ast, that would mean that we could never introduce a new node type. In order to avoid introducing new nodes, we'd have to pack new syntactic constructs into existing nodes, resulting in cumbersome APIs and increased memory usage. Also, from another perspective, assuming the completeness of node types is not so different from assuming the completeness of fields in struct literals, which is explicitly not guaranteed by the compatibility promise. We should therefore consider adding a new node type a valid change (and do our best to publicize this change to our users).

Finally, when selecting our representation, keep in mind that the parser has access to only local syntactic information. Therefore, it cannot differentiate between, for example, the representation of f[T] in var f []func(); T := 0; f[T]() and func f[S any](){} ... f[T]().

Proposal

For type parameters in type and function declarations

To represent type parameters in type and function declarations, both ast.TypeSpec and ast.FuncType gain a new TParams *FieldList field, which will be nil in the case of non-generic types and functions.

Alternatively, we could add TParams to ast.FuncDecl rather than ast.FuncType. While this would be a more efficient representation with the current proposal, it is incompatible with potential future language extensions to allow generic function literals or parameterized methods in interfaces.

For type and function instantiation

To represent both type and function instantiation with type arguments, introduce a new node type ast.MultiIndexExpr, which is an expression similar to ast.IndexExpr, but with a slice of indexes rather than a single index:

type MultiIndexExpr struct {
	X Expr
	Lbrack token.Pos
	Indexes []Expr
	Rbrack token.Pos
}

Type instantiation will be parsed into a single IndexExpr if there is only one index, and a MultiIndexExpr if there is more than one index. Specifically, when encountering an expression f[] or f[expr1, ..., exprN] with N argument expressions, parse as follows:

If N == 1, as in normal index expressions f[expr], parse an IndexExpr.
If N > 1, parse a MultiIndexExpr with Indexes set to the parsed expressions expr1, …, exprN
If N == 0, as in the invalid expression f[], parse an IndexExpr with BadExpr for its Index (this matches the current error recovery behavior).

There were several alternatives considered for representing this syntax. At least two of these alternatives were implemented. They are worth discussing:

Add a new ListExpr node type that holds an expression list, to serve as the IndexExpr.Index when N >= 2. This is an elegant solution, but results in somewhat inefficient storage and, more importantly, a new node type that exists only to alter the meaning of an existing node. This is somewhat inconsistent with the design of other nodes in go/ast, where additional nodes are preferred to overloading existing nodes. Compare with RangeStmt and TypeSwitchStmt, which are distinct nodes in go/ast. Having distinct nodes is generally easier to work with, as each node has more uniform composition.
Overload ast.CallExpr to have a Brackets bool field, so f[T] would be analogous to f(T), but with CallExpr.Brackets set to true. This is roughly equivalent to the MultiIndexExpr node, and allows us to avoid adding a new type. However, it overloads the meaning of CallExpr, and also bloats CallExpr somewhat with the additional field. This was done initially, but proved cumbersome.
Add an IndexExpr.Tail []Expr field to hold additional type arguments. While this means that instantiation logic can deal with just IndexExpr, and avoids a new node type, it adds bloat to every IndexExpr and provides an API that could be error-prone.

For type sets

The new syntax for type restrictions in interfaces can be represented using existing node types.

We can represent the expression ~T1|T2 |~T3 in interface { ~T1|T2|~T3 } as a single embedded expression (i.e. an *ast.Field with empty Names), consisting of unary and binary expressions. Specifically, we can introduce a new token token.TILDE, and represent ~expr as an *ast.UnaryExpr where Op is token.TILDE. Represent expr1|expr2 as an *ast.BinaryExpr where Op is token.OR, as would be done for a value expression involving bitwise-or.