blob: d8e9787c220114cd46e447a95712754881e709e1 [file] [log] [blame] [view]
# Generalized Types
This is a proposal for adding generics to Go, written by Ian Lance
Taylor in March, 2011.
This proposal will not be adopted.
It is being presented as an example for what a complete generics
proposal must cover.
## Introduction
This document describes a possible implementation of generalized types
in Go.
We introduce a new keyword, `gen`, which declares one or more type
parameters: types that are not known at compile time.
These type parameters may then be used in other declarations,
producing generalized types and functions.
Some goals, borrowed from [Garcia et al](https://web.archive.org/web/20170812055356/http://www.crest.iu.edu/publications/prints/2003/comparing_generic_programming03.pdf):
* Do not require an explicit relationship between a definition of a generalized function and its use. The function should be callable with any type that fits the required form.
* Permit interfaces to express relationships between types of methods, as in a comparison function that takes two parameters of the same unknown type.
* Given a generalized type, make it possible to use related types, such as a slice of that type.
* Do not require explicit instantiation of generalized functions.
* Permit type aliasing of generalized types.
The type parameter introduced by a `gen` declaration is a concept that
exists at compile time.
Any actual value that exists at runtime has a specific concrete type:
an ordinary non-generalized type, or a generalized type that has been
instantiated as a concrete type.
Generalized functions will be compiled to handle values whose types
are supplied at runtime.
This is what changes in the language:
* There is a new syntax for declaring a type parameter (or parameters) for the scope of one or more declarations.
* There is a new syntax for specifying the concrete type(s) to use when using something declared with a type parameter.
* There is a new syntax for converting values of concrete type, and untyped constants, to generalized types. Also values of generalized type are permitted in type assertions.
* Within a function, we define the operations permitted on values with a generalized type.
## Syntax
Any package-scope type or function declaration may be preceded with
the new keyword `gen` followed by a list of type parameter names in
square brackets:
```
gen [T] type Vector []T
```
This defines `T` as a type parameter for the generalized type `Vector`.
The scope of `Vector` is the same as it would be if `gen` did not appear.
A use of a generalized type will normally provide specific types to
use for the type parameters.
This is done using square brackets following the generalized type.
```
type VectorInt Vector[int]
var v1 Vector[int]
var v2 Vector[float32]
gen [T1, T2] type Pair struct { first T1; second T2 }
var v3 Pair[int, string]
```
Type parameters may also be used with functions.
```
gen [T] func SetHead(v Vector[T], e T) T {
v[0] = e
return e
}
```
For convenience, we permit a modified version of the factoring syntax
used with `var`, `type`, and `const` to permit a series of
declarations to share the same type parameters.
```
gen [T1, T2] (
type Pair struct { first T1; second T2 }
func MakePair(first T1, second T2) Pair {
return &Pair{first, second}
}
)
```
References to other names declared within the same gen block do not
have to specify the type parameters.
When the type parameters are omitted, they are assumed to simply be
the parameters declared for the block.
In the above example, `Pair` when used as the result type of
`MakePair` is equivalent to `Pair[T1, T2]`.
As with generalized types, we must specify the types when we refer to
a generalized function (but see the section on type deduction, below).
```
var MakeIntPair = MakePair[int, int]
var IntPairZero = MakeIntPair(0, 0)
```
A generalized type can have methods.
```
gen [T] func (v *Vector[T]) SetHeadMethod(e T) T {
v[0] = e
return e
}
```
Of course a method of a generalized type may itself be a generalized function.
```
gen [T, T2] func (v *Vector[T]) Transform(f func(T) T2) Vector[T2]
```
The `gen` keyword may only be used with a type or function.
It may only appear in package scope, not within a function.
One `gen` keyword may appear within the scope of another.
In that case, any use of the generalized type or function must specify
all the type parameters, starting with the outermost ones.
A different way of writing the last example would be:
```
gen [T] (
type Vector []T
gen [T2] func (v *Vector[T]) Transform(f func(T) T2) Vector[T2]
)
var v Vector[int]
var v2 = v.Transform[int, string](f)
```
Type deduction, described below, would permit omitting the
`[int, string]` in the last line, based on the types of `v` and `f`.
### A note on syntax
While the use of the `gen` keyword fits reasonably well into the
existing Go language, the use of square brackets to denote the
specific types is new to Go.
We have considered a number of different approaches:
* Use angle brackets, as in `Pair<int, string>`. This has the advantage of being familiar to C++ and Java programmers. Unfortunately, it means that `f<T>(true)` can be parsed as either a call to function `f<T>` or a comparison of `f<T` (an expression that tests whether `f` is less than `T`) with `(true)`. While it may be possible to construct complex resolution rules, the Go syntax avoids that sort of ambiguity for good reason.
* Overload the dot operator again, as in `Vector.int` or `Pair.(int, string)`. This becomes confusing when we see `Vector.(int)`, which could be a type assertion.
* We considered using dot but putting the type first, as in `int.Vector` or `(int, string).Pair`. It might be possible to make that work without ambiguity, but putting the types first seems to make the code harder to read.
* An earlier version of this proposal used parentheses for names after types, as in `Vector(int)`. However, that proposal was flawed because there was no way to specify types for generalized functions, and extending the parentheses syntax led to `MakePair(int, string)(1, "")` which seems less than ideal.
* We considered various different characters, such as backslash, dollar sign, at-sign or sharp. The square brackets grouped the parameters nicely and provide an acceptable visual appearance.
## Type Deduction
When calling a function, as opposed to referring to it without calling
it, the type parameters may be omitted in some cases.
A function call may omit the type parameters when every type parameter
is used for a regular parameter, or, in other words, there are no type
parameters that are used only for results.
In that case the compiler will compare the actual type of the
argument (`A`) with the type of the generalized parameter (`P`), examining
the arguments from left to right.
`A` and `P` must be identical.
The first time we see a type parameter in `P`, it will be set to the
appropriate portion of `A`.
If the type parameter appears again, it must be identical to the
actual type at that point.
Note that at compile time the argument type may itself be a
generalized type.
The type deduction algorithm is the same.
A type parameter of `P` may match a type parameter of `A`.
Once this match is made, then every subsequent instance of the `P` type
parameter must match the same `A` type parameter.
When doing type deduction with an untyped numeric constant, the
constant is given the type `int`, `float64`, or `complex128` as usual.
Type deduction does not support passing an untyped `nil` constant;
`nil` may only be used with an explicit type conversion (or, of course,
the type parameters may be written explicitly).
For example, these two variables will have the same type and value.
```
var v1 = MakePair[int, int](0, 0)
var v2 = MakePair(0, 0) // [int, int] deduced.
```
## Constraints
The only things that a generalized function can do with a value of
generalized type are the operations inherent to the type&mdash;e.g., the
`Vector` type can be indexed or sliced.
But sometimes we want to be able to say something about the types that
are used as part of a larger type.
Specifically, we want to say that they must implement a particular
interface.
So when listing the identifiers following `gen` we permit an optional
interface type following the name.
```
gen [T Stringer] type PrintableVector []T
```
Now `PrintableVector` may only be used with types that implement the
`Stringer` interface.
The interface may itself be a generalized type.
The scope of each type parameter starts at the `[`, and so we permit
using the type identifier just named with the generalized interface
type.
```
// Types that may be compared with some other type.
gen [T] type Comparer interface {
Compare(T) int // <0, ==0, >0
}
// Vector of elements that may be compared with themselves.
gen [T Comparer[T]] type SortableVector []T
```
## Example
```
package hashmap
gen [Keytype, Valtype] (
type bucket struct {
next *bucket
key Keytype
val Valtype
}
type Hashfn func(Keytype) uint
type Eqfn func(Keytype, Keytype) bool
type Hashmap struct {
hashfn Hashfn
eqfn Eqfn
buckets []bucket
entries int
}
// This function must be called with explicit type parameters, as
// there is no way to deduce the value type. For example,
// h := hashmap.New[int, string](hashfn, eqfn)
func New(hashfn Hashfn, eqfn Eqfn) *Hashmap {
return &Hashmap{hashfn, eqfn, make([]buckets, 16), 0}
}
func (p *Hashmap) Lookup(key Keytype) (val Valtype, found bool) {
h := p.hashfn(key) % len(p.buckets)
for b := p.buckets[h]; b != nil; b = b.next {
if p.eqfn(key, b.key) {
return b.val, true
}
}
return
}
func (p *Hashmap) Insert(key Keytype, val Valtype) (inserted bool) {
// Implementation omitted.
}
) // Ends gen.
package sample
import (
“fmt”
“hashmap”
“os”
)
func hashint(i int) uint {
return uint(i)
}
func eqint(i, j int) bool {
return i == j
}
var v = hashmap.New[int, string](hashint, eqint)
func Add(id int, name string) {
if !v.Insert(id, name) {
fmt.Println(“duplicate id”, id)
os.Exit(1)
}
}
func Find(id int) string {
val, found = v.Lookup(id)
if !found {
fmt.Println(“missing id”, id)
os.Exit(1)
}
}
```
## Language spec changes
This is an outline of the changes required to the language spec.
### Types
A few paragraphs will be added to discuss generalized types.
### Struct types
While a struct may use a type parameter as an anonymous field, within
generalized code only the generalized definition is considered when
resolving field references.
That is, given
```
gen [T] type MyGenStruct struct { T }
type MyRealStruct { i int }
type MyInstStruct MyGenStruct[MyRealStruct]
gen [T] func GetI(p *MyGenStruct[T]) int {
return p.i // INVALID
}
func MyGetI(p *MyInstStruct) int {
return GetI(p)
}
```
the function `GetI` may not refer to the field `i` even though the
field exists when called from `MyGetI`.
(This restriction is fairly obvious if you think about it, but is
explicitly stated for clarity.)
### Type Identity
We define type identity for generalized types.
Two generalized types are identical if they have the same name and the
type parameters are identical.
### Assignability
We define assignability for generalized types.
A value `x` of generalized type `T1` is assignable to a variable of
type `T2` if `T1` and `T2` are identical.
A value `x` of concrete type is never assignable to a variable of
generalized type: a generalized type coercion is required (see below).
Similarly, a value `x` of generalized type is never assignable to a
variable of concrete type: a type assertion is required.
For example (more details given below):
```
gen [T] func Zero() (z T) {
z = 0 // INVALID: concrete to generalized.
z = int(0) // INVALID: concrete to generalized.
z = 0.[T] // Valid: generalized type coercion.
}
gen [T] func ToInt(v T) (r int) {
r = v // INVALID: generalized to concrete
r = int(v) // INVALID: no conversions for gen types
r, ok := v.(int) // Valid: generalized type assertion.
if !ok {
panic(“not int”)
}
}
```
### Declarations and scope
A new section Generalized declarations is added, consisting of a few
paragraphs that describe generalized declarations and the gen syntax.
### Indexes
The new syntax `x[T]` for a generalized type or function is defined,
where `T` is a type and `x` is the name of some type or function
declared within a `gen` scope.
### Type assertions
We define type assertions using generalized types.
Given `x.(T)` where `x` is a value with generalized type and `T` is a
concrete type, the type assertion succeeds if the concrete type of `x`
is identical to `T`, or, if `T` is an interface type, the concrete
type implements the interface `T`.
In other words, pretty much the same as doing a type assertion of a
value of interface type.
If `x` and `T` are both generalized types, we do the same test using
the concrete types of `x` and `T`.
In general these assertions must be checked at runtime.
### Generalized type coercions
We introduce a new syntax for coercing a value of concrete type to a
generalized type.
Where `x` is a value with concrete type and `T` is a generalized type,
the expression `x.[T]` coerces `x` to the generalized type `T`.
The generalized type coercion may succeed or fail, just as with a type
assertion.
However, it is not a pure type assertion, as we permit `x` to be an
untyped constant.
The generalized type coercion succeeds if the concrete type matches
the generalized type, where any parameters of the generalized type
match the appropriate portion of the concrete type.
If the same parameter appears more than once in the generalized type,
it must match identical types in the concrete type.
If the value is an untyped constant, the coercion succeeds if an
assignment of that constant to the concrete type would succeed at
compile time.
### Calls
This section is extended to describe the type deduction algorithm used
to avoid explicit type parameters when possible.
An implicit generalized type conversion is applied to convert the
arguments to the expected generalized type, even though normally
values of concrete type are not assignable to variables of generalized
type.
Type checking ensures that the arguments must be assignable to the
concrete type which is either specified or deduced, and so this
implicit generalized type conversion will always succeed.
When a result parameter has a generalized type, an implicit type
assertion is applied to convert back to the type that the caller
expects, which may be a concrete type.
The type expected by the caller is determined by the type parameters
passed to the function, whether determined via type deduction or not.
This implicit type assertion will always succeed.
For example, in
```
gen [T] func Identity(v T) T { return v }
func Call(i int) { j := Identity(i) }
```
the variable `j` gets the type `int`, and an implicit type assertion
converts the return value of `Identity[int]` to `int`.
### Conversions
Nothing needs to change in this section.
I just want to note explicitly that there are no type conversions for
generalized types other than the standard conversions that apply to
all types.
### Type switches
A type switch may be used on a value of generalized type.
Type switch cases may include generalized types.
The rules are the same as for type assertions.
### For statements
A range clause may be used with a value of generalized type, if the
generalized type is known to be a slice, array, map or channel.
## Implementation
Any actual value in Go will have a concrete type.
The implementation issue that arises is how to compile a function that
has parameters with generalized type.
### Representation
When calling a function that uses type parameters, the type parameters
are passed first, as pointers to a runtime type descriptor.
The type parameters are thus literally additional parameters to the
functions.
### Types
In some cases it will be necessary to create a new type at runtime,
which means creating a new runtime type descriptor.
It will be necessary to ensure that type descriptor comparisons
continue to work correctly.
For example, the hashmap example above will require creating a new
type for each call to `hashmap.New` for the concrete types that are used
in the call.
The reflect package already creates new runtime type descriptors in
the functions `PtrTo`, `ChanOf`, `FuncOf`, etc.
Type reflection on a generalized type will return the appropriate
runtime type descriptor, which may have been newly created.
Calling `Name()` on such a type descriptor will return a name with the
appropriate type parameters: e.g, `“Vector[int]”`.
### Variable declarations
A local variable in a function may be declared with a generalized
type.
In the general case, the size of the variable is unknown, and must be
retrieved from the type descriptor.
Declaring a local variable of unknown size will dynamically allocate
zeroed memory of the appropriate size.
As an optimization the memory may be allocated on the stack when there
is sufficient room.
### Composite literals
A generalized type that is defined to be a struct, array, slice, or
map type may be used to create a composite literal.
The expression has the same generalized type.
The elements of the composite literal must follow the assignability
rules.
### Selectors
When `x` is a value of generalized type that is a struct, `x.f` can
refer to a field of that struct.
Whether `f` is a field of `x` is known at compile time.
The exact offset of the field in the struct value may not be known.
When it is not known, the field offset is retrieved from the type
descriptor at runtime.
Similarly, `x.f` may refer to a method of the type.
In this case the method is always known at compile time.
As noted above under struct types, if a generalized struct type uses a
type parameter as an anonymous field, the compiler does not attempt to
look up a field name in the concrete type of the field at runtime.
### Indexes
A value of a generalized type that is an array, slice or map may be indexed.
Note that when indexing into a map type, the type of the value must be
assignable to the maps key type;
in practice this means that if the maps key type is generalized, the
value must itself have the same generalized type.
Indexing into a generalized array or slice may require multiplying by
the element size found in the type descriptor.
Indexing into a generalized map may require a new runtime function.
### Slices
A value of a generalized type that is an array or slice may itself be
sliced.
This operation is essentially the same as a slice of a value of
concrete type.
### Type Assertions
A type assertion generally requires a runtime check, and in the
general case requires comparing two concrete types at runtime, where
one of the types is known to instantiate some generalized type.
The complexity of the runtime check is linear in the number of tokens
in the generalized type, and requires storage space to store type
parameters during the check.
This check could be inlined into the code, or it could use a general
purpose runtime check that compares the concrete type descriptor to a
similar representation of the generalized type.
### Calls
Function calls can require converting normal values to generalized
values.
This operation depends on the representation chosen for the
generalized value.
In the worst case it will be similar to passing a normal value to a
function that takes an interface type.
When calling a function with type parameters, the type parameters will
be passed first, as a pointer to a runtime type descriptor.
Function calls can also require converting generalized return values
to normal values.
This is done via an implicitly inserted type assertion.
Depending on the representation, this may not require any actual code
to be generated.
### Communication operators
We have to implement sending and receiving generalized values for
channels of generalized type.
### Assignments
We have to implement assignment of generalized values.
This will be based on the runtime type descriptor.
### Type switches
We have to implement type switches using generalized types.
This will mostly likely devolve into a series of if statements using
type assertions.
### For statements
We have to implement for statements with range clauses over
generalized types.
This is similar to the indexing and communication operators.
### Select statements
We have to implement select on channels of generalized type.
### Return statements
We have to implement returning a value of generalized type.
### Specialization of functions
This proposal is intended to support compiling a generalized function
into code that operates on generalized values.
In fact, it requires that this work.
```
package p1
gen [T] func Call(f func (T) T, T) T {
return f(T)
}
package p2
func SliceIdentity(a []int) []int {
return a
}
package p3
var v = p1.Call(p2.SliceIdentity, make([]int, 10))
```
Here `Call` has to support calling a generalized function.
There is no straightforward specialization process that can implement
this case.
(It could be done if the full source code of p1 and p2 are available either when compiling p3 or at link time;
that is how C++ does it, but it is not an approach that fits well with Go.)
However, for many cases, this proposal can be implemented using
function specialization.
Whenever the compiler can use type deduction for a function call, and
the types are known concrete types, and the body of the function is
available, the compiler can generate a version of the function
specialized for those types.
This is, therefore, an optional optimization, in effect a form of
cross-package inlining, which costs compilation time but improves
runtime.
## Methods on builtin types
This is an optional addendum to the proposal described above.
The proposal does not provide a convenient way to write a function
that works on any numeric type.
For example, there is no convenient way to write this:
```
gen [T] func SliceAverage(a []T) T {
s := T(0)
for _, v = range a {
s += v
}
return s / len(a)
}
```
It would be nice if that function worked for any numeric function.
However, it is not permitted under the proposal described above,
because of the use of `+=` and `/`.
These operators are not available for every type and therefore are not
available for a generalized type.
This approach does work:
```
gen [T] type Number interface {
Plus(T) T
Divide(T) T
}
gen [T Number[T]] func SliceAverage(a []T) T {
s := 0.[T]
for _, v = range a {
s = s.Plus(v)
}
return s.Divide(len(a))
}
```
However, this requires writing explicit `Plus` and `Divide` methods for
each type you want to use.
These methods are themselves boilerplate:
```
func (i MyNum) Plus(v MyNum) MyNum { return i + v }
func (i MyNum) Divide(v MyNum) MyNum { return i / v }
```
This proposal does not help with this kind of boilerplate function,
because there is no way to use operators with generalized values.
There are a few ways to solve this.
One way that seems to fit well with Go as extended by this proposal is
to declare that for all types that support some language operator, the
type has a corresponding method.
That is, we say that if the type can be used with `+`, the language
defines a method `Plus` (or `Binary+` or whatever) for the type that
implements the operation.
This method can then be picked up by an interface such as the above,
and the standard library can define convenient aggregate interfaces,
such as an interface listing all the methods supported by an integer
type.
Note that it would not help for the standard library to define a
`Plus` method for every integer type, as those methods would not carry
over to user defined types.
## Operator methods
It is of course a smallish step from those language-defined methods to
having operator methods, which would permit writing generalized code
using operators rather than method calls. For the purposes of using
generalized types, however, this is less important than having
language defined methods for operators.
## Summary
This proposal will not be adopted.
It has significant flaws.
The factored `gen` syntax is convenient but looks awkward on the page.
You wind up with a trailing close parenthesis after a set of function
definitions.
Indenting all the function definitions looks silly.
This proposal doesn't let me write a trivial generalized `Max`
function, unless we include operator methods.
Even when we include operator methods, `Max` has to be written in
terms of a `Less` method.
The handling of untyped constants in generalized functions is
extremely awkward.
They must always use a generalized type coercion.
While this proposal is more or less palatable for data structures,
it is much weaker for functions.
You basically can't do anything with a generalized type,
except assign it and call a method on it.
Writing standardized algorithms will require developing a whole
vocabulary of quasi-standard methods.
The proposal doesn't help write functions that work on either `[]byte`
or `string`, unless those types get additional operator methods like
`Index` and `Len`.
Even operator methods don't help with using `range`.