Generalized Types In Go

This is a proposal for adding generics to Go, written by Ian Lance Taylor in October, 2013. This proposal will not be adopted. It is being presented as an example for what a complete generics proposal must cover.

Introduction

This document describes a possible implementation of generalized types in Go. We introduce a new keyword, gen, that declares one or more type parameters: types that are not known at compile time. These type parameters may then be used in other declarations, producing generalized types and functions.

Some goals, borrowed from Garcia et al:

  • Do not require an explicit relationship between a definition of a generalized function and its use. The function should be callable with any suitable type.
  • Permit interfaces to express relationships between types of methods, as in a comparison function that takes two parameters of the same unknown type.
  • Given a generalized type, make it possible to use related types, such as a slice of that type.
  • Do not require explicit instantiation of generalized functions.
  • Permit type aliasing of generalized types.

Background

My earlier proposal for generalized types had some flaws.

People expect functions that operate on generalized types to be fast. They do not want a reflection based interface in all cases. The question is how to support that without excessively slowing down the compiler.

People want to be able to write simple generalized functions like Sum(v []T) T, a function that sums the values in the slice v. They are prepared to assume that T is a numeric type. They don’t want to have to write a set of methods simply to implement Sum or the many other similar functions for every numeric type, including their own named numeric types.

People want to be able to write the same function to work on both []byte and string, without requiring a copy.

People want to write functions on generalized types that support simple operations like comparison. That is, they want to write a function that uses a generalized type and compares a value of that type to another value of the same type. That was awkward in my earlier proposal: it required using a form of the curiously recurring template pattern.

Go’s use of structural typing means that you can use any type to meet an interface without an explicit declaration. Generalized types should work the same way.

Proposal

We permit package-level type and func declarations to use generalized types. There are no restrictions on how these types may be used within their scope. At compile time each actual use of a generalized type or function is instantiated by replacing the generalized type with some concrete type. This may happen multiple times with different concrete types. A concrete type is only permitted if all the operations used with the generalized types are permitted for the concrete type. How to implement this efficiently is discussed below.

Syntax

Any package-scope type or func declaration may be preceded with the new keyword gen followed by a list of type parameter names in square brackets:

gen [T] type Vector []T

This defines T as a type parameter for the generalized type Vector. The scope of Vector is the same as it would be if gen did not appear.

A use of a generalized type must provide specific types to use for the type parameters. This is normally done using square brackets following the generalized type.

type VectorInt Vector[int]
var v1 Vector[int]
var v2 Vector[float32]
gen [T1, T2] type Pair struct { first T1; second T2 }
var v3 Pair[int, string]

Type parameters may also be used with functions.

gen [T] func SetHead(v Vector[T], e T) T {
	v[0] = e
	return e
}

We permit a modified version of the factoring syntax used with var, type, and const to permit a series of declarations to share the same type parameters.

gen [T1, T2] (
type Pair struct { first T1; second T2 }

func MakePair(first T1, second T2) Pair {
	return &Pair{first, second}
}
) // Ends gen.

References to other names declared within the same gen block do not have to specify the type parameters. When the type parameters are omitted, they are implied to simply be the parameters declared for the block. In the above example, Pair when used as the result type of MakePair is equivalent to Pair[T1, T2].

When this syntax is used we require that the entire contents of the block be valid for a given concrete type. The block is instantiated as a whole, not in individual pieces.

As with generalized types, we must specify the types when we refer to a generalized function (but see the section on type deduction, below).

var MakeIntPair = MakePair[int, int]
var IntPairZero = MakeIntPair(0, 0)

A generalized type can have methods.

gen [T] func (v *Vector[T]) SetHead(e T) T {
	v[0] = e
	return e
}

Of course a method of a generalized type may itself be a generalized function.

gen [T, T2] func (v *Vector[T]) Transform(f func(T) T2) Vector[T2]

The gen keyword may only be used with a type or function. It may only appear in package scope, not within a function. A non-generalized type may not have a generalized method.

A gen keyword may appear within the scope of another gen keyword. In that case, any use of the generalized type or function must specify all the type parameters, starting with the outermost ones. A different way of writing the last example would be:

gen [T] (
type Vector []T
gen [T2] func (v *Vector[T]) Transform(f func(T) T2) Vector[T2]
)
var v Vector[int]
var v2 = v.Transform[int, string](strconv.Itoa)

Type deduction, described below, would permit omitting the [int, string] in the last line, based on the types of v and strconv.Itoa. Inner type parameters shadow outer ones with the same name, as in other scopes in Go (although it’s hard to see a use for this shadowing).

A note on syntax

While the use of the gen keyword fits reasonably well into the existing Go language, the use of square brackets to denote the specific types is new to Go. We have considered a number of different approaches:

  • Use angle brackets, as in Pair<int, string>. This has the advantage of being familiar to C++ and Java programmers. Unfortunately, it means that f<T>(true) can be parsed as either a call to function f<T> or a comparison of f<T (an expression that tests whether f is less than T) with (true). While it may be possible to construct complex resolution rules, the Go syntax avoids that sort of ambiguity for good reason.
  • Overload the dot operator again, as in Vector.int or Pair.(int, string). This becomes confusing when we see Vector.(int), which could be a type assertion.
  • We considered using dot but putting the type first, as in int.Vector or (int, string).Pair. It might be possible to make that work without ambiguity, but putting the types first seems to make the code harder to read.
  • An earlier version of this proposal used parentheses for names after types, as in Vector(int). However, that proposal was flawed because there was no way to specify types for generalized functions, and extending the parentheses syntax led to MakePair(int, string)(1, "") which seems less than ideal.
  • We considered various different characters, such as backslash, dollar sign, at-sign or sharp. The square brackets grouped the parameters nicely and provide an acceptable visual appearance.

The use of square brackets to pick specific versions of generalized types and functions seems appropriate. However, the top-level declarations could work differently.

  • We could omit the square brackets in a gen declaration without ambiguity.
    • gen T type Vector []T
    • gen T1, T2 type Pair struct { f1 T1; f2 T2 }
  • We could keep the square brackets, but use the existing type keyword rather than introducing a new keyword.
    • type [T] type Vector []T I have a mild preference for the syntax described above but I’m OK with other choices as well.

Type Deduction

When calling a function, as opposed to referring to it without calling it, the type parameters may be omitted in some cases. A function call may omit the type parameters when every type parameter is used for a regular parameter, or, in other words, there are no type parameters that are used only for results. In that case the compiler will compare the actual type of the argument A with the type of the generalized parameter P, examining the arguments from left to right. A and P must be identical. The first time we see a type parameter in P, it will be set to the appropriate portion of A. If the type parameter appears again, it must be identical to the actual type at that point.

Note that at compile time the argument type may itself be a generalized type, when one generalized function calls another. The type deduction algorithm is the same. A type parameter of P may match a type parameter of A. Once this match is made, then every subsequent instance of the P type parameter must match the same A type parameter.

When doing type deduction with an untyped constant, the constant does not determine anything about the generalized type. The deduction proceeds with the remaining arguments. If at the end of the deduction the type has not been determined, the untyped constants are re-examined in sequence, and given the type int, rune, float64, or complex1281 as usual. Type deduction does not support passing an untypednilconstant;nil` may only be used with an explicit type conversion (or, of course, the type parameters may be written explicitly).

Type deduction also applies to composite literals, in which the type parameters for a generalized struct type are deduced from the types of the literals.

For example, these three variables will have the same type and value.

var v1 = MakePair[int, int](0, 0)
var v2 = MakePair(0, 0)			// [int, int] deduced.
var v3 = Pair{0, 0}			// [int, int] deduced.

To explain the untyped constant rules:

gen [T] func Max(a, b T) T {
	if a < b {
		return b
	}
	return a
}
var i int
var m1 = Max(i, 0)		// i causes T to be deduced as int, 0 is 
// passed as int.
var m2 = Max(0, i)		// 0 ignored on first pass, i causes
// T to be deduced as int, 0 is passed as 
// int.
var m3 = Max(1, 2.5)		// 1 and 2.5 ignored on first pass.
// On second pass 1 causes T to be deduced 
// as int.  Passing 2.5 is an error.
var m4 = Max(2.5, 1)		// 2.5 and 1 ignored on first pass.
// On second pass 2.5 causes T to be 
// deduced as float64.  1 converted to 
// float64.

Examples

A hash table.

package hashmap

gen [Keytype, Valtype] (

type bucket struct {
	next *bucket
	key Keytype
	val Valtype
}

type Hashfn func(Keytype) uint
type Eqfn func(Keytype, Keytype) bool

type Hashmap struct {
	hashfn Hashfn
	eqfn Eqfn
	buckets []bucket
	entries int
}

// This function must be called with explicit type parameters, as
// there is no way to deduce the value type.  For example,
// h := hashmap.New[int, string](hashfn, eqfn)
func New(hashfn Hashfn, eqfn Eqfn) *Hashmap {
	return &Hashmap{hashfn, eqfn, make([]buckets, 16), 0}
}

func (p *Hashmap) Lookup(key Keytype) (val Valtype, found bool) {
	h := p.hashfn(key) % len(p.buckets)
	for b := p.buckets[h]; b != nil; b = b.next {
		if p.eqfn(key, b.key) {
			return b.val, true
		}
	}
	return
}

func (p *Hashmap) Insert(key Keytype, val Valtype) (inserted bool) {
	// Implementation omitted.
}

) // Ends gen.

Using the hash table.

package sample

import (
	"fmt"
	"hashmap"
	"os"
)

func hashint(i int) uint {
	return uint(i)
}

func eqint(i, j int) bool {
	return i == j
}

var v = hashmap.New[int, string](hashint, eqint)

func Add(id int, name string) {
	if !v.Insert(id, name) {
		fmt.Println(“duplicate id”, id)
		os.Exit(1)
	}
}

func Find(id int) string {
	val, found := v.Lookup(id)
	if !found {
		fmt.Println(“missing id”, id)
		os.Exit(1)
	}
	return val
}

Sorting a slice given a comparison function.

gen [T] (

func SortSlice(s []T, less func(T, T) bool) {
	sort.Sort(&sorter{s, less})
}

type sorter struct {
	s []T
	less func(T, T) bool
}

func (s *sorter) Len() int { return len(s.s) }
func (s *sorter) Less(i, j int) bool { return s.less(s[i], s[j]) }
func (s *sorter) Swap(i, j int) { s.s[i], s.s[j] = s.s[j], s.s[i] }

) // End gen

Sorting a numeric slice (also works for string).

// This can be successfully instantiated for any type T that can be
// used with <.
gen [T] func SortNumericSlice(s []T) {
	SortSlice(s, func(a, b T) bool { return a < b })
}

Merging two channels into one.

gen [T] func Merge(a, b <-chan T) <-chan T {
	c := make(chan T)
	go func(a, b chan T) {
		for a != nil && b != nil {
			select {
			case v, ok := <-a:
				if ok {
					c <- v
				} else {
					a = nil
				}
			case v, ok := <-b:
				if ok {
					c <- v
				} else {
					b = nil
				}
			}
		}
		close(c)
	}(a, b)
	return c
}

Summing a slice.

// Works with any type that supports +.
gen [T] func Sum(a []T) T {
	var s T
	for _, v := range a {
		s += v
	}
	return s
}

A generic interface.

gen [T] type Equaler interface {
	Equal(T) bool
}

// Return the index in s of v1.
gen [T] func Find(s []T, v1 T) int {
	eq, eqok := v1.(Equaler[T])
	for i, v2 := range s {
		if eqok {
if eq.Equal(v2) {
				return i
			}
		} else if reflect.DeepEqual(v1, v2) {
			return i
		}
	}
	return -1
}

type S []int

// Slice equality that treats nil and S{} as equal.
func (s1 S) Equal(s2 S) bool {
	if len(s1) != len(s2) {
		return false
	}
	for i, v1 := range s1 {
		if v1 != s2[i] {
			return false
		}
	}
	return true
}

var i = Find([]S{S{1, 2}}, S{1, 2})

Joining sequences; works for any T that supports len, copy to []byte, and conversion from []byte; in other words, works for []byte and string.

gen [T] func Join(a []T, sep T) T {
	if len(a) == 0 {
		return T([]byte{})
	}
	if len(a) == 1 {
		return a[0]
	}
	n := len(sep) * (len(a) - 1)
	for _, v := range a {
		n += len(v)
	}
	b := make([]byte, n)
	bp := copy(b, a[0])
	for _, v := range a[1:] {
		bp += copy(b[bp:], sep)
		bp += copy(b[bp:], v)
	}
	return T(b)
}

Require generalized types to implement an interface.

// A vector of any type that supports the io.Reader interface.
gen [T] (

type ReaderVector []T

// This function is not used.
// It can only be compiled if T is an io.Reader, so instantiating
// this block with any other T will fail.
func _(t T) {
	var _ io.Reader = t
}

) // Ends gen.

Implementation

When the compiler sees a gen declaration, it must examine the associated types and functions and build a set of constraints on the generalized type. For example, if the type is used with binary + as in the Sum example, then the type must be a numeric or string type. If the type is used as a + 1 then it must be numeric. If a type method is called, then the type must support a method with that name that can accept the given argument types. If the type is passed to a function, then it must have a type acceptable to that function.

The compiler can do minimal type checking on the generalized types: if the set of constraints on the generalized type can not be satisfied, the program is in error. For example, if we see both a + 1 and a + "b", where a is a variable of the same generalized type, the compiler should issue an error at that point.

When the compiler needs to instantiate a generalized type or function, it compares the concrete type with the constraints. If some constraint fails, the instantiation is invalid. The compiler can give an appropriate error ("cannot use complex64 with SortNumericSlice because complex64 does not support <"). The ability to give good clear errors for such a case is important, to avoid the C++ cascading error problem.

These constraints are used to build a set of methods required for the generalized type. When compiling a generalized function, each local variable of the generalized type (or any type derived from the generalized type) is changed to an interface type. Each use of the generalized type is changed to call a method on the generalized type. Each constraint becomes a method. Calling a generalized function then means passing in an interface type whose methods are built from the constraints.

For example, start with the Sum function.

gen [T] func Sum(a []T) T {
	var s T
	for _, v := range a {
		s += v
	}
	return s
}

This get rewritten along the lines of

type T interface {
	plus(T) T
}

type S interface {
	len() int
	index(int) T
}

func GenSum(a S) T {
	var s T
	for i := 0; i < a.len(); i++ {
		v := a.index(i)
		s = s.plus(v)
	}
	return s
}

This code is compiled as usual. In addition, the compiler generates instantiation templates. These could perhaps be plain text that can be included in the export data.

type realT <instantiation of T>
func (p realT) plus(a T) T {
	return p + a.(realT) // return converts realT to T
}
type realS []realT
func (s realS) len() int {
	return len(s)
}
func (s realS) index(i int) T {
	return s[i] // return converts realT to T
}

When instantiating Sum for a new type, the compiler builds and compiles this code for the new type and calls the compiled version of Sum with the interface value for the generated interface. As shown above, the methods automatically use type assertions and interface conversion as needed. The actual call to Sum(s) will be rewritten as GenSum(s).(realT). The type assertions and interface conversions are checked at compile time and will always succeed.

Note that another way to say whether a concrete type may be used to instantiate a generalized function is to ask whether the instantiation templates may be instantiated and compiled without error.

More complex cases may of course involve multiple generalized types in a single expression such as a function call. The compiler can arbitrarily pick one value to carry the methods, and the method implementation will use type assertions to implement the call. This works because all the concrete types are known at instantiation time.

For cases like make where the compiler has no value on which to invoke a method, there are two cases. For a generalized function, the compiler can write the function as a closure. The actual instantiation will pass a special object in the closure value. See the use of make in the next example.

gen [T1, T2, T3] func Combine(a []T1, b []T2, f func(T1, T2) T3) []T3 {
	r := make([]T3, len(a))
	for i, v := range a {
		r[i] = f(v, b[i])
	}
	return r
}

This will be rewritten as

type S1 interface {
	len() int
	index(int) T1
}
type S2 interface {
	index(int) T2
}
type S3 interface {
	set(int, T3)
}
type F interface {
	call(T1, T2) T3
}
type T1 interface{}
type T2 interface{}
type T3 interface{}
type Maker interface {
	make(int) S3
}

func GenCombine(a S1, b S2, f F) S3 {
	// The maker var has type Maker and is accessed via the
	// function’s closure.
	r = maker.make(a.len())
	for i := 0; i < a.len(); i++ {
		v := a.index(i)
		r.set(i, f.call(v, b.index(i))
	}
	return r
}

The associated instantiation templates will be

type realT1 <instantiation of T1>
type realT2 <instantiation of T2>
type realT3 <instantiation of T3>
type realS1 []realT1
type realS2 []realT2
type realS3 []realT3
type realF func(realT1, realT2) realT3
type realMaker struct{}
func (s1 realS1) len() int {
	return len(s1)
}
func (s1 realS1) index(i int) T1 {
	return s1[i]
}
func (s2 realS2) index(i int) T2 {
	return s2[i]
}
func (s3 realS3) set(i int, t3 T3) {
	s3[i] = t3.(realT3)
}
func (f realF) call(t1 T1, t2 T2) T3 {
	return f(t1.(realT1), t2.(realT2))
}
func (m realMaker) make(l int) S3 {
	return make(realT3, l)
}

A reference to Combine will then be built into a closure value with GenCombine as the function and a value of the Maker interface in the closure. The dynamic type of the Maker value will be realMaker. (If a function like make is invoked in a method on a generalized type, we can’t use a closure, so we instead add an appropriate hidden method on the generalized type.)

With this implementation approach we are taking interface types in a different direction. The interface type in Go lets the programmer define methods and then implement them for different types. With generalized types the programmer describes how the interface is used, and the compiler uses that description to define the methods and their implementation.

Another example. When a generalized type has methods, those methods need to be instantiated as calls to the generalized methods with appropriate type assertions and conversions.

gen [T] (
type Vector []T
func (v Vector) Len() int { return len(v) }
func (v Vector) Index(i int) T { return v[i] }
) // Ends gen.

type Readers interface {
	Len() int
	Index(i int) io.Reader
}

type VectorReader Vector[io.Reader]
var _ = make(VectorReader, 0).(Readers)

The last statement asserts that VectorReader[io.Reader] supports the Readers interface, as of course it should. The Vector type implementation will look like this.

type T interface{}
type S interface {
	len() int
	index(i int) T
}
type V struct {
	S
}
func (v V) Len() int { return v.S.len() }
func (v V) Index(i int) T { return v.S.index(i) }

The instantiation templates will be

type realT <instantiation of T>
type realS []realT
func (s realS) len() { return len(s) }
func (s realS) index(i int) T { return s[i] }

When this is instantiated with io.Reader, the compiler will generate additional methods.

func (s realS) Len() int { return V{s}.Len() }
func (s realS) Index(i int) io.Reader {
return V{s}.Index(i).(io.Reader)
}

With an example this simple this seems like a pointless runaround. In general, though, the idea is that the bulk of the method implementation will be in the V methods, which are compiled once. The realS len and index methods support those V methods. The realS Len and Index methods simply call the V methods with appropriate type conversions. That ensures that the Index method returns io.Reader rather than T aka interface{}, so that realS satisfies the Readers interface in the original code.

Now an example with a variadic method.

gen [T] func F(v T) {
	v.M(1)
	v.M(“a”, “b”)
}

This looks odd, but it’s valid for a type with a method M(...interface{}). This is rewritten as

type T interface {
	m(...interface{})	// Not the same as T’s method M.
}
func GF(v T) {
	v.m(1)
	v.m(“a”, “b”)
}

The instantiation templates will be

type realT <instantiation of T>
func (t realT) m(a ...interface{}) {
	t.M(a...)
}

The basic rule is that if the same method is called with different numbers of arguments, it must be instantiated with a variadic method. If it is called with the same number of arguments with different types, it must be instantiated with interface{} arguments. In the general case the instantiation template may need to convert the argument types to the types that the real type’s method accepts.

Because generalized types are implemented by interface types, there is no way to write generalized code that detects whether it was instantiated with an interface type. If the code can assume that a generalized function was instantiated by a non-interface type, then it can detect that type using a type switch or type assertion. If it is important to be able to detect whether a generalized function was instantiated with an interface type, some new mechanism will be required.

In the above examples I’ve always described a rewritten implementation and instantiation templates. There is of course another implementation method that will be appropriate for simple generalized functions: inline the function. That would most likely be the implementation method of choice for something like a generalized Max function. I think this could be handled as a minor variant on traditional function inlinining.

In some cases the compiler can determine that only a specific number of concrete types are permitted. For example, the Sum function can only be used with types that support that binary + operator, which means a numeric or string type. In that case the compiler could choose to instantiate and compile the function for each possible type. Uses of the generalized function would then call the appropriate instantiation. This would be more work when compiling the generalized function, but not much more work. It would mean no extra work for uses of the generalized function.

Spec changes

I don’t think many spec changes are needed other than a new section on generalized types. The syntax of generalized types would have to be described. The implementation details do not need to be in the spec. A generalized function instantiated with concrete types is valid if rewriting the function with the concrete types would produce valid Go code.

There is a minor exception to that approach: we would want to permit type assertions and type switches for generalized types as well as for interface types, even if the concrete type is not an interface type.

Compatibility

This approach introduces a new keyword, gen. However, this keyword is only permitted in top-level declarations. That means that we could treat it as a new syntactic category, a top-level keyword that is only recognized as such when parsing a TopLevelDecl. That would mean that any code that currently compiles with Go 1 would continue to compile with this new proposal, as any existing use of gen at top-level is invalid.

We could also maintain Go 1 compatibility by using the existing type keyword instead of gen. The square brackets used around the generalized type names would make this unambiguous.

However, syntactic compatibility is only part of the story. If this proposal is adopted there will be a push toward rewriting certain parts of the standard library to use generalized types. For example, people will want to change the container and sort packages. A farther reaching change will be changing io.Writer to take a generalized type that is either []byte or string, and work to push that through the net and os packages down to the syscall package. I do not know whether this work could be done while maintaining Go 1 compatibility. I do not even know if this work should be done, although I’m sure that people will want it.

Comparison to other languages

C

Generalized types in C are implemented via preprocessor macros. The system described here can be seen as a macro system. However, unlike in C, each generalized function must be complete and compilable by itself. The result is in some ways less powerful than C preprocessor macros, but does not suffer from problems of namespace conflict and does not require a completely separate language (the preprocessor language) for implementation.

C++

The system described here can be seen as a subset of C++ templates. Go’s very simple name lookup rules mean that there is none of the confusion of dependent vs. non-dependent names. Go’s lack of function overloading removes any concern over just which instance of a name is being used. Together these permit the explicit determination of constraints when compiling a generalized function, whereas in C++ where it’s nearly impossible to determine whether a type may be used to instantiate a template without effectively compiling the instantiated template and looking for errors (or using concepts, proposed for later addition to the language).

C++ template metaprogramming uses template specialization, non-type template parameters, variadic templates, and SFINAE to implement a Turing complete language accessible at compile time. This is very powerful but at the same time has serious drawbacks: the template metaprogramming language has a baroque syntax, no variables or non-recursive loops, and is in general completely different from C++ itself. The system described here does not support anything similar to template metaprogramming for Go. I believe this is a feature. I think the right way to implement such features in Go would be to add support in the go tool for writing Go code to generate Go code, most likely using the go/ast package and friends, which is in turn compiled into the final program. This would mean that the metaprogramming language in Go is itself Go.

Java

I believe this system is slightly more powerful than Java generics, in that it permits direct operations on basic types without requiring explicit methods. This system also does not use type erasure. Although the implementation described above does insert type assertions in various places, all the types are fully checked at compile time and those type assertions will always succeed.

Summary

This proposal will not be adopted. It has significant flaws.

The factored gen syntax is convenient but looks awkward on the page. You wind up with a trailing close parenthesis after a set of function definitions. Indenting all the function definitions looks silly.

The description of constraints in the implementation section is imprecise. It's hard to know how well it would work in practice. Can the proposed implementation really handle the possible cases?

A type switch that uses cases with generalized types may wind up with identical types in multiple different cases. We need to clearly explain which case is chosen.