Proposal: Generic parameterization of array sizes

Author(s): Andrew Werner

Last updated: March 16th, 2021

Abstract

With the type parameters generics proposal has been accepted, though not yet fully specified or implemented, we can begin to talk about extension. That proposal lists the following omission:

No parameterization on non-type values such as constants. This arises most obviously for arrays, where it might sometimes be convenient to write type Matrix[n int] [n][n]float64. It might also sometimes be useful to specify significant values for a container type, such as a default value for elements.

This proposal seeks to resolve this limitation by (a) specifying when len can be used as a compile-time constant and (b) adding syntax to specify constraints for all arrays of a given type in type lists.

Background

An important property of the generics proposal is that it enables the creation of libraries of specialized container data structures. The existence of such libraries will help developers write more efficient code as these data structures will be able to allocate fewer object and provide greater access locality. This Google blog post about block-based C++ data drives home the point.

The justification is laid out in the omission of the type parameter proposal. The motivation that I've stumbled upon is in trying to implement a B-Tree and allowing the client to dictate the degree of the node.

One initial idea would be to allow the client to provide the actual array which will be backing the data inside the node as a type parameter. This might actually be okay in some data structure user cases but in a B-Tree it's bad because we still would like to instantiate an array for the pointers and that array needs to have a size that is a function of the data array.

The proposal here seeks to make it possible for clients to provide default values for array sizes of generic data structures in a way that is minimally invasive to the concepts which go already has. The shorthand comment stated in the Omission of the Type Parameter Proposal waves its hand at what feels like a number of new and complex concepts for the language.

Proposal

This proposals attempts to side-step questions of how one might provide a scalar value in a type constraint by not ever providing a scalar directly. This proposal recognizes that constants can be used to specify array lengths. It also notes that the value of len() can be computed as a compile-time constant in some cases. Lastly, it observes that type lists could be extended minimally to indicate a constraint that a type is an array of a given type without constraining the length of the array.

The vanilla generic B-Tree

Let's explore the example of a generic B-Tree with a fixed-size buffer. Find such an example here.

// These constants are the wart.
const (
	degree   = 16
	maxItems = 2*degree - 1
	minItems = degree - 1
)

func NewBTree[K, V any](cmp LessFn[K]) OrderedMap[K, V] {
	return &btree[K, V]{cmp: cmp}
}

type btree[K, V any] struct {
	cmp  LessFn[K]
	root *node[K, V]
}

// ...

type node[K, V any] struct {
	count    int16
	leaf     bool
	keys     [maxItems]K
	vals     [maxItems]V
	children [maxItems + 1]*node[K, V]
}

Parameterized nodes

Then we allow parameterization on the node type within the btree implementation so that different node concrete types with different memory layouts may be used. Find an example of this generalization here.

type nodeI[K, V, N any] interface {
	type *N
	find(K, LessFn[K]) (idx int, found bool)
	insert(K, V, LessFn[K]) (replaced bool)
	remove(K, LessFn[K]) (K, V, bool)
	len() int16
	at(idx int16) (K, V)
	child(idx int16) *N
	isLeaf() bool
}

func NewBTree[K, V any](cmp LessFn[K]) OrderedMap[K, V] {
	type N = node[K, V]
	return &btree[K, V, N, *N]{
		cmp: cmp,
		newNode: func(isLeaf bool) *N {
			return &N{leaf: isLeaf}
		},
	}
}

type btree[K, V, N any, NP nodeI[K, V, N]] struct {
	len     int
	cmp     LessFn[K]
	root    NP
	newNode func(isLeaf bool) NP
}

type node[K, V any] struct {
	count    int16
	leaf     bool
	keys     [maxItems]K
	vals     [maxItems]V
	children [maxItems + 1]*node[K, V]
}

This still ends up using constants and there‘s no really easy way around that. You might want to parameterize the arrays into the node like in this example. This still doesn’t tell a story about how to relate the children array to the items.

The proposal to parameterize the arrays

Instead, we‘d like to find a way to express the idea that there’s a size constant which can be used in the type definitions. The proposal would result in an implementation that looked like this.


// StructArr is a constraint that says that a type is an array of empty // structs of any length. type StructArr interface { type [...]struct{} } type btree[K, V, N any, NP nodeI[K, V, N]] struct { len int cmp LessFn[K] root NP newNode func(isLeaf bool) NP } // NewBTree constructs a generic BTree-backed map with degree 16. func NewBTree[K, V any](cmp LessFn[K]) OrderedMap[K, V] { const defaultDegree = 16 return NewBTreeWithDegree[K, V, [defaultDegree]struct{}](cmp) } // NewBTreeWithDegree constructs a generic BTree-backed map with degree equal // to the length of the array used as type parameter A. func NewBTreeWithDegree[K, V any, A StructArr](cmp LessFn[K]) OrderedMap[K, V] { type N = node[K, V, A] return &btree[K, V, N, *N]{ cmp: cmp, newNode: func(isLeaf bool) *N { return &N{leaf: isLeaf} }, } } type node[K, V any, A StructArr] struct { count int16 leaf bool keys [2*len(A) - 1]K values [2*len(A) - 1]V children [2 * len(A)]*node[K, V, A] }

The Matrix example

The example of the omission in type parameter proposal could be achieved in the following way:

type Dim interface {
    type [...]struct{}
}

type SquareFloatMatrix2D[D Dim] [len(D)][len(D)]float64

Summary

  1. Support type list constraints to express that a type is an array
// Array expresses a constraint that a type is an array of T of any
// size.
type Array[T any] interface {
    type [...]T
}
  1. Support a compile-time constant expression for len([...]T)

This handy syntax would permit parameterization of arrays relative to other array types. Note that the constant expression len function on array types could actually be implemented today using unsafe.Sizeof by a parameterization over an array whose members have non-zero size. For example len could be written as unsafe.Sizeof([...]T)/unsafe.Sizeof(T) so long as unsafe.Sizeof(T) > 0.

Rationale

This approach is simpler than generally providing a constant scalar expression parameterization of generic types. Of the two elements of the proposal, neither feels particularly out of line with the design of the language or its concepts. The [...]T syntax exists in the language to imply length inference for array literals and is not a hard to imagine concept. It is the deeper requirement to make this proposal work.

One potential downside of this proposal is that we‘re not really using the array for anything other than its size which may feel awkward. For that reason I’ve opted to use a constraint which forces the array to use struct{} values to indicate that the structure of the elements isn't relevant. This awkwardness feels justified to side-step introduces scalars to type parameters.

Compatibility

This proposal is fully backwards compatible with all of the language and also the now accepted type parameters proposal.

Implementation

Neither of the two features of this proposal feel particularly onerous to implement. My guess is that the [...]T type list constraint would be extremely straightforward given an implementation of type parameters. The len implementation is also likely to be straightforward given the existence of both compile-time evaluation of len expressions on array types which exist in the language and the constant nature of unsafe.Sizeof. Maybe there'd be some pain in deferring the expression evaluation until after type checking.