_content/blog/allocation-optimizations.md - website - Git at Google

 ---
 title: Allocating on the Stack
 date: 2026-02-27
 by:
 - Keith Randall
 summary: A description of some of the recent changes to do allocations on the stack instead of the heap.
 template: true
 ---

 We're always looking for ways to make Go programs faster. In the last
 2 releases, we have concentrated on mitigating a particular source of
 slowness, heap allocations. Each time a Go program allocates memory
 from the heap, there's a fairly large chunk of code that needs to run
 to satisfy that allocation. In addition, heap allocations present
 additional load on the garbage collector.  Even with recent
 enhancements like [Green Tea](/blog/greenteagc), the garbage collector
 still incurs substantial overhead.

 So we've been working on ways to do more allocations on the stack
 instead of the heap.  Stack allocations are considerably cheaper to
 perform (sometimes completely free).  Moreover, they present no load
 to the garbage collector, as stack allocations can be collected
 automatically together with the stack frame itself. Stack allocations
 also enable prompt reuse, which is very cache friendly.

 ## Stack allocation of constant-sized slices

 Consider the task of building a slice of tasks to process:
 {{raw `
 	func process(c chan task) {
 		var tasks []task
 		for t := range c {
 			tasks = append(tasks, t)
 		}
 		processAll(tasks)
 	}
 `}}

 Let's walk through what happens at runtime when pulling tasks from the
 channel `c` and adding them to the slice `tasks`.

 On the first loop iteration, there is no backing store for `tasks`, so
 `append` has to allocate one. Because it doesn't know how big the
 slice will eventually be, it can't be too aggressive. Currently, it
 allocates a backing store of size 1.

 On the second loop iteration, the backing store now exists, but it is
 full. `append` again has to allocate a new backing store, this time of
 size 2. The old backing store of size 1 is now garbage.

 On the third loop iteration, the backing store of size 2 is
 full. `append` *again* has to allocate a new backing store, this time
 of size 4. The old backing store of size 2 is now garbage.

 On the fourth loop iteration, the backing store of size 4 has only 3
 items in it. `append` can just place the item in the existing backing
 store and bump up the slice length. Yay! No call to the allocator for
 this iteration.

 On the fifth loop iteration, the backing store of size 4 is full, and
 `append` again has to allocate a new backing store, this time of size
 8.

 And so on. We generally double the size of the allocation each time it
 fills up, so we can eventually append most new tasks to the slice
 without allocation. But there is a fair amount of overhead in the
 "startup" phase when the slice is small. During this startup phase we
 spend a lot of time in the allocator, and produce a bunch of garbage,
 which seems pretty wasteful. And it may be that in your program, the
 slice never really gets large. This startup phase may be all you ever
 encounter.

 If this code was a really hot part of your program, you might be
 tempted to start the slice out at a larger size, to avoid all of these
 allocations.

 {{raw `
 	func process2(c chan task) {
 		tasks := make([]task, 0, 10) // probably at most 10 tasks
 		for t := range c {
 			tasks = append(tasks, t)
 		}
 		processAll(tasks)
 	}
 `}}

 This is a reasonable optimization to do. It is never incorrect; your
 program still runs correctly. If the guess is too small, you get
 allocations from `append` as before. If the guess is too large, you
 waste some memory.

 If your guess for the number of tasks was a good one, then there's
 only one allocation site in this program. The `make` call allocates a
 slice backing store of the correct size, and `append` never has to do
 any reallocation.

 The surprising thing is that if you benchmark this code with 10
 elements in the channel, you'll see that you didn't reduce the number
 of allocations to 1, you reduced the number of allocations to 0!

 The reason is that the compiler decided to allocate the backing store
 on the stack. Because it knows what size it needs to be (10 times the
 size of a task) it can allocate storage for it in the stack frame of
 `process2` instead of on the heap[<sup>1</sup>](#footnotes).  Note
 that this depends on the fact that the backing store does not [escape
 to the heap](/doc/gc-guide#Escape_analysis) inside of `processAll`.

 ## Stack allocation of variable-sized slices

 But of course, hard coding a size guess is a bit rigid.
 Maybe we can pass in an estimated length?

 {{raw `
 	func process3(c chan task, lengthGuess int) {
 		tasks := make([]task, 0, lengthGuess)
 		for t := range c {
 			tasks = append(tasks, t)
 		}
 		processAll(tasks)
 	}
 `}}

 This lets the caller pick a good size for the `tasks` slice, which may
 vary depending on where this code is being called from.

 Unfortunately, in Go 1.24 the non-constant size of the backing store
 means the compiler can no longer allocate the backing store on the
 stack.  It will end up on the heap, converting our 0-allocation code
 to 1-allocation code. Still better than having `append` do all the
 intermediate allocations, but unfortunate.

 But never fear, Go 1.25 is here!

 Imagine you decide to do the following, to get the stack allocation
 only in cases where the guess is small:

 {{raw `
 	func process4(c chan task, lengthGuess int) {
 		var tasks []task
 		if lengthGuess <= 10 {
 			tasks = make([]task, 0, 10)
 		} else {
 			tasks = make([]task, 0, lengthGuess)
 		}
 		for t := range c {
 			tasks = append(tasks, t)
 		}
 		processAll(tasks)
 	}
 `}}

 Kind of ugly, but it would work. When the guess is small, you use a
 constant size `make` and thus a stack-allocated backing store, and
 when the guess is larger you use a variable size `make` and allocate
 the backing store from the heap.

 But in Go 1.25, you don't need to head down this ugly road. The Go
 1.25 compiler does this transformation for you!  For certain slice
 allocation locations, the compiler automatically allocates a small
 (currently 32-byte) slice backing store, and uses that backing store
 for the result of the `make` if the size requested is small
 enough. Otherwise, it uses a heap allocation as normal.

 In Go 1.25, `process3` performs zero heap allocations, if
 `lengthGuess` is small enough that a slice of that length fits into 32
 bytes. (And of course that `lengthGuess` is a correct guess for how
 many items are in `c`.)

 We're always improving the performance of Go, so upgrade to the latest
 Go release and [be
 surprised](https://youtu.be/FUm0pfgWehI?si=QRTt_JYwr-cRHDNJ&t=960) by
 how much faster and memory efficient your program becomes!

 ## Stack allocation of append-allocated slices

 Ok, but you still don't want to have to change your API to add this
 weird length guess. Anything else you could do?

 Upgrade to Go 1.26!

 {{raw `
 	func process(c chan task) {
 		var tasks []task
 		for t := range c {
 			tasks = append(tasks, t)
 		}
 		processAll(tasks)
 	}
 `}}

 In Go 1.26, we allocate the same kind of small, speculative backing
 store on the stack, but now we can use it directly at the `append`
 site.

 On the first loop iteration, there is no backing store for `tasks`, so
 `append` uses a small, stack-allocated backing store as the first
 allocation. If, for instance, we can fit 4 `task`s in that backing store,
 the first `append` allocates a backing store of length 4 from the stack.

 The next 3 loop iterations append directly to the stack backing store,
 requiring no allocation.

 On the 4th iteration, the stack backing store is finally full and we
 have to go to the heap for more backing store. But we have avoided
 almost all of the startup overhead described earlier in this article.
 No heap allocations of size, 1, 2, and 4, and none of the garbage that
 they eventually become. If your slices are small, maybe you will never
 have a heap allocation.

 ## Stack allocation of append-allocated escaping slices

 Ok, this is all good when the `tasks` slice doesn't escape. But what if
 I'm returning the slice? Then it can't be allocated on the stack, right?

 Right! The backing store for the slice returned by `extract` below
 can't be allocated on the stack, because the stack frame for `extract`
 disappears when `extract` returns.

 {{raw `
 	func extract(c chan task) []task {
 		var tasks []task
 		for t := range c {
 			tasks = append(tasks, t)
 		}
 		return tasks
 	}
 `}}

 But you might think, the *returned* slice can't be allocated on the
 stack. But what about all those intermediate slices that just become
 garbage? Maybe we can allocate those on the stack?

 {{raw `
 	func extract2(c chan task) []task {
 		var tasks []task
 		for t := range c {
 			tasks = append(tasks, t)
 		}
 		tasks2 := make([]task, len(tasks))
 		copy(tasks2, tasks)
 		return tasks2
 	}
 `}}

 Then the `tasks` slice never escapes `extract2`. It can benefit from
 all of the optimizations described above. Then at the very end of
 `extract2`, when we know the final size of the slice, we do one heap
 allocation of the required size, copy our `task`s into it, and return
 the copy.

 But do you really want to write all that additional code? It seems
 error prone. Maybe the compiler can do this transformation for us?

 In Go 1.26, it can!

 For escaping slices, the compiler will transform the original `extract`
 code to something like this:

 {{raw `
 	func extract3(c chan task) []task {
 		var tasks []task
 		for t := range c {
 			tasks = append(tasks, t)
 		}
 		tasks = runtime.move2heap(tasks)
 		return tasks
 	}
 `}}

 `runtime.move2heap` is a special compiler+runtime function that is the
 identity function for slices that are already allocated in the heap.
 For slices that are on the stack, it allocates a new slice on the
 heap, copies the stack-allocated slice to the heap copy, and returns
 the heap copy.

 This ensures that for our original `extract` code, if the number of
 items fits in our small stack-allocated buffer, we perform exactly 1
 allocation of exactly the right size. If the number of items exceeds
 the capacity of our small stack-allocated buffer, we do our normal
 doubling-allocation once the stack-allocated buffer overflows.

 The optimization that Go 1.26 does is actually better than the
 hand-optimized code, because it does not require the extra
 allocation+copy that the hand-optimized code always does at the end.
 It requires the allocation+copy only in the case that we've exclusively
 operated on a stack-backed slice up to the return point.

 We do pay the cost for a copy, but that cost is almost completely
 offset by the copies in the startup phase that we no longer have to
 do. (In fact, the new scheme at worst has to copy one more element
 than the old scheme.)

 ## Wrapping up

 Hand optimization can still be beneficial, especially if you have a
 good estimate of the slice size ahead of time. But hopefully the
 compiler will now catch a lot of the simple cases for you and allow
 you to focus on the remaining ones that really matter.

 There are a lot of details that the compiler needs to ensure to get
 all these optimizations right. If you think that one of these
 optimizations is causing correctness or (negative) performance issues
 for you, you can turn them off with
 `-gcflags=all=-d=variablemakehash=n`. If turning these optimizations
 off helps, please [file an issue](/issue/new) so we can investigate.

 ## Footnotes

 <sup>1</sup> Go stacks do not have any `alloca`-style mechanism for
 dynamically-sized stack frames. All Go stack frames are constant
 sized.
	---
	title: Allocating on the Stack
	date: 2026-02-27
	by:
	- Keith Randall
	summary: A description of some of the recent changes to do allocations on the stack instead of the heap.
	template: true
	---

	We're always looking for ways to make Go programs faster. In the last
	2 releases, we have concentrated on mitigating a particular source of
	slowness, heap allocations. Each time a Go program allocates memory
	from the heap, there's a fairly large chunk of code that needs to run
	to satisfy that allocation. In addition, heap allocations present
	additional load on the garbage collector. Even with recent
	enhancements like [Green Tea](/blog/greenteagc), the garbage collector
	still incurs substantial overhead.

	So we've been working on ways to do more allocations on the stack
	instead of the heap. Stack allocations are considerably cheaper to
	perform (sometimes completely free). Moreover, they present no load
	to the garbage collector, as stack allocations can be collected
	automatically together with the stack frame itself. Stack allocations
	also enable prompt reuse, which is very cache friendly.

	## Stack allocation of constant-sized slices

	Consider the task of building a slice of tasks to process:
	{{raw `
	func process(c chan task) {
	var tasks []task
	for t := range c {
	tasks = append(tasks, t)
	}
	processAll(tasks)
	}
	`}}

	Let's walk through what happens at runtime when pulling tasks from the
	channel `c` and adding them to the slice `tasks`.

	On the first loop iteration, there is no backing store for `tasks`, so
	`append` has to allocate one. Because it doesn't know how big the
	slice will eventually be, it can't be too aggressive. Currently, it
	allocates a backing store of size 1.

	On the second loop iteration, the backing store now exists, but it is
	full. `append` again has to allocate a new backing store, this time of
	size 2. The old backing store of size 1 is now garbage.

	On the third loop iteration, the backing store of size 2 is
	full. `append` again has to allocate a new backing store, this time
	of size 4. The old backing store of size 2 is now garbage.

	On the fourth loop iteration, the backing store of size 4 has only 3
	items in it. `append` can just place the item in the existing backing
	store and bump up the slice length. Yay! No call to the allocator for
	this iteration.

	On the fifth loop iteration, the backing store of size 4 is full, and
	`append` again has to allocate a new backing store, this time of size
	8.

	And so on. We generally double the size of the allocation each time it
	fills up, so we can eventually append most new tasks to the slice
	without allocation. But there is a fair amount of overhead in the
	"startup" phase when the slice is small. During this startup phase we
	spend a lot of time in the allocator, and produce a bunch of garbage,
	which seems pretty wasteful. And it may be that in your program, the
	slice never really gets large. This startup phase may be all you ever
	encounter.

	If this code was a really hot part of your program, you might be
	tempted to start the slice out at a larger size, to avoid all of these
	allocations.

	{{raw `
	func process2(c chan task) {
	tasks := make([]task, 0, 10) // probably at most 10 tasks
	for t := range c {
	tasks = append(tasks, t)
	}
	processAll(tasks)
	}
	`}}

	This is a reasonable optimization to do. It is never incorrect; your
	program still runs correctly. If the guess is too small, you get
	allocations from `append` as before. If the guess is too large, you
	waste some memory.

	If your guess for the number of tasks was a good one, then there's
	only one allocation site in this program. The `make` call allocates a
	slice backing store of the correct size, and `append` never has to do
	any reallocation.

	The surprising thing is that if you benchmark this code with 10
	elements in the channel, you'll see that you didn't reduce the number
	of allocations to 1, you reduced the number of allocations to 0!

	The reason is that the compiler decided to allocate the backing store
	on the stack. Because it knows what size it needs to be (10 times the
	size of a task) it can allocate storage for it in the stack frame of
	`process2` instead of on the heap[<sup>1</sup>](#footnotes). Note
	that this depends on the fact that the backing store does not [escape
	to the heap](/doc/gc-guide#Escape_analysis) inside of `processAll`.

	## Stack allocation of variable-sized slices

	But of course, hard coding a size guess is a bit rigid.
	Maybe we can pass in an estimated length?

	{{raw `
	func process3(c chan task, lengthGuess int) {
	tasks := make([]task, 0, lengthGuess)
	for t := range c {
	tasks = append(tasks, t)
	}
	processAll(tasks)
	}
	`}}

	This lets the caller pick a good size for the `tasks` slice, which may
	vary depending on where this code is being called from.

	Unfortunately, in Go 1.24 the non-constant size of the backing store
	means the compiler can no longer allocate the backing store on the
	stack. It will end up on the heap, converting our 0-allocation code
	to 1-allocation code. Still better than having `append` do all the
	intermediate allocations, but unfortunate.

	But never fear, Go 1.25 is here!

	Imagine you decide to do the following, to get the stack allocation
	only in cases where the guess is small:

	{{raw `
	func process4(c chan task, lengthGuess int) {
	var tasks []task
	if lengthGuess <= 10 {
	tasks = make([]task, 0, 10)
	} else {
	tasks = make([]task, 0, lengthGuess)
	}
	for t := range c {
	tasks = append(tasks, t)
	}
	processAll(tasks)
	}
	`}}

	Kind of ugly, but it would work. When the guess is small, you use a
	constant size `make` and thus a stack-allocated backing store, and
	when the guess is larger you use a variable size `make` and allocate
	the backing store from the heap.

	But in Go 1.25, you don't need to head down this ugly road. The Go
	1.25 compiler does this transformation for you! For certain slice
	allocation locations, the compiler automatically allocates a small
	(currently 32-byte) slice backing store, and uses that backing store
	for the result of the `make` if the size requested is small
	enough. Otherwise, it uses a heap allocation as normal.

	In Go 1.25, `process3` performs zero heap allocations, if
	`lengthGuess` is small enough that a slice of that length fits into 32
	bytes. (And of course that `lengthGuess` is a correct guess for how
	many items are in `c`.)

	We're always improving the performance of Go, so upgrade to the latest
	Go release and [be
	surprised](https://youtu.be/FUm0pfgWehI?si=QRTt_JYwr-cRHDNJ&t=960) by
	how much faster and memory efficient your program becomes!

	## Stack allocation of append-allocated slices

	Ok, but you still don't want to have to change your API to add this
	weird length guess. Anything else you could do?

	Upgrade to Go 1.26!

	{{raw `
	func process(c chan task) {
	var tasks []task
	for t := range c {
	tasks = append(tasks, t)
	}
	processAll(tasks)
	}
	`}}

	In Go 1.26, we allocate the same kind of small, speculative backing
	store on the stack, but now we can use it directly at the `append`
	site.

	On the first loop iteration, there is no backing store for `tasks`, so
	`append` uses a small, stack-allocated backing store as the first
	allocation. If, for instance, we can fit 4 `task`s in that backing store,
	the first `append` allocates a backing store of length 4 from the stack.

	The next 3 loop iterations append directly to the stack backing store,
	requiring no allocation.

	On the 4th iteration, the stack backing store is finally full and we
	have to go to the heap for more backing store. But we have avoided
	almost all of the startup overhead described earlier in this article.
	No heap allocations of size, 1, 2, and 4, and none of the garbage that
	they eventually become. If your slices are small, maybe you will never
	have a heap allocation.

	## Stack allocation of append-allocated escaping slices

	Ok, this is all good when the `tasks` slice doesn't escape. But what if
	I'm returning the slice? Then it can't be allocated on the stack, right?

	Right! The backing store for the slice returned by `extract` below
	can't be allocated on the stack, because the stack frame for `extract`
	disappears when `extract` returns.

	{{raw `
	func extract(c chan task) []task {
	var tasks []task
	for t := range c {
	tasks = append(tasks, t)
	}
	return tasks
	}
	`}}

	But you might think, the returned slice can't be allocated on the
	stack. But what about all those intermediate slices that just become
	garbage? Maybe we can allocate those on the stack?

	{{raw `
	func extract2(c chan task) []task {
	var tasks []task
	for t := range c {
	tasks = append(tasks, t)
	}
	tasks2 := make([]task, len(tasks))
	copy(tasks2, tasks)
	return tasks2
	}
	`}}

	Then the `tasks` slice never escapes `extract2`. It can benefit from
	all of the optimizations described above. Then at the very end of
	`extract2`, when we know the final size of the slice, we do one heap
	allocation of the required size, copy our `task`s into it, and return
	the copy.

	But do you really want to write all that additional code? It seems
	error prone. Maybe the compiler can do this transformation for us?

	In Go 1.26, it can!

	For escaping slices, the compiler will transform the original `extract`
	code to something like this:

	{{raw `
	func extract3(c chan task) []task {
	var tasks []task
	for t := range c {
	tasks = append(tasks, t)
	}
	tasks = runtime.move2heap(tasks)
	return tasks
	}
	`}}

	`runtime.move2heap` is a special compiler+runtime function that is the
	identity function for slices that are already allocated in the heap.
	For slices that are on the stack, it allocates a new slice on the
	heap, copies the stack-allocated slice to the heap copy, and returns
	the heap copy.

	This ensures that for our original `extract` code, if the number of
	items fits in our small stack-allocated buffer, we perform exactly 1
	allocation of exactly the right size. If the number of items exceeds
	the capacity of our small stack-allocated buffer, we do our normal
	doubling-allocation once the stack-allocated buffer overflows.

	The optimization that Go 1.26 does is actually better than the
	hand-optimized code, because it does not require the extra
	allocation+copy that the hand-optimized code always does at the end.
	It requires the allocation+copy only in the case that we've exclusively
	operated on a stack-backed slice up to the return point.

	We do pay the cost for a copy, but that cost is almost completely
	offset by the copies in the startup phase that we no longer have to
	do. (In fact, the new scheme at worst has to copy one more element
	than the old scheme.)

	## Wrapping up

	Hand optimization can still be beneficial, especially if you have a
	good estimate of the slice size ahead of time. But hopefully the
	compiler will now catch a lot of the simple cases for you and allow
	you to focus on the remaining ones that really matter.

	There are a lot of details that the compiler needs to ensure to get
	all these optimizations right. If you think that one of these
	optimizations is causing correctness or (negative) performance issues
	for you, you can turn them off with
	`-gcflags=all=-d=variablemakehash=n`. If turning these optimizations
	off helps, please [file an issue](/issue/new) so we can investigate.

	## Footnotes

	<sup>1</sup> Go stacks do not have any `alloca`-style mechanism for
	dynamically-sized stack frames. All Go stack frames are constant
	sized.