runtime: don't free large spans until heapBitsSweepSpan returns

This fixes a race between 1) sweeping and freeing an unmarked large
span and 2) reusing that span and allocating from it. This race arises
because mSpan_Sweep returns spans for large objects to the heap
*before* heapBitsSweepSpan clears the mark bit on the object in the
span.

Specifically, the following sequence of events can lead to an
incorrectly zeroed bitmap byte, which causes the garbage collector to
not trace any pointers in that object (the pointer bits for the first
four words are cleared, and the scan bits are also cleared, so it
looks like a no-scan object).

1) P0 calls mSpan_Sweep on a large span S0 with an unmarked object on it.

2) mSpan_Sweep calls heapBitsSweepSpan, which invokes the callback for
   the one (unmarked) object on the span.

3) The callback calls mHeap_Free, which makes span S0 available for
   allocation, but this is too early.

4) P1 grabs this S0 from the heap to use for allocation.

5) P1 allocates an object on this span and writes that object's type
   bits to the bitmap.

6) P0 returns from the callback to heapBitsSweepSpan.
   heapBitsSweepSpan clears the byte containing the mark, even though
   this span is now owned by P1 and this byte contains important
   bitmap information.

This fixes this problem by simply delaying the mHeap_Free until after
the heapBitsSweepSpan. I think the overall logic of mSpan_Sweep could
be simplified now, but this seems like the minimal change.

Fixes #11617.

Change-Id: I6b1382c7e7cc35f81984467c0772fe9848b7522a
Reviewed-on: https://go-review.googlesource.com/12320
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Rob Pike <r@golang.org>
diff --git a/src/runtime/mgcsweep.go b/src/runtime/mgcsweep.go
index 910257a..acad55e 100644
--- a/src/runtime/mgcsweep.go
+++ b/src/runtime/mgcsweep.go
@@ -170,12 +170,12 @@
 	cl := s.sizeclass
 	size := s.elemsize
 	res := false
-	nfree := 0
+	nfree := 0 // Set to -1 for large span
 
 	var head, end gclinkptr
 
 	c := _g_.m.mcache
-	sweepgenset := false
+	freeToHeap := false
 
 	// Mark any free objects in this span so we don't collect them.
 	sstart := uintptr(s.start << _PageShift)
@@ -237,31 +237,10 @@
 
 			// important to set sweepgen before returning it to heap
 			atomicstore(&s.sweepgen, sweepgen)
-			sweepgenset = true
 
-			// NOTE(rsc,dvyukov): The original implementation of efence
-			// in CL 22060046 used SysFree instead of SysFault, so that
-			// the operating system would eventually give the memory
-			// back to us again, so that an efence program could run
-			// longer without running out of memory. Unfortunately,
-			// calling SysFree here without any kind of adjustment of the
-			// heap data structures means that when the memory does
-			// come back to us, we have the wrong metadata for it, either in
-			// the MSpan structures or in the garbage collection bitmap.
-			// Using SysFault here means that the program will run out of
-			// memory fairly quickly in efence mode, but at least it won't
-			// have mysterious crashes due to confused memory reuse.
-			// It should be possible to switch back to SysFree if we also
-			// implement and then call some kind of MHeap_DeleteSpan.
-			if debug.efence > 0 {
-				s.limit = 0 // prevent mlookup from finding this span
-				sysFault(unsafe.Pointer(p), size)
-			} else {
-				mHeap_Free(&mheap_, s, 1)
-			}
-			c.local_nlargefree++
-			c.local_largefree += size
-			res = true
+			// Free the span after heapBitsSweepSpan
+			// returns, since it's not done with the span.
+			freeToHeap = true
 		} else {
 			// Free small object.
 			if size > 2*ptrSize {
@@ -285,7 +264,10 @@
 	// But we need to set it before we make the span available for allocation
 	// (return it to heap or mcentral), because allocation code assumes that a
 	// span is already swept if available for allocation.
-	if !sweepgenset && nfree == 0 {
+	//
+	// TODO(austin): Clean this up by consolidating atomicstore in
+	// large span path above with this.
+	if !freeToHeap && nfree == 0 {
 		// The span must be in our exclusive ownership until we update sweepgen,
 		// check for potential races.
 		if s.state != mSpanInUse || s.sweepgen != sweepgen-1 {
@@ -298,6 +280,32 @@
 		c.local_nsmallfree[cl] += uintptr(nfree)
 		res = mCentral_FreeSpan(&mheap_.central[cl].mcentral, s, int32(nfree), head, end, preserve)
 		// MCentral_FreeSpan updates sweepgen
+	} else if freeToHeap {
+		// Free large span to heap
+
+		// NOTE(rsc,dvyukov): The original implementation of efence
+		// in CL 22060046 used SysFree instead of SysFault, so that
+		// the operating system would eventually give the memory
+		// back to us again, so that an efence program could run
+		// longer without running out of memory. Unfortunately,
+		// calling SysFree here without any kind of adjustment of the
+		// heap data structures means that when the memory does
+		// come back to us, we have the wrong metadata for it, either in
+		// the MSpan structures or in the garbage collection bitmap.
+		// Using SysFault here means that the program will run out of
+		// memory fairly quickly in efence mode, but at least it won't
+		// have mysterious crashes due to confused memory reuse.
+		// It should be possible to switch back to SysFree if we also
+		// implement and then call some kind of MHeap_DeleteSpan.
+		if debug.efence > 0 {
+			s.limit = 0 // prevent mlookup from finding this span
+			sysFault(unsafe.Pointer(uintptr(s.start<<_PageShift)), size)
+		} else {
+			mHeap_Free(&mheap_, s, 1)
+		}
+		c.local_nlargefree++
+		c.local_largefree += size
+		res = true
 	}
 	if trace.enabled {
 		traceGCSweepDone()