runtime: add a barrier after a new span is allocated

When copying a stack, we
1. allocate a new stack,
2. adjust pointers pointing to the old stack to pointing to the
   new stack.

If the GC is running on another thread concurrently, on a machine
with weak memory model, the GC could observe the adjusted pointer
(e.g. through gp._defer which could be a special heap-to-stack
pointer), but not observe the publish of the new stack span. In
this case, the GC will see the adjusted pointer pointing to an
unallocated span, and throw. Fixing this by adding a publication
barrier between the allocation of the span and adjusting pointers.

One testcase for this is TestDeferHeapAndStack in long mode. It
fails reliably on linux-mips64le-mengzhuo builder without the fix,
and passes reliably after the fix.

Fixes #35541.

Change-Id: I82b09b824fdf14be7336a9ee853f56dec1b13b90
Reviewed-on: https://go-review.googlesource.com/c/go/+/234478
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
diff --git a/src/runtime/mheap.go b/src/runtime/mheap.go
index 6f7dc6e..2c7bfd8 100644
--- a/src/runtime/mheap.go
+++ b/src/runtime/mheap.go
@@ -1283,9 +1283,10 @@
 	// Publish the span in various locations.
 
 	// This is safe to call without the lock held because the slots
-	// related to this span will only every be read or modified by
-	// this thread until pointers into the span are published or
-	// pageInUse is updated.
+	// related to this span will only ever be read or modified by
+	// this thread until pointers into the span are published (and
+	// we execute a publication barrier at the end of this function
+	// before that happens) or pageInUse is updated.
 	h.setSpans(s.base(), npages, s)
 
 	if !manual {
@@ -1315,6 +1316,11 @@
 			traceHeapAlloc()
 		}
 	}
+
+	// Make sure the newly allocated span will be observed
+	// by the GC before pointers into the span are published.
+	publicationBarrier()
+
 	return s
 }