runtime: make markrootSpans time proportional to in-use spans

Currently markrootSpans iterates over all spans ever allocated to find
the in-use spans. Since we now have a list of in-use spans, change it
to iterate over that instead.

This, combined with the previous change, fixes #9265. Before these two
changes, blowing up the heap to 8GB and then shrinking it to a 0MB
live set caused the small-heap portion of the test to run 60x slower
than without the initial blowup. With these two changes, the time is
indistinguishable.

No significant effect on other benchmarks.

Change-Id: I4a27e533efecfb5d18cba3a87c0181a81d0ddc1e
Reviewed-on: https://go-review.googlesource.com/30536
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
diff --git a/src/runtime/mgcmark.go b/src/runtime/mgcmark.go
index eb96858..e0f82d4 100644
--- a/src/runtime/mgcmark.go
+++ b/src/runtime/mgcmark.go
@@ -77,7 +77,13 @@
 		// above invariants for objects that get finalizers
 		// after concurrent mark. In STW GC, this will happen
 		// during mark termination.
-		work.nSpanRoots = (len(work.spans) + rootBlockSpans - 1) / rootBlockSpans
+		//
+		// We're only interested in scanning the in-use spans,
+		// which will all be swept at this point. More spans
+		// may be added to this list during concurrent GC, but
+		// we only care about spans that were allocated before
+		// this mark phase.
+		work.nSpanRoots = mheap_.sweepSpans[mheap_.sweepgen/2%2].numBlocks()
 
 		// On the first markroot, we need to scan all Gs. Gs
 		// may be created after this point, but it's okay that
@@ -332,18 +338,14 @@
 	}
 
 	sg := mheap_.sweepgen
-	startSpan := shard * rootBlockSpans
-	endSpan := (shard + 1) * rootBlockSpans
-	if endSpan > len(work.spans) {
-		endSpan = len(work.spans)
-	}
+	spans := mheap_.sweepSpans[mheap_.sweepgen/2%2].block(shard)
 	// Note that work.spans may not include spans that were
 	// allocated between entering the scan phase and now. This is
 	// okay because any objects with finalizers in those spans
 	// must have been allocated and given finalizers after we
 	// entered the scan phase, so addfinalizer will have ensured
 	// the above invariants for them.
-	for _, s := range work.spans[startSpan:endSpan] {
+	for _, s := range spans {
 		if s.state != mSpanInUse {
 			continue
 		}