[release-branch.go1.7] runtime: force workers out before checking mark roots

Fixes #18700 (backport)

Currently we check that all roots are marked as soon as gcMarkDone
decides to transition from mark 1 to mark 2. However, issue #16083
indicates that there may be a race where we try to complete mark 1
while a worker is still scanning a stack, causing the root mark check
to fail.

We don't yet understand this race, but as a simple mitigation, move
the root check to after gcMarkDone performs a ragged barrier, which
will force any remaining workers to finish their current job.

Updates #16083. This may "fix" it, but it would be better to
understand and fix the underlying race.

Change-Id: I1af9ce67bd87ade7bc2a067295d79c28cd11abd2
Reviewed-on: https://go-review.googlesource.com/35678
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
diff --git a/src/runtime/mgc.go b/src/runtime/mgc.go
index 3b238cb..67b4a21 100644
--- a/src/runtime/mgc.go
+++ b/src/runtime/mgc.go
@@ -1076,8 +1076,6 @@
 		// objects reachable from global roots since they don't have write
 		// barriers. Rescan some roots and flush work caches.
 
-		gcMarkRootCheck()
-
 		// Disallow caching workbufs and indicate that we're in mark 2.
 		gcBlackenPromptly = true
 
@@ -1101,6 +1099,16 @@
 			})
 		})
 
+		// Check that roots are marked. We should be able to
+		// do this before the forEachP, but based on issue
+		// #16083 there may be a (harmless) race where we can
+		// enter mark 2 while some workers are still scanning
+		// stacks. The forEachP ensures these scans are done.
+		//
+		// TODO(austin): Figure out the race and fix this
+		// properly.
+		gcMarkRootCheck()
+
 		// Now we can start up mark 2 workers.
 		atomic.Xaddint64(&gcController.dedicatedMarkWorkersNeeded, 0xffffffff)
 		atomic.Xaddint64(&gcController.fractionalMarkWorkersNeeded, 0xffffffff)