runtime: steal timers from running P's

Previously we did not steal timers from running P's, because that P
should be responsible for running its own timers. However, if the P
is running a CPU-bound G, this can cause measurable delays in running
ready timers. Also, in CL 214185 we avoided taking the timer lock of a P
with no ready timers, which reduces the chances of timer lock contention.

So, if we can't find any ready timers on sleeping P's, try stealing
them from running P's.

Fixes #38860

Change-Id: I0bf1d5dc56258838bdacccbf89493524e23d7fed
Reviewed-on: https://go-review.googlesource.com/c/go/+/232199
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
diff --git a/src/runtime/proc.go b/src/runtime/proc.go
index 766784c..2399f0a 100644
--- a/src/runtime/proc.go
+++ b/src/runtime/proc.go
@@ -2231,11 +2231,14 @@
 			// Consider stealing timers from p2.
 			// This call to checkTimers is the only place where
 			// we hold a lock on a different P's timers.
-			// Lock contention can be a problem here, so avoid
-			// grabbing the lock if p2 is running and not marked
-			// for preemption. If p2 is running and not being
-			// preempted we assume it will handle its own timers.
-			if i > 2 && shouldStealTimers(p2) {
+			// Lock contention can be a problem here, so
+			// initially avoid grabbing the lock if p2 is running
+			// and is not marked for preemption. If p2 is running
+			// and not being preempted we assume it will handle its
+			// own timers.
+			// If we're still looking for work after checking all
+			// the P's, then go ahead and steal from an active P.
+			if i > 2 || (i > 1 && shouldStealTimers(p2)) {
 				tnow, w, ran := checkTimers(p2, now)
 				now = tnow
 				if w != 0 && (pollUntil == 0 || w < pollUntil) {