runtime: unify mutex code across OSes
The change introduces 2 generic mutex implementations
(futex- and semaphore-based). Each OS chooses a suitable mutex
implementation and implements few callbacks (e.g. futex wait/wake).
The CL reduces code duplication, extends some optimizations available
only on Linux/Windows to other OSes and provides ground
for futher optimizations. Chan finalizers are finally eliminated.

(Linux/amd64, 8 HT cores)
benchmark                      old      new
BenchmarkChanContended         83.6     77.8 ns/op
BenchmarkChanContended-2       341      328 ns/op
BenchmarkChanContended-4       382      383 ns/op
BenchmarkChanContended-8       390      374 ns/op
BenchmarkChanContended-16      313      291 ns/op

(Darwin/amd64, 2 cores)
benchmark                      old      new
BenchmarkChanContended         159      172 ns/op
BenchmarkChanContended-2       6735     263 ns/op
BenchmarkChanContended-4       10384    255 ns/op
BenchmarkChanCreation          1174     407 ns/op
BenchmarkChanCreation-2        4007     254 ns/op
BenchmarkChanCreation-4        4029     246 ns/op

R=rsc, jsing, hectorchu
CC=golang-dev
https://golang.org/cl/5140043
11 files changed