| <!--{ |
| "Title": "Data Race Detector", |
| "Template": true |
| }--> |
| |
| <h2 id="Introduction">Introduction</h2> |
| |
| <p> |
| Data races are one of the most common and hardest to debug types of bugs in concurrent systems. A data race occurs when two goroutines access the same variable concurrently and at least one of the accesses is a write. See the <a href="/ref/mem/">The Go Memory Model</a> for details. |
| </p> |
| |
| <p> |
| Here is an example of a data race that can lead to crashes and memory corruption: |
| </p> |
| |
| <pre> |
| func main() { |
| c := make(chan bool) |
| m := make(map[string]string) |
| go func() { |
| m["1"] = "a" // First conflicting access. |
| c <- true |
| }() |
| m["2"] = "b" // Second conflicting access. |
| <-c |
| for k, v := range m { |
| fmt.Println(k, v) |
| } |
| } |
| </pre> |
| |
| <h2 id="Usage">Usage</h2> |
| |
| <p> |
| Fortunately, Go includes a built-in data race detector. To use it, add the <code>-race</code> flag to the go command: |
| </p> |
| |
| <pre> |
| $ go test -race mypkg // to test the package |
| $ go run -race mysrc.go // to run the source file |
| $ go build -race mycmd // to build the command |
| $ go install -race mypkg // to install the package |
| </pre> |
| |
| <h2 id="Report_Format">Report Format</h2> |
| |
| <p> |
| When the race detector finds a data race in the program, it prints a report. The report contains stack traces for conflicting accesses, as well as stacks where the involved goroutines were created. For example: |
| </p> |
| |
| <pre> |
| WARNING: DATA RACE |
| Read by goroutine 185: |
| net.(*pollServer).AddFD() |
| src/pkg/net/fd_unix.go:89 +0x398 |
| net.(*pollServer).WaitWrite() |
| src/pkg/net/fd_unix.go:247 +0x45 |
| net.(*netFD).Write() |
| src/pkg/net/fd_unix.go:540 +0x4d4 |
| net.(*conn).Write() |
| src/pkg/net/net.go:129 +0x101 |
| net.func·060() |
| src/pkg/net/timeout_test.go:603 +0xaf |
| |
| Previous write by goroutine 184: |
| net.setWriteDeadline() |
| src/pkg/net/sockopt_posix.go:135 +0xdf |
| net.setDeadline() |
| src/pkg/net/sockopt_posix.go:144 +0x9c |
| net.(*conn).SetDeadline() |
| src/pkg/net/net.go:161 +0xe3 |
| net.func·061() |
| src/pkg/net/timeout_test.go:616 +0x3ed |
| |
| Goroutine 185 (running) created at: |
| net.func·061() |
| src/pkg/net/timeout_test.go:609 +0x288 |
| |
| Goroutine 184 (running) created at: |
| net.TestProlongTimeout() |
| src/pkg/net/timeout_test.go:618 +0x298 |
| testing.tRunner() |
| src/pkg/testing/testing.go:301 +0xe8 |
| </pre> |
| |
| <h2 id="Options">Options</h2> |
| |
| <p> |
| The <code>GORACE</code> environment variable sets race detector options. The format is: |
| </p> |
| |
| <pre> |
| GORACE="option1=val1 option2=val2" |
| </pre> |
| |
| <p> |
| The options are: |
| </p> |
| |
| <ul> |
| <li> |
| <code>log_path</code> (default <code>stderr</code>): The race detector writes |
| its report to a file named log_path.pid. The special names <code>stdout</code> |
| and <code>stderr</code> cause reports to be written to standard output and |
| standard error, respectively. |
| </li> |
| |
| <li> |
| <code>exitcode</code> (default <code>66</code>): The exit status to use when |
| exiting after a detected race. |
| </li> |
| |
| <li> |
| <code>strip_path_prefix</code> (default <code>""</code>): Strip this prefix |
| from all reported file paths, to make reports more concise. |
| </li> |
| |
| <li> |
| <code>history_size</code> (default <code>1</code>): The per-goroutine memory |
| access history is <code>32K * 2**history_size elements</code>. Increasing this |
| value can avoid a "failed to restore the stack" error in reports, but at the |
| cost of increased memory usage. |
| </li> |
| </ul> |
| |
| <p> |
| Example: |
| </p> |
| |
| <pre> |
| $ GORACE="log_path=/tmp/race/report strip_path_prefix=/my/go/sources/" go test -race |
| </pre> |
| |
| <h2 id="Excluding_Tests">Excluding Tests</h2> |
| |
| <p> |
| When you build with <code>-race</code> flag, go command defines additional |
| <a href="/pkg/go/build/#Build_Constraints">build tag</a> <code>race</code>. |
| You can use it to exclude some code/tests under the race detector. For example: |
| </p> |
| |
| <pre> |
| // +build !race |
| |
| package foo |
| |
| // The test contains a data race. See issue 123. |
| func TestFoo(t *testing.T) { |
| // ... |
| } |
| |
| // The test fails under the race detector due to timeouts. |
| func TestBar(t *testing.T) { |
| // ... |
| } |
| |
| // The test takes too long under the race detector. |
| func TestBaz(t *testing.T) { |
| // ... |
| } |
| </pre> |
| |
| <h2 id="How_To_Use">How To Use</h2> |
| |
| <p> |
| To start, run your tests using the race detector (<code>go test -race</code>). |
| The race detector only finds races that happen at runtime, so it can't find |
| races in code paths that are not executed. If your tests have incomplete coverage, |
| you may find more races by running a binary built with <code>-race</code> under a realistic |
| workload. |
| </p> |
| |
| <h2 id="Typical_Data_Races">Typical Data Races</h2> |
| |
| <p> |
| Here are some typical data races. All of them can be detected with the race detector. |
| </p> |
| |
| <h3 id="Race_on_loop_counter">Race on loop counter</h3> |
| |
| <pre> |
| func main() { |
| var wg sync.WaitGroup |
| wg.Add(5) |
| for i := 0; i < 5; i++ { |
| go func() { |
| fmt.Println(i) // Not the 'i' you are looking for. |
| wg.Done() |
| }() |
| } |
| wg.Wait() |
| } |
| </pre> |
| |
| <p> |
| The variable <code>i</code> in the function literal is the same variable used by the loop, so |
| the read in the goroutine races with the loop increment. (This program typically |
| prints 55555, not 01234.) The program can be fixed by making a copy of the |
| variable: |
| </p> |
| |
| <pre> |
| func main() { |
| var wg sync.WaitGroup |
| wg.Add(5) |
| for i := 0; i < 5; i++ { |
| go func(j int) { |
| fmt.Println(j) // Good. Read local copy of the loop counter. |
| wg.Done() |
| }(i) |
| } |
| wg.Wait() |
| } |
| </pre> |
| |
| <h3 id="Accidentally_shared_variable">Accidentally shared variable</h3> |
| |
| <pre> |
| // ParallelWrite writes data to file1 and file2, returns the errors. |
| func ParallelWrite(data []byte) chan error { |
| res := make(chan error, 2) |
| f1, err := os.Create("file1") |
| if err != nil { |
| res <- err |
| } else { |
| go func() { |
| // This err is shared with the main goroutine, |
| // so the write races with the write below. |
| _, err = f1.Write(data) |
| res <- err |
| f1.Close() |
| }() |
| } |
| f2, err := os.Create("file2") // The second conflicting write to err. |
| if err != nil { |
| res <- err |
| } else { |
| go func() { |
| _, err = f2.Write(data) |
| res <- err |
| f2.Close() |
| }() |
| } |
| return res |
| } |
| </pre> |
| |
| <p> |
| The fix is to introduce new variables in the goroutines (note <code>:=</code>): |
| </p> |
| |
| <pre> |
| ... |
| _, err := f1.Write(data) |
| ... |
| _, err := f2.Write(data) |
| ... |
| </pre> |
| |
| <h3 id="Unprotected_global_variable">Unprotected global variable</h3> |
| |
| <p> |
| If the following code is called from several goroutines, it leads to bad races on the <code>service</code> map. |
| Concurrent reads and writes of a map are not safe: |
| </p> |
| |
| <pre> |
| var service map[string]net.Addr |
| |
| func RegisterService(name string, addr net.Addr) { |
| service[name] = addr |
| } |
| |
| func LookupService(name string) net.Addr { |
| return service[name] |
| } |
| </pre> |
| |
| <p> |
| To make the code safe, protect the accesses with a mutex: |
| </p> |
| |
| <pre> |
| var ( |
| service map[string]net.Addr |
| serviceMu sync.Mutex |
| ) |
| |
| func RegisterService(name string, addr net.Addr) { |
| serviceMu.Lock() |
| defer serviceMu.Unlock() |
| service[name] = addr |
| } |
| |
| func LookupService(name string) net.Addr { |
| serviceMu.Lock() |
| defer serviceMu.Unlock() |
| return service[name] |
| } |
| </pre> |
| |
| <h3 id="Primitive_unprotected_variable">Primitive unprotected variable</h3> |
| |
| <p> |
| Data races can happen on variables of primitive types as well (<code>bool</code>, <code>int</code>, <code>int64</code>, etc.), like in the following example: |
| </p> |
| |
| <pre> |
| type Watchdog struct{ last int64 } |
| |
| func (w *Watchdog) KeepAlive() { |
| w.last = time.Now().UnixNano() // First conflicting access. |
| } |
| |
| func (w *Watchdog) Start() { |
| go func() { |
| for { |
| time.Sleep(time.Second) |
| // Second conflicting access. |
| if w.last < time.Now().Add(-10*time.Second).UnixNano() { |
| fmt.Println("No keepalives for 10 seconds. Dying.") |
| os.Exit(1) |
| } |
| } |
| }() |
| } |
| </pre> |
| |
| <p> |
| Even such “innocent” data races can lead to hard to debug problems caused by (1) non-atomicity of the memory accesses, (2) interference with compiler optimizations and (3) processor memory access reordering issues. |
| </p> |
| |
| <p> |
| A typical fix for this race is to use a channel or a mutex. |
| To preserve the lock-free behavior, one can also use the <a href="/pkg/sync/atomic/"><code>sync/atomic</code></a> package. |
| </p> |
| |
| <pre> |
| type Watchdog struct{ last int64 } |
| |
| func (w *Watchdog) KeepAlive() { |
| atomic.StoreInt64(&w.last, time.Now().UnixNano()) |
| } |
| |
| func (w *Watchdog) Start() { |
| go func() { |
| for { |
| time.Sleep(time.Second) |
| if atomic.LoadInt64(&w.last) < time.Now().Add(-10*time.Second).UnixNano() { |
| fmt.Println("No keepalives for 10 seconds. Dying.") |
| os.Exit(1) |
| } |
| } |
| }() |
| } |
| </pre> |
| |
| <h2 id="Supported_Systems">Supported Systems</h2> |
| |
| <p> |
| The race detector runs on <code>darwin/amd64</code>, <code>linux/amd64</code>, and <code>windows/amd64</code>. |
| </p> |
| |
| <h2 id="Runtime_Overheads">Runtime Overhead</h2> |
| |
| <p> |
| The cost of race detection varies by program, but for a typical program, memory |
| usage may increase by 5-10x and execution time by 2-20x. |
| </p> |