| <!--{ |
| "Title": "Data Race Detector" |
| }--> |
| |
| <h2 id="Introduction">Introduction</h2> |
| |
| <p> |
| Data races are among the most common and hardest to debug types of bugs in concurrent systems. |
| A data race occurs when two goroutines access the same variable concurrently and at least one of the accesses is a write. |
| See the <a href="/ref/mem/">The Go Memory Model</a> for details. |
| </p> |
| |
| <p> |
| Here is an example of a data race that can lead to crashes and memory corruption: |
| </p> |
| |
| <pre> |
| func main() { |
| c := make(chan bool) |
| m := make(map[string]string) |
| go func() { |
| m["1"] = "a" // First conflicting access. |
| c <- true |
| }() |
| m["2"] = "b" // Second conflicting access. |
| <-c |
| for k, v := range m { |
| fmt.Println(k, v) |
| } |
| } |
| </pre> |
| |
| <h2 id="Usage">Usage</h2> |
| |
| <p> |
| To help diagnose such bugs, Go includes a built-in data race detector. |
| To use it, add the <code>-race</code> flag to the go command: |
| </p> |
| |
| <pre> |
| $ go test -race mypkg // to test the package |
| $ go run -race mysrc.go // to run the source file |
| $ go build -race mycmd // to build the command |
| $ go install -race mypkg // to install the package |
| </pre> |
| |
| <h2 id="Report_Format">Report Format</h2> |
| |
| <p> |
| When the race detector finds a data race in the program, it prints a report. |
| The report contains stack traces for conflicting accesses, as well as stacks where the involved goroutines were created. |
| Here is an example: |
| </p> |
| |
| <pre> |
| WARNING: DATA RACE |
| Read by goroutine 185: |
| net.(*pollServer).AddFD() |
| src/net/fd_unix.go:89 +0x398 |
| net.(*pollServer).WaitWrite() |
| src/net/fd_unix.go:247 +0x45 |
| net.(*netFD).Write() |
| src/net/fd_unix.go:540 +0x4d4 |
| net.(*conn).Write() |
| src/net/net.go:129 +0x101 |
| net.func·060() |
| src/net/timeout_test.go:603 +0xaf |
| |
| Previous write by goroutine 184: |
| net.setWriteDeadline() |
| src/net/sockopt_posix.go:135 +0xdf |
| net.setDeadline() |
| src/net/sockopt_posix.go:144 +0x9c |
| net.(*conn).SetDeadline() |
| src/net/net.go:161 +0xe3 |
| net.func·061() |
| src/net/timeout_test.go:616 +0x3ed |
| |
| Goroutine 185 (running) created at: |
| net.func·061() |
| src/net/timeout_test.go:609 +0x288 |
| |
| Goroutine 184 (running) created at: |
| net.TestProlongTimeout() |
| src/net/timeout_test.go:618 +0x298 |
| testing.tRunner() |
| src/testing/testing.go:301 +0xe8 |
| </pre> |
| |
| <h2 id="Options">Options</h2> |
| |
| <p> |
| The <code>GORACE</code> environment variable sets race detector options. |
| The format is: |
| </p> |
| |
| <pre> |
| GORACE="option1=val1 option2=val2" |
| </pre> |
| |
| <p> |
| The options are: |
| </p> |
| |
| <ul> |
| <li> |
| <code>log_path</code> (default <code>stderr</code>): The race detector writes |
| its report to a file named <code>log_path.<em>pid</em></code>. |
| The special names <code>stdout</code> |
| and <code>stderr</code> cause reports to be written to standard output and |
| standard error, respectively. |
| </li> |
| |
| <li> |
| <code>exitcode</code> (default <code>66</code>): The exit status to use when |
| exiting after a detected race. |
| </li> |
| |
| <li> |
| <code>strip_path_prefix</code> (default <code>""</code>): Strip this prefix |
| from all reported file paths, to make reports more concise. |
| </li> |
| |
| <li> |
| <code>history_size</code> (default <code>1</code>): The per-goroutine memory |
| access history is <code>32K * 2**history_size elements</code>. |
| Increasing this value can avoid a "failed to restore the stack" error in reports, at the |
| cost of increased memory usage. |
| </li> |
| |
| <li> |
| <code>halt_on_error</code> (default <code>0</code>): Controls whether the program |
| exits after reporting first data race. |
| </li> |
| |
| <li> |
| <code>atexit_sleep_ms</code> (default <code>1000</code>): Amount of milliseconds |
| to sleep in the main goroutine before exiting. |
| </li> |
| </ul> |
| |
| <p> |
| Example: |
| </p> |
| |
| <pre> |
| $ GORACE="log_path=/tmp/race/report strip_path_prefix=/my/go/sources/" go test -race |
| </pre> |
| |
| <h2 id="Excluding_Tests">Excluding Tests</h2> |
| |
| <p> |
| When you build with <code>-race</code> flag, the <code>go</code> command defines additional |
| <a href="/pkg/go/build/#hdr-Build_Constraints">build tag</a> <code>race</code>. |
| You can use the tag to exclude some code and tests when running the race detector. |
| Some examples: |
| </p> |
| |
| <pre> |
| // +build !race |
| |
| package foo |
| |
| // The test contains a data race. See issue 123. |
| func TestFoo(t *testing.T) { |
| // ... |
| } |
| |
| // The test fails under the race detector due to timeouts. |
| func TestBar(t *testing.T) { |
| // ... |
| } |
| |
| // The test takes too long under the race detector. |
| func TestBaz(t *testing.T) { |
| // ... |
| } |
| </pre> |
| |
| <h2 id="How_To_Use">How To Use</h2> |
| |
| <p> |
| To start, run your tests using the race detector (<code>go test -race</code>). |
| The race detector only finds races that happen at runtime, so it can't find |
| races in code paths that are not executed. |
| If your tests have incomplete coverage, |
| you may find more races by running a binary built with <code>-race</code> under a realistic |
| workload. |
| </p> |
| |
| <h2 id="Typical_Data_Races">Typical Data Races</h2> |
| |
| <p> |
| Here are some typical data races. All of them can be detected with the race detector. |
| </p> |
| |
| <h3 id="Race_on_loop_counter">Race on loop counter</h3> |
| |
| <pre> |
| func main() { |
| var wg sync.WaitGroup |
| wg.Add(5) |
| for i := 0; i < 5; i++ { |
| go func() { |
| fmt.Println(i) // Not the 'i' you are looking for. |
| wg.Done() |
| }() |
| } |
| wg.Wait() |
| } |
| </pre> |
| |
| <p> |
| The variable <code>i</code> in the function literal is the same variable used by the loop, so |
| the read in the goroutine races with the loop increment. |
| (This program typically prints 55555, not 01234.) |
| The program can be fixed by making a copy of the variable: |
| </p> |
| |
| <pre> |
| func main() { |
| var wg sync.WaitGroup |
| wg.Add(5) |
| for i := 0; i < 5; i++ { |
| go func(j int) { |
| fmt.Println(j) // Good. Read local copy of the loop counter. |
| wg.Done() |
| }(i) |
| } |
| wg.Wait() |
| } |
| </pre> |
| |
| <h3 id="Accidentally_shared_variable">Accidentally shared variable</h3> |
| |
| <pre> |
| // ParallelWrite writes data to file1 and file2, returns the errors. |
| func ParallelWrite(data []byte) chan error { |
| res := make(chan error, 2) |
| f1, err := os.Create("file1") |
| if err != nil { |
| res <- err |
| } else { |
| go func() { |
| // This err is shared with the main goroutine, |
| // so the write races with the write below. |
| _, err = f1.Write(data) |
| res <- err |
| f1.Close() |
| }() |
| } |
| f2, err := os.Create("file2") // The second conflicting write to err. |
| if err != nil { |
| res <- err |
| } else { |
| go func() { |
| _, err = f2.Write(data) |
| res <- err |
| f2.Close() |
| }() |
| } |
| return res |
| } |
| </pre> |
| |
| <p> |
| The fix is to introduce new variables in the goroutines (note the use of <code>:=</code>): |
| </p> |
| |
| <pre> |
| ... |
| _, err := f1.Write(data) |
| ... |
| _, err := f2.Write(data) |
| ... |
| </pre> |
| |
| <h3 id="Unprotected_global_variable">Unprotected global variable</h3> |
| |
| <p> |
| If the following code is called from several goroutines, it leads to races on the <code>service</code> map. |
| Concurrent reads and writes of the same map are not safe: |
| </p> |
| |
| <pre> |
| var service map[string]net.Addr |
| |
| func RegisterService(name string, addr net.Addr) { |
| service[name] = addr |
| } |
| |
| func LookupService(name string) net.Addr { |
| return service[name] |
| } |
| </pre> |
| |
| <p> |
| To make the code safe, protect the accesses with a mutex: |
| </p> |
| |
| <pre> |
| var ( |
| service map[string]net.Addr |
| serviceMu sync.Mutex |
| ) |
| |
| func RegisterService(name string, addr net.Addr) { |
| serviceMu.Lock() |
| defer serviceMu.Unlock() |
| service[name] = addr |
| } |
| |
| func LookupService(name string) net.Addr { |
| serviceMu.Lock() |
| defer serviceMu.Unlock() |
| return service[name] |
| } |
| </pre> |
| |
| <h3 id="Primitive_unprotected_variable">Primitive unprotected variable</h3> |
| |
| <p> |
| Data races can happen on variables of primitive types as well (<code>bool</code>, <code>int</code>, <code>int64</code>, etc.), |
| as in this example: |
| </p> |
| |
| <pre> |
| type Watchdog struct{ last int64 } |
| |
| func (w *Watchdog) KeepAlive() { |
| w.last = time.Now().UnixNano() // First conflicting access. |
| } |
| |
| func (w *Watchdog) Start() { |
| go func() { |
| for { |
| time.Sleep(time.Second) |
| // Second conflicting access. |
| if w.last < time.Now().Add(-10*time.Second).UnixNano() { |
| fmt.Println("No keepalives for 10 seconds. Dying.") |
| os.Exit(1) |
| } |
| } |
| }() |
| } |
| </pre> |
| |
| <p> |
| Even such "innocent" data races can lead to hard-to-debug problems caused by |
| non-atomicity of the memory accesses, |
| interference with compiler optimizations, |
| or reordering issues accessing processor memory . |
| </p> |
| |
| <p> |
| A typical fix for this race is to use a channel or a mutex. |
| To preserve the lock-free behavior, one can also use the |
| <a href="/pkg/sync/atomic/"><code>sync/atomic</code></a> package. |
| </p> |
| |
| <pre> |
| type Watchdog struct{ last int64 } |
| |
| func (w *Watchdog) KeepAlive() { |
| atomic.StoreInt64(&w.last, time.Now().UnixNano()) |
| } |
| |
| func (w *Watchdog) Start() { |
| go func() { |
| for { |
| time.Sleep(time.Second) |
| if atomic.LoadInt64(&w.last) < time.Now().Add(-10*time.Second).UnixNano() { |
| fmt.Println("No keepalives for 10 seconds. Dying.") |
| os.Exit(1) |
| } |
| } |
| }() |
| } |
| </pre> |
| |
| <h3 id="Unsynchronized_send_and_close_operations">Unsynchronized send and close operations</h3> |
| |
| <p> |
| As this example demonstrates, unsynchronized send and close operations |
| on the same channel can also be a race condition: |
| </p> |
| |
| <pre> |
| c := make(chan struct{}) // or buffered channel |
| |
| // The race detector cannot derive the happens before relation |
| // for the following send and close operations. These two operations |
| // are unsynchronized and happen concurrently. |
| go func() { c <- struct{}{} }() |
| close(c) |
| </pre> |
| |
| <p> |
| According to the Go memory model, a send on a channel happens before |
| the corresponding receive from that channel completes. To synchronize |
| send and close operations, use a receive operation that guarantees |
| the send is done before the close: |
| </p> |
| |
| <pre> |
| c := make(chan struct{}) // or buffered channel |
| |
| go func() { c <- struct{}{} }() |
| <-c |
| close(c) |
| </pre> |
| |
| <h2 id="Requirements">Requirements</h2> |
| |
| <p> |
| The race detector requires cgo to be enabled, and on non-Darwin |
| systems requires an installed C compiler. |
| The race detector supports |
| <code>linux/amd64</code>, <code>linux/ppc64le</code>, |
| <code>linux/arm64</code>, <code>freebsd/amd64</code>, |
| <code>netbsd/amd64</code>, <code>darwin/amd64</code>, |
| <code>darwin/arm64</code>, and <code>windows/amd64</code>. |
| </p> |
| |
| <p> |
| On Windows, the race detector runtime is sensitive to the |
| version of the C compiler installed; as of Go 1.21, building |
| a program with <code>-race</code> requires a C compiler that |
| incorporates version 8 or later of the <code>mingw-w64</code> |
| runtime libraries. You can test your C compiler by invoking it |
| with the arguments |
| <code>--print-file-name libsynchronization.a</code>. A newer |
| compliant C compiler will print a full path for this library, |
| whereas older C compilers will just echo the argument. |
| </p> |
| |
| <h2 id="Runtime_Overheads">Runtime Overhead</h2> |
| |
| <p> |
| The cost of race detection varies by program, but for a typical program, memory |
| usage may increase by 5-10x and execution time by 2-20x. |
| </p> |
| |
| <p> |
| The race detector currently allocates an extra 8 bytes per <code>defer</code> |
| and <code>recover</code> statement. Those extra allocations <a |
| href="/issue/26813">are not recovered until the goroutine |
| exits</a>. This means that if you have a long-running goroutine that is |
| periodically issuing <code>defer</code> and <code>recover</code> calls, |
| the program memory usage may grow without bound. These memory allocations |
| will not show up in the output of <code>runtime.ReadMemStats</code> or |
| <code>runtime/pprof</code>. |
| </p> |