| <!--{ |
| "Title": "Diagnostics", |
| "Breadcrumb": true |
| }--> |
| |
| <!-- |
| NOTE: In this document and others in this directory, the convention is to |
| set fixed-width phrases with non-fixed-width spaces, as in |
| <code>hello</code> <code>world</code>. |
| Do not send CLs removing the interior tags from such phrases. |
| --> |
| |
| <h2 id="introduction">Introduction</h2> |
| |
| <p> |
| The Go ecosystem provides a large suite of APIs and tools to |
| diagnose logic and performance problems in Go programs. This page |
| summarizes the available tools and helps Go users pick the right one |
| for their specific problem. |
| </p> |
| |
| <p> |
| Diagnostics solutions can be categorized into the following groups: |
| </p> |
| |
| <ul> |
| <li><strong>Profiling</strong>: Profiling tools analyze the complexity and costs of a |
| Go program such as its memory usage and frequently called |
| functions to identify the expensive sections of a Go program.</li> |
| <li><strong>Tracing</strong>: Tracing is a way to instrument code to analyze latency |
| throughout the lifecycle of a call or user request. Traces provide an |
| overview of how much latency each component contributes to the overall |
| latency in a system. Traces can span multiple Go processes.</li> |
| <li><strong>Debugging</strong>: Debugging allows us to pause a Go program and examine |
| its execution. Program state and flow can be verified with debugging.</li> |
| <li><strong>Runtime statistics and events</strong>: Collection and analysis of runtime stats and events |
| provides a high-level overview of the health of Go programs. Spikes/dips of metrics |
| helps us to identify changes in throughput, utilization, and performance.</li> |
| </ul> |
| |
| <p> |
| Note: Some diagnostics tools may interfere with each other. For example, precise |
| memory profiling skews CPU profiles and goroutine blocking profiling affects scheduler |
| trace. Use tools in isolation to get more precise info. |
| </p> |
| |
| <h2 id="profiling">Profiling</h2> |
| |
| <p> |
| Profiling is useful for identifying expensive or frequently called sections |
| of code. The Go runtime provides <a href="/pkg/runtime/pprof/"> |
| profiling data</a> in the format expected by the |
| <a href="https://github.com/google/pprof/blob/master/doc/README.md">pprof visualization tool</a>. |
| The profiling data can be collected during testing |
| via <code>go</code> <code>test</code> or endpoints made available from the <a href="/pkg/net/http/pprof/"> |
| net/http/pprof</a> package. Users need to collect the profiling data and use pprof tools to filter |
| and visualize the top code paths. |
| </p> |
| |
| <p>Predefined profiles provided by the <a href="/pkg/runtime/pprof">runtime/pprof</a> package:</p> |
| |
| <ul> |
| <li> |
| <strong>cpu</strong>: CPU profile determines where a program spends |
| its time while actively consuming CPU cycles (as opposed to while sleeping or waiting for I/O). |
| </li> |
| <li> |
| <strong>heap</strong>: Heap profile reports memory allocation samples; |
| used to monitor current and historical memory usage, and to check for memory leaks. |
| </li> |
| <li> |
| <strong>threadcreate</strong>: Thread creation profile reports the sections |
| of the program that lead the creation of new OS threads. |
| </li> |
| <li> |
| <strong>goroutine</strong>: Goroutine profile reports the stack traces of all current goroutines. |
| </li> |
| <li> |
| <strong>block</strong>: Block profile shows where goroutines block waiting on synchronization |
| primitives (including timer channels). Block profile is not enabled by default; |
| use <code>runtime.SetBlockProfileRate</code> to enable it. |
| </li> |
| <li> |
| <strong>mutex</strong>: Mutex profile reports the lock contentions. When you think your |
| CPU is not fully utilized due to a mutex contention, use this profile. Mutex profile |
| is not enabled by default, see <code>runtime.SetMutexProfileFraction</code> to enable it. |
| </li> |
| </ul> |
| |
| |
| <p><strong>What other profilers can I use to profile Go programs?</strong></p> |
| |
| <p> |
| On Linux, <a href="https://perf.wiki.kernel.org/index.php/Tutorial">perf tools</a> |
| can be used for profiling Go programs. Perf can profile |
| and unwind cgo/SWIG code and kernel, so it can be useful to get insights into |
| native/kernel performance bottlenecks. On macOS, |
| <a href="https://developer.apple.com/library/content/documentation/DeveloperTools/Conceptual/InstrumentsUserGuide/">Instruments</a> |
| suite can be used profile Go programs. |
| </p> |
| |
| <p><strong>Can I profile my production services?</strong></p> |
| |
| <p>Yes. It is safe to profile programs in production, but enabling |
| some profiles (e.g. the CPU profile) adds cost. You should expect to |
| see performance downgrade. The performance penalty can be estimated |
| by measuring the overhead of the profiler before turning it on in |
| production. |
| </p> |
| |
| <p> |
| You may want to periodically profile your production services. |
| Especially in a system with many replicas of a single process, selecting |
| a random replica periodically is a safe option. |
| Select a production process, profile it for |
| X seconds for every Y seconds and save the results for visualization and |
| analysis; then repeat periodically. Results may be manually and/or automatically |
| reviewed to find problems. |
| Collection of profiles can interfere with each other, |
| so it is recommended to collect only a single profile at a time. |
| </p> |
| |
| <p> |
| <strong>What are the best ways to visualize the profiling data?</strong> |
| </p> |
| |
| <p> |
| The Go tools provide text, graph, and <a href="http://valgrind.org/docs/manual/cl-manual.html">callgrind</a> |
| visualization of the profile data using |
| <code><a href="https://github.com/google/pprof/blob/master/doc/README.md">go tool pprof</a></code>. |
| Read <a href="/blog/profiling-go-programs">Profiling Go programs</a> |
| to see them in action. |
| </p> |
| |
| <p> |
| <img width="800" src="../images/diagnostics/pprof-text.png"> |
| <br> |
| <small>Listing of the most expensive calls as text.</small> |
| </p> |
| |
| <p> |
| <img width="800" src="../images/diagnostics/pprof-dot.png"> |
| <br> |
| <small>Visualization of the most expensive calls as a graph.</small> |
| </p> |
| |
| <p>Weblist view displays the expensive parts of the source line by line in |
| an HTML page. In the following example, 530ms is spent in the |
| <code>runtime.concatstrings</code> and cost of each line is presented |
| in the listing.</p> |
| |
| <p> |
| <img width="800" src="../images/diagnostics/pprof-weblist.png"> |
| <br> |
| <small>Visualization of the most expensive calls as weblist.</small> |
| </p> |
| |
| <p> |
| Another way to visualize profile data is a <a href="http://www.brendangregg.com/flamegraphs.html">flame graph</a>. |
| Flame graphs allow you to move in a specific ancestry path, so you can zoom |
| in/out of specific sections of code. |
| The <a href="https://github.com/google/pprof">upstream pprof</a> |
| has support for flame graphs. |
| </p> |
| |
| <p> |
| <img width="800" src="../images/diagnostics/flame.png"> |
| <br> |
| <small>Flame graphs offers visualization to spot the most expensive code-paths.</small> |
| </p> |
| |
| <p><strong>Am I restricted to the built-in profiles?</strong></p> |
| |
| <p> |
| Additionally to what is provided by the runtime, Go users can create |
| their custom profiles via <a href="/pkg/runtime/pprof/#Profile">pprof.Profile</a> |
| and use the existing tools to examine them. |
| </p> |
| |
| <p><strong>Can I serve the profiler handlers (/debug/pprof/...) on a different path and port?</strong></p> |
| |
| <p> |
| Yes. The <code>net/http/pprof</code> package registers its handlers to the default |
| mux by default, but you can also register them yourself by using the handlers |
| exported from the package. |
| </p> |
| |
| <p> |
| For example, the following example will serve the pprof.Profile |
| handler on :7777 at /custom_debug_path/profile: |
| </p> |
| |
| <p> |
| <pre> |
| package main |
| |
| import ( |
| "log" |
| "net/http" |
| "net/http/pprof" |
| ) |
| |
| func main() { |
| mux := http.NewServeMux() |
| mux.HandleFunc("/custom_debug_path/profile", pprof.Profile) |
| log.Fatal(http.ListenAndServe(":7777", mux)) |
| } |
| </pre> |
| </p> |
| |
| <h2 id="tracing">Tracing</h2> |
| |
| <p> |
| Tracing is a way to instrument code to analyze latency throughout the |
| lifecycle of a chain of calls. Go provides |
| <a href="https://godoc.org/golang.org/x/net/trace">golang.org/x/net/trace</a> |
| package as a minimal tracing backend per Go node and provides a minimal |
| instrumentation library with a simple dashboard. Go also provides |
| an execution tracer to trace the runtime events within an interval. |
| </p> |
| |
| <p>Tracing enables us to:</p> |
| |
| <ul> |
| <li>Instrument and analyze application latency in a Go process.</li> |
| <li>Measure the cost of specific calls in a long chain of calls.</li> |
| <li>Figure out the utilization and performance improvements. |
| Bottlenecks are not always obvious without tracing data.</li> |
| </ul> |
| |
| <p> |
| In monolithic systems, it's relatively easy to collect diagnostic data |
| from the building blocks of a program. All modules live within one |
| process and share common resources to report logs, errors, and other |
| diagnostic information. Once your system grows beyond a single process and |
| starts to become distributed, it becomes harder to follow a call starting |
| from the front-end web server to all of its back-ends until a response is |
| returned back to the user. This is where distributed tracing plays a big |
| role to instrument and analyze your production systems. |
| </p> |
| |
| <p> |
| Distributed tracing is a way to instrument code to analyze latency throughout |
| the lifecycle of a user request. When a system is distributed and when |
| conventional profiling and debugging tools don’t scale, you might want |
| to use distributed tracing tools to analyze the performance of your user |
| requests and RPCs. |
| </p> |
| |
| <p>Distributed tracing enables us to:</p> |
| |
| <ul> |
| <li>Instrument and profile application latency in a large system.</li> |
| <li>Track all RPCs within the lifecycle of a user request and see integration issues |
| that are only visible in production.</li> |
| <li>Figure out performance improvements that can be applied to our systems. |
| Many bottlenecks are not obvious before the collection of tracing data.</li> |
| </ul> |
| |
| <p>The Go ecosystem provides various distributed tracing libraries per tracing system |
| and backend-agnostic ones.</p> |
| |
| |
| <p><strong>Is there a way to automatically intercept each function call and create traces?</strong></p> |
| |
| <p> |
| Go doesn’t provide a way to automatically intercept every function call and create |
| trace spans. You need to manually instrument your code to create, end, and annotate spans. |
| </p> |
| |
| <p><strong>How should I propagate trace headers in Go libraries?</strong></p> |
| |
| <p> |
| You can propagate trace identifiers and tags in the |
| <a href="/pkg/context#Context"><code>context.Context</code></a>. |
| There is no canonical trace key or common representation of trace headers |
| in the industry yet. Each tracing provider is responsible for providing propagation |
| utilities in their Go libraries. |
| </p> |
| |
| <p> |
| <strong>What other low-level events from the standard library or |
| runtime can be included in a trace?</strong> |
| </p> |
| |
| <p> |
| The standard library and runtime are trying to expose several additional APIs |
| to notify on low level internal events. For example, |
| <a href="/pkg/net/http/httptrace#ClientTrace"><code>httptrace.ClientTrace</code></a> |
| provides APIs to follow low-level events in the life cycle of an outgoing request. |
| There is an ongoing effort to retrieve low-level runtime events from |
| the runtime execution tracer and allow users to define and record their user events. |
| </p> |
| |
| <h2 id="debugging">Debugging</h2> |
| |
| <p> |
| Debugging is the process of identifying why a program misbehaves. |
| Debuggers allow us to understand a program’s execution flow and current state. |
| There are several styles of debugging; this section will only focus on attaching |
| a debugger to a program and core dump debugging. |
| </p> |
| |
| <p>Go users mostly use the following debuggers:</p> |
| |
| <ul> |
| <li> |
| <a href="https://github.com/derekparker/delve">Delve</a>: |
| Delve is a debugger for the Go programming language. It has |
| support for Go’s runtime concepts and built-in types. Delve is |
| trying to be a fully featured reliable debugger for Go programs. |
| </li> |
| <li> |
| <a href="/doc/gdb">GDB</a>: |
| Go provides GDB support via the standard Go compiler and Gccgo. |
| The stack management, threading, and runtime contain aspects that differ |
| enough from the execution model GDB expects that they can confuse the |
| debugger, even when the program is compiled with gccgo. Even though |
| GDB can be used to debug Go programs, it is not ideal and may |
| create confusion. |
| </li> |
| </ul> |
| |
| <p><strong>How well do debuggers work with Go programs?</strong></p> |
| |
| <p> |
| The <code>gc</code> compiler performs optimizations such as |
| function inlining and variable registerization. These optimizations |
| sometimes make debugging with debuggers harder. There is an ongoing |
| effort to improve the quality of the DWARF information generated for |
| optimized binaries. Until those improvements are available, we recommend |
| disabling optimizations when building the code being debugged. The following |
| command builds a package with no compiler optimizations: |
| |
| <p> |
| <pre> |
| $ go build -gcflags=all="-N -l" |
| </pre> |
| </p> |
| |
| As part of the improvement effort, Go 1.10 introduced a new compiler |
| flag <code>-dwarflocationlists</code>. The flag causes the compiler to |
| add location lists that helps debuggers work with optimized binaries. |
| The following command builds a package with optimizations but with |
| the DWARF location lists: |
| |
| <p> |
| <pre> |
| $ go build -gcflags="-dwarflocationlists=true" |
| </pre> |
| </p> |
| |
| <p><strong>What’s the recommended debugger user interface?</strong></p> |
| |
| <p> |
| Even though both delve and gdb provides CLIs, most editor integrations |
| and IDEs provides debugging-specific user interfaces. |
| </p> |
| |
| <p><strong>Is it possible to do postmortem debugging with Go programs?</strong></p> |
| |
| <p> |
| A core dump file is a file that contains the memory dump of a running |
| process and its process status. It is primarily used for post-mortem |
| debugging of a program and to understand its state |
| while it is still running. These two cases make debugging of core |
| dumps a good diagnostic aid to postmortem and analyze production |
| services. It is possible to obtain core files from Go programs and |
| use delve or gdb to debug, see the |
| <a href="/wiki/CoreDumpDebugging">core dump debugging</a> |
| page for a step-by-step guide. |
| </p> |
| |
| <h2 id="runtime">Runtime statistics and events</h2> |
| |
| <p> |
| The runtime provides stats and reporting of internal events for |
| users to diagnose performance and utilization problems at the |
| runtime level. |
| </p> |
| |
| <p> |
| Users can monitor these stats to better understand the overall |
| health and performance of Go programs. |
| Some frequently monitored stats and states: |
| </p> |
| |
| <ul> |
| <li><code><a href="/pkg/runtime/#ReadMemStats">runtime.ReadMemStats</a></code> |
| reports the metrics related to heap |
| allocation and garbage collection. Memory stats are useful for |
| monitoring how much memory resources a process is consuming, |
| whether the process can utilize memory well, and to catch |
| memory leaks.</li> |
| <li><code><a href="/pkg/runtime/debug/#ReadGCStats">debug.ReadGCStats</a></code> |
| reads statistics about garbage collection. |
| It is useful to see how much of the resources are spent on GC pauses. |
| It also reports a timeline of garbage collector pauses and pause time percentiles.</li> |
| <li><code><a href="/pkg/runtime/debug/#Stack">debug.Stack</a></code> |
| returns the current stack trace. Stack trace |
| is useful to see how many goroutines are currently running, |
| what they are doing, and whether they are blocked or not.</li> |
| <li><code><a href="/pkg/runtime/debug/#WriteHeapDump">debug.WriteHeapDump</a></code> |
| suspends the execution of all goroutines |
| and allows you to dump the heap to a file. A heap dump is a |
| snapshot of a Go process' memory at a given time. It contains all |
| allocated objects as well as goroutines, finalizers, and more.</li> |
| <li><code><a href="/pkg/runtime#NumGoroutine">runtime.NumGoroutine</a></code> |
| returns the number of current goroutines. |
| The value can be monitored to see whether enough goroutines are |
| utilized, or to detect goroutine leaks.</li> |
| </ul> |
| |
| <h3 id="execution-tracer">Execution tracer</h3> |
| |
| <p>Go comes with a runtime execution tracer to capture a wide range |
| of runtime events. Scheduling, syscall, garbage collections, |
| heap size, and other events are collected by runtime and available |
| for visualization by the go tool trace. Execution tracer is a tool |
| to detect latency and utilization problems. You can examine how well |
| the CPU is utilized, and when networking or syscalls are a cause of |
| preemption for the goroutines.</p> |
| |
| <p>Tracer is useful to:</p> |
| <ul> |
| <li>Understand how your goroutines execute.</li> |
| <li>Understand some of the core runtime events such as GC runs.</li> |
| <li>Identify poorly parallelized execution.</li> |
| </ul> |
| |
| <p>However, it is not great for identifying hot spots such as |
| analyzing the cause of excessive memory or CPU usage. |
| Use profiling tools instead first to address them.</p> |
| |
| <p> |
| <img width="800" src="../images/diagnostics/tracer-lock.png"> |
| </p> |
| |
| <p>Above, the go tool trace visualization shows the execution started |
| fine, and then it became serialized. It suggests that there might |
| be lock contention for a shared resource that creates a bottleneck.</p> |
| |
| <p>See <a href="/cmd/trace/"><code>go</code> <code>tool</code> <code>trace</code></a> |
| to collect and analyze runtime traces. |
| </p> |
| |
| <h3 id="godebug">GODEBUG</h3> |
| |
| <p>Runtime also emits events and information if |
| <a href="/pkg/runtime/#hdr-Environment_Variables">GODEBUG</a> |
| environmental variable is set accordingly.</p> |
| |
| <ul> |
| <li>GODEBUG=gctrace=1 prints garbage collector events at |
| each collection, summarizing the amount of memory collected |
| and the length of the pause.</li> |
| <li>GODEBUG=inittrace=1 prints a summary of execution time and memory allocation |
| information for completed package initialization work.</li> |
| <li>GODEBUG=schedtrace=X prints scheduling events every X milliseconds.</li> |
| </ul> |
| |
| <p>The GODEBUG environmental variable can be used to disable use of |
| instruction set extensions in the standard library and runtime.</p> |
| |
| <ul> |
| <li>GODEBUG=cpu.all=off disables the use of all optional |
| instruction set extensions.</li> |
| <li>GODEBUG=cpu.<em>extension</em>=off disables use of instructions from the |
| specified instruction set extension.<br> |
| <em>extension</em> is the lower case name for the instruction set extension |
| such as <em>sse41</em> or <em>avx</em>.</li> |
| </ul> |