_content/blog: add execution traces 2024 blog post

Change-Id: Id12e94913263048df46496cc77fb0376edd81a77
Reviewed-on: https://go-review.googlesource.com/c/website/+/570603
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Eli Bendersky <eliben@google.com>
Reviewed-by: Eli Bendersky <eliben@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
diff --git a/_content/blog/execution-traces-2024.md b/_content/blog/execution-traces-2024.md
new file mode 100644
index 0000000..0d005b3
--- /dev/null
+++ b/_content/blog/execution-traces-2024.md
@@ -0,0 +1,160 @@
+---
+title: More powerful Go execution traces
+date: 2024-03-14
+by:
+- Michael Knyszek
+tags:
+- debug
+- technical
+- tracing
+summary: "New features and improvements to execution traces from the last year."
+---
+
+The [runtime/trace](/pkg/runtime/trace) package contains a powerful tool for understanding and
+troubleshooting Go programs.
+The functionality within allows one to produce a trace of each goroutine's execution over some
+time period.
+With the [`go tool trace` command](/pkg/cmd/trace) (or the excellent open source
+[gotraceui tool](https://gotraceui.dev/)), one may then visualize and explore the data within these
+traces.
+
+The magic of a trace is that it can easily reveal things about a program that are hard to see in
+other ways.
+For example, a concurrency bottleneck where lots of goroutines block on the same channel might be
+quite difficult to see in a CPU profile, because there's no execution to sample.
+But in an execution trace, the _lack_ of execution will show up with amazing clarity, and the stack
+traces of blocked goroutines will quickly point at the culprit.
+
+{{image "execution-traces-2024/gotooltrace.png"}}
+
+Go developers are even able to instrument their own programs with [tasks](/pkg/runtime/trace#Task),
+[regions](/pkg/runtime/trace#WithRegion), and [logs](/pkg/runtime/trace#Log) that
+they can use to correlate their higher-level concerns with lower-level execution details.
+
+## Issues
+
+Unfortunately, the wealth of information in execution traces can often be out of reach.
+Four big issues with traces have historically gotten in the way.
+
+- Traces had high overheads.
+- Traces didn't scale well, and could become too big to analyze.
+- It was often unclear when to start tracing to capture a specific bad behavior.
+- Only the most adventurous gophers could programmatically analyze traces, given the lack of a
+  public package for parsing and interpreting execution traces.
+
+If you've used traces in the last few years, you've likely been frustrated by one or more of these
+problems.
+But we're excited to share that over the last two Go releases we've made big progress in all four
+of these areas.
+
+## Low-overhead tracing
+
+Prior to Go 1.21, the run-time overhead of tracing was somewhere between 10–20% CPU for many
+applications, which limits tracing to situational usage, rather than continuous usage like CPU
+profiling.
+It turned out that much of the cost of tracing came down to tracebacks.
+Many events produced by the runtime have stack traces attached, which are invaluable to actually
+identifying what goroutines where doing at key moments in their execution.
+
+Thanks to work by Felix Geisendörfer and Nick Ripley on optimizing the efficiency of tracebacks,
+the run-time CPU overhead of execution traces has been cut dramatically, down to 1–2% for many
+applications.
+You can read more about the work done here in [Felix's great blog
+post](https://blog.felixge.de/reducing-gos-execution-tracer-overhead-with-frame-pointer-unwinding/)
+on the topic.
+
+## Scalable traces
+
+The trace format and its events were designed around relatively efficient emission, but required
+tooling to parse and keep around the state of the entirety of a trace.
+A few hundred MiB trace could require several GiB of RAM to analyze!
+
+This issue is unfortunately fundamental to how traces are generated.
+To keep run-time overheads low, all events are written to the equivalent of thread-local buffers.
+But this means events appear out of their true order, and the burden is placed on the trace
+tooling to figure out what really happened.
+
+The key insight to making traces scale while keeping overheads low was to occasionally split the
+trace being generated.
+Each split point would behave a bit like simultaneously disabling and reenabling tracing in one
+go.
+All the trace data so far would represent a complete and self-contained trace, while the new trace
+data would seamlessly pick up from where it left off.
+
+As you might imagine, fixing this required [rethinking and rewriting a lot of the foundation of
+the trace implementation](/issue/60773) in the runtime.
+We're happy to say that the work landed in Go 1.22 and is now generally available.
+[A lot of nice improvements](/doc/go1.22#runtime/trace) came with the rewrite, including some
+improvements to the [`go tool trace` command](/doc/go1.22#trace) as well.
+The gritty details are all in the [design
+document](https://github.com/golang/proposal/blob/master/design/60773-execution-tracer-overhaul.md),
+if you're curious.
+
+(Note: `go tool trace` still loads the full trace into memory, but [removing this
+limitation](/issue/65315) for traces produced by Go 1.22+ programs is now feasible.)
+
+## Flight recording
+
+Suppose you work on a web service and an RPC took a very long time.
+You couldn't start tracing at the point you knew the RPC was already taking a while, because the
+root cause of the slow request already happened and wasn't recorded.
+
+There's a technique that can help with this called flight recording, which you may already be
+familiar with from other programming environments.
+The insight with flight recording is to have tracing on continuously and always keep the most
+recent trace data around, just in case.
+Then, once something interesting happens, the program can just write out whatever it has!
+
+Before traces could be split, this was pretty much a non-starter.
+But because continuous tracing is now viable thanks to low overheads, and the fact that the runtime
+can now split traces any time it needs, it turns out it was straightforward to implement flight
+recording.
+
+As a result, we're happy to announce a flight recorder experiment, available in the
+[golang.org/x/exp/trace package](/pkg/golang.org/x/exp/trace#FlightRecorder).
+
+Please try it out!
+Below is an example that sets up flight recording to capture a long HTTP request to get you started.
+
+{{code "execution-traces-2024/flightrecorder.go" `/START/` `/END/`}}
+
+If you have any feedback, positive or negative, please share it to the [proposal
+issue](/issue/63185)!
+
+## Trace reader API
+
+Along with the trace implementation rewrite came an effort to clean up the other trace internals,
+like `go tool trace`.
+This spawned an attempt to create a trace reader API that was good enough to share and that could
+make traces more accessible.
+
+Just like the flight recorder, we're happy to announce that we also have an experimental trace reader
+API that we'd like to share.
+It's available in the [same package as the flight recorder,
+golang.org/x/exp/trace](/pkg/golang.org/x/exp/trace#Reader).
+
+We think it's good enough to start building things on top of, so please try it out!
+Below is an example that measures the proportion of goroutine block events that blocked to wait on
+the network.
+
+{{code "execution-traces-2024/reader.go" `/START/` `/END/`}}
+
+And just like the flight recorder, there's a [proposal issue](/issue/62627) that would
+be a great place to leave feedback!
+
+We'd like to quickly call out Dominik Honnef as someone who tried it out early, provided great
+feedback, and has contributed support for older trace versions to the API.
+
+## Thank you!
+
+This work was completed, in no small part, thanks to the help of the those in the [diagnostics
+working group](/issue/57175), started over a year ago as a collaboration between stakeholders from
+across the Go community, and open to the public.
+
+We'd like to take a moment to thank those community members who have attended the diagnostic
+meetings regularly over the last year: Felix Geisendörfer, Nick Ripley, Rhys Hiltner, Dominik
+Honnef, Bryan Boreham, thepudds.
+
+The discussions, feedback, and work you all put in have been instrumental to getting us to where we
+are today.
+Thank you!
diff --git a/_content/blog/execution-traces-2024/flightrecorder.go b/_content/blog/execution-traces-2024/flightrecorder.go
new file mode 100644
index 0000000..8dd7ec7
--- /dev/null
+++ b/_content/blog/execution-traces-2024/flightrecorder.go
@@ -0,0 +1,54 @@
+//go:build OMIT
+
+package main
+
+import (
+	"bytes"
+	"log"
+	"net/http"
+	"os"
+	"sync"
+	"time"
+
+	"golang.org/x/exp/trace"
+)
+
+func main() {
+	// START OMIT
+	// Set up the flight recorder.
+	fr := trace.NewFlightRecorder()
+	fr.Start()
+
+	// Set up and run an HTTP server.
+	var once sync.Once
+	http.HandleFunc("/my-endpoint", func(w http.ResponseWriter, r *http.Request) {
+		start := time.Now()
+
+		// Do the work...
+		doWork(w, r)
+
+		// We saw a long request. Take a snapshot!
+		if time.Since(start) > 300*time.Millisecond {
+			// Do it only once for simplicitly, but you can take more than one.
+			once.Do(func() {
+				// Grab the snapshot.
+				var b bytes.Buffer
+				_, err = fr.WriteTo(&b)
+				if err != nil {
+					log.Print(err)
+					return
+				}
+				// Write it to a file.
+				if err := os.WriteFile("trace.out", b.Bytes(), 0o755); err != nil {
+					log.Print(err)
+					return
+				}
+			})
+		}
+	})
+	log.Fatal(http.ListenAndServe(":8080", nil))
+	// END OMIT
+}
+
+func doWork(_ http.ResponseWriter, _ *http.Request) {
+}
diff --git a/_content/blog/execution-traces-2024/gotooltrace.png b/_content/blog/execution-traces-2024/gotooltrace.png
new file mode 100644
index 0000000..43d07b0
--- /dev/null
+++ b/_content/blog/execution-traces-2024/gotooltrace.png
Binary files differ
diff --git a/_content/blog/execution-traces-2024/reader.go b/_content/blog/execution-traces-2024/reader.go
new file mode 100644
index 0000000..7fdf7cb
--- /dev/null
+++ b/_content/blog/execution-traces-2024/reader.go
@@ -0,0 +1,55 @@
+//go:build OMIT
+
+package main
+
+import (
+	"fmt"
+	"io"
+	"log"
+	"os"
+	"strings"
+
+	"golang.org/x/exp/trace"
+)
+
+func main() {
+	// START OMIT
+	// Start reading from STDIN.
+	r, err := trace.NewReader(os.Stdin)
+	if err != nil {
+		log.Fatal(err)
+	}
+
+	var blocked int
+	var blockedOnNetwork int
+	for {
+		// Read the event.
+		ev, err := r.ReadEvent()
+		if err == io.EOF {
+			break
+		} else if err != nil {
+			log.Fatal(err)
+		}
+
+		// Process it.
+		if ev.Kind() == trace.EventStateTransition {
+			st := ev.StateTransition()
+			if st.Resource.Kind == trace.ResourceGoroutine {
+				id := st.Resource.Goroutine()
+				from, to := st.GoroutineTransition()
+
+				// Look for goroutines blocking, and count them.
+				if from.Executing() && to == trace.GoWaiting {
+					blocked++
+					if strings.Contains(st.Reason, "network") {
+						blockedOnNetwork++
+					}
+				}
+			}
+		}
+	}
+	// Print what we found.
+	p := 100 * float64(blockedOnNetwork) / float64(blocked)
+	fmt.Printf("%2.3f%% instances of goroutines blocking were to block on the network\n", p)
+	// END OMIT
+}