slog-handler-guide: speed section

Change-Id: If06f51747e6e4926d95e88840ab5fdce5f9a0189
Reviewed-on: https://go-review.googlesource.com/c/example/+/513457
Reviewed-by: Ian Cottrell <iancottrell@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Jonathan Amsterdam <jba@google.com>
diff --git a/slog-handler-guide/README.md b/slog-handler-guide/README.md
index cadc501..3172f0e 100644
--- a/slog-handler-guide/README.md
+++ b/slog-handler-guide/README.md
@@ -894,6 +894,122 @@
 
 ## Speed
 
-TODO(jba): discuss
+Most programs don't need fast logging.
+Before making your handler fast, gather data&mdash;preferably production data,
+not benchmark comparisons&mdash;that demonstrates that it needs to be fast.
+Avoid premature optimization.
 
-TODO(jba): show how to pool a []byte.
+If you need a fast handler, start with pre-formatting. It may provide dramatic
+speed-ups in cases where a single call to `Logger.With` is followed by many
+calls to the resulting logger.
+
+If log output is the bottleneck, consider making your handler asynchronous.
+Do the minimal amount of processing in the handler, then send the record and
+other information over a channel. Another goroutine can collect the incoming log
+entries and write them in bulk and in the background.
+You might want to preserve the option to log synchronously
+so you can see all the log output to debug a crash.
+
+Allocation is often a major cause of a slow system.
+The `slog` package already works hard at minimizing allocations.
+If your handler does it own allocation, and profiling shows it to be
+a problem, then see if you can minimize it.
+
+One simple change you can make is to replace calls to `fmt.Sprintf` or `fmt.Appendf`
+with direct appends to the buffer. For example, our IndentHandler appends string
+attributes to the buffer like so:
+
+	buf = fmt.Appendf(buf, "%s: %q\n", a.Key, a.Value.String())
+
+As of Go 1.21, that results in two allocations, one for each argument passed to
+an `any` parameter. We can get that down to zero by using `append` directly:
+
+	buf = append(buf, a.Key...)
+	buf = append(buf, ": "...)
+	buf = strconv.AppendQuote(buf, a.Value.String())
+	buf = append(buf, '\n')
+
+Another worthwhile change is to use a `sync.Pool` to manage the one chunk of
+memory that most handlers need:
+the `[]byte` buffer holding the formatted output.
+
+Our example `Handle` method began with this line:
+
+	buf := make([]byte, 0, 1024)
+
+As we said above, providing a large initial capacity avoids repeated copying and
+re-allocation of the slice as it grows, reducing the number of allocations to
+one.
+But we can get it down to zero in the steady state by keeping a global pool of buffers.
+Initially, the pool will be empty and new buffers will be allocated.
+But eventually, assuming the number of concurrent log calls reaches a steady
+maximum, there will be enough buffers in the pool to share among all the
+ongoing `Handler` calls. As long as no log entry grows past a buffer's capacity,
+there will be no allocations from the garbage collector's point of view.
+
+We will hide our pool behind a pair of functions, `allocBuf` and `freeBuf`.
+The single line to get a buffer at the top of `Handle` becomes two lines:
+
+	bufp := allocBuf()
+	defer freeBuf(bufp)
+
+One of the subtleties involved in making a `sync.Pool` of slices
+is suggested by the variable name `bufp`: your pool must deal in
+_pointers_ to slices, not the slices themselves.
+Pooled values must always be pointers. If they aren't, then the `any` arguments
+and return values of the `sync.Pool` methods will themselves cause allocations,
+defeating the purpose of pooling.
+
+There are two ways to proceed with our slice pointer: we can replace `buf`
+with `*bufp` throughout our function, or we can dereference it and remember to
+re-assign it before freeing:
+
+	bufp := allocBuf()
+	buf := *bufp
+	defer func() {
+		*bufp = buf
+		freeBuf(bufp)
+	}()
+
+
+Here is our pool and its associated functions:
+
+```
+var bufPool = sync.Pool{
+	New: func() any {
+		b := make([]byte, 0, 1024)
+		return &b
+	},
+}
+
+func allocBuf() *[]byte {
+	return bufPool.Get().(*[]byte)
+}
+
+func freeBuf(b *[]byte) {
+	// To reduce peak allocation, return only smaller buffers to the pool.
+	const maxBufferSize = 16 << 10
+	if cap(*b) <= maxBufferSize {
+		*b = (*b)[:0]
+		bufPool.Put(b)
+	}
+}
+```
+
+The pool's `New` function does the same thing as the original code:
+create a byte slice with 0 length and plenty of capacity.
+The `allocBuf` function just type-asserts the result of the pool's
+`Get` method.
+
+The `freeBuf` method truncates the buffer before putting it back
+in the pool, so that `allocBuf` always returns zero-length slices.
+It also implements an important optimization: it doesn't return
+large buffers to the pool.
+To see why this important, consider what would happen if there were a single,
+unusually large log entry&mdash;say one that was a megabyte when formatted.
+If that megabyte-sized buffer were put in the pool, it could remain
+there indefinitely, constantly being reused, but with most of its capacity
+wasted.
+The extra memory might never be used again by the handler, and since it was in
+the handler's pool, it might never be garbage-collected for reuse elsewhere.
+We can avoid that situation by keeping large buffers out of the pool.
diff --git a/slog-handler-guide/guide.md b/slog-handler-guide/guide.md
index 229589b..92bcd26 100644
--- a/slog-handler-guide/guide.md
+++ b/slog-handler-guide/guide.md
@@ -612,6 +612,102 @@
 
 ## Speed
 
-TODO(jba): discuss
+Most programs don't need fast logging.
+Before making your handler fast, gather data&mdash;preferably production data,
+not benchmark comparisons&mdash;that demonstrates that it needs to be fast.
+Avoid premature optimization.
 
-TODO(jba): show how to pool a []byte.
+If you need a fast handler, start with pre-formatting. It may provide dramatic
+speed-ups in cases where a single call to `Logger.With` is followed by many
+calls to the resulting logger.
+
+If log output is the bottleneck, consider making your handler asynchronous.
+Do the minimal amount of processing in the handler, then send the record and
+other information over a channel. Another goroutine can collect the incoming log
+entries and write them in bulk and in the background.
+You might want to preserve the option to log synchronously
+so you can see all the log output to debug a crash.
+
+Allocation is often a major cause of a slow system.
+The `slog` package already works hard at minimizing allocations.
+If your handler does its own allocation, and profiling shows it to be
+a problem, then see if you can minimize it.
+
+One simple change you can make is to replace calls to `fmt.Sprintf` or `fmt.Appendf`
+with direct appends to the buffer. For example, our IndentHandler appends string
+attributes to the buffer like so:
+
+	buf = fmt.Appendf(buf, "%s: %q\n", a.Key, a.Value.String())
+
+As of Go 1.21, that results in two allocations, one for each argument passed to
+an `any` parameter. We can get that down to zero by using `append` directly:
+
+	buf = append(buf, a.Key...)
+	buf = append(buf, ": "...)
+	buf = strconv.AppendQuote(buf, a.Value.String())
+	buf = append(buf, '\n')
+
+Another worthwhile change is to use a `sync.Pool` to manage the one chunk of
+memory that most handlers need:
+the `[]byte` buffer holding the formatted output.
+
+Our example `Handle` method began with this line:
+
+	buf := make([]byte, 0, 1024)
+
+As we said above, providing a large initial capacity avoids repeated copying and
+re-allocation of the slice as it grows, reducing the number of allocations to
+one.
+But we can get it down to zero in the steady state by keeping a global pool of buffers.
+Initially, the pool will be empty and new buffers will be allocated.
+But eventually, assuming the number of concurrent log calls reaches a steady
+maximum, there will be enough buffers in the pool to share among all the
+ongoing `Handler` calls. As long as no log entry grows past a buffer's capacity,
+there will be no allocations from the garbage collector's point of view.
+
+We will hide our pool behind a pair of functions, `allocBuf` and `freeBuf`.
+The single line to get a buffer at the top of `Handle` becomes two lines:
+
+	bufp := allocBuf()
+	defer freeBuf(bufp)
+
+One of the subtleties involved in making a `sync.Pool` of slices
+is suggested by the variable name `bufp`: your pool must deal in
+_pointers_ to slices, not the slices themselves.
+Pooled values must always be pointers. If they aren't, then the `any` arguments
+and return values of the `sync.Pool` methods will themselves cause allocations,
+defeating the purpose of pooling.
+
+There are two ways to proceed with our slice pointer: we can replace `buf`
+with `*bufp` throughout our function, or we can dereference it and remember to
+re-assign it before freeing:
+
+	bufp := allocBuf()
+	buf := *bufp
+	defer func() {
+		*bufp = buf
+		freeBuf(bufp)
+	}()
+
+
+Here is our pool and its associated functions:
+
+%include indenthandler4/indent_handler.go pool -
+
+The pool's `New` function does the same thing as the original code:
+create a byte slice with 0 length and plenty of capacity.
+The `allocBuf` function just type-asserts the result of the pool's
+`Get` method.
+
+The `freeBuf` method truncates the buffer before putting it back
+in the pool, so that `allocBuf` always returns zero-length slices.
+It also implements an important optimization: it doesn't return
+large buffers to the pool.
+To see why this important, consider what would happen if there were a single,
+unusually large log entry&mdash;say one that was a megabyte when formatted.
+If that megabyte-sized buffer were put in the pool, it could remain
+there indefinitely, constantly being reused, but with most of its capacity
+wasted.
+The extra memory might never be used again by the handler, and since it was in
+the handler's pool, it might never be garbage-collected for reuse elsewhere.
+We can avoid that situation by keeping large buffers out of the pool.
diff --git a/slog-handler-guide/indenthandler4/indent_handler.go b/slog-handler-guide/indenthandler4/indent_handler.go
new file mode 100644
index 0000000..bf96c5d
--- /dev/null
+++ b/slog-handler-guide/indenthandler4/indent_handler.go
@@ -0,0 +1,208 @@
+//go:build go1.21
+
+package indenthandler
+
+import (
+	"context"
+	"fmt"
+	"io"
+	"log/slog"
+	"runtime"
+	"slices"
+	"strconv"
+	"sync"
+	"time"
+)
+
+// !+IndentHandler
+type IndentHandler struct {
+	opts           Options
+	preformatted   []byte   // data from WithGroup and WithAttrs
+	unopenedGroups []string // groups from WithGroup that haven't been opened
+	indentLevel    int      // same as number of opened groups so far
+	mu             *sync.Mutex
+	out            io.Writer
+}
+
+//!-IndentHandler
+
+type Options struct {
+	// Level reports the minimum level to log.
+	// Levels with lower levels are discarded.
+	// If nil, the Handler uses [slog.LevelInfo].
+	Level slog.Leveler
+}
+
+func New(out io.Writer, opts *Options) *IndentHandler {
+	h := &IndentHandler{out: out, mu: &sync.Mutex{}}
+	if opts != nil {
+		h.opts = *opts
+	}
+	if h.opts.Level == nil {
+		h.opts.Level = slog.LevelInfo
+	}
+	return h
+}
+
+func (h *IndentHandler) Enabled(ctx context.Context, level slog.Level) bool {
+	return level >= h.opts.Level.Level()
+}
+
+// !+WithGroup
+func (h *IndentHandler) WithGroup(name string) slog.Handler {
+	if name == "" {
+		return h
+	}
+	h2 := *h
+	// Add an unopened group to h2 without modifying h.
+	h2.unopenedGroups = make([]string, len(h.unopenedGroups)+1)
+	copy(h2.unopenedGroups, h.unopenedGroups)
+	h2.unopenedGroups[len(h2.unopenedGroups)-1] = name
+	return &h2
+}
+
+//!-WithGroup
+
+// !+WithAttrs
+func (h *IndentHandler) WithAttrs(attrs []slog.Attr) slog.Handler {
+	if len(attrs) == 0 {
+		return h
+	}
+	h2 := *h
+	// Force an append to copy the underlying array.
+	pre := slices.Clip(h.preformatted)
+	// Add all groups from WithGroup that haven't already been added.
+	h2.preformatted = h2.appendUnopenedGroups(pre, h2.indentLevel)
+	// Each of those groups increased the indent level by 1.
+	h2.indentLevel += len(h2.unopenedGroups)
+	// Now all groups have been opened.
+	h2.unopenedGroups = nil
+	// Pre-format the attributes.
+	for _, a := range attrs {
+		h2.preformatted = h2.appendAttr(h2.preformatted, a, h2.indentLevel)
+	}
+	return &h2
+}
+
+func (h *IndentHandler) appendUnopenedGroups(buf []byte, indentLevel int) []byte {
+	for _, g := range h.unopenedGroups {
+		buf = fmt.Appendf(buf, "%*s%s:\n", indentLevel*4, "", g)
+		indentLevel++
+	}
+	return buf
+}
+
+//!-WithAttrs
+
+// !+Handle
+func (h *IndentHandler) Handle(ctx context.Context, r slog.Record) error {
+	bufp := allocBuf()
+	buf := *bufp
+	defer func() {
+		*bufp = buf
+		freeBuf(bufp)
+	}()
+	if !r.Time.IsZero() {
+		buf = h.appendAttr(buf, slog.Time(slog.TimeKey, r.Time), 0)
+	}
+	buf = h.appendAttr(buf, slog.Any(slog.LevelKey, r.Level), 0)
+	if r.PC != 0 {
+		fs := runtime.CallersFrames([]uintptr{r.PC})
+		f, _ := fs.Next()
+		// Optimize to minimize allocation.
+		srcbufp := allocBuf()
+		defer freeBuf(srcbufp)
+		*srcbufp = append(*srcbufp, f.File...)
+		*srcbufp = append(*srcbufp, ':')
+		*srcbufp = strconv.AppendInt(*srcbufp, int64(f.Line), 10)
+		buf = h.appendAttr(buf, slog.String(slog.SourceKey, string(*srcbufp)), 0)
+	}
+
+	buf = h.appendAttr(buf, slog.String(slog.MessageKey, r.Message), 0)
+	// Insert preformatted attributes just after built-in ones.
+	buf = append(buf, h.preformatted...)
+	if r.NumAttrs() > 0 {
+		buf = h.appendUnopenedGroups(buf, h.indentLevel)
+		r.Attrs(func(a slog.Attr) bool {
+			buf = h.appendAttr(buf, a, h.indentLevel+len(h.unopenedGroups))
+			return true
+		})
+	}
+	buf = append(buf, "---\n"...)
+	h.mu.Lock()
+	defer h.mu.Unlock()
+	_, err := h.out.Write(buf)
+	return err
+}
+
+//!-Handle
+
+func (h *IndentHandler) appendAttr(buf []byte, a slog.Attr, indentLevel int) []byte {
+	// Resolve the Attr's value before doing anything else.
+	a.Value = a.Value.Resolve()
+	// Ignore empty Attrs.
+	if a.Equal(slog.Attr{}) {
+		return buf
+	}
+	// Indent 4 spaces per level.
+	buf = fmt.Appendf(buf, "%*s", indentLevel*4, "")
+	switch a.Value.Kind() {
+	case slog.KindString:
+		// Quote string values, to make them easy to parse.
+		buf = append(buf, a.Key...)
+		buf = append(buf, ": "...)
+		buf = strconv.AppendQuote(buf, a.Value.String())
+		buf = append(buf, '\n')
+	case slog.KindTime:
+		// Write times in a standard way, without the monotonic time.
+		buf = append(buf, a.Key...)
+		buf = append(buf, ": "...)
+		buf = a.Value.Time().AppendFormat(buf, time.RFC3339Nano)
+		buf = append(buf, '\n')
+	case slog.KindGroup:
+		attrs := a.Value.Group()
+		// Ignore empty groups.
+		if len(attrs) == 0 {
+			return buf
+		}
+		// If the key is non-empty, write it out and indent the rest of the attrs.
+		// Otherwise, inline the attrs.
+		if a.Key != "" {
+			buf = fmt.Appendf(buf, "%s:\n", a.Key)
+			indentLevel++
+		}
+		for _, ga := range attrs {
+			buf = h.appendAttr(buf, ga, indentLevel)
+		}
+
+	default:
+		buf = append(buf, a.Key...)
+		buf = append(buf, ": "...)
+		buf = append(buf, a.Value.String()...)
+		buf = append(buf, '\n')
+	}
+	return buf
+}
+
+// !+pool
+var bufPool = sync.Pool{
+	New: func() any {
+		b := make([]byte, 0, 1024)
+		return &b
+	},
+}
+
+func allocBuf() *[]byte {
+	return bufPool.Get().(*[]byte)
+}
+
+func freeBuf(b *[]byte) {
+	// To reduce peak allocation, return only smaller buffers to the pool.
+	const maxBufferSize = 16 << 10
+	if cap(*b) <= maxBufferSize {
+		*b = (*b)[:0]
+		bufPool.Put(b)
+	}
+}
+
+//!-pool
diff --git a/slog-handler-guide/indenthandler4/indent_handler_norace_test.go b/slog-handler-guide/indenthandler4/indent_handler_norace_test.go
new file mode 100644
index 0000000..bddfd4d
--- /dev/null
+++ b/slog-handler-guide/indenthandler4/indent_handler_norace_test.go
@@ -0,0 +1,46 @@
+//go:build go1.21 && !race
+
+package indenthandler
+
+import (
+	"fmt"
+	"io"
+	"log/slog"
+	"strconv"
+	"testing"
+)
+
+func TestAlloc(t *testing.T) {
+	a := slog.String("key", "value")
+	t.Run("Appendf", func(t *testing.T) {
+		buf := make([]byte, 0, 100)
+		g := testing.AllocsPerRun(2, func() {
+			buf = fmt.Appendf(buf, "%s: %q\n", a.Key, a.Value.String())
+		})
+		if g, w := int(g), 2; g != w {
+			t.Errorf("got %d, want %d", g, w)
+		}
+	})
+	t.Run("appends", func(t *testing.T) {
+		buf := make([]byte, 0, 100)
+		g := testing.AllocsPerRun(2, func() {
+			buf = append(buf, a.Key...)
+			buf = append(buf, ": "...)
+			buf = strconv.AppendQuote(buf, a.Value.String())
+			buf = append(buf, '\n')
+		})
+		if g, w := int(g), 0; g != w {
+			t.Errorf("got %d, want %d", g, w)
+		}
+	})
+
+	t.Run("Handle", func(t *testing.T) {
+		l := slog.New(New(io.Discard, nil))
+		got := testing.AllocsPerRun(10, func() {
+			l.LogAttrs(nil, slog.LevelInfo, "hello", slog.String("a", "1"))
+		})
+		if g, w := int(got), 6; g > w {
+			t.Errorf("got %d, want at most %d", g, w)
+		}
+	})
+}
diff --git a/slog-handler-guide/indenthandler4/indent_handler_test.go b/slog-handler-guide/indenthandler4/indent_handler_test.go
new file mode 100644
index 0000000..8f6e99c
--- /dev/null
+++ b/slog-handler-guide/indenthandler4/indent_handler_test.go
@@ -0,0 +1,107 @@
+//go:build go1.21
+
+package indenthandler
+
+import (
+	"bytes"
+	"log/slog"
+	"reflect"
+	"regexp"
+	"testing"
+	"testing/slogtest"
+
+	"gopkg.in/yaml.v3"
+)
+
+// !+TestSlogtest
+func TestSlogtest(t *testing.T) {
+	var buf bytes.Buffer
+	err := slogtest.TestHandler(New(&buf, nil), func() []map[string]any {
+		return parseLogEntries(t, buf.Bytes())
+	})
+	if err != nil {
+		t.Error(err)
+	}
+}
+
+// !-TestSlogtest
+
+func Test(t *testing.T) {
+	var buf bytes.Buffer
+	l := slog.New(New(&buf, nil))
+	l.Info("hello", "a", 1, "b", true, "c", 3.14, slog.Group("g", "h", 1, "i", 2), "d", "NO")
+	got := buf.String()
+	wantre := `time: [-0-9T:.]+Z?
+level: INFO
+source: ".*/indent_handler_test.go:\d+"
+msg: "hello"
+a: 1
+b: true
+c: 3.14
+g:
+    h: 1
+    i: 2
+d: "NO"
+`
+	re := regexp.MustCompile(wantre)
+	if !re.MatchString(got) {
+		t.Errorf("\ngot:\n%q\nwant:\n%q", got, wantre)
+	}
+
+	buf.Reset()
+	l.Debug("test")
+	if got := buf.Len(); got != 0 {
+		t.Errorf("got buf.Len() = %d, want 0", got)
+	}
+}
+
+// !+parseLogEntries
+func parseLogEntries(t *testing.T, data []byte) []map[string]any {
+	entries := bytes.Split(data, []byte("---\n"))
+	entries = entries[:len(entries)-1] // last one is empty
+	var ms []map[string]any
+	for _, e := range entries {
+		var m map[string]any
+		if err := yaml.Unmarshal([]byte(e), &m); err != nil {
+			t.Fatal(err)
+		}
+		ms = append(ms, m)
+	}
+	return ms
+}
+
+// !-parseLogEntries
+
+func TestParseLogEntries(t *testing.T) {
+	in := `
+a: 1
+b: 2
+c: 3
+g:
+    h: 4
+    i: five
+d: 6
+---
+e: 7
+---
+`
+	want := []map[string]any{
+		{
+			"a": 1,
+			"b": 2,
+			"c": 3,
+			"g": map[string]any{
+				"h": 4,
+				"i": "five",
+			},
+			"d": 6,
+		},
+		{
+			"e": 7,
+		},
+	}
+	got := parseLogEntries(t, []byte(in[1:]))
+	if !reflect.DeepEqual(got, want) {
+		t.Errorf("\ngot:\n%v\nwant:\n%v", got, want)
+	}
+}