design/37112-unstable-runtime-metrics.md: add proposal

This change adds a proposal and design document for addition of a
runtime metrics package which is designed to support adding and removing
runtime metrics without breaking the Go 1 compatibility promise.

Updates golang/go#37112.

Change-Id: I550740970ad74c5c71d712735e4984808bdbf463
Reviewed-on: https://go-review.googlesource.com/c/proposal/+/218677
Reviewed-by: Austin Clements <austin@google.com>
diff --git a/design/37112-unstable-runtime-metrics.md b/design/37112-unstable-runtime-metrics.md
new file mode 100644
index 0000000..b92f39b
--- /dev/null
+++ b/design/37112-unstable-runtime-metrics.md
@@ -0,0 +1,585 @@
+# Proposal: API for unstable runtime metrics
+
+Author: Michael Knyszek
+
+## Background & Motivation
+
+The need for a new API for unstable metrics was already summarized quite well by
+@aclements, so I'll quote that here:
+
+> The runtime currently exposes heap-related metrics through
+> `runtime.ReadMemStats` (which can be used programmatically) and
+> `GODEBUG=gctrace=1` (which is difficult to read programmatically).
+> These metrics are critical to understanding runtime behavior, but have some
+> serious limitations:
+> 1. `MemStats` is hard to evolve because it must obey the Go 1 compatibility
+>    rules.
+>    The existing metrics are confusing, but we can't change them.
+>    Some of the metrics are now meaningless (like `EnableGC` and `DebugGC`),
+>    and several have aged poorly (like hard-coding the number of size classes
+>    at 61, or only having a single pause duration per GC cycle).
+>    Hence, we tend to shy away from adding anything to this because we'll have
+>    to maintain it for the rest of time.
+> 1. The `gctrace` format is unspecified, which means we can evolve it (and have
+>    completely changed it several times).
+>    But it's a pain to collect programmatically because it only comes out on
+>    stderr and, even if you can capture that, you have to parse a text format
+>    that changes.
+>    Hence, automated metric collection systems ignore gctrace.
+>    There have been requests to make this programmatically accessible (#28623).
+> There are many metrics I would love to expose from the runtime memory manager
+> and scheduler, but our current approach forces me to choose between two bad
+> options: programmatically expose metrics that are so fundamental they'll make
+> sense for the rest of time, or expose unstable metrics in a way that's
+> difficult to collect and process programmatically.
+
+Other problems with `ReadMemStats` include performance, such as the need to
+stop-the-world.
+While it's otherwise difficult to collect many of the metrics in `MemStats`, not
+all metrics require it, and it would be nice to be able to acquire some subset
+of metrics without a global application penalty.
+
+## Requirements
+
+Conversing with @aclements, we agree that:
+* The API should be easily extendable with new metrics.
+* The API should be easily retractable, to deprecate old metrics.
+    * Removing a metric should not break any Go applications as per the Go 1
+      compatibility promise.
+* The API should be discoverable, to obtain a list of currently relevant
+  metrics.
+* The API should be rich, allowing a variety of metrics (e.g. distributions).
+* The API implementation should minimize CPU/memory usage, such that it does not
+  appreciably affect any of the metrics being measured.
+* The API should include useful existing metrics already exposed by the runtime.
+
+## Goals
+
+Given the requirements, I suggest we prioritize the following concerns when
+designing the API in the following order.
+
+1. Extensibility.
+    * Metrics are "unstable" and therefore it should always be compatible to add
+      or remove metrics.
+    * Since metrics will tend to be implementation-specific, this feature is
+      critical.
+1. Discoverability.
+    * Because these metrics are "unstable," there must be a way for the
+      application, and for the human writing the application, to discover the
+      set of usable metrics and be able to do something useful with that
+      information (e.g. log the metric).
+    * The API should enable collecting a subset of metrics programmatically.
+      For example, one might want to "collect all memory-related metrics" or
+      "collect all metrics which are efficient to collect".
+1. Performance.
+    * Must have a minimized effect on the metrics it returns in the
+      steady-state.
+    * Should scale up to 100s metrics, an amount that a human might consider "a
+      lot."
+        * Note that picking the right types to expose can limit the amount of
+          metrics we need to expose.
+          For example, a distribution type would significantly reduce the number
+          of metrics.
+1. Ergonomics.
+    * The API should be as easy to use as it can be, given the above.
+
+## Design
+
+I propose we add a new standard library package to support a new runtime metrics
+API to avoid polluting the namespace of existing packages.
+The proposed name of the package is the `runtime/metrics` package.
+
+I propose that this package expose a sampling-based API for acquiring runtime
+metrics, in the same vein as `runtime.ReadMemStats`, that meets this proposal's
+stated goals.
+The sampling approach is taken in opposition to a stream-based (or event-based)
+API.
+Many of the metrics currently exposed by the runtime are "continuous" in the
+sense that they're cheap to update and are updated frequently enough that
+emitting an event for every update would be quite expensive, and would require
+scaffolding to allow the user to control the emission rate.
+Unless noted otherwise, this document will assume a sampling-based API.
+
+With that said, I believe that in the future it will be worthwhile to expose an
+event-based API as well, taking a hybrid approach, much like Linux's `perf`
+tool.
+See "Time series data" for a discussion of such an extension.
+
+### Representation of metrics
+
+Firstly, it probably makes the most sense to interact with a set of metrics,
+rather than one metric at a time.
+Many metrics require that the runtime reach some safe state to collect, so
+naturally it makes sense to collect all such metrics at this time for
+performance.
+For the rest of this document, we're going to consider "sets of metrics" as the
+unit of our API instead of individual metrics for this reason.
+
+Second, the extendability and retractability requirements imply a less rigid
+data structure to represent and interact with a set of metrics.
+Perhaps the least rigid data structure in Go is something like a byte slice, but
+this is decidedly too low-level to use from within a Go application because it
+would need to have an encoding.
+Simply defining a new encoding for this would be a non-trivial undertaking with
+its own complexities.
+
+The next least-rigid data structure is probably a Go map, which allows us to
+associate some key for a metric with a sampled metric value.
+The two most useful properties of maps here is that their set of keys is
+completely dynamic, and that they allow efficient random access.
+The inconvenience of a map though is its undefined iteration order.
+While this might not matter if we're just constructing an RPC message to hit an
+API, it does matter if one just wants to print statistics to STDERR every once
+in a while for debugging.
+
+A slightly more rigid data structure would be useful for managing an unstable
+set of metrics is a slice of structs, with each struct containing a key (the
+metric name) and a value.
+This allows us to have a well-defined iteration order, and it's up to the user
+if they want efficient random access.
+For example, they could keep the slice sorted by metric keys, and do a binary
+search over them, or even have a map on the side.
+
+There are several variants of this slice approach (e.g. struct of keys slice and
+values slice), but I think the general idea of using slices of key-value pairs
+strikes the right balance between flexibility and usability.
+Going any further in terms of rigidity and we end up right where we don't want
+to be: with a `MemStats`-like struct.
+
+Third, I propose the metric key be something abstract but still useful for
+humans, such as a string.
+An alternative might be an integral ID, where we provide a function to obtain a
+metric's name from its ID.
+However, using an ID pollutes the API.
+Since we want to allow a user to ask for specific metrics, we would be required
+to provide named constants for each metric which would later be deprecated.
+It's also unclear that this would give any performance benefit at all.
+
+Finally, we want the metric value to be able to take on a variety of forms.
+Many metrics might work great as `uint64` values, but most do not.
+For example we might want to collect a distribution of values (size classes are
+one such example).
+Distributions in particular can take on many different forms, for example if we
+wanted to have an HDR histogram of STW pause times.
+In the interest of being as extensible as possible, something like an empty
+interface value works well here.
+
+Putting this all together, I propose sampled metric values look like
+
+```go
+// Sample captures a single metric sample.
+type Sample struct {
+  Name string
+  Value interface{}
+}
+
+// Read populates a slice of samples.
+func Read(m []Sample)
+```
+
+### Efficiently populating a `[]struct{Name string, Value interface{}}`
+
+Returning a `[]struct{Name string, Value interface{}}` on each call to the API
+would cause potentially many allocations, which could mean a significant impact
+on the performance of metrics collection in the steady-state (and also a skew in
+the metrics themselves!).
+
+To remedy this, we can do what `ReadMemStats` does: take a pointer to allocated
+memory and populate it with values.
+In this case, the first call may need to populate each Value field in the struct
+with a new allocation.
+However, since each metric is stable for the lifetime of the application binary
+(because its stability is tied to the runtime's implementation), we can re-use
+the same slice and all its values on subsequent calls without allocation,
+provided that each Value field contains a pointer type.
+For example, the Value field would contain a `*int64` instead of `int64`.
+Using a non-pointer-typed value in the interface would require allocation on
+every call; whereas using a pointer-typed value requires an initial allocation,
+but that allocation can be reused on subsequent calls.
+
+### Discoverability
+
+To support discovering which metrics the system supports, we must provide a
+function that returns the set of supported metric keys.
+
+I propose that the discovery API return a slice of "metric descriptors" which
+contain a "Name" field referring to a metric key.
+Using a slice here mirrors the sampling API.
+
+#### Metric naming
+
+Choosing a naming scheme for each metric will significantly influence its usage,
+since these are the names that will eventually be surfaced to the user.
+There are two important properties we would like to have such that these metric
+names may be smoothly and correctly exposed to the user.
+
+The first, and perhaps most important of these properties is that semantics be
+tied to their name.
+If the semantics (including the type of each sample value) of a metric changes,
+then the name should too.
+
+The second is that the name should be easily parsable and mechanically
+rewritable, since different metric collection systems have different naming
+conventions.
+
+Putting these two together, I propose that the metric name be built from two
+components: its English name, and its unit (e.g. bytes, seconds).
+I propose we separate the two components of "name" and "unit" by a colon (":")
+and provide a well-defined format for the unit.
+
+The use of an English name is in some ways not much of a deviation from
+`ReadMemStats`, which uses Go identifiers for naming.
+I propose that we mostly stick to the current convention and use UpperCamelCase
+consisting of only uppercase and lowercase characters from the latin alphabet.
+The introduction of this new API is also a good time to rename some of the more
+vaguely named statistics, and perhaps to introduce a better namespacing
+convention.
+Austin suggested using a common prefixes for namespacing such as "GC" or
+"Sched," which seems good enough to me.
+
+Including the unit in the name may be a bit surprising at first.
+First of all, why should the unit even be a string? One alternative way to
+represent the unit is to use some structured format, but this has the potential
+to lock us into some bad decisions or limit us to only a certain subset of
+units.
+Using a string gives us more flexibility to extend the units we support in the
+future.
+Thus, I propose that no matter what we do, we should definitely keep the unit as
+a string.
+
+In terms of a format for this string, I think we should keep the unit closely
+aligned with the Go benchmark output format to facilitate a nice user experience
+for measuring these metrics within the Go testing framework.
+This goal suggests the following very simple format: a series of all-lowercase
+common base unit names, singular or plural, without SI prefixes (such as
+"seconds" or "bytes", not "nanoseconds" or "MiB"), potentially containing
+hyphens (e.g. "cpu-seconds"), delimited by either `*` or `/` characters.
+A regular expression is sufficient to describe the format, and ignoring the
+restriction of common base unit names, would look like
+`^[a-z-]+(?:[*\/][a-z-]+)*$`.
+
+Why should the unit be a part of the name? Mainly to help maintain the first
+property mentioned above.
+If we decide to change a metric's unit, which represents a semantic change, then
+the name must also change.
+Also, in this situation, it's much more difficult for a user to forget to
+include the unit.
+If their metric collection system has no rules about names, then great, they can
+just use whatever Go gives them.
+If they do (and most seem to be fairly opinionated) it forces the user to
+account for the unit when dealing with the name and it lessens the chance that
+it would be forgotten.
+Furthermore, splitting a string is typically less computationally expensive than
+combining two strings.
+
+#### Metric Descriptors
+
+Firstly, any metric descriptor must contain the name of the metric.
+No matter which way we choose to store a set of descriptions, it is both useful
+and necessary to carry this information around.
+Another useful field is the unit of the metric.
+As mentioned above in discussing metric naming, I propose that the unit be kept
+as part of the name.
+
+The metric descriptor should also indicate the performance sensitivity of the
+metric.
+Today `ReadMemStats` forces the user to endure a stop-the-world to collect all
+metrics.
+There are a number of pieces of information we could add, but one good one for
+now would be "does this metric require a stop-the-world event?".
+The intended use of such information would be to collect certain metrics less
+often, or to exclude them altogether from metrics collection.
+While this is fairly implementation-specific for metadata, the majority of
+tracing GC designs involve a stop-the-world event at one point or another.
+
+Another useful aspect of a metric descriptor would be to indicate whether the
+metric is a "gauge" or a "counter" (i.e. it increases monotonically).
+We have examples of both in the runtime and this information is often useful to
+bubble up to metrics collection systems to influence how they're displayed and
+what operations are valid on them (e.g. counters are often more usefully viewed
+as rates).
+By including whether a metric is a gauge or a counter in the descriptions,
+metrics collection systems don't have to try to guess, and users don't have to
+annotate exported metrics manually; they can do so programmatically.
+
+### Time series metrics
+
+The API as described so far has been a sampling-based API, but many metrics are
+updated at well-defined (and relatively infrequent) intervals, such as many of
+the metrics found in the `gctrace` output.
+These metrics, which I'll call "time series metrics," may be sampled, but the
+sampling operation is inherently lossy.
+In many cases it's very useful for performance debugging to have precise
+information of how a metric might change e.g. from GC cycle to GC cycle.
+
+Measuring such metrics thus fits better in an event-based, or stream-based API,
+which emits a stream of metric values (tagged with precise timestamps) which are
+then ingested by the application and logged someplace.
+
+While we stated earlier that considering such time series metrics is outside of
+the scope of this proposal, it's worth noting that buying into a sampling-based
+API today does not close any doors toward exposing precise time series metrics
+in the future.
+A straightforward way of extending the API would be to add the time series
+metrics to the total list of metrics, allowing the usual sampling-based approach
+if desired, while also tagging some metrics with a "time series" flag in their
+descriptions.
+The event-based API, in that form, could then just be a pure addition.
+
+A feasible alternative in this space is to only expose a sampling API, but to
+include a timestamp on event metrics to allow users to correlate metrics with
+specific events.
+For example, if metrics came from the previous GC, they would be tagged with the
+timestamp of that GC, and if the metric and timestamp hadn't changed, the user
+could identify that.
+
+One interesting consequence of having an event-based API which is prompt is that
+users could then to Go runtime state on-the-fly, such as for detecting when the
+GC is running.
+On the one hand, this could provide value to some users of Go, who require
+fine-grained feedback from the runtime system.
+On the other hand, the supported metrics will still always be unstable, so
+relying on a metric for feedback in one release might no longer be possible in a
+future release.
+
+## Draft API Specification
+
+Given the discussion of the design above, I propose the following draft API
+specification.
+
+```go
+package metrics
+
+// Metric describes a runtime metric.
+type Metric struct {
+  // Name is the full name of the metric which includes the unit.
+  //
+  // The format of the metric may be described by the following regular expression.
+  // ^(?P<name>[^:]+):(?P<unit>[^:*\/]+(?:[*\/][^:*\/]+)*)$
+  //
+  // The format splits the name into two components, separated by a colon: a human-readable
+  // name and a computer-parseable unit. The name may only contain characters in the lowercase
+  // and uppercase latin alphabet, and by convention will be UpperCamelCase.
+  //
+  // The unit is a series of lowercase English unit names (singular or plural) without
+  // prefixes (but potentially containing hyphens) delimited by ‘*' or ‘/'. For example
+  // "seconds", "bytes", "bytes/second", "cpu-seconds", "byte*cpu-seconds", and
+  // "bytes/second/second" are all valid. The value will never contain whitespace.
+  //
+  // A complete name might look like "GCPauseTimes:seconds".
+  Name string
+
+  // Cumulative is whether or not the metric is cumulative. If a cumulative metric is just
+  // a single number, then it increases monotonically. If the metric is a distribution,
+  // then each bucket count increases monotonically.
+  //
+  // This flag thus indicates whether or not it's useful to compute a rate from this value.
+  Cumulative bool
+
+  // StopTheWorld is whether or not the metric requires a stop-the-world
+  // event in order to collect it.
+  StopTheWorld bool
+}
+
+// Histogram is an interface for a distribution of a runtime metric.
+type Histogram interface {
+  // Buckets returns a range of values represented by each bucket.
+  //
+  // The valid return types are one of `[]float64` or `[]time.Duration`.
+  // More valid return types may be added in the future, and the caller
+  // should be prepared to handle them.
+  //
+  // The slice contains the boundaries between buckets, in increasing order.
+  // There are len(slice)+1 total buckets: a bucket for all values less than
+  // the first boundary, a bucket covering each [slice[i], slice[i+1]) interval,
+  // and a bucket for all values greater than or equal to the last boundary.
+  Buckets() interface{}
+
+  // Counts populates the given slice with weights for each histogram
+  // bucket. The length of this slice should be the length of the slice
+  // returned by Buckets, plus one to account for the implicit minimum
+  // bucket. If the given slice is too small, this method will panic.
+  //
+  // Given N buckets, the following is the mathematical relationship between
+  // Counts and Buckets.
+  // count[0] is the weight of the range (-inf, bucket[0])
+  // count[n] is the weight of the range [bucket[n], bucket[n+1]), for 0 < n < N-1
+  // count[N-1] is the weight of the range [bucket[N-1], inf)
+  Counts([]uint64)
+
+  // ValueSum returns the sum of all the values added to the distribution.
+  //
+  // Note that this sum is exact, so it cannot be computed from Buckets and
+  // Counts. This value is useful for computing an accurate mean.
+  //
+  // The valid return types are one of `float64` or `time.Duration`.
+  ValueSum() interface{}
+}
+
+// Descriptions returns a slice of metric descriptions for all metrics.
+func Descriptions() []Metric
+
+// Sample captures a single metric sample.
+type Sample struct {
+  // Name is the name of the metric sampled.
+  //
+  // It must correspond to a name in one of the metric descriptions
+  // returned by Descriptions.
+  Name string
+
+  // Value is the value of the metric sample.
+  //
+  // The valid set of types which this field may take on are *uint64,
+  // *int64, *float64, *time.Duration, and Histogram.
+  //
+  // This set of types may expand in the future, but will never shrink.
+  Value interface{}
+}
+
+// Read populates the given slice of metric samples.
+//
+// Desired metrics should be present in the slice with the appropriate name.
+//
+// The first time Read is called, it will populate each value's
+// Value field with a properly sized allocation, which may then be
+// re-used by subsequent calls to Read. The user is therefore
+// encouraged to re-use the same slice between calls.
+//
+// Metric values with names not appearing in the value returned by Descriptions
+// will simply be left untouched.
+func Read(m []Sample)
+```
+
+The usage of the API we have in mind for collecting specific metrics is the
+following:
+
+```go
+var stats = []metrics.Sample{
+  {Name: "GCHeapGoal:bytes"},
+  {Name: "GCPauses:seconds"},
+}
+
+// Somewhere...
+...
+  go statsLoop(stats)
+...
+
+func statsLoop(stats []metrics.Sample, d time.Duration) {
+  // Read and print stats every 30 seconds.
+  ticker := time.NewTicker(30*time.Second)
+  for {
+    metrics.Read(stats)
+    for _, sample := range stats {
+      split := strings.IndexByte(sample.Name, ‘:')
+      name, unit := sample.Name[:split], sample.Name[split+1:]
+      switch v := value.(type) {
+      case *int64:
+        log.Printf("%s: %s %d %s", name, *v, unit)
+      case *uint64:
+        log.Printf("%s: %s %d %s", name, *v, unit)
+      case *float64:
+        log.Printf("%s: %s %f %s", name, *v, unit)
+      case *time.Duration:
+        log.Printf("%s: %s %s %s", name, *v)
+      case Histogram:
+        log.Printf("%s: %s mean %f %s", name, v.ValueSum()/v.CountSum(), unit)
+      }
+    }
+    <-ticker.C
+  }
+}
+```
+
+I believe common usage will be to simply slurp up all metrics, which would look
+like this:
+
+```go
+...
+  // Generate a sample array for all the metrics.
+  desc := metrics.Descriptions()
+  stats := make([]metric.Sample, len(desc))
+  for _, desc := range {
+    stats = append(stats, metric.Sample{Name: desc.Name})
+  }
+  go statsLoop(stats)
+...
+```
+
+## Proposed initial list of metrics
+
+### Existing metrics
+
+```
+GCHeapFree:bytes        *uint64 // (== HeapIdle - HeapReleased)
+GCHeapUncommitted:bytes *uint64 // (== HeapReleased)
+GCHeapObject:bytes      *uint64 // (== HeapAlloc)
+GCHeapUnused:bytes      *uint64 // (== HeapInUse - HeapAlloc)
+StackInUse:bytes        *uint64 // (== StackInuse)
+StackOther:bytes        *uint64 // (== StackSys - StackInuse)
+
+GCHeapObjects:objects          *uint64 // (== HeapObjects)
+GCMSpanInUse:bytes             *uint64 // (== MSpanInUse)
+GCMSpanFree:bytes              *uint64 // (== MSpanSys - MSpanInUse)
+GCMCacheInUse:bytes            *uint64 // (== MCacheInUse)
+GCMCacheFree:bytes             *uint64 // (== MCacheSys - MCacheInUse)
+GCCount:completed-cycles       *uint64 // (== NumGC)
+GCForcedCount:completed-cycles *uint64 // (== NumForcedGC)
+ProfilingBucketMemory:bytes    *uint64 // (== BuckHashSys)
+GCMetadata:bytes               *uint64 // (== GCSys)
+RuntimeOtherMemory:bytes       *uint64 // (== OtherSys)
+
+// (== GCHeap.* + StackInUse + StackOther + GCMSpan.* + GCMCache.* +
+// ProfilingBucketMemory + GCMetadata + RuntimeOtherMemory)
+RuntimeVirtualMemory:bytes *uint64
+
+GCHeapGoal:bytes *uint64 // (== NextGC)
+```
+
+## New GC metrics
+
+```
+// Distribution of what fraction of CPU time was spent in each GC cycle.
+GCCPUPercent:cpu-percent Histogram
+
+// Distribution of pause times, replaces PauseNs and PauseTotalNs.
+GCPauses:seconds Histogram
+
+// Distribution of unsmoothed trigger ratio.
+GCTriggerRatios:ratio Histogram
+
+// Distribution of objects by size.
+// Buckets correspond directly to size classes up to 32 KiB,
+// after that it's approximated by an HDR histogram.
+// GCHeapAllocations replaces BySize, TotalAlloc, and Mallocs.
+// GCHeapFrees replaces BySize and Frees.
+GCHeapAllocations:bytes Histogram
+GCHeapFrees:bytes       Histogram
+
+// Distribution of allocations satisfied by the page cache.
+// Buckets are exact since there are only 16 options.
+GCPageCacheAllocations:bytes Histogram
+
+// Distribution of stack scanning latencies. HDR histogram.
+GCStackScans:seconds Histogram
+```
+
+## Scheduler metrics
+
+```
+SchedGoroutines:goroutines        *uint64
+SchedAsyncPreemptions:preemptions *uint64
+
+// Distribution of how long goroutines stay in runnable
+// before transitioning to running. HDR histogram.
+SchedTimesToRun:seconds Histogram
+```
+
+## Backwards Compatibility
+
+Note that although the set of metrics the runtime exposes will not be stable
+across Go versions, the API to discover and access those metrics will be.
+
+Therefore, this proposal strictly increases the API surface of the Go standard
+library without changing any existing functionality and is therefore Go 1
+compatible.
+