design/48815-custom-fuzz-input-types.md: new design proposal

For golang/go#48815.

Change-Id: I021e4517940ff073254d9d56fcca623f4e2ed460
Reviewed-on: https://go-review.googlesource.com/c/proposal/+/493637
Reviewed-by: Ian Lance Taylor <iant@golang.org>
diff --git a/design/48815-custom-fuzz-input-types.md b/design/48815-custom-fuzz-input-types.md
new file mode 100644
index 0000000..25141db
--- /dev/null
+++ b/design/48815-custom-fuzz-input-types.md
@@ -0,0 +1,250 @@
+# Proposal: Custom Fuzz Input Types
+
+Author: Richard Hansen <rhansen@rhansen.org>
+
+Last updated: 2023-05-10
+
+Discussion at https://go.dev/issue/48815.
+
+## Abstract
+
+Extend [`testing.F.Fuzz`](https://pkg.go.dev/testing#F.Fuzz) to support custom
+types, with their own custom mutation logic, as input parameters. This enables
+developers to perform [structure-aware
+fuzzing](https://github.com/google/fuzzing/blob/master/docs/structure-aware-fuzzing.md).
+
+## Background
+
+As of Go 1.20, `testing.F.Fuzz` only accepts fuzz functions that have basic
+parameter types: `[]byte`, `string`, `int`, etc. Custom input types with custom
+mutation logic would make it easier to fuzz functions that take complex data
+structures as input.
+
+It is technically possible to fuzz such functions using the basic types, but the
+benefit is limited:
+
+  * A basic input type can be used as a pseudo-random number generator seed to
+    generate a valid structure at test time. Downsides:
+      * The seed, not the generated structure, is saved in
+        `testdata/fuzz/FuzzTestName/*`. This makes it difficult for developers
+        to examine the structure to figure out why it is interesting. It also
+        means that a minor change to the structure generation algorithm can
+        invalidate the entire seed corpus.
+      * A problematic or interesting structure discovered or created outside of
+        fuzzing cannot be added to the seed corpus.
+      * `F.Fuzz` cannot distinguish the structure generation code from the code
+        under test, so the structure generation code is instrumented and
+        included in `F.Fuzz`'s analysis. This causes unnecessary slowdowns and
+        false positives (uninteresting inputs treated as interesting due to
+        changed coverage).
+      * `F.Fuzz` has limited ability to explore or avoid "similar" inputs in its
+        pursuit of new execution paths. (Similar seeds produce pseudo-randomly
+        independent structures.)
+  * Multiple input values can be used to populate the fields of the complex
+    structure. This has many of the same downsides as using a single seed
+    input.
+  * Raw input values can be cast as (an encoding of) the complex structure. For
+    example, a `[]byte` input could be interpreted as a protobuf. Depending on
+    the specifics, the yield of this approach (the number of bugs it finds) is
+    likely to be low due to the low probability of generating a syntactically
+    and semantically valid structure. (Sometimes it is important to attempt
+    invalid structures to exercise error handling and discover security
+    vulnerabilities, but this does not apply to function call traces that are
+    replayed to test a stateful system.)
+
+See [Structure-Aware Fuzzing with
+libFuzzer](https://github.com/google/fuzzing/blob/master/docs/structure-aware-fuzzing.md)
+for additional background.
+
+## Proposal
+
+Extend `testing.F.Fuzz` to accept fuzz functions with parameter types that
+implement the following interface (not exported, just documented in
+`testing.F.Fuzz`):
+
+```go
+// A customMutator is a fuzz input value that is self-mutating. This interface
+// extends the encoding.BinaryMarshaler and encoding.BinaryUnmarshaler
+// interfaces.
+type customMutator interface {
+	// MarshalBinary encodes the customMutator's value in a platform-independent
+	// way (e.g., JSON or Protocol Buffers).
+	MarshalBinary() ([]byte, error)
+	// UnmarshalBinary restores the customMutator's value from encoded data
+	// previously returned from a call to MarshalBinary.
+	UnmarshalBinary([]byte) error
+	// Mutate pseudo-randomly transforms the customMutator's value. The mutation
+	// must be deterministic: every call to Mutate with the same starting value
+	// and seed must result in the same transformed value.
+	Mutate(seed int64) error
+}
+```
+
+Also extend the seed corpus file format to support custom values. A line for a
+custom value has the following form:
+
+```
+custom("type identifier here", []byte("marshal output here"))
+```
+
+The type identifier is a globally unique and stable identifier derived from the
+value's fully qualified type name, such as `"*example.com/mod/pkg.myType"`.
+
+### Example Usage
+
+```go
+package pkg_test
+
+import (
+	"encoding/json"
+	"testing"
+
+	"github.com/go-loremipsum/loremipsum"
+)
+
+type fuzzInput struct{ Word string }
+
+func (v *fuzzInput) MarshalBinary() ([]byte, error) { return json.Marshal(v) }
+func (v *fuzzInput) UnmarshalBinary(d []byte) error { return json.Unmarshal(d, v) }
+func (v *fuzzInput) Mutate(seed int64) error {
+	v.Word = loremipsum.NewWithSeed(seed).Word()
+	return nil
+}
+
+func FuzzInput(f *testing.F) {
+	f.Fuzz(func(t *testing.T, v *fuzzInput) {
+		if v.Word == "lorem" {
+			t.Fatal("boom!")
+		}
+	})
+}
+```
+
+The fuzzer eventually encounters an input value that causes the test function to
+fail, and produces a seed corpus file in `testdata/fuzz` like the following:
+
+```
+go test fuzz v1
+custom("*example.com/mod/pkg_test.fuzzInput", []byte("{\"Word\":\"lorem\"}"))
+```
+
+## Rationale
+
+### `MarshalBinary`, `UnmarshalBinary` methods
+
+`Marshal` and `Unmarshal` would be shorter to type than `MarshalBinary` and
+`UnmarshalBinary`, but the longer names make it easier to extend existing types
+that already implement the `encoding.BinaryMarshaler` and
+`encoding.BinaryUnmarshaler` interfaces.
+
+`MarshalText` and `UnmarshalText` were considered but rejected because the most
+natural representation of a custom type might be binary, not text.
+
+`UnmarshalBinary` is used both to load seed corpus files from disk and to
+transmit input values between the coordinator and its workers. Unmarshaling
+malformed data from disk is allowed to fail, but unmarshaling after
+transmission to another process is expected to always succeed.
+
+`MarshalBinary` is used both to save seed corpus files to disk and to transmit
+input values between the coordinator and its workers. Marshaling is expected to
+always succeed. Despite this, it returns an error for several reasons:
+
+  * to implement the `encoding.BinaryMarshaler` interface
+  * for symmetry with `UnmarshalBinary`
+  * to match the APIs provided by packages such as `encoding/json` and
+    `encoding/gob`
+  * to discourage the use of `panic`
+
+Panicking is especially problematic because:
+
+  * The coordinator process currently interprets a panic as a bug in the code
+    under test, even if it happens outside of the test function.
+  * Worker process stdout and stderr is currently suppressed, presumably to
+    [reduce the amount of output
+    noise](https://github.com/golang/go/blob/aa4d5e739f32397969fd5c33cbc95d316686039f/src/testing/fuzz.go#L380-L383),
+    so developers might not notice that a failure is caused by a panic in a
+    custom input type's method.
+
+### `Mutate` method
+
+The `seed` parameter is an `int64`, not an unsigned integer type as is common
+for holding random bits, because that is what
+[`math/rand.NewSource`](https://pkg.go.dev/math/rand#NewSource) takes.
+
+The `Mutate` method must be deterministic to avoid violating [an assumption in
+the coordinator–worker
+protocol](https://github.com/golang/go/blob/0a9875c5c809fa70ae6662b8a38f5f86f648badd/src/internal/fuzz/worker.go#L702-L705).
+This may be relaxed in the future by revising the protocol.
+
+`Mutate` is expected to always succeed. Always returning `nil` helps ensure
+repeatability, which is necessary for the coordinator–worker protocol assumption
+linked above. Despite this, `Mutate` returns an error for a couple of reasons:
+
+  * It discourages the use of `panic`. Panicking is problematic for the reasons
+    described in the `MarshalBinary` rationale above.
+
+  * It may enable advanced use cases when combined with a future removal of the
+    determinism requirement. A hypothetical example: The `Mutate` method could
+    call out to a service that coordinates multiple fuzzing tasks to avoid
+    duplicated effort or employ advanced techniques for exploring the input
+    space. Such queries could regularly fail; enabling the coordinator to
+    gracefully handle the errors improves UX.
+
+Alternatively, the error return value could be omitted for now, and `F.Fuzz`
+extended again in the future to accept another custom input type whose `Mutate`
+method does return an error. This was rejected because the end result is
+unnecessarily messy for little immediate benefit, and any existing custom input
+types that call `panic` would have to be updated to take advantage of the
+improved error handling.
+
+### Minimization
+
+To simplify the initial implementation, input types are not minimizable.
+Minimizability could be added in the future by accepting a type like the
+following and calling its `Minimize` method:
+
+```go
+// A customMinimizingMutator is a customMutator that supports attempts to reduce
+// the size of an interesting value.
+type customMinimizingMutator interface {
+	customMutator
+	// Minimize attempts to produce the smallest value (usually defined as
+	// easiest to process by machine and/or humans) that still provides the same
+	// coverage as the original value. It repeatedly generates candidates,
+	// checking each one for suitability with the given callback. It returns
+	// a suitable candidate if it is satisfied that the candidate is
+	// sufficiently small or nil if it has given up searching.
+	Minimize(seed int64, check func(candidate any) (bool, error)) (any, error)
+}
+```
+
+## Compatibility
+
+No changes in behavior are expected with existing code and seed corpus files.
+
+## Implementation
+
+See https://go.dev/cl/493304 for an initial attempt.
+
+For the initial implementation, a worker will simply panic if one of the custom
+type's methods returns an error. A future change can improve UX by plumbing the
+error.
+
+No particular Go release is targeted.
+
+## Open issues
+
+  * What is the best way to obtain a stable, globally unique, and marshalable
+    identifier from a `reflect.Type`? The `reflect.Type.String` method does not
+    guarantee global uniqueness. See https://go.dev/cl/493304 for an initial
+    attempt.
+
+  * Should `MarshalBinary` not return an error, forcing devs to call `panic` on
+    error? We can always add support for a returned error in the future if
+    desired.
+
+  * Should `Mutate` not return an error, forcing devs to call `panic` on error?
+    We can always add support for a returned error in the future if desired.
+
+  * Should `Mutate` take a `context.Context` in case it wants to be cancelable?
+    (Maybe it wants to send RPCs, or otherwise do something expensive.)