blob: c1a50510900e629e017f3a5e49d2b35d05f39fe5 [file] [log] [blame] [view]
# Proposal: Custom Fuzz Input Types
Author: Richard Hansen <rhansen@rhansen.org>
Last updated: 2023-06-08
Discussion at https://go.dev/issue/48815.
## Abstract
Extend [`testing.F.Fuzz`](https://pkg.go.dev/testing#F.Fuzz) to support custom
types, with their own custom mutation logic, as input parameters. This enables
developers to perform [structure-aware
fuzzing](https://github.com/google/fuzzing/blob/master/docs/structure-aware-fuzzing.md).
## Background
As of Go 1.20, `testing.F.Fuzz` only accepts fuzz functions that have basic
parameter types: `[]byte`, `string`, `int`, etc. Custom input types with custom
mutation logic would make it easier to fuzz functions that take complex data
structures as input.
It is technically possible to fuzz such functions using the basic types, but the
benefit is limited:
* A basic input type can be used as a pseudo-random number generator seed to
generate a valid structure at test time. Downsides:
* The seed, not the generated structure, is saved in
`testdata/fuzz/FuzzTestName/*`. This makes it difficult for developers
to examine the structure to figure out why it is interesting. It also
means that a minor change to the structure generation algorithm can
invalidate the entire seed corpus.
* A problematic or interesting structure discovered or created outside of
fuzzing cannot be added to the seed corpus.
* `F.Fuzz` cannot distinguish the structure generation code from the code
under test, so the structure generation code is instrumented and
included in `F.Fuzz`'s analysis. This causes unnecessary slowdowns and
false positives (uninteresting inputs treated as interesting due to
changed coverage).
* `F.Fuzz` has limited ability to explore or avoid "similar" inputs in its
pursuit of new execution paths. (Similar seeds produce pseudo-randomly
independent structures.)
* Multiple input values can be used to populate the fields of the complex
structure. This has many of the same downsides as using a single seed
input.
* Raw input values can be cast as (an encoding of) the complex structure. For
example, a `[]byte` input could be interpreted as a protobuf. Depending on
the specifics, the yield of this approach (the number of bugs it finds) is
likely to be low due to the low probability of generating a syntactically
and semantically valid structure. (Sometimes it is important to attempt
invalid structures to exercise error handling and discover security
vulnerabilities, but this does not apply to function call traces that are
replayed to test a stateful system.)
See [Structure-Aware Fuzzing with
libFuzzer](https://github.com/google/fuzzing/blob/master/docs/structure-aware-fuzzing.md)
for additional background.
## Proposal
Extend `testing.F.Fuzz` to accept fuzz functions with parameter types that
implement the following interface (not exported, just documented in
`testing.F.Fuzz`):
```go
// A customMutator is a fuzz input value that is self-mutating. This interface
// extends the encoding.BinaryMarshaler and encoding.BinaryUnmarshaler
// interfaces.
type customMutator interface {
// MarshalBinary encodes the customMutator's value in a platform-independent
// way (e.g., JSON or Protocol Buffers).
MarshalBinary() ([]byte, error)
// UnmarshalBinary restores the customMutator's value from encoded data
// previously returned from a call to MarshalBinary.
UnmarshalBinary([]byte) error
// Mutate pseudo-randomly transforms the customMutator's value. The mutation
// must be repeatable: every call to Mutate with the same starting value and
// seed must result in the same transformed value.
Mutate(ctx context.Context, seed int64) error
}
```
Also extend the seed corpus file format to support custom values. A line for a
custom value has the following form:
```
custom("type identifier here", []byte("marshal output here"))
```
The type identifier is a globally unique and stable identifier derived from the
value's fully qualified type name, such as `"*example.com/mod/pkg.myType"`.
### Example Usage
```go
package pkg_test
import (
"encoding/json"
"testing"
"github.com/go-loremipsum/loremipsum"
)
type fuzzInput struct{ Word string }
func (v *fuzzInput) MarshalBinary() ([]byte, error) { return json.Marshal(v) }
func (v *fuzzInput) UnmarshalBinary(d []byte) error { return json.Unmarshal(d, v) }
func (v *fuzzInput) Mutate(ctx context.Context, seed int64) error {
v.Word = loremipsum.NewWithSeed(seed).Word()
return nil
}
func FuzzInput(f *testing.F) {
f.Fuzz(func(t *testing.T, v *fuzzInput) {
if v.Word == "lorem" {
t.Fatal("boom!")
}
})
}
```
The fuzzer eventually encounters an input value that causes the test function to
fail, and produces a seed corpus file in `testdata/fuzz` like the following:
```
go test fuzz v1
custom("*example.com/mod/pkg_test.fuzzInput", []byte("{\"Word\":\"lorem\"}"))
```
## Rationale
### Private interface
The `customMutator` interface is not exported for a few reasons:
* Exporting is not strictly required because it does not appear anywhere
outside of internal logic.
* It can be easily exported in the future if needed. The opposite is not true:
un-exporting requires a major version change.
* [YAGNI](https://en.wikipedia.org/wiki/You_aren%27t_gonna_need_it): Users are
unlikely to want to declare anything with that type. One possible exception
is a compile-time type check such as the following:
```go
var _ testing.CustomMutator = (*myType)(nil)
```
Such a check is unlikely to have much value: the code is likely being
compiled because tests are about to run, and `testing.F.Fuzz`'s runtime
check will immediately catch the bug.
* Exporting now would add friction to extending `testing.F.Fuzz` again in the
future. Should the new interface be exported even if doing so doesn't add
much value beyond consistency?
### `MarshalBinary`, `UnmarshalBinary` methods
`Marshal` and `Unmarshal` would be shorter to type than `MarshalBinary` and
`UnmarshalBinary`, but the longer names make it easier to extend existing types
that already implement the `encoding.BinaryMarshaler` and
`encoding.BinaryUnmarshaler` interfaces.
`MarshalText` and `UnmarshalText` were considered but rejected because the most
natural representation of a custom type might be binary, not text.
`UnmarshalBinary` is used both to load seed corpus files from disk and to
transmit input values between the coordinator and its workers. Unmarshaling
malformed data from disk is allowed to fail, but unmarshaling after
transmission to another process is expected to always succeed.
`MarshalBinary` is used both to save seed corpus files to disk and to transmit
input values between the coordinator and its workers. Marshaling is expected to
always succeed. Despite this, it returns an error for several reasons:
* to implement the `encoding.BinaryMarshaler` interface
* for symmetry with `UnmarshalBinary`
* to match the APIs provided by packages such as `encoding/json` and
`encoding/gob`
* to discourage the use of `panic`
Panicking is especially problematic because:
* The coordinator process currently interprets a panic as a bug in the code
under test, even if it happens outside of the test function.
* Worker process stdout and stderr is currently suppressed, presumably to
[reduce the amount of output
noise](https://github.com/golang/go/blob/aa4d5e739f32397969fd5c33cbc95d316686039f/src/testing/fuzz.go#L380-L383),
so developers might not notice that a failure is caused by a panic in a
custom input type's method.
### `Mutate` method
The `seed` parameter is an `int64`, not an unsigned integer type as is common
for holding random bits, because that is what
[`math/rand.NewSource`](https://pkg.go.dev/math/rand#NewSource) takes.
The `Mutate` method must be repeatable to avoid violating [an assumption in the
coordinator–worker
protocol](https://github.com/golang/go/blob/0a9875c5c809fa70ae6662b8a38f5f86f648badd/src/internal/fuzz/worker.go#L702-L705).
This may be relaxed in the future by revising the protocol.
Some alternatives for the `Mutate` method were considered:
* `Mutate()`: Simplest, but the lack of a seed parameter makes it difficult
to satisfy the repeatability requirement.
* `Mutate(seed int64)`: Simple. Naturally hints to developers that the method
is expected to be fast, repeatable, and error-free, which increases the
effectiveness of fuzzing. Adding a context parameter or error return value
(or both) might be YAGNI, but their absence makes complex mutation
operations more difficult to implement. The lack of an error return value
encourages the use of `panic`, which is problematic for the reasons
discussed in the `MarshalBinary` rationale above.
* `Mutate(seed int64) error`: The error return value discourages the use of
`panic`, and enables better dev UX when debugging complex mutation
operations.
* `Mutate(ctx context.Context, seed int64) error`: The context makes this more
future-proof by enabling advanced techniques once the repeatability
requirement is removed. For example, `Mutate` could send an RPC to a service
that feeds automatic crash report data to fuzzing tasks to increase the
likelihood of encountering an interesting value. The context parameter and
error return value might be YAGNI, but the added implementation complexity
and developer cognitive load is believed to be minor enough to not worry
about it (they can be ignored in most use cases).
* Accept both `Mutate(seed int64)` and `Mutate(ctx context.Context, seed
int64) error`: The second of the two can be added later after accumulating
additional feedback from developers. Supporting both might result in
unnecessary complexity.
Because mutation operations on custom types are expected to be somewhat complex
(otherwise a basic type would probably suffice), the `Mutate(ctx
context.Context, seed int64) error` option is believed to be the best choice.
### Minimization
To simplify the initial implementation, input types are not minimizable.
Minimizability could be added in the future by accepting a type like the
following and calling its `Minimize` method:
```go
// A customMinimizingMutator is a customMutator that supports attempts to reduce
// the size of an interesting value.
type customMinimizingMutator interface {
customMutator
// Minimize attempts to produce the smallest value (usually defined as
// easiest to process by machine and/or humans) that still provides the same
// coverage as the original value. It repeatedly generates candidates,
// checking each one for suitability with the given callback. It returns
// a suitable candidate if it is satisfied that the candidate is
// sufficiently small or nil if it has given up searching.
Minimize(seed int64, check func(candidate any) (bool, error)) (any, error)
}
```
## Compatibility
No changes in behavior are expected with existing code and seed corpus files.
## Implementation
See https://go.dev/cl/493304 for an initial attempt.
For the initial implementation, a worker can simply panic if one of the custom
type's methods returns an error. A future change can improve UX by plumbing the
error.
No particular Go release is targeted.