design/draft-fuzzing.md: update fuzzing draft design This updates the draft design to more closely match the more recent decisions during implementation. Change-Id: I716f4e07431612bcf15fcbde8409e8db5b3b37a9 Reviewed-on: https://go-review.googlesource.com/c/proposal/+/289809 Trust: Katie Hockman <katie@golang.org> Run-TryBot: Katie Hockman <katie@golang.org> Reviewed-by: Jay Conrod <jayconrod@google.com> Reviewed-by: Ian Lance Taylor <iant@golang.org>

commit: a3847975cfb389756685c9fec9db6cfa8732a56e [log] [tgz]
author: Katie Hockman <katie@golang.org> Thu Feb 04 16:35:00 2021 -0500
committer: Katie Hockman <katie@golang.org> Mon Feb 08 19:22:25 2021 +0000
tree: c34da05b00a38d83d61252d65723754368c0b5d2
parent: abbf42e235ddc8819c792f0053261ecec186699c [diff]
diff --git a/design/draft-fuzzing.md b/design/draft-fuzzing.md
index f58b574..3870587 100644
--- a/design/draft-fuzzing.md
+++ b/design/draft-fuzzing.md

@@ -138,7 +138,7 @@
 
 ## Implementation
 
-There are several components to this proposal which are described below.
+There are several components to this design draft which are described below.
 The big pieces to be supported in the MVP are: support for fuzzing built-in
 types, structs, and types which implement the BinaryMarshaler and
 BinaryUnmarshaler interfaces or the TextMarshaler and TextUnmarshaler
@@ -169,8 +169,8 @@
 [mutation engine](#fuzzing-engine-and-mutator) if fuzzing.
 The testing portion of the fuzz target is a function within an `f.Fuzz`
 invocation.
-This function runs like a standard unit test with `testing.T` for each input in
-the seed corpus.
+This function runs much like a standard unit test with `testing.T` for each
+input in the seed corpus.
 If the developer is fuzzing this target with the new `-fuzz` flag with `go
 test`, then a [generated corpus](#generated-corpus) will be managed by the
 fuzzing engine, and a mutator will generate new inputs to run against the
@@ -182,14 +182,12 @@
 ```
 func FuzzMarshalFoo(f *testing.F) {
 	// Seed the initial corpus
-	inputs := []string{"cat", "DoG", "!mouse!"}
-	for _, input := range inputs {
-		f.Add(input, big.NewInt(0))
-	}
+    f.Add("cat", big.NewInt(1341))
+    f.Add("!mouse", big.NewInt(0))
 
 	// Run the fuzz test
 	f.Fuzz(func(t *testing.T, a string, num *big.Int) {
-		t.Parallel()
+		t.Parallel() // seed corpus tests can run in parallel
 		if num.Sign() <= 0 {
 			t.Skip() // only test positive numbers
 		}
@@ -226,10 +224,11 @@
 // those in the Fuzz function.
 func (f *F) Add(args ...interface{})
 
-// Fuzz runs the fuzz function, ff, for fuzz testing. It runs ff in a separate
-// goroutine. Only one call to Fuzz is allowed per fuzz target, and any
-// subsequent calls will panic. If ff fails for a set of arguments, those
-// arguments will be added to the seed corpus.
+// Fuzz runs the fuzz function, ff, for fuzz testing. While fuzzing with -fuzz,
+// the fuzz target and ff may be run in multiple worker processes that don't
+// share global state within the process. Only one call to Fuzz is allowed per
+// fuzz target, and any subsequent calls will panic. If ff fails for a set of
+// arguments, those arguments will be added to the seed corpus.
 func (f *F) Fuzz(ff interface{})
 ```
 
@@ -238,20 +237,26 @@
 A fuzz target has two main components: 1) seeding the corpus and 2) the `f.Fuzz`
 function which is executed for items in the corpus.
 
-1.  The corpus generation is done first, and builds a seed corpus with
-    interesting input values for the fuzz test.  This should be fairly quick,
-    thus able to run before the fuzz testing begins, every time it’s run. These
-    inputs are run by default with `go test`.
-1.  The `f.Fuzz(...)` function is executed for each item in the corpus. If this
-    target is being fuzzed, then new inputs will be generated and continously
-    tested against the `f.Fuzz(...)` function.
+1.  Defining the seed corpus and any necessary setup work is done before the
+    `f.Fuzz` function, to prepare for fuzzing.
+    These inputs, as well as those in `testdata/corpus/FuzzTarget`, are run by
+    default with `go test`.
+1.  The `f.Fuzz(...)` function is executed for each item in the seed corpus.
+    If this target is being fuzzed, then new inputs will be generated and
+    continously tested using the `f.Fuzz(...)` function.
 
-The arguments to `f.Add(...)` and the function in `f.Fuzz(...)` must be the same
-type within the target, and there must be at least one argument specified.
+The arguments to `f.Add(...)` and the fuzzing arguments in the `f.Fuzz` function
+must be the same type within the target, and there must be at least one argument
+specified.
 This will be ensured by a vet check.
+
 Fuzzing of built-in types (e.g. simple types, maps, arrays) and types which
 implement the BinaryMarshaler and TextMarshaler interfaces are supported.
 
+In the future, structs that do not implement the BinaryMarshaler and
+TextMarshaler interfaces may be supported by building them based on their
+exported fields.
+
 Interfaces, functions, and channels are not appropriate types to fuzz, so will
 never be supported.
 
@@ -264,50 +269,77 @@
 identified by the fuzzing engine.
 This set of inputs is also used to “seed” the corpus used by the fuzzing engine
 when mutating inputs to discover new code coverage.
+A good seed corpus can save the mutation engine a lot of work (for example
+adding a new key type to a key parsing function).
 
-The seed corpus can be populated programmatically using `f.Add` within the
-fuzz target.
-Programmatic seed corpuses make it easy to add new entries when support for new
-things are added (for example adding a new key type to a key parsing function)
-saving the mutation engine a lot of work.
-These can also be more clear for the developer when they break the build when
-something changes.
+Each fuzz target will always look in the package’s `testdata/corpus/FuzzTarget`
+directory for an existing seed corpus to use, if one exists.
+New crashes will also be written to this directory.
 
-The fuzz target will always look in the package’s testdata/ directory for an
-existing seed corpus to use as well, if one exists.
-This seed corpus will be in a directory of the form `testdata/<target_name>`,
-with a file for each unit that can be unmarshaled for testing.
+The seed corpus can be populated programmatically using `f.Add` within the fuzz
+target.
 
 _Examples:_
 
-1: A fuzz target’s `f.Fuzz` function takes three arguments
-
-```
-f.Fuzz(func(t *testing.T, a string, b myStruct, num *big.Int) {...})
-
-type myStruct struct {
-    A, B string
-    num int
-}
-```
-
-In this example, string is a built-in type, so can be decoded directly.
-`*big.Int` implements `UnmarshalText`, so can also be unmarshaled directly.
-However, `myStruct` does not implement `UnmarshalBinary` or `UnmarshalText` so
-the struct is pieced together recursively from its exported types. That would
-mean two sets of bytes will be written for this type, one for each of A and B.
-In total, four files would be written, and four inputs can be mutated when
-fuzzing.
-
-2:  A fuzz target’s `f.Fuzz` function takes a single `[]byte`
+1:  A fuzz target’s `f.Fuzz` function takes a single `[]byte`.
 
 ```
 f.Fuzz(func(t *testing.T, b []byte) {...})
 ```
 
-This is the typical “non-structured fuzzing” approach.
-There is only one set of bytes to be provided by the mutator, so only one file
-will be written.
+This is the typical “non-structured fuzzing” approach, and only the single
+[]byte will be mutated while fuzzing.
+
+2: A fuzz target’s `f.Fuzz` function takes two arguments.
+
+```
+f.Fuzz(func(t *testing.T, a string, num *big.Int) {...})
+```
+
+This example uses string, which is a built-in type, and as such can be decoded directly.
+`*big.Int` implements `UnmarshalText`, so can also be unmarshaled using that
+method.
+The mutator will alter the bytes of both the string and the *big.Int while
+seeking new code coverage.
+
+### Corpus file encoding
+
+The `testdata/corpus` directory will hold corpus files which act as the seed
+corpus as well as a set of regression tests for identified crashers.
+Corpus files must be encoded to support multiple fuzzing arguments.
+
+The first line of the corpus file indicates the encoding "version" of this file,
+e.g. "version 1". This is to indicate how the file was encoded, which allows for
+new, improved encodings in the future.
+
+For version 1, each subsequent line represents the value of each type making up
+the corpus entry. Each line is copy-pastable directly into Go code. The only
+case where the line would require editing is for imported struct types, in which
+case the import path would be removed when used in code.
+
+For example:
+```
+encV1
+float(45.241)
+int(12345)
+[]byte("ABC\xa8\x8c\xb3G\xfc")
+example.com/foo.Bar.UnmarshalText("\xfe\x99Uh\xb4\xe29\xed")
+```
+
+A tool will be provided that can convert between binary files and corpus files
+(in both directions).
+This tool would serve two main purposes.
+It would allow binary files, such as images, or files from other fuzzers, to be
+ported over into seed corpus for Go fuzzing.
+It would also convert otherwise indecipherable hex bytes into a binary format
+which may be easier to read and edit.
+
+To make it easier to understand new crashes, each crash found by the fuzzing
+engine will be written to a binary file in $GOCACHE.
+This file should not be checked in, as the crash will have already been written
+to a corpus file in testdata within the module.
+Instead, this file is a way to quickly get an idea about the input which caused
+the crash, without requiring a tool to decode it.
 
 ### Fuzzing Engine and Mutator
 
@@ -345,56 +377,49 @@
 or
 <code>[UnmarshalText](https://pkg.go.dev/encoding?tab=doc#TextUnmarshaler)</code>
 if implemented on the type.
-If building a struct, it can also build exported fields recursively as needed.
+In the future, it may support fuzzing struct types which don't implement these
+marshalers by building it through its exported fields.
 
 #### Generated corpus
 
 A generated corpus will be managed by the fuzzing engine and will live outside
-the module.
-New items can be added to this corpus in several ways, e.g. as part of the seed
-corpus, or by the fuzzing engine (e.g. because of new code coverage).
+the module in a subdirectory of $GOCACHE.
+This generated corpus will grow as the fuzzing engine discovers new coverage.
 
 The details of how the corpus is built and processed should be unimportant to
 users.
 This should be a technical detail that developers don’t need to understand in
 order to seed a corpus or write a fuzz target.
-Any existing files that a developer wants to include in the fuzz test should be
-added to the seed corpus directory, `testdata/<target_name>`.
-
-
-#### Minification + Pruning
-
-Corpus entries will be minified to the smallest input that causes the failure
-where possible, and pruned wherever possible to remove corpus entries that don’t
-add additional coverage.
-If a developer manually adds input files to the corpus directory, the fuzzing
-engine may change the file names in order to help with this.
+Any existing files that a developer wants to include in the fuzz test may be
+added to the seed corpus.
 
 ### Crashers
 
-A **crasher** is a panic or failure in `f.Fuzz(...)`, or a race condition.
+A **crasher** is a panic or failure in `f.Fuzz(...)`, or a race condition, which
+was found while fuzzing.
 By default, the fuzz target will stop after the first crasher is found, and a
 crash report will be provided.
 Crash reports will include the inputs that caused the crash and the resulting
 error message or stack trace.
-The crasher inputs will be written to the package's testdata/ directory as a
-seed corpus entry.
+The crasher inputs will be written to the package's testdata/corpus directory as
+after being minified where possible.
 
-Since this crasher is added to testdata/, which will then be run by default as
-part of the seed corpus for the fuzz target, this can act as a test for the new
-failure.
+Since this crasher is added to testdata/corpus, which will then be run by
+default as part of the seed corpus for the fuzz target, this can act as a test
+for the new failure.
 A user experience may look something like this:
 
 1.  A user runs `go test -fuzz=FuzzFoo`, and a crasher is found while fuzzing.
-1.  The arguments that caused the crash are added to a testdata directory within
-    the package automatically.
-1.  A subsequent run of `go test` (even without `-fuzz=FuzzFoo`) will then hit
-    this newly discovering failing condition, and continue to fail until the bug
-    is fixed.
+1.  The arguments that caused the crash are added to the testdata/corpus
+    directory of that package.
+1.  A subsequent run of `go test` (without needing `-fuzz=FuzzFoo`) will then
+    reproduce this crash, and continue to fail until the bug is fixed.
+    A user could also run `go test -run=FuzzFoo/<filename>` to only run a
+    specific file in the testdata/corpus directory when debugging.
 
 ### Go command
 
-Fuzz testing will only be supported in module mode, and if run in GOPATH mode
+Fuzz testing will only be supported in module mode, and if run in GOPATH mode,
 the fuzz targets will be ignored.
 
 Fuzz targets will be in *_test.go files, and can be in the same file as Test and
@@ -403,12 +428,9 @@
 not need to be in any fuzz-specific directory or have a fuzz-specific file name
 or build tag.
 
-A new environment variable will be added, `$GOFUZZCACHE`, which will default to
-an appropriate cache directory on the developer's machine.
-This directory will hold the generated corpus.
-For example, the corpus for each fuzz target will be managed in a subdirectory
-called `<module_name>/<pkg>/@corpus/<target_name>` where `<module_name>` will
-follow module case-encoding and include the major version.
+The generated corpus will be in a new directory within `$GOCACHE`, in the form
+$GOCACHE/fuzz/$pkg/$test/$name, where $pkg is the package path containing the
+fuzz target, $test is the target name, and $name is the name of the file.
 
 The default behavior of `go test` will be to build and run the fuzz targets
 using the seed corpus only.
@@ -437,11 +459,21 @@
 -keepfuzzing
     Keep running the target if a crasher is found. (default false)
 -parallel
-    Allow parallel execution of f.Fuzz functions that call f.Parallel.
-    The value of this flag is the maximum number of f.Fuzz functions to run
-    simultaneously within the given fuzz target. (default GOMAXPROCS)
+    Allow parallel execution of f.Fuzz functions that call t.Parallel when
+    running the seed corpus.
+    While fuzzing with -fuzz, the value of this flag is the maximum number of
+    workers to run the fuzz function simultaneously; by default, it is set to
+    the value of GOMAXPROCS.
+    Note that -parallel only applies within a single test binary.
 -race
-    Enable data race detection while fuzzing. (default true)
+    Enable data race detection while fuzzing. (default false)
+-run
+    Run only those tests, examples, and fuzz targets matching the regular
+    expression.
+    For testing a single seed corpus entry for a target, the regular
+    expression can be in the form $target/$name, where $target is the name of
+    the fuzz target, and $name is the name of the file (ignoring file
+    extensions) to run.
 ```
 
 `go test` will not respect `-p` when running with `-fuzz`, as it doesn't make
@@ -449,33 +481,6 @@
 
 ## Open issues and future work
 
-### Naming scheme for corpus files
-
-There are several naming schemes for the corpus files which may be appropriate,
-and the final decision is still undecided.
-
-Take the following example:
-
-```
-f.Fuzz(func(t *testing.T, a string, b myStruct, num *big.Int) {...})
-
-type myStruct struct {
-    A, B string
-    num int
-}
-```
-
-For two corpus entries, this could be structured as follows:
-* 0000001.string
-* 0000001.myStruct.string
-* 0000001.myStruct.string
-* 0000001.big_int
-* 0000002.string
-* 0000002.myStruct.string
-* 0000002.myStruct.string
-* 0000002.big_int
-
-
 ### Options
 
 There are options that developers often need to fuzz effectively and safely.
commit	a3847975cfb389756685c9fec9db6cfa8732a56e	[log] [tgz]
author	Katie Hockman <katie@golang.org>	Thu Feb 04 16:35:00 2021 -0500
committer	Katie Hockman <katie@golang.org>	Mon Feb 08 19:22:25 2021 +0000
tree	c34da05b00a38d83d61252d65723754368c0b5d2
parent	abbf42e235ddc8819c792f0053261ecec186699c [diff]