design: add 25530-notary.md

See https://golang.org/design/25530-notary.

For golang/go#25530.

Change-Id: I1b4add8fe1c2f6911e925bafab99eb7418aa67b4
Reviewed-on: https://go-review.googlesource.com/c/proposal/+/165018
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
diff --git a/design/25530-notary.md b/design/25530-notary.md
new file mode 100644
index 0000000..a066921
--- /dev/null
+++ b/design/25530-notary.md
@@ -0,0 +1,418 @@
+# Proposal: Secure the Public Go Module Ecosystem with the Go Notary
+
+Russ Cox\
+Filippo Valsorda
+
+Last updated: February 26, 2019.
+
+[golang.org/design/25530-notary](https://golang.org/design/25530-notary)
+
+Discussion at [golang.org/issue/25530](https://golang.org/issue/25530).
+
+## Abstract
+
+We propose to secure the public Go module ecosystem
+by introducing a new server, the Go notary,
+which serves what is in effect a `go.sum` file
+listing all publicly-available Go modules.
+The `go` command will use this service to fill in gaps
+in its own local `go.sum` files,
+such as during `go get -u`.
+This ensures that unexpected code changes cannot
+be introduced when first adding a dependency to a module
+or when upgrading a dependency.
+
+## Background
+
+When you run `go` `get` `rsc.io/quote@v1.5.2`, `go` `get` first fetches
+`https://rsc.io/quote?go-get=1` and looks for `<meta>` tags. It finds
+
+	<meta name="go-import"
+	      content="rsc.io/quote git https://github.com/rsc/quote">
+
+which tells the code is in a Git repository on `github.com`.
+Next it runs `git clone https://github.com/rsc/quote` to fetch
+the Git repository and then extracts the file tree from the `v1.5.2` tag,
+producing the actual module archive.
+
+Historically, `go` `get` has always simply assumed that it was downloading
+the right code.
+An attacker able to intercept the connection to `rsc.io` or `github.com`
+(or an attacker able to break into one of those systems, or a malicious module author)
+would be able to cause `go` `get` to download different code tomorrow,
+and `go` `get` would not notice.
+
+There are
+[many challenges in using software dependencies safely](https://research.swtch.com/deps),
+and much more vetting should typically be done before taking on a
+new dependency, but no amount of vetting is worth anything
+if the code you download and vet today
+differs from the code you or a collaborator downloads
+tomorrow for the “same” module version.
+We must be able to authenticate whether a particular
+download is correct.
+
+For our purposes, “correct” for a particular module version download
+is defined as the same code everyone else downloads.
+This definition ensures reproducibility of builds
+and makes vetting of specific module versions meaningful,
+without needing to attribute specific archives to
+specific authors,
+and without introducing new potential points of compromise
+like per-author keys.
+(Also, even the author of a module should not be able to change
+the bits associated with a specific version from one day to the next.)
+
+Being able to authenticate a particular module version download
+effectively moves code hosting servers like `rsc.io` and `github.com`
+out of the trusted computing base of the Go module ecosystem.
+With module authentication, those servers could cause availability problems
+by not serving a module version anymore,
+but they cannot substitute different code.
+The introduction of Go module proxies (see `go help goproxy`)
+introduces yet another way for an attacker to intercept module downloads;
+module authentication eliminates the need to trust those proxies as well,
+moving them outside
+[trusted computing base](https://www.microsoft.com/en-us/research/publication/authentication-in-distributed-systems-theory-and-practice/).
+
+See the Go blog post “[Go Modules in 2019](https://blog.golang.org/modules2019)”
+for additional background.
+
+### Module Authentication with `go.sum`
+
+Go 1.11’s preview of Go modules introduced the `go.sum` file,
+which is maintained automatically by the `go` command
+in the root of a module tree
+and contains cryptographic checksums for the content of each
+dependency of that module.
+If a module’s source file tree is obtained unmodified,
+then the `go.sum` file allows authenticating all dependencies
+needed for a build of that module.
+It ensures that tomorrow’s builds will use the same exact
+code for dependencies that today’s builds did.
+Tomorrow’s downloads are authenticated by `go.sum`.
+
+On the other hand, today’s downloads—the ones that add or update
+dependencies in the first place—are not authenticated.
+When a dependency is first added to a module,
+or when a dependency is upgraded to a newer version,
+there is no entry for it in `go.sum`,
+and the `go` command today blindly trusts that it
+downloads the correct code.
+Then it records the hash of that code into `go.sum`
+to ensure that code doesn’t change tomorrow.
+But that doesn’t help the initial download.
+The model is similar to SSH’s
+“[trust on first use](https://en.wikipedia.org/wiki/Trust_on_first_use),”
+and while that approach is an improvement over “trust every time,”
+it’s still not ideal,
+especially since developers typically download new module versions
+far more often than they connect to new, unknown SSH servers.
+
+We are concerned primarily with authenticating downloads
+of publicly-available module versions.
+We assume that the private servers hosting
+private module source code are already within the
+trusted computing base of the developers using that code.
+In contrast, a developer who wants to use `rsc.io/quote`
+should not be required to trust that `rsc.io` is properly secured.
+This trust becomes particularly problematic when summed
+over all dependencies.
+
+What we need is an easily-accessed `go.sum` file listing every
+publicly-available module version.
+But we don’t want to blindly trust a downloaded `go.sum` file,
+since that would become the next attractive target for an attacker.
+
+### Transparent Logs
+
+The [Certificate Transparency](https://www.certificate-transparency.org/) project
+is based on a data structure called a _transparent _log_.
+The transparent log is hosted on a server and made accessible to clients for random access,
+but clients are still able to verify that a particular log record really is in the log
+and also that the server never removes any log record from the log.
+Separately, third-party auditors can iterate over the log
+checking that the entries themselves are accurate.
+These two properties combined mean that
+a client can use records from the log,
+confident that those records will remain available in the log
+for auditors to double-check and report invalid or suspicious entries.
+Clients and auditors can also compare observations to ensure
+that the server is showing the same data to everyone involved.
+
+That is, the log server is not trusted to store the log properly,
+nor is it trusted to put the right records into the log.
+Instead, clients and auditors interact skeptically with the server,
+able to verify for themselves in each interaction
+that the server really is behaving correctly.
+
+For details about the data structure, see Russ Cox’s blog post,
+“[Transparent Logs for Skeptical Clients](https://research.swtch.com/tlog).”
+
+The use of a transparent log for module hashes aligns with
+a broader trend of using transparent logs to enable detection
+of misbehavior by partially trusted systems,
+what the Trillian team calls
+“[General Transparency](https://github.com/google/trillian/#trillian-general-transparency).”
+
+## Proposal
+
+We propose to publish the `go.sum` lines for all publicly-available Go modules
+in a transparent log,
+served by a new server called the Go notary.
+When a publicly-available module is not yet listed in
+the main module’s `go.sum` file,
+the `go` command will fetch the relevant `go.sum` lines
+from the notary instead of trusting the initial download
+to be correct.
+
+### Notary Server
+
+The Go notary will run at `https://notary.golang.org/` and serve the following endpoints:
+
+ - `/latest` will serve a signed tree size and hash for the latest log.
+   
+ - `/lookup/M@V` will serve the log record number for the entry about module M version V,
+   followed by the data for the record.
+   If the module version is not yet recorded in the log, the notary will try to fetch it before replying.
+   Note that the data should never be used without first
+   authenticating it against a signed tree hash.
+
+ - `/record/R` will serve the data for record number R.
+ 
+ - `/tile/H/L/K[.p/W]` will serve a [log tile](https://research.swtch.com/tlog#serving_tiles).
+   The optional `.p/W` suffix indicates a partial log tile with only `W` hashes.
+
+### Proxying a Notary
+
+A module proxy can also proxy requests to the notary.
+The general proxy URL form is `<proxyURL>/notary/<notaryURL>`.
+If `GOPROXY=https://proxy.site` then the latest signed tree would be fetched using
+`https://proxy.site/notary/notary.golang.org/latest`.
+Including the full notary URL allows a transition to a new notary log,
+such as `notary.golang.org/v2`.
+
+Before accessing any notary URL using a proxy,
+the proxy client should first fetch `<proxyURL>/notary/supported`.
+If that request returns a successful (HTTP 200) response,
+then the proxy supports proxying notary requests.
+In that case, the client should use the proxied notary only,
+never falling back to a direct connection to the notary.
+If the `/notary/supported` check fails with a “not found” (HTTP 404) response,
+the proxy is unwilling to proxy the notary,
+and the client should connect directly to the notary.
+Any other response is treated as the notary being unavailable.
+
+A corporate proxy may want to ensure that clients
+never make any direct notary connections
+(for example, for privacy; see the “Rationale” section below).
+The optional `/notary/supported` endpoint, along with
+proxying actual notary requests, lets such a proxy
+ensure that a `go` command using the proxy
+never makes a direct connection to notary.golang.org.
+But simpler proxies may wish to focus on serving
+only modules and not notary data—in particular,
+module-only proxies can be served from entirely static file systems,
+with no special infrastructure at all.
+Such proxies can respoond with an HTTP 404 to
+the `/notary/supported` endpoint, so that clients
+will connect to the notary directly.
+
+### `go` command client
+
+The `go` command is the primary consumer of the notary’s published log.
+The `go` command will [verify the log](https://research.swtch.com/tlog#verifying_a_log)
+as it uses it,
+ensuring that every record it reads is actually in the log
+and that no observed log ever drops a record from an earlier observed log.
+
+The `go` command will store the notary’s public key in 
+`$GOROOT/lib/notary/notary.cfg`.
+That file will also contain the default starting signed tree size and tree hash,
+updated with each major release.
+
+The `go` command will then cache the latest signed tree size and tree hash
+in `$GOPATH/pkg/notary/notary.golang.org/latest`.
+It will cache tiles in `$GOPATH/pkg/mod/download/cache/notary/notary.golang.org/tile/H/L/K[.W]`.
+These two different locations let `go clean -modcache` delete any cached tiles as well,
+but no `go` command (only a manual `rm -rf $GOPATH/pkg`)
+will wipe out the memory of the latest observed tree size and hash.
+If the `go` command ever does observe a pair of inconsistent signed tree sizes and hashes,
+it will complain loudly on standard error and fail the build.
+
+The `go` command must be configured to know which modules are
+publicly available and therefore can be verified by the notary,
+versus those that are closed source and must not be verified,
+especially since that would transmit potentially private import paths
+over the network to the notary `/lookup` endpoint.
+A few new environment variables control this configuration.
+(See the [`go env -w` proposal](https://golang.org/design/30411-env)
+for a way to manage these variables more easily.)
+
+- `GOPROXY=https://proxy.site/path` sets the Go module proxy to use, as before.
+
+- `GONOPROXY=prefix1,prefix2,prefix3` sets a list of module path prefixes,
+  possibly containing globs, that should not be proxied.
+  For example:
+  
+      GONOPROXY=*.corp.google.com,rsc.io/private
+
+  will bypass the proxy for the modules foo.corp.google.com, foo.corp.google.com/bar, rsc.io/private, and rsc.io/private/bar,
+  though not rsc.io/privateer (the patterns are path prefixes, not string prefixes).
+
+- `GONOVERIFY=prefix1,prefix2,prefix3` sets a list of module path prefixes,
+   again possibly containing globs, that should not be verified using the notary.
+   
+   We expect that corporate environments may fetch all modules, public and private,
+   through an internal proxy;
+   `GONOVERIFY` allows them to disable notary-based verification of
+   internal modules while still verifying public modules.
+   Therefore, `GONOVERIFY` must not imply `GONOPROXY`.
+
+   We also expect that other users may prefer to connect directly to source origins
+   but still want verification of open source modules or proxying of the notary itself;
+   `GONOPROXY` allows them to arrange that and therefore must not imply `GONOVERIFY`.
+
+The notary not being able to report `go.sum` lines for a module version
+is a hard failure:
+any private modules must be explicitly listed in `$GONOVERIFY`.
+(Otherwise an attacker could block traffic to the notary
+and make all module versions appear to verify.)
+The notary can be disabled entirely with `GONOVERIFY=*`.
+The command `go get -insecure` will report but not stop after notary failures.
+
+## Rationale
+
+The motivation for authenticating module downloads is
+covered in the background section above.
+Note that we want to authenticate modules
+obtained both from direct connections to code-hosting servers
+and from module proxies.
+
+Two topics are worth further discussion:
+first, having a single notary service for the entire Go ecosystem,
+and second, the privacy implications of a notary.
+
+### One Notary
+
+The Go team at Google will run the Go notary as a service to the Go ecosystem,
+similar to running `godoc.org` and `golang.org`.
+There is no plan to allow use of alternate notaries,
+which would add complexity and potentially reduce the overall
+security of the system,
+allowing different users to be attacked by compromising different notaries.
+
+We originally considered having multiple notaries
+signing individual `go.sum` entries and
+requiring the `go` command to collect signatures
+from a quorum of notaries before accepting an entry.
+That design depended on the uptime of multiple services
+and could still be compromised undetectably by
+compromising enough notaries.
+That is, that design would blindly trust a quorum of notaries.
+
+The design presented here uses the transparent log
+eliminates blind trust in a quorum of notaries
+and instead uses a “trust but verify” model with
+a single notary.
+In this design, the notary’s published `go.sum` lines
+are accepted by the `go` command client,
+but the published lines are also verifiably preserved
+for auditing by any interested third party.
+In fact, we hope that proxies run by various
+organizations in the Go community will serve as auditors
+and double-check Go notary log entries
+as part of their ordinary operation.
+Another useful
+service that could be enabled by
+the notary is a notification service to alert
+authors about new versions of their own modules.
+
+### Privacy
+
+Contacting the Go notary to authenticate a new dependency
+requires sending the module path and version to the notary.
+There are two potential privacy concerns.
+First, a misconfigured `go` command might send
+the names of private module paths 
+(for example, `rsc.io/private/secret-plan`)
+to the notary.
+The notary would try to fetch the module and fail,
+but the path would have been exposed in the network traffic.
+Second, even using only public modules,
+there might be a concern that contacting the notary
+at all would expose information about how popular particular modules
+are in a particular organization (or at least in a particular client IP block).
+
+The design addresses these two privacy concerns
+in two ways: with both a lightweight, partial solution
+for each and a heavier, complete solution.
+
+The lightweight, partial solution for a misconfigured `go` command
+that asks the notary about a non-public module
+is to make it fail as loudly as possible.
+If the `go` command is configured to ask the notary
+about a particular module, and the notary cannot return
+information about that module, the download fails
+and the `go` command stops.
+This ensures both that all public modules are in fact
+authenticated and also that any misconfiguration
+must be corrected (by setting `$GONOVERIFY` to avoid
+the notary for those private modules)
+in order to achieve a successful build.
+This way, the frequency of misconfiguration should be minimized.
+
+The lightweight, partial solution for exposing information about
+module usage is to only contact the notary when there is not
+already an entry in `go.sum`. If a module version is already listed
+in `go.sum`, it is assumed to be correct, with no notary interaction.
+This allows authentication of previously-downloaded private
+modules and also ensures that only the first use of a new module
+version is exposed to the notary.
+
+These lightweight solutions are meant to make the notary
+usable out of the box for most Go developers.
+If there are additional lightweight solutions that can be adopted
+to further reduce privacy concerns,
+we would be happy to consider them.
+
+The heavier, complete solution for notary privacy concerns
+is for developers to put their usage behind a proxy,
+such as a local Athens instance or JFrog’s GoCenter,
+assuming those proxies add support for proxying and
+caching the Go notary service endpoints.
+(Those endpoints are designed to be highly cacheable
+for exactly this reason, and a proxy with a full copy
+of the notary log doesn’t have to leak any information
+about what modules are in use, at the cost of maintaining
+its own index to answer lookup requests.)
+We anticipate that there will be many proxies available
+for use in the Go ecosystem.
+Part of the motivation for the Go notary is to allow
+the use of any available proxy to download modules,
+without any reduction in security.
+Developers can then use any proxy they are comfortable using,
+or run their own.
+
+## Compatibility
+
+The introduction of the notary does not have any compatibility
+concerns at the command or language level.
+However, proxies that serve modified copies of public modules
+will be incompatible with the notary and stop being usable.
+This is by design: such proxies are indistinguishable from man-in-the-middle attacks.
+
+## Implementation
+
+The Go team at Google is working on a production implementation
+of both a Go module proxy and the Go notary,
+as we described in the blog post “[Go Modules in 2019](https://blog.golang.org/modules2019).”
+
+We will publish a notary client as part of the `go` command,
+as well as an example notary implementation.
+We intend to ship support for the notary, enabled by default, in Go 1.13.
+
+Russ Cox will lead the `go` command integration
+and has posted a [stack of changes in golang.org/x/exp/notary](https://go-review.googlesource.com/q/f:notary).
+
+