| # Proposal: Secure the Public Go Module Ecosystem |
| |
| Russ Cox\ |
| Filippo Valsorda |
| |
| Last updated: April 24, 2019. |
| |
| [golang.org/design/25530-sumdb](https://golang.org/design/25530-sumdb) |
| |
| Discussion at [golang.org/issue/25530](https://golang.org/issue/25530). |
| |
| ## Abstract |
| |
| We propose to secure the public Go module ecosystem |
| by introducing a new server, the Go checksum database, |
| which serves what is in effect a `go.sum` file |
| listing all publicly-available Go modules. |
| The `go` command will use this service to fill in gaps |
| in its own local `go.sum` files, |
| such as during `go get -u`. |
| This ensures that unexpected code changes cannot |
| be introduced when first adding a dependency to a module |
| or when upgrading a dependency. |
| |
| The original name for the Go checksum database was “the Go notary,” |
| but we have stopped using that name to avoid confusion |
| with the CNCF Notary project, itself written in Go, |
| not to mention the Apple Notary. |
| |
| ## Background |
| |
| When you run `go` `get` `rsc.io/quote@v1.5.2`, `go` `get` first fetches |
| `https://rsc.io/quote?go-get=1` and looks for `<meta>` tags. It finds |
| |
| <meta name="go-import" |
| content="rsc.io/quote git https://github.com/rsc/quote"> |
| |
| which tells it the code is in a Git repository on `github.com`. |
| Next it runs `git clone https://github.com/rsc/quote` to fetch |
| the Git repository and then extracts the file tree from the `v1.5.2` tag, |
| producing the actual module archive. |
| |
| Historically, `go` `get` has always simply assumed that it was downloading |
| the right code. |
| An attacker able to intercept the connection to `rsc.io` or `github.com` |
| (or an attacker able to break into one of those systems, or a malicious module author) |
| would be able to cause `go` `get` to download different code tomorrow, |
| and `go` `get` would not notice. |
| |
| There are |
| [many challenges in using software dependencies safely](https://research.swtch.com/deps), |
| and much more vetting should typically be done before taking on a |
| new dependency, but no amount of vetting is worth anything |
| if the code you download and vet today |
| differs from the code you or a collaborator downloads |
| tomorrow for the “same” module version. |
| We must be able to authenticate whether a particular |
| download is correct. |
| |
| For our purposes, “correct” for a particular module version download |
| is defined as the same code everyone else downloads. |
| This definition ensures reproducibility of builds |
| and makes vetting of specific module versions meaningful, |
| without needing to attribute specific archives to |
| specific authors, |
| and without introducing new potential points of compromise |
| like per-author keys. |
| (Also, even the author of a module should not be able to change |
| the bits associated with a specific version from one day to the next.) |
| |
| Being able to authenticate a particular module version download |
| effectively moves code hosting servers like `rsc.io` and `github.com` |
| out of the trusted computing base of the Go module ecosystem. |
| With module authentication, those servers could cause availability problems |
| by not serving a module version anymore, |
| but they cannot substitute different code. |
| The introduction of Go module proxies (see `go help goproxy`) |
| introduces yet another way for an attacker to intercept module downloads; |
| module authentication eliminates the need to trust those proxies as well, |
| moving them outside |
| [trusted computing base](https://www.microsoft.com/en-us/research/publication/authentication-in-distributed-systems-theory-and-practice/). |
| |
| See the Go blog post “[Go Modules in 2019](https://blog.golang.org/modules2019)” |
| for additional background. |
| |
| ### Module Authentication with `go.sum` |
| |
| Go 1.11’s preview of Go modules introduced the `go.sum` file, |
| which is maintained automatically by the `go` command |
| in the root of a module tree |
| and contains cryptographic checksums for the content of each |
| dependency of that module. |
| If a module’s source file tree is obtained unmodified, |
| then the `go.sum` file allows authenticating all dependencies |
| needed for a build of that module. |
| It ensures that tomorrow’s builds will use the same exact |
| code for dependencies that today’s builds did. |
| Tomorrow’s downloads are authenticated by `go.sum`. |
| |
| On the other hand, today’s downloads—the ones that add or update |
| dependencies in the first place—are not authenticated. |
| When a dependency is first added to a module, |
| or when a dependency is upgraded to a newer version, |
| there is no entry for it in `go.sum`, |
| and the `go` command today blindly trusts that it |
| downloads the correct code. |
| Then it records the hash of that code into `go.sum` |
| to ensure that code doesn’t change tomorrow. |
| But that doesn’t help the initial download. |
| The model is similar to SSH’s |
| “[trust on first use](https://en.wikipedia.org/wiki/Trust_on_first_use),” |
| and while that approach is an improvement over “trust every time,” |
| it’s still not ideal, |
| especially since developers typically download new module versions |
| far more often than they connect to new, unknown SSH servers. |
| |
| We are concerned primarily with authenticating downloads |
| of publicly-available module versions. |
| We assume that the private servers hosting |
| private module source code are already within the |
| trusted computing base of the developers using that code. |
| In contrast, a developer who wants to use `rsc.io/quote` |
| should not be required to trust that `rsc.io` is properly secured. |
| This trust becomes particularly problematic when summed |
| over all dependencies. |
| |
| What we need is an easily-accessed `go.sum` file listing every |
| publicly-available module version. |
| But we don’t want to blindly trust a downloaded `go.sum` file, |
| since that would become the next attractive target for an attacker. |
| |
| ### Transparent Logs |
| |
| The [Certificate Transparency](https://www.certificate-transparency.org/) project |
| is based on a data structure called a _transparent log_. |
| The transparent log is hosted on a server and made accessible to clients for random access, |
| but clients are still able to verify that a particular log record really is in the log |
| and also that the server never removes any log record from the log. |
| Separately, third-party auditors can iterate over the log |
| checking that the entries themselves are accurate. |
| These two properties combined mean that |
| a client can use records from the log, |
| confident that those records will remain available in the log |
| for auditors to double-check and report invalid or suspicious entries. |
| Clients and auditors can also compare observations to ensure |
| that the server is showing the same data to everyone involved. |
| |
| That is, the log server is not trusted to store the log properly, |
| nor is it trusted to put the right records into the log. |
| Instead, clients and auditors interact skeptically with the server, |
| able to verify for themselves in each interaction |
| that the server really is behaving correctly. |
| |
| For details about the data structure, see Russ Cox’s blog post, |
| “[Transparent Logs for Skeptical Clients](https://research.swtch.com/tlog).” |
| For a high-level overview of Certificate Transparency |
| along with additional motivation and context, |
| see Ben Laurie's ACM Queue article, |
| “[Certificate Transparency: Public, verifiable, append-only logs](https://queue.acm.org/detail.cfm?id=2668154).” |
| |
| The use of a transparent log for module hashes aligns with |
| a broader trend of using transparent logs to enable detection |
| of misbehavior by partially trusted systems, |
| what the Trillian team calls |
| “[General Transparency](https://github.com/google/trillian/#trillian-general-transparency).” |
| |
| ## Proposal |
| |
| We propose to publish the `go.sum` lines for all publicly-available Go modules |
| in a transparent log, |
| served by a new server called the Go checksum database. |
| When a publicly-available module is not yet listed in |
| the main module’s `go.sum` file, |
| the `go` command will fetch the relevant `go.sum` lines |
| from the checksum database instead of trusting the initial download |
| to be correct. |
| |
| ### Checksum Database |
| |
| The Go checksum database will run at `https://sum.golang.org/` and serve the following endpoints: |
| |
| - `/latest` will serve a signed tree size and hash for the latest log. |
| |
| - `/lookup/M@V` will serve the log record number for the entry about module M version V, |
| followed by the data for the record (that is, the `go.sum` lines for module M version V) |
| and a signed tree hash for a tree that contains the record. |
| If the module version is not yet recorded in the log, the notary will try to fetch it before replying. |
| Note that the data should never be used without first |
| authenticating it against the signed tree hash |
| and authenticating the signed tree hash against the client's |
| timeline of signed tree hashes. |
| |
| - `/tile/H/L/K[.p/W]` will serve a [log tile](https://research.swtch.com/tlog#serving_tiles). |
| The optional `.p/W` suffix indicates a partial log tile with only `W` hashes. |
| Clients must fall back to fetching the full tile if a partial tile is not found. |
| The record data for the leaf hashes in `/tile/H/0/K[.p/W]` are served as `/tile/H/data/K[.p/W]` |
| (with a literal `data` path element). |
| |
| Clients are expected to use `/lookup` and `/tile/H/L/...` during normal operations, |
| while auditors will want to use `/latest` and `/tile/H/data/...`. |
| A special `go` command may also fetch `/latest` to force incorporation |
| of that signed tree head into the local timeline. |
| |
| ### Proxying a Checksum Database |
| |
| A module proxy can also proxy requests to the checksum database. |
| The general proxy URL form is `<proxyURL>/sumdb/<databaseURL>`. |
| If `GOPROXY=https://proxy.site` then the latest signed tree would be fetched using |
| `https://proxy.site/sumdb/sum.golang.org/latest`. |
| Including the full database URL allows a transition to a new database log, |
| such as `sum.golang.org/v2`. |
| |
| Before accessing any checksum database URL using a proxy, |
| the proxy client should first fetch `<proxyURL>/sumdb/<sumdb-name>/supported`. |
| If that request returns a successful (HTTP 200) response, |
| then the proxy supports proxying checksum database requests. |
| In that case, the client should use the proxied access method only, |
| never falling back to a direct connection to the database. |
| If the `/sumdb/<sumdb-name>/supported` check fails with a “not found” (HTTP 404) |
| or “gone” (HTTP 410) response, |
| the proxy is unwilling to proxy the checksum database, |
| and the client should connect directly to the database. |
| Any other response is treated as the database being unavailable. |
| |
| A corporate proxy may want to ensure that clients |
| never make any direct database connections |
| (for example, for privacy; see the “Rationale” section below). |
| The optional `/sumdb/supported` endpoint, along with |
| proxying actual database requests, lets such a proxy |
| ensure that a `go` command using the proxy |
| never makes a direct connection to sum.golang.org. |
| But simpler proxies may wish to focus on serving |
| only modules and not checksum data—in particular, |
| module-only proxies can be served from entirely static file systems, |
| with no special infrastructure at all. |
| Such proxies can respond with an HTTP 404 or HTTP 410 to |
| the `/sumdb/supported` endpoint, so that clients |
| will connect to the database directly. |
| |
| ### `go` command client |
| |
| The `go` command is the primary consumer of the database’s published log. |
| The `go` command will [verify the log](https://research.swtch.com/tlog#verifying_a_log) |
| as it uses it, |
| ensuring that every record it reads is actually in the log |
| and that no observed log ever drops a record from an earlier observed log. |
| |
| The `go` command will refer to `$GOSUMDB` to find the name and public key |
| of the Go checksum database. |
| That variable will default to the `sum.golang.org` server. |
| |
| The `go` command will cache the latest signed tree size and tree hash |
| in `$GOPATH/pkg/sumdb/<sumdb-name>/latest`. |
| It will cache lookup results and tiles in |
| `$GOPATH/pkg/mod/download/cache/sumdb/<sumdb-name>/lookup/path@version` |
| and `$GOPATH/pkg/mod/download/cache/sumdb/<sumdb-name>/tile/H/L/K[.W]`. |
| (More generally, `https://<sumdb-URL>` is cached |
| in `$GOPATH/pkg/mod/download/cache/sumdb/<sumdb-URL>`.) |
| This way, `go clean -modcache` deletes cached lookup results and tiles |
| but not the latest signed tree hash, which should be preserved for |
| detection of timeline inconsistency. |
| No `go` command (only a manual `rm -rf $GOPATH/pkg`) |
| will wipe out the memory of the latest observed tree size and hash. |
| If the `go` command ever does observe a pair of inconsistent signed tree sizes and hashes, |
| it will complain loudly on standard error and fail the build. |
| |
| The `go` command must be configured to know which modules are |
| publicly available and therefore can be looked up in the checksum database, |
| versus those that are closed source and must not be looked up, |
| especially since that would transmit potentially private import paths |
| over the network to the database `/lookup` endpoint. |
| A few new environment variables control this configuration. |
| (See the [`go env -w` proposal](https://golang.org/design/30411-env), |
| now available in the Go 1.13 development branch, |
| for a way to manage these variables more easily.) |
| |
| - `GOPROXY=https://proxy.site/path` sets the Go module proxy to use, as before. |
| |
| - `GONOPROXY=prefix1,prefix2,prefix3` sets a list of module path prefixes, |
| possibly containing globs, that should not be proxied. |
| For example: |
| |
| GONOPROXY=*.corp.google.com,rsc.io/private |
| |
| will bypass the proxy for the modules foo.corp.google.com, foo.corp.google.com/bar, rsc.io/private, and rsc.io/private/bar, |
| though not rsc.io/privateer (the patterns are path prefixes, not string prefixes). |
| |
| - `GOSUMDB=<sumdb-key>` sets the Go checksum database to use, |
| where `<sumdb-key>` is a verifier key as defined in |
| [package note](https://godoc.org/golang.org/x/mod/sumdb/note#hdr-Verifying_Notes). |
| |
| - `GONOSUMDB=prefix1,prefix2,prefix3` sets a list of module path prefixes, |
| again possibly containing globs, that should not be looked up using the database. |
| |
| We expect that corporate environments may fetch all modules, public and private, |
| through an internal proxy; |
| `GONOSUMDB` allows them to disable checksum database lookups for |
| internal modules while still verifying public modules. |
| Therefore, `GONOSUMDB` must not imply `GONOPROXY`. |
| |
| We also expect that other users may prefer to connect directly to source origins |
| but still want verification of open source modules or proxying of the database itself; |
| `GONOPROXY` allows them to arrange that and therefore must not imply `GONOSUMDB`. |
| |
| The database not being able to report `go.sum` lines for a module version |
| is a hard failure: |
| any private modules must be explicitly listed in `$GONOSUMDB`. |
| (Otherwise an attacker could block traffic to the database |
| and make all module versions appear to be genuine.) |
| The database can be disabled entirely with `GONOSUMDB=*`. |
| The command `go get -insecure` will report but not stop after database lookup |
| failures or database mismatches. |
| |
| ## Rationale |
| |
| The motivation for authenticating module downloads is |
| covered in the background section above. |
| Note that we want to authenticate modules |
| obtained both from direct connections to code-hosting servers |
| and from module proxies. |
| |
| Two topics are worth further discussion: |
| first, having a single database server for the entire Go ecosystem, |
| and second, the privacy implications of a database server. |
| |
| ### Security |
| |
| The Go team at Google will run the Go checksum database as a service to the Go ecosystem, |
| similar to running `godoc.org` and `golang.org`. |
| It is important that the service be secure. |
| Our thinking about the security design of the database has evolved over time, |
| and it is useful to outline the evolution that led to the |
| current design. |
| |
| The simplest possible approach, which we never seriously considered, |
| is to have one trusted server that issues a signed certificate for each |
| module version. |
| The drawback of this approach is that a compromised server |
| can be used to sign a certificate for a compromised module version, |
| and then that compromised module version and certificate |
| can be served to a target victim without easy detection. |
| |
| One way to address this weakness is strength in numbers: |
| have, say, N=3 or N=5 organizations run independent servers, |
| gather certificates from all of them, and accept a module version |
| as valid when, say, (N+1)/2 certificates agree. |
| The two drawbacks of this approach are that it is significantly more expensive |
| and still provides no detection of actual attacks. |
| The payoff from targeted replacement of source code |
| could be high enough to justify silently compromising (N+1)/2 |
| notaries and then making very selective use of the certificates. |
| So our focus turned to detection of compromise. |
| |
| Requiring a checksum database to log a `go.sum` entry in a |
| [transparent log](https://research.swtch.com/tlog) |
| before accepting it does raise the likelihood of detection. |
| If the compromised `go.sum` entry is stored in the |
| actual log, an auditor can find it. |
| And if the compromised `go.sum` entry is served in |
| a forked, victim-specific log, the server must always serve |
| that forked log to the victim, and only to the victim, |
| or else the `go` command's consistency checks will fail |
| loudly, and with enough information to cryptographically |
| prove the compromise of the server. |
| |
| An ecosystem with multiple proxies run by different organizations |
| makes a successful “forked log” attack even harder: |
| the attacker would have to not only compromise the database, |
| it would also have to compromise each possible proxy the |
| victim might use and arrange to identify the victim well enough |
| to always serve the forked log to the victim |
| and to never serve it to any non-victim. |
| |
| The serving of the transparent log in tile form helps |
| caching and proxying but also makes victim identification |
| that much harder. |
| When using Certificate Transparency's proof endpoints, |
| the proof requests might be arranged to carry enough |
| material to identify a victim, for example by only ever serving an |
| even log sizes to the victim and odd log sizes to others |
| and then adjusting the log-size-specific proofs accordingly. |
| But complete tile fetches expose no information about the cached log size, |
| making it that much harder to serve modified tiles only to the victim. |
| |
| We hope that proxies run by various |
| organizations in the Go community will also serve as auditors |
| and double-check Go checksum database log entries |
| as part of their ordinary operation. |
| (Another useful |
| service that could be enabled by |
| the database is a notification service to alert |
| authors about new versions of their own modules.) |
| |
| As described earlier, |
| users who want to ensure their own compromise requires |
| compromising multiple organizations can use Google's checksum database |
| and a different organization's proxy to access it. |
| |
| Generalizing that approach, |
| the usual way to further improve detection of fork attacks is to add gossip, |
| so that different users can check whether they are seeing |
| different logs. |
| In effect, the proxy protocol already supports this, |
| so that any available proxy that proxies the database |
| can be a gossip source. |
| If we add a `go fetch-latest-chccksum-log-from-goproxy` (obviously not the final name) |
| and |
| |
| GOPROXY=https://other.proxy/ go fetch-latest-checksum-log-from-goproxy |
| |
| succeeds, then the client and other.proxy are seeing the same log. |
| |
| Compared to the original scenario of a single checksum database with |
| no transparent log, the use of a single transparent log |
| and the ability to proxy the database and gossip improves |
| detection of attacks so much that there is little incremental |
| security benefit to adding the complexity of multiple notaries. |
| At some point in the future, it might make sense for the |
| Go ecosystem to support using multiple databases, |
| but to begin with we have opted for the simpler |
| (but still reasonably secure) ecosystem design |
| of a single database. |
| |
| ### Privacy |
| |
| Contacting the Go checksum database to authenticate a new dependency |
| requires sending the module path and version to the database server. |
| |
| The database server will of course need to publish a privacy policy, |
| and it should be written as clearly as |
| the [Google Public DNS Privacy Policy](https://developers.google.com/speed/public-dns/privacy) |
| and be sure to include information about log retention windows. |
| That policy is still under development. |
| But the privacy policy only matters for data the database receives. |
| The design of the database protocol and usage is meant to minimize |
| what the `go` command even sends. |
| |
| There are two main privacy concerns: |
| exposing the text of private modules paths to the database, |
| and exposing usage information for public modules to the databas. |
| |
| #### Private Module Paths |
| |
| The first main privacy concern is that a misconfigured `go` command |
| could send the text of a private module path |
| (for example, `secret-machine.rsc.io/private/secret-plan`) to the database. |
| The database will try to resolve the module, triggering a DNS lookup |
| for `secret-machine.rsc.io` and, if that resolves, an HTTPS fetch |
| for the longer URL. |
| Even if the database then discards that path immediately upon failure, |
| it has still been sent over the network. |
| |
| Such misconfiguration must not go unnoticed. |
| For this reason (and also to avoid downgrade attacks), |
| if the database cannot return information about a module, |
| the download fails loudly and the `go` command stops. |
| This ensures both that all public modules are in fact |
| authenticated and also that any misconfiguration |
| must be corrected (by setting `$GONOSUMDB` to avoid |
| the database for those private modules) |
| in order to achieve a successful build. |
| This way, the frequency of misconfiguration-induced |
| database lookups should be minimized. |
| Misconfigurations fail; they will be noticed and fixed. |
| |
| One possibility to further reduce exposure of private module path text |
| is to provide additional ways to |
| set `$GONOSUMDB`, although it is not clear what those |
| should be. |
| A top-level module's source code repository is an attractive place to |
| want to store configuration such as `$GONOSUMDB` |
| and `$GOPROXY`, but then that configuration changes |
| depending on which version of the repo is checked out, |
| which would cause interesting behavior when testing old |
| versions, whether by hand or using tools like `git bisect`. |
| |
| (The nice thing about environment variables is that most |
| corporate computer management systems already provide |
| ways to preset environment variables.) |
| |
| #### Private Module SHA256s |
| |
| Another possibility to reduce exposure is to support and |
| use by default an alternate lookup `/lookup/SHA256(module)@version`, |
| which sends the SHA256 hash of the module path instead of the |
| module path instead. |
| If the database was already aware of that module path, |
| it would recognize the SHA256 and perform the lookup, |
| even potentially fetching a new version of the module. |
| If a misconfigured `go` command sends the SHA256 of |
| a private module path, that is far less information. |
| |
| The SHA256 scheme does require, however, that the first use of a |
| public module be accompanied by some operation that sends |
| its module path text to the database, so that the database |
| can update its inverse-SHA256 index. |
| That operation—for now, let's call it `go notify <modulepath>`—would |
| need to be run just once ever across the whole Go ecosystem |
| for each module path. |
| Most likely the author would do it, perhaps as part of the |
| still-hypothetical `go release` command, |
| or else the first user of the module would need to do it |
| (perhaps thinking carefully about being the first-ever user of the module!). |
| |
| A modification of the SHA256 scheme might be to send a truncated hash, |
| designed to produce [K-anonymity](https://en.wikipedia.org/wiki/K-anonymity), |
| but this would cause significant expense: |
| if the database identified K public modules with the truncated hash, |
| it would have to look up the given version tag for all K of them |
| before returning an answer. This seems needlessly expensive |
| and of little practical benefit. |
| (An attacker might even create a long list of module paths |
| that collide with a popular module, just to slow down requests.) |
| |
| The SHA256 + `go notify` scheme is not part of this proposal today, |
| but we are considering adding it, |
| with full hashes, not truncated ones. |
| |
| #### Public Module Usage Information |
| |
| The second main privacy concern is that even developers who use only |
| public modules would expose information about their module usage habits |
| by requesting new `go.sum` lines from the database. |
| |
| Remember that the `go` command only contacts the database |
| in order to find new lines to add to `go.sum`. |
| When `go.sum` is up-to-date, as it is during ordinary development, |
| the database is never contacted. |
| That is, the database is only involved at all when adding a new dependency |
| or changing the version of an existing one. |
| That significantly reduces the amount of usage information |
| being sent to the database in the first place. |
| |
| Note also that even `go get -u` does not request information |
| about every dependency from the database: |
| it only requests information about dependencies with |
| updates available. |
| |
| The `go` command will also cache database lookup results |
| (reauthenticating them against cached tiles at each use), |
| so that using a single computer to |
| upgrade the version of a particular dependency used by N different modules |
| will result in only one database lookup, not N. |
| That further reduces the strength of any usage signal. |
| |
| One possible way to even further reduce the usage signal |
| observable by the database might be to use a truncated hash |
| for K-anonymity, as described in the previous section, |
| but the efficiency problems described earlier still apply. |
| Also, even if any particular fetch downloaded information |
| for K different module paths, the likely-very-lopsided popularity |
| distribution might make it easy to guess which module |
| path a typical client was really looking for, |
| especially combined with version information. |
| Truncated hashes appear to cost more than the benefit |
| they would bring. |
| |
| The complete solution for not exposing either |
| private module path text or public module usage information |
| is to us a proxy or a bulk download. |
| |
| #### Privacy by Proxy |
| |
| A complete solution for database privacy concerns is to for |
| developers to access the database only through a proxy, |
| such as a local Athens instance or JFrog Artifactory instance, |
| assuming those proxies add support for proxying and |
| caching the Go database service endpoints. |
| |
| The proxy can be configured with a list of private module patterns, |
| so that even requests from a misconfigured `go` command never |
| not make it past the proxy. |
| The database endpoints are designed for cacheability, |
| so that a proxy can avoid making any request more than once. |
| Requests for new versions of modules would still need to be |
| relayed to the database. |
| |
| We anticipate that there will be many proxies available |
| for use in the Go ecosystem. |
| Part of the motivation for the Go checksum database is to allow |
| the use of any available proxy to download modules, |
| without any reduction in security. |
| Developers can then use any proxy they are comfortable using, |
| or run their own. |
| |
| #### Privacy by Bulk Download |
| |
| What little usage signal leaks from a proxy that aggressively caches |
| database queries can be removed entirely by instead downloading |
| the entire checksum database and answering requests using the |
| local copy. |
| We estimate that the Go ecosystem has around 3 million module versions. |
| At an estimated footprint of 200 bytes per module version, |
| a much larger, complete checksum database of even 100 million module versions would still only be 20 GB. |
| Bandwidth can be exchanged for complete anonymity |
| by downloading the full database once and thereafter updating it incrementally |
| (easy, since it is append-only). |
| Any queries can be answered using only the local copy, |
| ensuring that neither private module paths nor |
| public module usage is exposed. |
| The cost of this approach is the need for a clients to download the entire database |
| despite only needing an ever-smaller fraction of it. |
| (Today, assuming only a 3-million-entry database, |
| a module with even 100 dependencies would be downloading |
| 30,000 times more database than it actually needs. |
| As the Go ecosystem grows, so too does the overhead factor.) |
| |
| Downloading the entire database might be a good strategy |
| for a corporate proxy, however. |
| |
| #### Privacy in CI/CD Systems |
| |
| A question was raised about privacy of database operations especially |
| in CI/CD systems. |
| We expect that a CI/CD system would _never_ contact the database. |
| |
| First, in typical usage, you only push code to a CI/CD system after |
| first at least building (and hopefully also testing!) any changes locally. |
| Building any changes locally will update `go.mod` and `go.sum` |
| as needed, and then the `go.sum` pushed to the CI/CD system |
| will be up-to-date. The database is only involved when adding to `go.sum`. |
| |
| Second, module-aware CI/CD systems should already be using `-mod=readonly`, |
| to fail on out-of-date `go.mod` files instead of silently updating them. |
| We will ensure that `-mod=readonly` also fails on out-of-date `go.sum` files |
| if it does not already ([#30667](https://golang.org/issue/30667)). |
| |
| ## Compatibility |
| |
| The introduction of the checksum database does not have any compatibility |
| concerns at the command or language level. |
| However, proxies that serve modified copies of public modules |
| will be incompatible with the new checks and stop being usable. |
| This is by design: such proxies are indistinguishable from man-in-the-middle attacks. |
| |
| ## Implementation |
| |
| The Go team at Google is working on a production implementation |
| of both a Go module proxy and the Go checksum database, |
| as we described in the blog post “[Go Modules in 2019](https://blog.golang.org/modules2019).” |
| |
| We will publish a checksum database client as part of the `go` command, |
| as well as an example database implementation. |
| We intend to ship support for the checksum database, enabled by default, in Go 1.13. |
| |
| Russ Cox will lead the `go` command integration |
| and has posted a [stack of changes in golang.org/x/exp/notary](https://go-review.googlesource.com/q/f:notary). |
| |
| |