_content/blog: add rebuild

Change-Id: I424b79460e903392f73270ee68fe88fc0ba49811
Reviewed-on: https://go-review.googlesource.com/c/website/+/523635
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Damien Neil <dneil@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
diff --git a/_content/blog/rebuild.md b/_content/blog/rebuild.md
new file mode 100644
index 0000000..01ac9e4
--- /dev/null
+++ b/_content/blog/rebuild.md
@@ -0,0 +1,692 @@
+---
+title: Perfectly Reproducible, Verified Go Toolchains
+date: 2023-08-28
+by:
+- Russ Cox
+summary: Go 1.21 is the first perfectly reproducible Go toolchain.
+---
+
+One of the key benefits of open-source software is that anyone can read
+the source code and inspect what it does.
+And yet most software, even open-source software,
+is downloaded in the form of compiled binaries,
+which are much more difficult to inspect.
+If an attacker wanted to run a [supply chain attack](https://cloud.google.com/software-supply-chain-security/docs/attack-vectors)
+on an open-source project,
+the least visible way would be to replace the binaries being served while
+leaving the source code unmodified.
+
+The best way to address this kind of attack is to make open-source software
+builds _reproducible_,
+meaning that a build that starts with the same sources produces the same
+outputs every time it runs.
+That way, anyone can verify that posted binaries are free of hidden changes
+by building from authentic sources and checking that the rebuilt binaries
+are bit-for-bit identical to the posted binaries.
+That approach proves the binaries have no backdoors or other changes not
+present in the source code,
+without having to disassemble or look inside them at all.
+Since anyone can verify the binaries, independent groups can easily detect
+and report supply chain attacks.
+
+As supply chain security becomes more important,
+so do reproducible builds, because they provide a simple way to verify the
+posted binaries for open-source projects.
+
+Go 1.21.0 is the first Go toolchain with perfectly reproducible builds.
+Earlier toolchains were possible to reproduce,
+but only with significant effort, and probably no one did:
+they just trusted that the binaries posted on [go.dev/dl](/dl/) were the correct ones.
+Now it’s easy to “trust but verify.”
+
+This post explains what goes into making builds reproducible,
+examines the many changes we had to make to Go to make Go toolchains reproducible,
+and then demonstrates one of the benefits of reproducibility by verifying
+the Ubuntu package for Go 1.21.0.
+
+## Making a Build Reproducible {#how}
+
+Computers are generally deterministic, so you might think all builds would
+be equally reproducible.
+That’s only true from a certain point of view.
+Let’s call a piece of information a _relevant input_ when the output of
+a build can change depending on that input.
+A build is reproducible if it can be repeated with all the same relevant inputs.
+Unfortunately, lots of build tools turn out to incorporate inputs that we
+would usually not realize are relevant and that might be difficult to recreate
+or provide as input.
+Let’s call an input an _unintentional input_ when it turns out to be relevant
+but we didn’t mean it to be.
+
+The most common unintentional input in build systems is the current time.
+If a build writes an executable to disk, the file system records the current
+time as the executable’s modification time.
+If the build then packages that file using a tool like “tar” or “zip”,
+the modification time is written into the archive.
+We certainly didn’t want our build to change based on the current time, but it does.
+So the current time turns out to be an unintentional input to the build.
+Worse, most programs don’t let you provide the current time as an input,
+so there is no way to repeat this build.
+To fix this, we might set the time stamps on created files to Unix time
+0 or to a specific time read from one of the build’s source files.
+That way, the current time is no longer a relevant input to the build.
+
+Common relevant inputs to a build include:
+
+  - the specific version of the source code to build;
+  - the specific versions of dependencies that will be included in the build;
+  - the operating system running the build, which may affect path names in the resulting binaries;
+  - the architecture of the CPU on the build system,
+  - which may affect which optimizations the compiler uses or the layout of certain data structures;
+  - the compiler version being used, as well as compiler options passed to it, which affect how the code is compiled;
+  - the name of the directory containing the source code, which may appear in debug information;
+  - the user name, group name, uid, and gid of the account running the build, which may appear in file metadata in an archive;
+  - and many more.
+
+To have a reproducible build, every relevant input must be configurable in the build,
+and then the binaries must be posted alongside an explicit configuration
+listing every relevant input.
+If you’ve done that, you have a reproducible build. Congratulations!
+
+We’re not done, though. If the binaries can only be reproduced if you
+first find a computer with the right architecture,
+install a specific operating system version,
+compiler version, put the source code in the right directory,
+set your user identity correctly, and so on,
+that may be too much work in practice for anyone to bother.
+
+We want builds to be not just reproducible but _easy to reproduce_.
+To do that, we need to identify relevant inputs and then,
+instead of documenting them, eliminate them.
+The build obviously has to depend on the source code being built,
+but everything else can be eliminated.
+When a build’s only relevant input is its source code,
+let’s call that _perfectly reproducible_.
+
+## Perfectly Reproducible Builds for Go {#go}
+
+As of Go 1.21, the Go toolchain is perfectly reproducible:
+its only relevant input is the source code for that build.
+We can build a specific toolchain (say, Go for Linux/x86-64) on a Linux/x86-64 host,
+or a Windows/ARM64 host, or a FreeBSD/386 host,
+or any other host that supports Go, and we can use any Go bootstrap compiler,
+including bootstrapping all the way back to Go 1.4’s C implementation,
+and we can vary any other details.
+None of that changes the toolchains that are built.
+If we start with the same toolchain source code,
+we will get the exact same toolchain binaries out.
+
+This perfect reproducibility is the culmination of efforts dating back originally to Go 1.10,
+although most of the effort was concentrated in Go 1.20 and Go 1.21.
+This section highlights some of the most interesting relevant inputs that we eliminated.
+
+### Reproducibility in Go 1.10 {#go110}
+
+Go 1.10 introduced a content-aware build cache that decides whether targets
+are up-to-date based on a fingerprint of the build inputs instead of file modification times.
+Because the toolchain itself is one of those build inputs,
+and because Go is written in Go, the [bootstrap process](/s/go15bootstrap)
+would only converge if the toolchain build on a single machine was reproducible.
+The overall toolchain build looks like this:
+
+<div class="image">
+<img src="rebuild/bootstrap.png" srcset="rebuild/bootstrap.png 1x, rebuild/bootstrap@2x.png 2x" width="515" height="177">
+</div>
+
+We start by building the sources for the current Go toolchain using an earlier Go version,
+the bootstrap toolchain (Go 1.10 used Go 1.4, written in C;
+Go 1.21 uses Go 1.17).
+That produces “toolchain1”, which we use to build everything again,
+producing “toolchain2”, which we use to build everything again,
+producing “toolchain3”.
+
+Toolchain1 and toolchain2 have been built from the same sources but with
+different Go implementations (compilers and libraries),
+so their binaries are certain to be different.
+However, if both Go implementations are non-buggy,
+correct implementations, toolchain1 and toolchain2 should behave exactly the same.
+In particular, when presented with the Go 1.X sources,
+toolchain1’s output (toolchain2) and toolchain2’s output (toolchain3)
+should be identical,
+meaning toolchain2 and toolchain3 should be identical.
+
+At least, that’s the idea. Making that true in practice required removing a couple unintentional inputs:
+
+**Randomness.** Map iteration and running work in multiple goroutines serialized
+with locks both introduce randomness in the order that results may be generated.
+This randomness can make the toolchain produce one of several different
+possible outputs each time it runs.
+To make the build reproducible, we had to find each of these and sort the
+relevant list of items before using it to generate output.
+
+**Bootstrap Libraries.** Any library used by the compiler that can choose
+from multiple different correct outputs might change its output from one
+Go version to the next.
+If that library output change causes a compiler output change,
+then toolchain1 and toolchain2 will not be semantically identical,
+and toolchain2 and toolchain3 will not be bit-for-bit identical.
+
+The canonical example is the [`sort`](/pkg/sort/) package,
+which can place elements that compare equal in [any order it likes](/blog/compat#output).
+A register allocator might sort to prioritize commonly used variables,
+and the linker sorts symbols in the data section by size.
+To completely eliminate any effect from the sorting algorithm,
+the comparison function used must never report two distinct elements as equal.
+In practice, this invariant turned out to be too onerous to impose on every
+use of sort in the toolchain,
+so instead we arranged to copy the Go 1.X `sort` package into the source
+tree that is presented to the bootstrap compiler.
+That way, the compiler uses the same sort algorithm when using the bootstrap
+toolchain as it does when built with itself.
+
+Another package we had to copy was [`compress/zlib`](/pkg/compress/zlib/),
+because the linker writes compressed debug information,
+and optimizations to compression libraries can change the exact output.
+Over time, we’ve [added other packages to that list too](https://go.googlesource.com/go/+/go1.21.0/src/cmd/dist/buildtool.go#55).
+This approach has the added benefit of allowing the Go 1.X compiler to use
+new APIs added to those packages immediately,
+at the cost that those packages must be written to compile with older versions of Go.
+
+### Reproducibility in Go 1.20 {#go120}
+
+Work on Go 1.20 prepared for both easy reproducible builds and [toolchain management](toolchain)
+by removing two more relevant inputs from the toolchain build.
+
+**Host C toolchain.** Some Go packages, most notably `net`,
+default to [using `cgo`](cgo) on most operating systems.
+In some cases, such as macOS and Windows,
+invoking system DLLs using `cgo` is the only reliable way to resolve host names.
+When we use `cgo`, though, we invoke the host C toolchain (meaning a specific
+C compiler and C library),
+and different toolchains have different compilation algorithms and library code,
+producing different outputs.
+The build graph for a `cgo` package looks like:
+
+<div class="image">
+<img src="rebuild/cgo.png" srcset="rebuild/cgo.png 1x, rebuild/cgo@2x.png 2x" width="441" height="344">
+</div>
+
+The host C toolchain is therefore a relevant input to the pre-compiled `net.a`
+that ships with the toolchain.
+For Go 1.20, we decided to fix this by removing `net.a` from the toolchain.
+That is, Go 1.20 stopped shipping pre-compiled packages to seed the build cache with.
+Now, the first time a program uses package `net`,
+the Go toolchain compiles it using the local system’s C toolchain and caches that result.
+In addition to removing a relevant input from toolchain builds and making
+toolchain downloads smaller,
+not shipping pre-compiled packages also makes toolchain downloads more portable.
+If we build package `net` on one system with one C toolchain and then compile
+other parts of the program on a different system with a different C toolchain,
+in general there is no guarantee that the two parts can be linked together.
+
+One reason we shipped the pre-compiled `net` package in the first place
+was to allow building programs that used package net even on systems without
+a C toolchain installed.
+If there’s no pre-compiled package, what happens on those systems? The
+answer varies by operating system,
+but in all cases we arranged for the Go toolchain to continue to work well
+for building pure Go programs without a host C toolchain.
+
+  - On macOS, we rewrote package net using the underlying mechanisms that cgo would use,
+    without any actual C code.
+    This avoids invoking the host C toolchain but still emits a binary that
+    refers to the required system DLLs.
+    This approach is only possible because every Mac has the same dynamic libraries installed.
+    Making the non-cgo macOS package net use the system DLLs also meant that
+    cross-compiled macOS executables now use the system DLLs for network access,
+    resolving a long-standing feature request.
+
+  - On Windows, package net already made direct use of DLLs without C code, so nothing needed to be changed.
+
+  - On Unix systems, we cannot assume a specific DLL interface to network code,
+    but the pure Go version works fine for systems that use typical IP and DNS setups.
+    Also, it is much easier to install a C toolchain on Unix systems than it
+    is on macOS and especially Windows.
+    We changed the `go` command to enable or disable `cgo` automatically based
+    on whether the system has a C toolchain installed.
+    Unix systems without a C toolchain fall back to the pure Go version of package net,
+    and in the rare cases where that’s not good enough,
+    they can install a C toolchain.
+
+Having dropped the pre-compiled packages,
+the only part of the Go toolchain that still depended on the host C toolchain
+was binaries built using package net,
+specifically the `go` command.
+With the macOS improvements, it was now viable to build those commands with `cgo` disabled,
+completely removing the host C toolchain as an input,
+but we left that final step for Go 1.21.
+
+**Host dynamic linker.** When programs use `cgo` on a system using dynamically linked C libraries,
+the resulting binaries contain the path to the system’s dynamic linker,
+something like `/lib64/ld-linux-x86-64.so.2`.
+If the path is wrong, the binaries don’t run.
+Typically each operating system/architecture combination has a single correct
+answer for this path.
+Unfortunately, musl-based Linuxes like Alpine Linux use a different dynamic
+linker than glibc-based Linuxes like Ubuntu.
+To make Go run at all on Alpine Linux, in Go bootstrap process looked like this:
+
+<div class="image">
+<img src="rebuild/linker1.png" srcset="rebuild/linker1.png 1x, rebuild/linker1@2x.png 2x" width="480" height="209">
+</div>
+
+The bootstrap program cmd/dist inspected the local system’s dynamic linker
+and wrote that value into a new source file compiled along with the rest
+of the linker sources,
+effectively hard-coding that default into the linker itself.
+Then when the linker built a program from a set of compiled packages,
+it used that default.
+The result is that a Go toolchain built on Alpine is different from a toolchain built on Ubuntu:
+the host configuration is a relevant input to the toolchain build.
+This is a reproducibility problem but also a portability problem:
+a Go toolchain built on Alpine doesn’t build working binaries or even
+run on Ubuntu, and vice versa.
+
+For Go 1.20, we took a step toward fixing the reproducibility problem by
+changing the linker to consult the host configuration when it is running,
+instead of having a default hard-coded at toolchain build time:
+
+<div class="image">
+<img src="rebuild/linker2.png" srcset="rebuild/linker2.png 1x, rebuild/linker2@2x.png 2x" width="450" height="175">
+</div>
+
+This fixed the portability of the linker binary on Alpine Linux,
+although not the overall toolchain, since the `go` command still used package
+`net` and therefore `cgo` and therefore had a dynamic linker reference in its own binary.
+Just as in the previous section, compiling the `go` command without `cgo`
+enabled would fix this,
+but we left that change for Go 1.21.
+(We didn’t feel there was enough time left in the Go 1.20 cycle to test
+such that change properly.)
+
+### Reproducibility in Go 1.21 {#go121}
+
+For Go 1.21, the goal of perfect reproducibility was in sight,
+and we took care of the remaining, mostly small,
+relevant inputs that remained.
+
+**Host C toolchain and dynamic linker.** As discussed above,
+Go 1.20 took important steps toward removing the host C toolchain and dynamic
+linker as relevant inputs.
+Go 1.21 completed the removal of these relevant inputs by building the toolchain
+with `cgo` disabled.
+This improved portability of the toolchain too:
+Go 1.21 is the first Go release where the standard Go toolchain runs unmodified
+on Alpine Linux systems.
+
+Removing these relevant inputs made it possible to cross-compile a Go toolchain
+from a different system without any loss in functionality.
+That in turn improved the supply chain security of the Go toolchain:
+we can now build Go toolchains for all target systems using a trusted Linux/x86-64 system,
+instead of needing to arrange a separate trusted system for each target.
+As a result, Go 1.21 is the first release to include posted binaries for
+all systems at [go.dev/dl/](/dl/).
+
+**Source directory.** Go programs include full paths in the runtime and debugging metadata,
+so that when a program crashes or is run in a debugger,
+stack traces include the full path to the source file,
+not just the name of the file in an unspecified directory.
+Unfortunately, including the full path makes the directory where the source
+code is stored a relevant input to the build.
+To fix this, Go 1.21 changed the release toolchain builds to install commands
+like the compiler using `go install -trimpath`,
+which replaces the source directory with the module path of the code.
+If a released compiler crashes, the stack trace will print paths like `cmd/compile/main.go`
+instead of `/home/user/go/src/cmd/compile/main.go`.
+Since the full paths would refer to a directory on a different machine anyway,
+this rewrite is no loss.
+On the other hand, for non-release builds,
+we keep the full path, so that when developers working on the compiler itself cause it to crash,
+IDEs and other tools reading those crashes can easily find the correct source file.
+
+**Host operating system.** Paths on Windows systems are backslash-separated,
+like `cmd\compile\main.go`.
+Other systems use forward slashes, like `cmd/compile/main.go`.
+Although earlier versions of Go had normalized most of these paths to use forward slashes,
+one inconsistency had crept back in, causing slightly different toolchain builds on Windows.
+We found and fixed the bug.
+
+**Host architecture.** Go runs on a variety of ARM systems and can emit
+code using a software library for floating-point math (SWFP) or using hardware
+floating-point instructions (HWFP).
+Toolchains defaulting to one mode or the other will necessarily differ.
+Like we saw with the dynamic linker earlier,
+the Go bootstrap process inspected the build system to make sure that the
+resulting toolchain worked on that system.
+For historical reasons, the rule was “assume SWFP unless the build is
+running on an ARM system with floating-point hardware”,
+with cross-compiled toolchains assuming SWFP.
+The vast majority of ARM systems today do have floating-point hardware,
+so this introduced an unnecessary difference between natively compiled and
+cross-compiled toolchains,
+and as a further wrinkle, Windows ARM builds always assumed HWFP,
+making the decision operating system-dependent.
+We changed the rule to be “assume HWFP unless the build is running on
+an ARM system without floating-point hardware”.
+This way, cross-compilation and builds on modern ARM systems produce identical toolchains.
+
+**Packaging logic.** All the code to create the actual toolchain archives
+we post for download lived in a separate Git repository,
+golang.org/x/build, and the exact details of how archives get packaged does change over time.
+If you wanted to reproduce those archives,
+you needed to have the right version of that repository.
+We removed this relevant input by moving the code to package the archives
+into the main Go source tree, as `cmd/distpack`.
+As of Go 1.21, if you have the sources for a given version of Go,
+you also have the sources for packaging the archives.
+The golang.org/x/build repository is no longer a relevant input.
+
+**User IDs.** The tar archives we posted for download were built from a
+distribution written to the file system,
+and using [`tar.FileInfoHeader`](/pkg/archive/tar/#FileInfoHeader) copies
+the user and group IDs from the file system into the tar file,
+making the user running the build a relevant input.
+We changed the archiving code to clear these.
+
+**Current time.** Like with user IDs, the tar and zip archives we posted
+for download had been built by copying the file system modification times into the archives,
+making the current time a relevant input.
+We could have cleared the time, but we thought it would look surprising
+and possibly even break some tools to use the Unix or MS-DOS zero time.
+Instead, we changed the go/VERSION file stored in the repository to add
+the time associated with that version:
+
+	$ cat go1.21.0/VERSION
+	go1.21.0
+	time 2023-08-04T20:14:06Z
+	$
+
+The packagers now copy the time from the VERSION file when writing files to archives,
+instead of copying the local file’s modification times.
+
+**Cryptographic signing keys.** The Go toolchain for macOS won’t run on
+end-user systems unless we sign the binaries with an Apple-approved signing key.
+We use an internal system to get them signed with Google’s signing key,
+and obviously we cannot share that secret key in order to allow others to
+reproduce the signed binaries.
+Instead, we wrote a verifier that can check whether two binaries are identical
+except for their signatures.
+
+**OS-specific packagers.** We use the Xcode tools `pkgbuild` and `productbuild`
+to create the downloadable macOS PKG installer,
+and we use WiX to create the downloadable Windows MSI installer.
+We don’t want verifiers to need the same exact versions of those tools,
+so we took the same approach as for the cryptographic signing keys,
+writing a verifier that can look inside the packages and check that the
+toolchain files are exactly as expected.
+
+## Verifying the Go Toolchains {#verify}
+
+It’s not enough to make Go toolchains reproducible once.
+We want to make sure they stay reproducible,
+and we want to make sure others can reproduce them easily.
+
+To keep ourselves honest, we now build all Go distributions on both a trusted
+Linux/x86-64 system and a Windows/x86-64 system.
+Except for the architecture, the two systems have almost nothing in common.
+The two systems must produce bit-for-bit identical archives or else we do
+not proceed with the release.
+
+To allow others to verify that we’re honest,
+we’ve written and published a verifier,
+[`golang.org/x/build/cmd/gorebuild`](https://pkg.go.dev/golang.org/x/build/cmd/gorebuild).
+That program will start with the source code in our Git repository and rebuild the
+current Go versions, checking that they match the archives posted on [go.dev/dl](/dl/).
+Most archives are required to match bit-for-bit.
+As mentioned above, there are three exceptions where a more relaxed check is used:
+
+  - The macOS tar.gz file is expected to differ,
+    but then the verifier compares the contents inside.
+    The rebuilt and posted copies must contain the same files,
+    and all the files must match exactly, except for executable binaries.
+    Executable binaries must match exactly after stripping code signatures.
+
+  - The macOS PKG installer is not rebuilt. Instead,
+    the verifier reads the files inside the PKG installer and checks that they
+    match the macOS tar.gz exactly,
+    again after code signature stripping.
+    In the long term, the PKG creation is trivial enough that it could potentially
+    be added to cmd/distpack,
+    but the verifier would still have to parse the PKG file to run the signature-ignoring
+    code executable comparison.
+
+  - The Windows MSI installer is not rebuilt.
+    Instead, the verifier invokes the Linux program `msiextract` to extract
+    the files inside and check that they match the rebuilt Windows zip file exactly.
+    In the long term, perhaps the MSI creation could be added to cmd/distpack,
+    and then the verifier could use a bit-for-bit MSI comparison.
+
+We run `gorebuild` nightly, posting the results at [go.dev/rebuild](/rebuild),
+and of course anyone else can run it too.
+
+## Verifying Ubuntu’s Go Toolchain {#ubuntu}
+
+The Go toolchain’s easily reproducible builds should mean that the binaries
+in the toolchains posted on go.dev match the binaries included in other packaging systems,
+even when those packagers build from source.
+Even if the packagers have compiled with different configurations or other changes,
+the easily reproducible builds should still make it easy to reproduce their binaries.
+To demonstrate this, let’s reproduce the Ubuntu `golang-1.21` package
+version `1.21.0-1` for Linux/x86-64.
+
+To start, we need to download and extract the Ubuntu packages,
+which are [ar(1) archives](https://linux.die.net/man/1/ar) containing zstd-compressed tar archives:
+
+{{raw `
+	$ mkdir deb
+	$ cd deb
+	$ curl -LO http://mirrors.kernel.org/ubuntu/pool/main/g/golang-1.21/golang-1.21-src_1.21.0-1_all.deb
+	$ ar xv golang-1.21-src_1.21.0-1_all.deb
+	x - debian-binary
+	x - control.tar.zst
+	x - data.tar.zst
+	$ unzstd < data.tar.zst | tar xv
+	...
+	x ./usr/share/go-1.21/src/archive/tar/common.go
+	x ./usr/share/go-1.21/src/archive/tar/example_test.go
+	x ./usr/share/go-1.21/src/archive/tar/format.go
+	x ./usr/share/go-1.21/src/archive/tar/fuzz_test.go
+	...
+	$
+`}}
+
+That was the source archive. Now the amd64 binary archive:
+
+{{raw `
+	$ rm -f debian-binary *.zst
+	$ curl -LO http://mirrors.kernel.org/ubuntu/pool/main/g/golang-1.21/golang-1.21-go_1.21.0-1_amd64.deb
+	$ ar xv golang-1.21-src_1.21.0-1_all.deb
+	x - debian-binary
+	x - control.tar.zst
+	x - data.tar.zst
+	$ unzstd < data.tar.zst | tar xv | grep -v '/$'
+	...
+	x ./usr/lib/go-1.21/bin/go
+	x ./usr/lib/go-1.21/bin/gofmt
+	x ./usr/lib/go-1.21/go.env
+	x ./usr/lib/go-1.21/pkg/tool/linux_amd64/addr2line
+	x ./usr/lib/go-1.21/pkg/tool/linux_amd64/asm
+	x ./usr/lib/go-1.21/pkg/tool/linux_amd64/buildid
+	...
+	$
+`}}
+
+Ubuntu splits the normal Go tree into two halves,
+in /usr/share/go-1.21 and /usr/lib/go-1.21.
+Let’s put them back together:
+
+	$ mkdir go-ubuntu
+	$ cp -R usr/share/go-1.21/* usr/lib/go-1.21/* go-ubuntu
+	cp: cannot overwrite directory go-ubuntu/api with non-directory usr/lib/go-1.21/api
+	cp: cannot overwrite directory go-ubuntu/misc with non-directory usr/lib/go-1.21/misc
+	cp: cannot overwrite directory go-ubuntu/pkg/include with non-directory usr/lib/go-1.21/pkg/include
+	cp: cannot overwrite directory go-ubuntu/src with non-directory usr/lib/go-1.21/src
+	cp: cannot overwrite directory go-ubuntu/test with non-directory usr/lib/go-1.21/test
+	$
+
+The errors are complaining about copying symlinks, which we can ignore.
+
+Now we need to download and extract the upstream Go sources:
+
+	$ curl -LO https://go.googlesource.com/go/+archive/refs/tags/go1.21.0.tar.gz
+	$ mkdir go-clean
+	$ cd go-clean
+	$ curl -L https://go.googlesource.com/go/+archive/refs/tags/go1.21.0.tar.gz | tar xzv
+	...
+	x src/archive/tar/common.go
+	x src/archive/tar/example_test.go
+	x src/archive/tar/format.go
+	x src/archive/tar/fuzz_test.go
+	...
+	$
+
+To skip some trial and error, it turns out that Ubuntu builds Go with `GO386=softfloat`,
+which forces the use of software floating point when compiling for 32-bit x86,
+and strips (removes symbol tables from) the resulting ELF binaries.
+Let’s start with a `GO386=softfloat` build:
+
+	$ cd src
+	$ GOOS=linux GO386=softfloat ./make.bash -distpack
+	Building Go cmd/dist using /Users/rsc/sdk/go1.17.13. (go1.17.13 darwin/amd64)
+	Building Go toolchain1 using /Users/rsc/sdk/go1.17.13.
+	Building Go bootstrap cmd/go (go_bootstrap) using Go toolchain1.
+	Building Go toolchain2 using go_bootstrap and Go toolchain1.
+	Building Go toolchain3 using go_bootstrap and Go toolchain2.
+	Building commands for host, darwin/amd64.
+	Building packages and commands for target, linux/amd64.
+	Packaging archives for linux/amd64.
+	distpack: 818d46ede85682dd go1.21.0.src.tar.gz
+	distpack: 4fcd8651d084a03d go1.21.0.linux-amd64.tar.gz
+	distpack: eab8ed80024f444f v0.0.1-go1.21.0.linux-amd64.zip
+	distpack: 58528cce1848ddf4 v0.0.1-go1.21.0.linux-amd64.mod
+	distpack: d8da1f27296edea4 v0.0.1-go1.21.0.linux-amd64.info
+	---
+	Installed Go for linux/amd64 in /Users/rsc/deb/go-clean
+	Installed commands in /Users/rsc/deb/go-clean/bin
+	*** You need to add /Users/rsc/deb/go-clean/bin to your PATH.
+	$
+
+That left the standard package in `pkg/distpack/go1.21.0.linux-amd64.tar.gz`.
+Let’s unpack it and strip the binaries to match Ubuntu:
+
+	$ cd ../..
+	$ tar xzvf go-clean/pkg/distpack/go1.21.0.linux-amd64.tar.gz
+	x go/CONTRIBUTING.md
+	x go/LICENSE
+	x go/PATENTS
+	x go/README.md
+	x go/SECURITY.md
+	x go/VERSION
+	...
+	$ elfstrip go/bin/* go/pkg/tool/linux_amd64/*
+	$
+
+Now we can diff the Go toolchain we’ve created on our Mac with the Go toolchain that Ubuntu ships:
+
+{{raw `
+	$ diff -r go go-ubuntu
+	Only in go: CONTRIBUTING.md
+	Only in go: LICENSE
+	Only in go: PATENTS
+	Only in go: README.md
+	Only in go: SECURITY.md
+	Only in go: codereview.cfg
+	Only in go: doc
+	Only in go: lib
+	Binary files go/misc/chrome/gophertool/gopher.png and go-ubuntu/misc/chrome/gophertool/gopher.png differ
+	Only in go-ubuntu/pkg/tool/linux_amd64: dist
+	Only in go-ubuntu/pkg/tool/linux_amd64: distpack
+	Only in go/src: all.rc
+	Only in go/src: clean.rc
+	Only in go/src: make.rc
+	Only in go/src: run.rc
+	diff -r go/src/syscall/mksyscall.pl go-ubuntu/src/syscall/mksyscall.pl
+	1c1
+	< #!/usr/bin/env perl
+	---
+	> #! /usr/bin/perl
+	...
+	$
+`}}
+
+We’ve successfully reproduced the Ubuntu package’s executables and identified
+the complete set of changes that remain:
+
+  - Various metadata and supporting files have been deleted.
+  - The `gopher.png` file has been modified. On closer inspection the two are
+    identical except for an embedded timestamp that Ubuntu has updated.
+    Perhaps Ubuntu’s packaging scripts recompressed the png with a tool that
+    rewrites the timestamp even when it cannot improve on the existing compression.
+  - The binaries `dist` and `distpack`, which are built during bootstrap but
+    not included in standard archives,
+    have been included in the Ubuntu package.
+  - The Plan 9 build scripts (`*.rc`) have been deleted, although the Windows build scripts (`*.bat`) remain.
+  - `mksyscall.pl` and seven other Perl scripts not shown have had their headers changed.
+
+Note in particular that we’ve reconstructed the toolchain binaries bit-for-bit:
+they do not show up in the diff at all.
+That is, we proved that the Ubuntu Go binaries correspond exactly to the
+upstream Go sources.
+
+Even better, we proved this without using any Ubuntu software at all:
+these commands were run on a Mac, and [`unzstd`](https://github.com/rsc/tmp/blob/master/unzstd/)
+and [`elfstrip`](https://github.com/rsc/tmp/blob/master/elfstrip/) are short Go programs.
+A sophisticated attacker might insert malicious code into an Ubuntu package
+by changing the package-creation tools.
+If they did, reproducing the Go Ubuntu package from clean sources using
+those malicious tools would still produce bit-for-bit identical copies of
+the malicious packages.
+This attack would be invisible to that kind of rebuild,
+much like [Ken Thompson’s compiler attack](https://dl.acm.org/doi/10.1145/358198.358210).
+Verifying the Ubuntu packages using no Ubuntu software at all is a much
+stronger check.
+Go’s perfectly reproducible builds, which don’t depend on unindented
+details like the host operating system,
+host architecture, and host C toolchain, are what make this stronger check possible.
+
+(As an aside for the historical record, Ken Thompson told me once that his
+attack was in fact detected,
+because the compiler build stopped being reproducible.
+It had a bug: a string constant in the backdoor added to the compiler was
+imperfectly handled and grew by a single NUL byte each time the compiler compiled itself.
+Eventually someone noticed the non-reproducible build and tried to find the cause by compiling to assembly.
+The compiler’s backdoor did not reproduce itself into assembly output at all,
+so assembling that output removed the backdoor.)
+
+## Conclusion
+
+Reproducible builds are an important tool for strengthening the open-source supply chain.
+Frameworks like [SLSA](https://slsa.dev/) focus on provenance and a software
+chain of custody that can be used to inform decisions about trust.
+Reproducible builds complement that approach by providing a way to verify
+that the trust is well-placed.
+
+Perfect reproducibility (when the source files are the build’s only relevant
+input) is only possible for programs that build themselves,
+like compiler toolchains.
+It is a lofty but worthwhile goal precisely because self-hosting compiler
+toolchains are otherwise quite difficult to verify.
+Go’s perfect reproducibility means that,
+assuming packagers don’t modify the source code,
+every repackaging of Go 1.21.0 for Linux/x86-64 (substitute your favorite
+system) in any form should be distributing exactly the same binaries,
+even when they all build from source.
+We’ve seen that this is not quite true for Ubuntu Linux,
+but perfect reproducibility still lets us reproduce the Ubuntu packaging
+using a very different, non-Ubuntu system.
+
+Ideally all open source software distributed in binary form would have easy-to-reproduce builds.
+In practice, as we’ve seen in this post,
+it is very easy for unintended inputs to leak into builds.
+For Go programs that don’t need `cgo`, a reproducible build is as simple
+as compiling with `CGO_ENABLED=0 go build -trimpath`.
+Disabling `cgo` removes the host C toolchain as a relevant input,
+and `-trimpath` removes the current directory.
+If your program does need `cgo`, you need to arrange for a specific host
+C toolchain version before running `go build`,
+such as by running the build in a specific virtual machine or container image.
+
+Moving beyond Go, the [Reproducible Builds](https://reproducible-builds.org/)
+project aims to improve reproducibility of all open source and is a good
+starting point for more information about making your own software builds reproducible.
+
diff --git a/_content/blog/rebuild/bootstrap.graffle b/_content/blog/rebuild/bootstrap.graffle
new file mode 100644
index 0000000..a5279f9
--- /dev/null
+++ b/_content/blog/rebuild/bootstrap.graffle
Binary files differ
diff --git a/_content/blog/rebuild/bootstrap.png b/_content/blog/rebuild/bootstrap.png
new file mode 100644
index 0000000..91a11b5
--- /dev/null
+++ b/_content/blog/rebuild/bootstrap.png
Binary files differ
diff --git a/_content/blog/rebuild/bootstrap@2x.png b/_content/blog/rebuild/bootstrap@2x.png
new file mode 100644
index 0000000..e19c446
--- /dev/null
+++ b/_content/blog/rebuild/bootstrap@2x.png
Binary files differ
diff --git a/_content/blog/rebuild/cgo.graffle b/_content/blog/rebuild/cgo.graffle
new file mode 100644
index 0000000..df78606
--- /dev/null
+++ b/_content/blog/rebuild/cgo.graffle
Binary files differ
diff --git a/_content/blog/rebuild/cgo.png b/_content/blog/rebuild/cgo.png
new file mode 100644
index 0000000..fdd5724
--- /dev/null
+++ b/_content/blog/rebuild/cgo.png
Binary files differ
diff --git a/_content/blog/rebuild/cgo@2x.png b/_content/blog/rebuild/cgo@2x.png
new file mode 100644
index 0000000..bf7e0c5
--- /dev/null
+++ b/_content/blog/rebuild/cgo@2x.png
Binary files differ
diff --git a/_content/blog/rebuild/linker1.graffle b/_content/blog/rebuild/linker1.graffle
new file mode 100644
index 0000000..0da78a9
--- /dev/null
+++ b/_content/blog/rebuild/linker1.graffle
Binary files differ
diff --git a/_content/blog/rebuild/linker1.png b/_content/blog/rebuild/linker1.png
new file mode 100644
index 0000000..fceb272
--- /dev/null
+++ b/_content/blog/rebuild/linker1.png
Binary files differ
diff --git a/_content/blog/rebuild/linker1@2x.png b/_content/blog/rebuild/linker1@2x.png
new file mode 100644
index 0000000..84e0a0f
--- /dev/null
+++ b/_content/blog/rebuild/linker1@2x.png
Binary files differ
diff --git a/_content/blog/rebuild/linker2.graffle b/_content/blog/rebuild/linker2.graffle
new file mode 100644
index 0000000..799754e
--- /dev/null
+++ b/_content/blog/rebuild/linker2.graffle
Binary files differ
diff --git a/_content/blog/rebuild/linker2.png b/_content/blog/rebuild/linker2.png
new file mode 100644
index 0000000..5cb33b3
--- /dev/null
+++ b/_content/blog/rebuild/linker2.png
Binary files differ
diff --git a/_content/blog/rebuild/linker2@2x.png b/_content/blog/rebuild/linker2@2x.png
new file mode 100644
index 0000000..79e95ea
--- /dev/null
+++ b/_content/blog/rebuild/linker2@2x.png
Binary files differ