_content/talks/2013/oscon-dl.slide - website - Git at Google

 dl.google.com: Powered by Go
 10:00 26 Jul 2013
 Tags: download, oscon, port, c++, google, groupcache, caching

 Brad Fitzpatrick
 Gopher, Google
 @bradfitz
 bradfitz@golang.org
 http://bradfitz.com/
 https://go.dev/
 https://github.com/golang/groupcache/

 * Overview / tl;dw:

 - dl.google.com serves Google downloads
 - Was written in C++
 - Now in Go
 - Now much better
 - Extensive, idiomatic use of Go's standard library
 - ... which is all open source
 - composition of interfaces is fun
 - _groupcache_, now Open Source, handles group-aware caching and cache-filling

 * too long...

 * me

 - Brad Fitzpatrick
 - bradfitz.com
 - @bradfitz
 - past: LiveJournal, memcached, OpenID, Perl stuff...
 - nowadays: Go, Go, Camlistore, Go, anything & everything written in Go ...

 * I love Go

 - this isn't a talk about Go, sorry.
 - but check it out.
 - simple, powerful, fast, liberating, refreshing
 - great mix of low- and high- level
 - light on the page
 - static binaries, easy to deploy
 - not perfect, but my favorite language yet

 * dl.google.com

 * dl.google.com

 - HTTP download server
 - serves Chrome, Android SDK, Earth, much more
 - Some huge, some tiny (e.g. WebGL white/blacklist JSON)
 - behind an edge cache; still high traffic
 - lots of datacenters, lots of bandwidth

 * Why port?

 * reason 0

 $ apt-get update

 .image oscon-dl/slow.png

 - embarrassing
 - Google can't serve a 1,238 byte file?
 - Hanging?
 - 207 B/s?!

 * Yeah, embarrassing, for years...

 .image oscon-dl/crbug.png

 * ... which led to:

 - complaining on corp G+. Me: "We suck. This sucks."
 - primary SRE owning it: "Yup, it sucks. And is unmaintained."
 - "I'll rewrite it for you!"
 - "Hah."
 - "No, serious. That's kinda our job. But I get to do it in Go."
 - (Go team's loan-out-a-Gopher program...)

 * How hard can this be?

 * dl.google.com: few tricks

 each "payload" (~URL) described by a protobuf:

 - paths/patterns for its URL(s)
 - go-live reveal date
 - ACLs (geo, network, user, user type, ...)
 - dynamic zip files
 - custom HTTP headers
 - custom caching

 * dl.google.com: how it was

 .image oscon-dl/before.png

 * Aside: Why good code goes bad

 * Why good code goes bad

 - Premise: people don't suck
 - Premise: code was once beautiful
 - code tends towards complexity (gets worse)
 - environment changes
 - scale changes

 * code complexity

 - without regular love, code grows warts over time
 - localized fixes and additions are easy & quick, but globally crappy
 - features, hacks and workarounds added without docs or tests
 - maintainers come & go,
 - ... or just go.

 * changing environment

 - Google's infrastructure (hardware & software), like anybody's, is always changing
 - properties of networks, storage
 - design assumptions no longer make sense
 - scale changes (design for 10x growth, rethink at 100x)
 - new internal services (beta or non-existent then, dependable now)
 - once-modern home-grown invented wheels might now look archaic

 * so why did it suck?

 .image oscon-dl/slow.png

 - stalling its single-threaded event loop, blocking when it shouldn't
 - maxed out at one CPU, but couldn't even use a fraction of a single CPU.

 * but why?

 - code was too complicated
 - future maintainers slowly violated unwritten rules
 - or knowingly violated them, assuming it couldn't be too bad?
 - C++ single-threaded event-based callback spaghetti
 - hard to know when/where code was running, or what "blocking" meant

 * Old code

 - served from local disk
 - single-threaded event loop
 - used sendfile(2) "for performance"
 - tried to be clever and steal the fd from the "SelectServer" sometimes to manually call sendfile
 - while also trying to do HTTP chunking,
 - ... and HTTP range requests,
 - ... and dynamic zip files,
 - lots of duplicated copy/paste code paths
 - many wrong/incomplete in different ways

 * Mitigation solution?

 - more complexity!
 - ad hoc addition of more threads
 - ... not really defined which threads did what,
 - ... or what the ownership or locking rules were,
 - no surprise: random crashes

 * Summary of 5-year old code in 2012

 - incomplete docs, tests
 - stalling event loop
 - ad-hoc threads...
 - ... stalling event loops
 - ... races
 - ... crashes
 - copy/paste code
 - ... incomplete code
 - two processes in the container
 - ... different languages

 * Environment changes

 - Remember: on start, we had to copy all payloads to local disk
 - in 2007, using local disk wasn't restricted
 - in 2007, sum(payload size) was much smaller
 - in 2012, containers get tiny % of local disk spindle time
 - ... why aren't you using the cluster file systems like everybody else?
 - ... cluster file systems own disk time on your machine, not you.
 - in 2007, it started up quickly.
 - in 2012, it started in 12-24 hours (!!!)
 - ... hope we don't crash! (oh, whoops)

 * Copying N bytes from A to B in event loop environments (node.js, this C++, etc)

 - Can *A* read?
 - Read up to _n_ bytes from A.
 - What'd we get? _rn_
 - _n_ -= _rn_
 - Store those.
 - Note we want to want to write to *B* now.
 - Can *B* write?
 - Try to write _rn_ bytes to *B*. Got _wn_.
 - buffered -= _wn_
 - while (blah blah blah) { ... blah blah blah ... }

 * Thought that sucked? Try to mix in other state / logic, and then write it in C++.

 *

 .image oscon-dl/cpp-write.png

 *

 .image oscon-dl/cpp-writeerr.png

 *

 .image oscon-dl/cpp-toggle.png

 * Or in JavaScript...

 - [[https://github.com/nodejitsu/node-http-proxy/blob/master/lib/node-http-proxy/http-proxy.js]]
 - Or Python gevent, Twisted, ...
 - Or Perl AnyEvent, etc.
 - Unreadable, discontiguous code.

 * Copying N bytes from A to B in Go:

 .code oscon-dl/copy.go /START OMIT/,/END OMIT/

 - dst is an _io.Writer_ (an interface type)
 - src is an _io.Reader_ (an interface type)
 - synchronous (blocks)
 - Go runtime deals with making blocking efficient
 - goroutines, epoll, user-space scheduler, ...
 - easier to reason about
 - fewer, easier, compatible APIs
 - concurrency is a _language_ (not _library_) feature

 * Where to start?

 - baby steps, not changing everything at once
 - only port the `payload_server`, not the `payload_fetcher`
 - read lots of old design docs
 - read lots of C++ code
 - port all command-line flags
 - serve from local disk
 - try to run integration tests
 - while (fail) { debug, port, swear, ...}

 * Notable stages

 - pass integration tests
 - run in a lightly-loaded datacenter
 - audit mode
 - ... mirror traffic to old & new servers; compare responses.
 - drop all SWIG dependencies on C++ libraries
 - ... use IP-to-geo lookup service, not static file + library

 * Notable stages

 - fetch blobs directly from blobstore, falling back to local disk on any errors,
 - relying entirely on blobstore, but `payload_fetcher` still running
 - disable `payload_fetcher` entirely; fast start-up time.

 * Using Go's Standard Library

 * Using Go's Standard Library

 - dl.google.com mostly just uses the standard library

 * Go's Standard Library

 - net/http
 - io
 - [[/pkg/net/http/#ServeContent][http.ServeContent]]

 * Hello World

 .play oscon-dl/server-hello.go

 * File Server

 .play oscon-dl/server-fs.go

 * http.ServeContent

 .image oscon-dl/servecontent.png

 * io.Reader, io.Seeker

 .image oscon-dl/readseeker.png
 .image oscon-dl/reader.png
 .image oscon-dl/seeker.png

 * http.ServeContent

 $ curl -H "Range: bytes=5-" http://localhost:8080

 .play oscon-dl/server-content.go

 * groupcache

 * groupcache

 - memcached alternative / replacement
 - [[http://github.com/golang/groupcache]]
 - _library_ that is both a client & server
 - connects to its peers
 - coordinated cache filling (no thundering herds on miss)
 - replication of hot items

 * Using groupcache

 Declare who you are and who your peers are.

 .code oscon-dl/groupcache.go /STARTINIT/,/ENDINIT/

 This peer interface is pluggable. (e.g. inside Google it's automatic.)

 * Using groupcache

 Declare a group. (group of keys, shared between group of peers)

 .code oscon-dl/groupcache.go /STARTGROUP/,/ENDGROUP/

 - group name "thumbnail" must be globally unique
 - 64 MB max per-node memory usage
 - Sink is an interface with SetString, SetBytes, SetProto

 * Using groupcache

 Request keys

 .code oscon-dl/groupcache.go /STARTUSE/,/ENDUSE/

 - might come from local memory cache
 - might come from peer's memory cache
 - might be computed locally
 - might be computed remotely
 - of all threads on all machines, only one thumbnail is made, then fanned out in-process and across-network to all waiters

 * dl.google.com and groupcache

 - Keys are "<blobref>-<chunk_offset>"
 - Chunks are 2MB
 - Chunks cached from local memory (for self-owned and hot items),
 - Chunks cached remotely, or
 - Chunks fetched from Google storage systems

 * dl.google.com interface composition

 .code oscon-dl/sizereaderat.go /START_1/,/END_1/

 * io.SectionReader

 .image oscon-dl/sectionreader.png

 * chunk-aligned ReaderAt

 .code oscon-dl/chunkaligned.go /START_DOC/,/END_DOC/

 - Caller can do ReadAt calls of any size and any offset
 - `r` only sees ReadAt calls on 2MB offset boundaries, of size 2MB (unless final chunk)

 * Composing all this

 - http.ServeContent wants a ReadSeeker
 - io.SectionReader(ReaderAt + size) -> ReadSeeker
 - Download server payloads are a type "content" with Size and ReadAt, implemented with calls to groupcache.
 - Wrapped in a chunk-aligned ReaderAt
 - ... concatenate parts of with MultiReaderAt

 .play oscon-dl/server-compose.go /START/,/END/

 * Things we get for free from net/http

 - Last-Modified
 - ETag
 - Range requests (w/ its paranoia)
 - HTTP/1.1 chunking, etc.
 - ... old server tried to do all this itself
 - ... incorrectly
 - ... incompletely
 - ... in a dozen different copies

 * Overall simplification

 - deleted C++ payload_server & Python payload_fetcher
 - 39 files (14,032 lines) deleted
 - one binary now (just Go `payload_server`, no `payload_fetcher`)
 - starts immediately, no huge start-up delay
 - server is just "business logic" now, not HTTP logic

 * From this...

 .image oscon-dl/before.png

 * ... to this.

 .image oscon-dl/after.png

 * And from page and pages of this...

 .image oscon-dl/cpp-writeerr.png

 * ... to this

 .image oscon-dl/after-code.png

 * So how does it compare to C++?

 - less than half the code
 - more testable, tests
 - same CPU usage for same bandwidth
 - ... but can do much more bandwidth
 - ... and more than one CPU
 - less memory (!)
 - no disk
 - starts up instantly (not 24 hours)
 - doesn't crash
 - handles hot download spikes

 * Could we have just rewritten it in new C++?

 - Sure.
 - But why?

 * Could I have just fixed the bugs in the C++ version?

 - Sure, if I could find them.
 - Then have to own it ("You touched it last...")
 - And I already maintain an HTTP server library. Don't want to maintain a bad one too.
 - It's much more maintainable. (and 3+ other people now do)

 * How much of dl.google.com is closed-source?

 - Very little.
 - ... ACL policies
 - ... RPCs to Google storage services.
 - Most is open source:
 - ... code.google.com/p/google-api-go-client/storage/v1beta1
 - ... net/http and rest of Go standard library
 - ... `groupcache`, now open source ([[https://github.com/golang/groupcache][github.com/golang/groupcache]])
	dl.google.com: Powered by Go
	10:00 26 Jul 2013
	Tags: download, oscon, port, c++, google, groupcache, caching

	Brad Fitzpatrick
	Gopher, Google
	@bradfitz
	bradfitz@golang.org
	http://bradfitz.com/
	https://go.dev/
	https://github.com/golang/groupcache/

	* Overview / tl;dw:

	- dl.google.com serves Google downloads
	- Was written in C++
	- Now in Go
	- Now much better
	- Extensive, idiomatic use of Go's standard library
	- ... which is all open source
	- composition of interfaces is fun
	- _groupcache_, now Open Source, handles group-aware caching and cache-filling

	* too long...

	* me

	- Brad Fitzpatrick
	- bradfitz.com
	- @bradfitz
	- past: LiveJournal, memcached, OpenID, Perl stuff...
	- nowadays: Go, Go, Camlistore, Go, anything & everything written in Go ...

	* I love Go

	- this isn't a talk about Go, sorry.
	- but check it out.
	- simple, powerful, fast, liberating, refreshing
	- great mix of low- and high- level
	- light on the page
	- static binaries, easy to deploy
	- not perfect, but my favorite language yet

	* dl.google.com

	* dl.google.com

	- HTTP download server
	- serves Chrome, Android SDK, Earth, much more
	- Some huge, some tiny (e.g. WebGL white/blacklist JSON)
	- behind an edge cache; still high traffic
	- lots of datacenters, lots of bandwidth

	* Why port?

	* reason 0

	$ apt-get update

	.image oscon-dl/slow.png

	- embarrassing
	- Google can't serve a 1,238 byte file?
	- Hanging?
	- 207 B/s?!

	* Yeah, embarrassing, for years...

	.image oscon-dl/crbug.png

	* ... which led to:

	- complaining on corp G+. Me: "We suck. This sucks."
	- primary SRE owning it: "Yup, it sucks. And is unmaintained."
	- "I'll rewrite it for you!"
	- "Hah."
	- "No, serious. That's kinda our job. But I get to do it in Go."
	- (Go team's loan-out-a-Gopher program...)

	* How hard can this be?

	* dl.google.com: few tricks

	each "payload" (~URL) described by a protobuf:

	- paths/patterns for its URL(s)
	- go-live reveal date
	- ACLs (geo, network, user, user type, ...)
	- dynamic zip files
	- custom HTTP headers
	- custom caching

	* dl.google.com: how it was

	.image oscon-dl/before.png

	* Aside: Why good code goes bad

	* Why good code goes bad

	- Premise: people don't suck
	- Premise: code was once beautiful
	- code tends towards complexity (gets worse)
	- environment changes
	- scale changes

	* code complexity

	- without regular love, code grows warts over time
	- localized fixes and additions are easy & quick, but globally crappy
	- features, hacks and workarounds added without docs or tests
	- maintainers come & go,
	- ... or just go.

	* changing environment

	- Google's infrastructure (hardware & software), like anybody's, is always changing
	- properties of networks, storage
	- design assumptions no longer make sense
	- scale changes (design for 10x growth, rethink at 100x)
	- new internal services (beta or non-existent then, dependable now)
	- once-modern home-grown invented wheels might now look archaic

	* so why did it suck?

	.image oscon-dl/slow.png

	- stalling its single-threaded event loop, blocking when it shouldn't
	- maxed out at one CPU, but couldn't even use a fraction of a single CPU.

	* but why?

	- code was too complicated
	- future maintainers slowly violated unwritten rules
	- or knowingly violated them, assuming it couldn't be too bad?
	- C++ single-threaded event-based callback spaghetti
	- hard to know when/where code was running, or what "blocking" meant

	* Old code

	- served from local disk
	- single-threaded event loop
	- used sendfile(2) "for performance"
	- tried to be clever and steal the fd from the "SelectServer" sometimes to manually call sendfile
	- while also trying to do HTTP chunking,
	- ... and HTTP range requests,
	- ... and dynamic zip files,
	- lots of duplicated copy/paste code paths
	- many wrong/incomplete in different ways

	* Mitigation solution?

	- more complexity!
	- ad hoc addition of more threads
	- ... not really defined which threads did what,
	- ... or what the ownership or locking rules were,
	- no surprise: random crashes

	* Summary of 5-year old code in 2012

	- incomplete docs, tests
	- stalling event loop
	- ad-hoc threads...
	- ... stalling event loops
	- ... races
	- ... crashes
	- copy/paste code
	- ... incomplete code
	- two processes in the container
	- ... different languages

	* Environment changes

	- Remember: on start, we had to copy all payloads to local disk
	- in 2007, using local disk wasn't restricted
	- in 2007, sum(payload size) was much smaller
	- in 2012, containers get tiny % of local disk spindle time
	- ... why aren't you using the cluster file systems like everybody else?
	- ... cluster file systems own disk time on your machine, not you.
	- in 2007, it started up quickly.
	- in 2012, it started in 12-24 hours (!!!)
	- ... hope we don't crash! (oh, whoops)

	* Copying N bytes from A to B in event loop environments (node.js, this C++, etc)

	- Can A read?
	- Read up to _n_ bytes from A.
	- What'd we get? _rn_
	- _n_ -= _rn_
	- Store those.
	- Note we want to want to write to B now.
	- Can B write?
	- Try to write _rn_ bytes to B. Got _wn_.
	- buffered -= _wn_
	- while (blah blah blah) { ... blah blah blah ... }

	* Thought that sucked? Try to mix in other state / logic, and then write it in C++.

	*

	.image oscon-dl/cpp-write.png

	*

	.image oscon-dl/cpp-writeerr.png

	*

	.image oscon-dl/cpp-toggle.png

	* Or in JavaScript...

	- [[https://github.com/nodejitsu/node-http-proxy/blob/master/lib/node-http-proxy/http-proxy.js]]
	- Or Python gevent, Twisted, ...
	- Or Perl AnyEvent, etc.
	- Unreadable, discontiguous code.

	* Copying N bytes from A to B in Go:

	.code oscon-dl/copy.go /START OMIT/,/END OMIT/

	- dst is an _io.Writer_ (an interface type)
	- src is an _io.Reader_ (an interface type)
	- synchronous (blocks)
	- Go runtime deals with making blocking efficient
	- goroutines, epoll, user-space scheduler, ...
	- easier to reason about
	- fewer, easier, compatible APIs
	- concurrency is a _language_ (not _library_) feature

	* Where to start?

	- baby steps, not changing everything at once
	- only port the `payload_server`, not the `payload_fetcher`
	- read lots of old design docs
	- read lots of C++ code
	- port all command-line flags
	- serve from local disk
	- try to run integration tests
	- while (fail) { debug, port, swear, ...}

	* Notable stages

	- pass integration tests
	- run in a lightly-loaded datacenter
	- audit mode
	- ... mirror traffic to old & new servers; compare responses.
	- drop all SWIG dependencies on C++ libraries
	- ... use IP-to-geo lookup service, not static file + library

	* Notable stages

	- fetch blobs directly from blobstore, falling back to local disk on any errors,
	- relying entirely on blobstore, but `payload_fetcher` still running
	- disable `payload_fetcher` entirely; fast start-up time.

	* Using Go's Standard Library

	* Using Go's Standard Library

	- dl.google.com mostly just uses the standard library

	* Go's Standard Library

	- net/http
	- io
	- [[/pkg/net/http/#ServeContent][http.ServeContent]]

	* Hello World

	.play oscon-dl/server-hello.go

	* File Server

	.play oscon-dl/server-fs.go

	* http.ServeContent

	.image oscon-dl/servecontent.png

	* io.Reader, io.Seeker

	.image oscon-dl/readseeker.png
	.image oscon-dl/reader.png
	.image oscon-dl/seeker.png

	* http.ServeContent

	$ curl -H "Range: bytes=5-" http://localhost:8080

	.play oscon-dl/server-content.go

	* groupcache

	* groupcache

	- memcached alternative / replacement
	- [[http://github.com/golang/groupcache]]
	- _library_ that is both a client & server
	- connects to its peers
	- coordinated cache filling (no thundering herds on miss)
	- replication of hot items

	* Using groupcache

	Declare who you are and who your peers are.

	.code oscon-dl/groupcache.go /STARTINIT/,/ENDINIT/

	This peer interface is pluggable. (e.g. inside Google it's automatic.)

	* Using groupcache

	Declare a group. (group of keys, shared between group of peers)

	.code oscon-dl/groupcache.go /STARTGROUP/,/ENDGROUP/

	- group name "thumbnail" must be globally unique
	- 64 MB max per-node memory usage
	- Sink is an interface with SetString, SetBytes, SetProto

	* Using groupcache

	Request keys

	.code oscon-dl/groupcache.go /STARTUSE/,/ENDUSE/

	- might come from local memory cache
	- might come from peer's memory cache
	- might be computed locally
	- might be computed remotely
	- of all threads on all machines, only one thumbnail is made, then fanned out in-process and across-network to all waiters

	* dl.google.com and groupcache

	- Keys are "<blobref>-<chunk_offset>"
	- Chunks are 2MB
	- Chunks cached from local memory (for self-owned and hot items),
	- Chunks cached remotely, or
	- Chunks fetched from Google storage systems

	* dl.google.com interface composition

	.code oscon-dl/sizereaderat.go /START_1/,/END_1/

	* io.SectionReader

	.image oscon-dl/sectionreader.png

	* chunk-aligned ReaderAt

	.code oscon-dl/chunkaligned.go /START_DOC/,/END_DOC/

	- Caller can do ReadAt calls of any size and any offset
	- `r` only sees ReadAt calls on 2MB offset boundaries, of size 2MB (unless final chunk)

	* Composing all this

	- http.ServeContent wants a ReadSeeker
	- io.SectionReader(ReaderAt + size) -> ReadSeeker
	- Download server payloads are a type "content" with Size and ReadAt, implemented with calls to groupcache.
	- Wrapped in a chunk-aligned ReaderAt
	- ... concatenate parts of with MultiReaderAt

	.play oscon-dl/server-compose.go /START/,/END/

	* Things we get for free from net/http

	- Last-Modified
	- ETag
	- Range requests (w/ its paranoia)
	- HTTP/1.1 chunking, etc.
	- ... old server tried to do all this itself
	- ... incorrectly
	- ... incompletely
	- ... in a dozen different copies

	* Overall simplification

	- deleted C++ payload_server & Python payload_fetcher
	- 39 files (14,032 lines) deleted
	- one binary now (just Go `payload_server`, no `payload_fetcher`)
	- starts immediately, no huge start-up delay
	- server is just "business logic" now, not HTTP logic

	* From this...

	.image oscon-dl/before.png

	* ... to this.

	.image oscon-dl/after.png

	* And from page and pages of this...

	.image oscon-dl/cpp-writeerr.png

	* ... to this

	.image oscon-dl/after-code.png

	* So how does it compare to C++?

	- less than half the code
	- more testable, tests
	- same CPU usage for same bandwidth
	- ... but can do much more bandwidth
	- ... and more than one CPU
	- less memory (!)
	- no disk
	- starts up instantly (not 24 hours)
	- doesn't crash
	- handles hot download spikes

	* Could we have just rewritten it in new C++?

	- Sure.
	- But why?

	* Could I have just fixed the bugs in the C++ version?

	- Sure, if I could find them.
	- Then have to own it ("You touched it last...")
	- And I already maintain an HTTP server library. Don't want to maintain a bad one too.
	- It's much more maintainable. (and 3+ other people now do)

	* How much of dl.google.com is closed-source?

	- Very little.
	- ... ACL policies
	- ... RPCs to Google storage services.
	- Most is open source:
	- ... code.google.com/p/google-api-go-client/storage/v1beta1
	- ... net/http and rest of Go standard library
	- ... `groupcache`, now open source ([[https://github.com/golang/groupcache][github.com/golang/groupcache]])