blob: d4a8a2e6a061bdc4e71aa791ee9d33401779132c [file] [log] [blame] Powered by Go
10:00 26 Jul 2013
Tags: download, oscon, port, c++, google, groupcache, caching
Brad Fitzpatrick
Gopher, Google
* Overview / tl;dw:
- serves Google downloads
- Was written in C++
- Now in Go
- Now much better
- Extensive, idiomatic use of Go's standard library
- ... which is all open source
- composition of interfaces is fun
- _groupcache_, now Open Source, handles group-aware caching and cache-filling
* too long...
* me
- Brad Fitzpatrick
- @bradfitz
- past: LiveJournal, memcached, OpenID, Perl stuff...
- nowadays: Go, Go, Camlistore, Go, anything & everything written in Go ...
* I love Go
- this isn't a talk about Go, sorry.
- but check it out.
- simple, powerful, fast, liberating, refreshing
- great mix of low- and high- level
- light on the page
- static binaries, easy to deploy
- not perfect, but my favorite language yet
- HTTP download server
- serves Chrome, Android SDK, Earth, much more
- Some huge, some tiny (e.g. WebGL white/blacklist JSON)
- behind an edge cache; still high traffic
- lots of datacenters, lots of bandwidth
* Why port?
* reason 0
$ apt-get update
.image oscon-dl/slow.png
- embarassing
- Google can't serve a 1,238 byte file?
- Hanging?
- 207 B/s?!
* Yeah, embarassing, for years...
.image oscon-dl/crbug.png
* ... which led to:
- complaining on corp G+. Me: "We suck. This sucks."
- primary SRE owning it: "Yup, it sucks. And is unmaintained."
- "I'll rewrite it for you!"
- "Hah."
- "No, serious. That's kinda our job. But I get to do it in Go."
- (Go team's loan-out-a-Gopher program...)
* How hard can this be?
* few tricks
each "payload" (~URL) described by a protobuf:
- paths/patterns for its URL(s)
- go-live reveal date
- ACLs (geo, network, user, user type, ...)
- dynamic zip files
- custom HTTP headers
- custom caching
* how it was
.image oscon-dl/before.png
* Aside: Why good code goes bad
* Why good code goes bad
- Premise: people don't suck
- Premise: code was once beautiful
- code tends towards complexity (gets worse)
- environment changes
- scale changes
* code complexity
- without regular love, code grows warts over time
- localized fixes and additions are easy & quick, but globally crappy
- features, hacks and workarounds added without docs or tests
- maintainers come & go,
- ... or just go.
* changing environment
- Google's infrastructure (hardware & software), like anybody's, is always changing
- properties of networks, storage
- design assumptions no longer make sense
- scale changes (design for 10x growth, rethink at 100x)
- new internal services (beta or non-existent then, dependable now)
- once-modern home-grown invented wheels might now look archaic
* so why did it suck?
.image oscon-dl/slow.png
- stalling its single-threaded event loop, blocking when it shouldn't
- maxed out at one CPU, but couldn't even use a fraction of a single CPU.
* but why?
- code was too complicated
- future maintainers slowly violated unwritten rules
- or knowingly violated them, assuming it couldn't be too bad?
- C++ single-threaded event-based callback spaghetti
- hard to know when/where code was running, or what "blocking" meant
* Old code
- served from local disk
- single-threaded event loop
- used sendfile(2) "for performance"
- tried to be clever and steal the fd from the "SelectServer" sometimes to manually call sendfile
- while also trying to do HTTP chunking,
- ... and HTTP range requests,
- ... and dynamic zip files,
- lots of duplicated copy/paste code paths
- many wrong/incomplete in different ways
* Mitigation solution?
- more complexity!
- ad hoc addition of more threads
- ... not really defined which threads did what,
- ... or what the ownership or locking rules were,
- no surprise: random crashes
* Summary of 5-year old code in 2012
- incomplete docs, tests
- stalling event loop
- ad-hoc threads...
- ... stalling event loops
- ... races
- ... crashes
- copy/paste code
- ... incomplete code
- two processes in the container
- ... different languages
* Environment changes
- Remember: on start, we had to copy all payloads to local disk
- in 2007, using local disk wasn't restricted
- in 2007, sum(payload size) was much smaller
- in 2012, containers get tiny % of local disk spindle time
- ... why aren't you using the cluster file systems like everybody else?
- ... cluster file systems own disk time on your machine, not you.
- in 2007, it started up quickly.
- in 2012, it started in 12-24 hours (!!!)
- ... hope we don't crash! (oh, whoops)
* Copying N bytes from A to B in event loop environments (node.js, this C++, etc)
- Can *A* read?
- Read up to _n_ bytes from A.
- What'd we get? _rn_
- _n_ -= _rn_
- Store those.
- Note we want to want to write to *B* now.
- Can *B* write?
- Try to write _rn_ bytes to *B*. Got _wn_.
- buffered -= _wn_
- while (blah blah blah) { ... blah blah blah ... }
* Thought that sucked? Try to mix in other state / logic, and then write it in C++.
.image oscon-dl/cpp-write.png
.image oscon-dl/cpp-writeerr.png
.image oscon-dl/cpp-toggle.png
* Or in JavaScript...
- [[]]
- Or Python gevent, Twisted, ...
- Or Perl AnyEvent, etc.
- Unreadable, discontiguous code.
* Copying N bytes from A to B in Go:
.code oscon-dl/copy.go /START OMIT/,/END OMIT/
- dst is an _io.Writer_ (an interface type)
- src is an _io.Reader_ (an interface type)
- synchronous (blocks)
- Go runtime deals with making blocking efficient
- goroutines, epoll, user-space scheduler, ...
- easier to reason about
- fewer, easier, compatible APIs
- concurrency is a _language_ (not _library_) feature
* Where to start?
- baby steps, not changing everything at once
- only port the `payload_server`, not the `payload_fetcher`
- read lots of old design docs
- read lots of C++ code
- port all command-line flags
- serve from local disk
- try to run integration tests
- while (fail) { debug, port, swear, ...}
* Notable stages
- pass integration tests
- run in a lightly-loaded datacenter
- audit mode
- ... mirror traffic to old & new servers; compare responses.
- drop all SWIG dependencies on C++ libraries
- ... use IP-to-geo lookup service, not static file + library
* Notable stages
- fetch blobs directly from blobstore, falling back to local disk on any errors,
- relying entirely on blobstore, but `payload_fetcher` still running
- disable `payload_fetcher` entirely; fast start-up time.
* Using Go's Standard Library
* Using Go's Standard Library
- mostly just uses the standard library
* Go's Standard Library
- net/http
- io
- [[][http.ServeContent]]
* Hello World
.play oscon-dl/server-hello.go
* File Server
.play oscon-dl/server-fs.go
* http.ServeContent
.image oscon-dl/servecontent.png
* io.Reader, io.Seeker
.image oscon-dl/readseeker.png
.image oscon-dl/reader.png
.image oscon-dl/seeker.png
* http.ServeContent
$ curl -H "Range: bytes=5-" http://localhost:8080
.play oscon-dl/server-content.go
* groupcache
* groupcache
- memcached alternative / replacement
- [[]]
- _library_ that is both a client & server
- connects to its peers
- coordinated cache filling (no thundering herds on miss)
- replication of hot items
* Using groupcache
Declare who you are and who your peers are.
.code oscon-dl/groupcache.go /STARTINIT/,/ENDINIT/
This peer interface is pluggable. (e.g. inside Google it's automatic.)
* Using groupcache
Declare a group. (group of keys, shared between group of peers)
.code oscon-dl/groupcache.go /STARTGROUP/,/ENDGROUP/
- group name "thumbnail" must be globally unique
- 64 MB max per-node memory usage
- Sink is an interface with SetString, SetBytes, SetProto
* Using groupcache
Request keys
.code oscon-dl/groupcache.go /STARTUSE/,/ENDUSE/
- might come from local memory cache
- might come from peer's memory cache
- might be computed locally
- might be computed remotely
- of all threads on all machines, only one thumbnail is made, then fanned out in-process and across-network to all waiters
* and groupcache
- Keys are "<blobref>-<chunk_offset>"
- Chunks are 2MB
- Chunks cached from local memory (for self-owned and hot items),
- Chunks cached remotely, or
- Chunks fetched from Google storage systems
* interface composition
.code oscon-dl/sizereaderat.go /START_1/,/END_1/
* io.SectionReader
.image oscon-dl/sectionreader.png
* chunk-aligned ReaderAt
.code oscon-dl/chunkaligned.go /START_DOC/,/END_DOC/
- Caller can do ReadAt calls of any size and any offset
- `r` only sees ReadAt calls on 2MB offset boundaries, of size 2MB (unless final chunk)
* Composing all this
- http.ServeContent wants a ReadSeeker
- io.SectionReader(ReaderAt + size) -> ReadSeeker
- Download server payloads are a type "content" with Size and ReadAt, implemented with calls to groupcache.
- Wrapped in a chunk-aligned ReaderAt
- ... concatenate parts of with MultiReaderAt
.play oscon-dl/server-compose.go /START/,/END/
* Things we get for free from net/http
- Last-Modified
- ETag
- Range requests (w/ its paranoia)
- HTTP/1.1 chunking, etc.
- ... old server tried to do all this itself
- ... incorrectly
- ... incompletely
- ... in a dozen different copies
* Overall simplification
- deleted C++ payload_server & Python payload_fetcher
- 39 files (14,032 lines) deleted
- one binary now (just Go `payload_server`, no `payload_fetcher`)
- starts immediately, no huge start-up delay
- server is just "business logic" now, not HTTP logic
* From this...
.image oscon-dl/before.png
* ... to this.
.image oscon-dl/after.png
* And from page and pages of this...
.image oscon-dl/cpp-writeerr.png
* ... to this
.image oscon-dl/after-code.png
* So how does it compare to C++?
- less than half the code
- more testable, tests
- same CPU usage for same bandwidth
- ... but can do much more bandwidth
- ... and more than one CPU
- less memory (!)
- no disk
- starts up instantly (not 24 hours)
- doesn't crash
- handles hot download spikes
* Could we have just rewritten it in new C++?
- Sure.
- But why?
* Could I have just fixed the bugs in the C++ version?
- Sure, if I could find them.
- Then have to own it ("You touched it last...")
- And I already maintain an HTTP server library. Don't want to maintain a bad one too.
- It's much more maintainable. (and 3+ other people now do)
* How much of is closed-source?
- Very little.
- ... ACL policies
- ... RPCs to Google storage services.
- Most is open source:
- ...
- ... net/http and rest of Go standard library
- ... `groupcache`, now open source ([[][]])