| dl.google.com: Powered by Go |
| 10:00 26 Jul 2013 |
| Tags: download, oscon, port, c++, google, groupcache, caching |
| |
| Brad Fitzpatrick |
| Gopher, Google |
| @bradfitz |
| bradfitz@golang.org |
| http://bradfitz.com/ |
| http://golang.org/ |
| https://github.com/golang/groupcache/ |
| |
| * Overview / tl;dw: |
| |
| - dl.google.com serves Google downloads |
| - Was written in C++ |
| - Now in Go |
| - Now much better |
| - Extensive, idiomatic use of Go's standard library |
| - ... which is all open source |
| - composition of interfaces is fun |
| - _groupcache_, now Open Source, handles group-aware caching and cache-filling |
| |
| * too long... |
| |
| * me |
| |
| - Brad Fitzpatrick |
| - bradfitz.com |
| - @bradfitz |
| - past: LiveJournal, memcached, OpenID, Perl stuff... |
| - nowadays: Go, Go, Camlistore, Go, anything & everything written in Go ... |
| |
| * I love Go |
| |
| - this isn't a talk about Go, sorry. |
| - but check it out. |
| - simple, powerful, fast, liberating, refreshing |
| - great mix of low- and high- level |
| - light on the page |
| - static binaries, easy to deploy |
| - not perfect, but my favorite language yet |
| |
| * dl.google.com |
| |
| * dl.google.com |
| |
| - HTTP download server |
| - serves Chrome, Android SDK, Earth, much more |
| - Some huge, some tiny (e.g. WebGL white/blacklist JSON) |
| - behind an edge cache; still high traffic |
| - lots of datacenters, lots of bandwidth |
| |
| * Why port? |
| |
| * reason 0 |
| |
| $ apt-get update |
| |
| .image oscon-dl/slow.png |
| |
| - embarrassing |
| - Google can't serve a 1,238 byte file? |
| - Hanging? |
| - 207 B/s?! |
| |
| * Yeah, embarrassing, for years... |
| |
| .image oscon-dl/crbug.png |
| |
| * ... which led to: |
| |
| - complaining on corp G+. Me: "We suck. This sucks." |
| - primary SRE owning it: "Yup, it sucks. And is unmaintained." |
| - "I'll rewrite it for you!" |
| - "Hah." |
| - "No, serious. That's kinda our job. But I get to do it in Go." |
| - (Go team's loan-out-a-Gopher program...) |
| |
| * How hard can this be? |
| |
| * dl.google.com: few tricks |
| |
| each "payload" (~URL) described by a protobuf: |
| |
| - paths/patterns for its URL(s) |
| - go-live reveal date |
| - ACLs (geo, network, user, user type, ...) |
| - dynamic zip files |
| - custom HTTP headers |
| - custom caching |
| |
| * dl.google.com: how it was |
| |
| .image oscon-dl/before.png |
| |
| * Aside: Why good code goes bad |
| |
| * Why good code goes bad |
| |
| - Premise: people don't suck |
| - Premise: code was once beautiful |
| - code tends towards complexity (gets worse) |
| - environment changes |
| - scale changes |
| |
| * code complexity |
| |
| - without regular love, code grows warts over time |
| - localized fixes and additions are easy & quick, but globally crappy |
| - features, hacks and workarounds added without docs or tests |
| - maintainers come & go, |
| - ... or just go. |
| |
| * changing environment |
| |
| - Google's infrastructure (hardware & software), like anybody's, is always changing |
| - properties of networks, storage |
| - design assumptions no longer make sense |
| - scale changes (design for 10x growth, rethink at 100x) |
| - new internal services (beta or non-existent then, dependable now) |
| - once-modern home-grown invented wheels might now look archaic |
| |
| * so why did it suck? |
| |
| .image oscon-dl/slow.png |
| |
| - stalling its single-threaded event loop, blocking when it shouldn't |
| - maxed out at one CPU, but couldn't even use a fraction of a single CPU. |
| |
| * but why? |
| |
| - code was too complicated |
| - future maintainers slowly violated unwritten rules |
| - or knowingly violated them, assuming it couldn't be too bad? |
| - C++ single-threaded event-based callback spaghetti |
| - hard to know when/where code was running, or what "blocking" meant |
| |
| * Old code |
| |
| - served from local disk |
| - single-threaded event loop |
| - used sendfile(2) "for performance" |
| - tried to be clever and steal the fd from the "SelectServer" sometimes to manually call sendfile |
| - while also trying to do HTTP chunking, |
| - ... and HTTP range requests, |
| - ... and dynamic zip files, |
| - lots of duplicated copy/paste code paths |
| - many wrong/incomplete in different ways |
| |
| * Mitigation solution? |
| |
| - more complexity! |
| - ad hoc addition of more threads |
| - ... not really defined which threads did what, |
| - ... or what the ownership or locking rules were, |
| - no surprise: random crashes |
| |
| * Summary of 5-year old code in 2012 |
| |
| - incomplete docs, tests |
| - stalling event loop |
| - ad-hoc threads... |
| - ... stalling event loops |
| - ... races |
| - ... crashes |
| - copy/paste code |
| - ... incomplete code |
| - two processes in the container |
| - ... different languages |
| |
| * Environment changes |
| |
| - Remember: on start, we had to copy all payloads to local disk |
| - in 2007, using local disk wasn't restricted |
| - in 2007, sum(payload size) was much smaller |
| - in 2012, containers get tiny % of local disk spindle time |
| - ... why aren't you using the cluster file systems like everybody else? |
| - ... cluster file systems own disk time on your machine, not you. |
| - in 2007, it started up quickly. |
| - in 2012, it started in 12-24 hours (!!!) |
| - ... hope we don't crash! (oh, whoops) |
| |
| * Copying N bytes from A to B in event loop environments (node.js, this C++, etc) |
| |
| - Can *A* read? |
| - Read up to _n_ bytes from A. |
| - What'd we get? _rn_ |
| - _n_ -= _rn_ |
| - Store those. |
| - Note we want to want to write to *B* now. |
| - Can *B* write? |
| - Try to write _rn_ bytes to *B*. Got _wn_. |
| - buffered -= _wn_ |
| - while (blah blah blah) { ... blah blah blah ... } |
| |
| * Thought that sucked? Try to mix in other state / logic, and then write it in C++. |
| |
| * |
| |
| .image oscon-dl/cpp-write.png |
| |
| * |
| |
| .image oscon-dl/cpp-writeerr.png |
| |
| * |
| |
| .image oscon-dl/cpp-toggle.png |
| |
| * Or in JavaScript... |
| |
| - [[https://github.com/nodejitsu/node-http-proxy/blob/master/lib/node-http-proxy/http-proxy.js]] |
| - Or Python gevent, Twisted, ... |
| - Or Perl AnyEvent, etc. |
| - Unreadable, discontiguous code. |
| |
| * Copying N bytes from A to B in Go: |
| |
| .code oscon-dl/copy.go /START OMIT/,/END OMIT/ |
| |
| - dst is an _io.Writer_ (an interface type) |
| - src is an _io.Reader_ (an interface type) |
| - synchronous (blocks) |
| - Go runtime deals with making blocking efficient |
| - goroutines, epoll, user-space scheduler, ... |
| - easier to reason about |
| - fewer, easier, compatible APIs |
| - concurrency is a _language_ (not _library_) feature |
| |
| * Where to start? |
| |
| - baby steps, not changing everything at once |
| - only port the `payload_server`, not the `payload_fetcher` |
| - read lots of old design docs |
| - read lots of C++ code |
| - port all command-line flags |
| - serve from local disk |
| - try to run integration tests |
| - while (fail) { debug, port, swear, ...} |
| |
| * Notable stages |
| |
| - pass integration tests |
| - run in a lightly-loaded datacenter |
| - audit mode |
| - ... mirror traffic to old & new servers; compare responses. |
| - drop all SWIG dependencies on C++ libraries |
| - ... use IP-to-geo lookup service, not static file + library |
| |
| * Notable stages |
| |
| - fetch blobs directly from blobstore, falling back to local disk on any errors, |
| - relying entirely on blobstore, but `payload_fetcher` still running |
| - disable `payload_fetcher` entirely; fast start-up time. |
| |
| * Using Go's Standard Library |
| |
| * Using Go's Standard Library |
| |
| - dl.google.com mostly just uses the standard library |
| |
| * Go's Standard Library |
| |
| - net/http |
| - io |
| - [[http://golang.org/pkg/net/http/#ServeContent][http.ServeContent]] |
| |
| * Hello World |
| |
| .play oscon-dl/server-hello.go |
| |
| * File Server |
| |
| .play oscon-dl/server-fs.go |
| |
| * http.ServeContent |
| |
| .image oscon-dl/servecontent.png |
| |
| * io.Reader, io.Seeker |
| |
| .image oscon-dl/readseeker.png |
| .image oscon-dl/reader.png |
| .image oscon-dl/seeker.png |
| |
| * http.ServeContent |
| |
| $ curl -H "Range: bytes=5-" http://localhost:8080 |
| |
| .play oscon-dl/server-content.go |
| |
| * groupcache |
| |
| * groupcache |
| |
| - memcached alternative / replacement |
| - [[http://github.com/golang/groupcache]] |
| - _library_ that is both a client & server |
| - connects to its peers |
| - coordinated cache filling (no thundering herds on miss) |
| - replication of hot items |
| |
| * Using groupcache |
| |
| Declare who you are and who your peers are. |
| |
| .code oscon-dl/groupcache.go /STARTINIT/,/ENDINIT/ |
| |
| This peer interface is pluggable. (e.g. inside Google it's automatic.) |
| |
| * Using groupcache |
| |
| Declare a group. (group of keys, shared between group of peers) |
| |
| .code oscon-dl/groupcache.go /STARTGROUP/,/ENDGROUP/ |
| |
| - group name "thumbnail" must be globally unique |
| - 64 MB max per-node memory usage |
| - Sink is an interface with SetString, SetBytes, SetProto |
| |
| * Using groupcache |
| |
| Request keys |
| |
| .code oscon-dl/groupcache.go /STARTUSE/,/ENDUSE/ |
| |
| - might come from local memory cache |
| - might come from peer's memory cache |
| - might be computed locally |
| - might be computed remotely |
| - of all threads on all machines, only one thumbnail is made, then fanned out in-process and across-network to all waiters |
| |
| * dl.google.com and groupcache |
| |
| - Keys are "<blobref>-<chunk_offset>" |
| - Chunks are 2MB |
| - Chunks cached from local memory (for self-owned and hot items), |
| - Chunks cached remotely, or |
| - Chunks fetched from Google storage systems |
| |
| * dl.google.com interface composition |
| |
| .code oscon-dl/sizereaderat.go /START_1/,/END_1/ |
| |
| * io.SectionReader |
| |
| .image oscon-dl/sectionreader.png |
| |
| * chunk-aligned ReaderAt |
| |
| .code oscon-dl/chunkaligned.go /START_DOC/,/END_DOC/ |
| |
| - Caller can do ReadAt calls of any size and any offset |
| - `r` only sees ReadAt calls on 2MB offset boundaries, of size 2MB (unless final chunk) |
| |
| * Composing all this |
| |
| - http.ServeContent wants a ReadSeeker |
| - io.SectionReader(ReaderAt + size) -> ReadSeeker |
| - Download server payloads are a type "content" with Size and ReadAt, implemented with calls to groupcache. |
| - Wrapped in a chunk-aligned ReaderAt |
| - ... concatenate parts of with MultiReaderAt |
| |
| .play oscon-dl/server-compose.go /START/,/END/ |
| |
| * Things we get for free from net/http |
| |
| - Last-Modified |
| - ETag |
| - Range requests (w/ its paranoia) |
| - HTTP/1.1 chunking, etc. |
| - ... old server tried to do all this itself |
| - ... incorrectly |
| - ... incompletely |
| - ... in a dozen different copies |
| |
| * Overall simplification |
| |
| - deleted C++ payload_server & Python payload_fetcher |
| - 39 files (14,032 lines) deleted |
| - one binary now (just Go `payload_server`, no `payload_fetcher`) |
| - starts immediately, no huge start-up delay |
| - server is just "business logic" now, not HTTP logic |
| |
| * From this... |
| |
| .image oscon-dl/before.png |
| |
| * ... to this. |
| |
| .image oscon-dl/after.png |
| |
| * And from page and pages of this... |
| |
| .image oscon-dl/cpp-writeerr.png |
| |
| * ... to this |
| |
| .image oscon-dl/after-code.png |
| |
| * So how does it compare to C++? |
| |
| - less than half the code |
| - more testable, tests |
| - same CPU usage for same bandwidth |
| - ... but can do much more bandwidth |
| - ... and more than one CPU |
| - less memory (!) |
| - no disk |
| - starts up instantly (not 24 hours) |
| - doesn't crash |
| - handles hot download spikes |
| |
| * Could we have just rewritten it in new C++? |
| |
| - Sure. |
| - But why? |
| |
| * Could I have just fixed the bugs in the C++ version? |
| |
| - Sure, if I could find them. |
| - Then have to own it ("You touched it last...") |
| - And I already maintain an HTTP server library. Don't want to maintain a bad one too. |
| - It's much more maintainable. (and 3+ other people now do) |
| |
| * How much of dl.google.com is closed-source? |
| |
| - Very little. |
| - ... ACL policies |
| - ... RPCs to Google storage services. |
| - Most is open source: |
| - ... code.google.com/p/google-api-go-client/storage/v1beta1 |
| - ... net/http and rest of Go standard library |
| - ... `groupcache`, now open source ([[https://github.com/golang/groupcache][github.com/golang/groupcache]]) |
| |