content: add "Go GC: Prioritizing low latency and simplicity" post

Change-Id: Ib4003d5c5edd803e5b508296ea96fa5549da1dd8
Reviewed-on: https://go-review.googlesource.com/14095
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
diff --git a/content/go15gc.article b/content/go15gc.article
new file mode 100644
index 0000000..56f7568
--- /dev/null
+++ b/content/go15gc.article
@@ -0,0 +1,114 @@
+Go GC: Prioritizing low latency and simplicity
+31 Aug 2015
+
+Richard Hudson
+rlh@golang.org
+
+* The Setup
+
+Go is building a garbage collector (GC) not only for 2015 but for 2025 and
+beyond: A GC that supports today’s software development and scales along with
+new software and hardware throughout the next decade. Such a future has no
+place for stop-the-world GC pauses, which have been an impediment to broader
+uses of safe and secure languages such as Go.
+
+Go 1.5, the first glimpse of this future, achieves GC latencies well below the
+10 millisecond goal we set a year ago. We presented some impressive numbers
+in [[https://talks.golang.org/2015/go-gc.pdf][a talk at Gophercon]].
+The latency improvements have generated a lot of attention;
+Robin Verlangen’s blog post
+[[https://medium.com/@robin.verlangen/billions-of-request-per-day-meet-go-1-5-362bfefa0911][_Billions_of_requests_per_day_meet_Go_1.5_]]
+validates our direction with end to end results.
+We also particularly enjoyed
+[[https://twitter.com/inconshreveable/status/620650786662555648][Alan Shreve’s production server graphs]]
+and his "Holy 85% reduction" comment.
+
+Today 16 gigabytes of RAM costs $100 and CPUs come with many cores, each with
+multiple hardware threads. In a decade this hardware will seem quaint but the
+software being built in Go today will need to scale to meet expanding needs and
+the next big thing. Given that hardware will provide the power to increase
+throughput, Go’s garbage collector is being designed to favor low latency and
+tuning via only a single knob. Go 1.5 is the first big step down this path and
+these first steps will forever influence Go and the applications it best
+supports. This blog post gives a high-level overview of what we have done for
+the Go 1.5 collector.
+
+* The Embellishment
+
+To create a garbage collector for the next decade, we turned to an algorithm
+from decades ago. Go's new garbage collector is a _concurrent_, _tri-color_,
+_mark-sweep_ collector, an idea first proposed by
+[[http://dl.acm.org/citation.cfm?id=359655][Dijkstra in 1978]].
+This is a deliberate divergence from most "enterprise" grade garbage collectors
+of today, and one that we believe is well suited to the properties of modern
+hardware and the latency requirements of modern software.
+
+In a tri-color collector, every object is either white, grey, or black and we
+view the heap as a graph of connected objects. At the start of a GC cycle all
+objects are white. The GC visits all _roots_, which are objects directly
+accessible by the application such as globals and things on the stack, and
+colors these grey. The GC then chooses a grey object, blackens it, and then
+scans it for pointers to other objects. When this scan finds a pointer to a
+white object, it turns that object grey. This process repeats until there are
+no more grey objects. At this point, white objects are known to be unreachable
+and can be reused.
+
+This all happens concurrently with the application, known as the _mutator_,
+changing pointers while the collector is running. Hence, the mutator must
+maintain the invariant that no black object points to a white object, lest the
+garbage collector lose track of an object installed in a part of the heap it
+has already visited. Maintaining this invariant is the job of the
+_write_barrier_, which is a small function run by the mutator whenever a
+pointer in the heap is modified. Go’s write barrier colors the now-reachable
+object grey if it is currently white, ensuring that the garbage collector will
+eventually scan it for pointers.
+
+Deciding when the job of finding all grey objects is done is subtle and can be
+expensive and complicated if we want to avoid blocking the mutators. To keep
+things simple Go 1.5 does as much work as it can concurrently and then briefly
+stops the world to inspect all potential sources of grey objects. Finding the
+sweet spot between the time needed for this final stop-the-world and the total
+amount of work that this GC does is a major deliverable for Go 1.6.
+
+Of course the devil is in the details. When do we start a GC cycle? What
+metrics do we use to make that decision? How should the GC interact with the Go
+scheduler? How do we pause a mutator thread long enough to scan its stack?
+ How do we represent white, grey, and black so we can efficiently find and scan
+grey objects? How do we know where the roots are? How do we know where in an
+object pointers are located? How do we minimize memory fragmentation? How do we
+deal with cache performance issues? How big should the heap be? And on and on,
+some related to allocation, some to finding reachable objects, some related to
+scheduling, but many related to performance. Low-level discussions of each of
+these areas are beyond the scope of this blog post.
+
+At a higher level, one approach to solving performance problems is to add GC
+knobs, one for each performance issue. The programmer can then turn the knobs
+in search of appropriate settings for their application. The downside is that
+after a decade with one or two new knobs each year you end up with the GC Knobs
+Turner Employment Act. Go is not going down that path. Instead we provide a
+single knob, called GOGC. This value controls the total size of the heap
+relative to the size of reachable objects. The default value of 100 means that
+total heap size is now 100% bigger than (i.e., twice) the size of the reachable
+objects after the last collection. 200 means total heap size is 200% bigger
+than (i.e., three times) the size of the reachable objects. If you want to
+lower the total time spent in GC, increase GOGC. If you want to trade more GC
+time for less memory, lower GOGC.
+
+More importantly as RAM doubles with the next generation of hardware, simply
+doubling GOGC will halve the number of GC cycles. On the other hand since GOGC
+is based on reachable object size, doubling the load by doubling the reachable
+objects requires no retuning. The application just scales.
+Furthermore, unencumbered by ongoing support for dozens of knobs, the runtime
+team can focus on improving the runtime based on feedback from real customer
+applications.
+
+* The Punchline
+
+Go 1.5’s GC ushers in a future where stop-the-world pauses are no longer a
+barrier to moving to a safe and secure language. It is a future where
+applications scale effortlessly along with hardware and as hardware becomes
+more powerful the GC will not be an impediment to better, more scalable
+software. It’s a good place to be for the next decade and beyond.
+For more details about the 1.5 GC and how we eliminated latency issues see the
+[[https://www.youtube.com/watch?v=aiv1JOfMjm0][Go GC: Latency Problem Solved presentation]]
+or [[https://talks.golang.org/2015/go-gc.pdf][the slides]].