blob: 8a4f69bfc463821b10134d9f6c39031fed24905d [file] [log] [blame]
Static analysis tools
for Go code comprehension and refactoring
golang nyc meetup
13 Nov 2014
Alan Donovan
Google, New York
* Video
This talk was presented at the GothamGo Kickoff Meetup in New York City, November 2014.
.link Watch the talk on Vimeo
* Introduction
Programmers say "writing code", but most of that time is spent _reading_
Actually: reading code, and making logical deductions
Machines are good at reading and logic
*Machines* *should* *be* *helping* *us* *read* *code.*
And write it! Refactoring can be tedious and error-prone.
* Outline
- Code comprehension tools: `oracle`, `godoc`
- Analysis technology
- Refactoring tools: `gorename`, `eg`
* Demo: the Go oracle
Supports interactive, editor-integrated queries:
- where is this thing defined?
- what are the methods of this type?
- what are the free variables of this selection?
- what might this expression point to?
- what dynamic types might this interface or `reflect.Value` contain?
* Demo: godoc analysis features
.link godoc -analysis=type,pointer
Package view
- method set and _implements_ relation for every type
- internal call graph of every package
Source code view
- kind, type, definition of every identifier
- method set and _implements_ relation for every type
- peers of every channel send/receive
- callers of every function
- callees of every call site (even dynamic)
* How it works
* Libraries and tools
.image static-analysis/tools.svg
Mostly in `` repo
* go/types: the Go type checker
Computes mappings from:
- identifiers to definitions
- constant expressions to values
- expressions to types
Many subtle corners:
- method set computation
- recursive interfaces
- shifts
Making it correct, fast, and clean was a substantial project
Author: Robert Griesemer
* go/ssa: intermediate representation (IR)
Typed abstract syntax trees (ASTs) are too varied and subtle to analyze directly
Programs are lowered into Static Single-Assignment form (SSA):
- simplifies dataflow analyses since _reaching_ _definitions_ are implicit
- invented 1991, now mainstream (gcc, llvm)
All Go programs can be expressed using only ~30 basic instructions
Simple, explicit, high-level, high source fidelity
The llgo project is using go/ssa as a front-end for LLVM
* Demo: ssadump
# Simple fibonacci function
# % ssadump -build=F fib.go basic
# % ssadump -build=FD fib.go debug info
# % ssadump -test -run unicode toy interpreter
* go/pointer: pointer analysis
Pointers complicate reasoning about program behaviour
- functions may be called dynamically
- a variable may be updated and accessed via different names ("aliases")
- runtime types are values too: `interface`, `reflect.Type`
We use *pointer* *analysis* to answer the question:
which variables might this pointer point to?
# comment on go's appropriateness for this analysis:
# (closed program---no dlopen, classloading, no generics, typesafe)
* Pointer analysis
Let `pts(p)` be the _points-to_ _set_ of pointer p
Its elements are abstract variables: globals, locals, unnamed (`new(T)`)
- analyze each SSA statement in the whole program
- generate appropriate constraints on the points-to set of each variable
Statement Constraint
y = &x pts(y) {x}
y = x pts(y) pts(x) "inclusion-based"
*y = x o pts(y). pts(o) pts(x)
y = *x o pts(x). pts(y) pts(o)
y = &x->f \
y = x.(T) } not shown
reflection /
- solve the constraint system
* Pointer analysis: constraint generation
All Go operations can be expressed in these constraints
Function, map, slice and channel ops all simplify to struct ops
Treatment of `unsafe.Pointer` conversion is unsound
But we don't care! This isn't a compiler
The constraint system is:
- *field-sensitive*: struct fields x.f and x.g have distinct solutions
- *flow-insensitive*: the order of statements is ignored
- *context-insensitive*: each function is analyzed only once
- *whole-program*: Go source is required for all dependencies
- *inclusion-based*: i.e. Andersen's analysis
Optimization: don't make constraints for "uninteresting" types
such as types not related to a one-shot Oracle query
* Pointer analysis: pre-solver optimizations
Constraint system for 124KLoC program (oracle) has:
254,000 variables
184,000 equations
Solving phase is in O(|v|³), so pre-solver optimisation is crucial
We can bring this down to:
88,000 variables
65,000 equations
* Pointer Equivalence by Hash-Value Numbering (Hardekopf & Lin '07)
p = &x a = &x if ... {
q = p b = a p, q = &x, &y
r = q c = b } else {
s = r if ... { a = c } p, q = &y, &x
.image static-analysis/hvn.svg
Hash-Value Numbering
* Pointer analysis: solving
It's transitive closure of a graph, essentially
Lots of optimizations, for example:
_sparse_bit_vectors_, a very compact representation for points-to sets
Solver log is >1GB. Debugging is fun.
#* Sparse bit vector
#.image sparsebitvector.svg
#Very compact representation of sparse `set<int>`
#Doubly-linked list of (offset int, data [256]bit) blocks.
* Call graph
The *call* *graph* can be read directly from the solution
The `` tool prints raw call graphs
Demo (time permitting)
* Refactoring tools
* gorename: precise, type-safe identifier renaming
A refactoring tool, usable from...
- the command line
% gorename -from bytes.Buffer.Len -to Size
Renamed 105 occurrences in 42 files in 33 packages.
- many editors ([[][Emacs]], [[][Vim]], [[][Sublime]], [[][Acme]])
All renamings are reversible
Sound: either a conflict is reported, or the refactoring preserves behaviour*
*except reflection
* Demo: gorename
* Example-based refactoring with eg
A Go port of the Java _Refaster_ tool
A template specifies the pattern and replacement as Go expressions:
package P
import ( "errors"; "fmt" )
func before(s string) error { return fmt.Errorf("%s", s) }
func after(s string) error { return errors.New(s) }
Parameters are _wildcards_
% eg -t template.go <package> ...
* Demo: eg
* Conclusion
From the outset, Go has been renowned for its tools: `go`, `gofmt`
We have many building blocks for sophisticated source analysis tools
What should we build next?