| Static analysis tools |
| for Go code comprehension and refactoring |
| |
| golang nyc meetup |
| 13 Nov 2014 |
| |
| Alan Donovan |
| Google, New York |
| adonovan@google.com |
| |
| * Video |
| |
| This talk was presented at the GothamGo Kickoff Meetup in New York City, November 2014. |
| |
| .link http://vimeo.com/114736889 Watch the talk on Vimeo |
| |
| * Introduction |
| |
| Programmers say "writing code", but most of that time is spent _reading_ |
| |
| Actually: reading code, and making logical deductions |
| |
| Machines are good at reading and logic |
| |
| *Machines* *should* *be* *helping* *us* *read* *code.* |
| |
| And write it! Refactoring can be tedious and error-prone. |
| |
| |
| * Outline |
| |
| - Code comprehension tools: `oracle`, `godoc` |
| - Analysis technology |
| - Refactoring tools: `gorename`, `eg` |
| |
| |
| * Demo: the Go oracle |
| |
| Supports interactive, editor-integrated queries: |
| - where is this thing defined? |
| - what are the methods of this type? |
| - what are the free variables of this selection? |
| - what might this expression point to? |
| - what dynamic types might this interface or `reflect.Value` contain? |
| |
| |
| * Demo: godoc analysis features |
| |
| .link /lib/godoc/analysis/help.html godoc -analysis=type,pointer |
| |
| Package view |
| - method set and _implements_ relation for every type |
| - internal call graph of every package |
| |
| Source code view |
| - kind, type, definition of every identifier |
| - method set and _implements_ relation for every type |
| - peers of every channel send/receive |
| - callers of every function |
| - callees of every call site (even dynamic) |
| |
| |
| * How it works |
| |
| |
| |
| * Libraries and tools |
| |
| .image static-analysis/tools.svg |
| |
| Mostly in `golang.org/x/tools` repo |
| |
| |
| * go/types: the Go type checker |
| |
| Computes mappings from: |
| - identifiers to definitions |
| - constant expressions to values |
| - expressions to types |
| |
| Many subtle corners: |
| - method set computation |
| - recursive interfaces |
| - shifts |
| |
| Making it correct, fast, and clean was a substantial project |
| |
| .link https://pkg.go.dev/golang.org/x/tools/go/types golang.org/x/tools/go/types |
| |
| Author: Robert Griesemer |
| |
| |
| |
| * go/ssa: intermediate representation (IR) |
| |
| Typed abstract syntax trees (ASTs) are too varied and subtle to analyze directly |
| |
| Programs are lowered into Static Single-Assignment form (SSA): |
| - simplifies dataflow analyses since _reaching_ _definitions_ are implicit |
| - invented 1991, now mainstream (gcc, llvm) |
| |
| All Go programs can be expressed using only ~30 basic instructions |
| |
| Simple, explicit, high-level, high source fidelity |
| |
| .link https://pkg.go.dev/golang.org/x/tools/go/ssa golang.org/x/tools/go/ssa |
| |
| The llgo project is using go/ssa as a front-end for LLVM |
| |
| |
| |
| * Demo: ssadump |
| |
| # Simple fibonacci function |
| # % ssadump -build=F fib.go basic |
| # % ssadump -build=FD fib.go debug info |
| # % ssadump -test -run unicode toy interpreter |
| |
| |
| |
| * go/pointer: pointer analysis |
| |
| Pointers complicate reasoning about program behaviour |
| - functions may be called dynamically |
| - a variable may be updated and accessed via different names ("aliases") |
| - runtime types are values too: `interface`, `reflect.Type` |
| |
| We use *pointer* *analysis* to answer the question: |
| which variables might this pointer point to? |
| |
| .link https://pkg.go.dev/golang.org/x/tools/go/pointer golang.org/x/tools/go/pointer |
| |
| # comment on go's appropriateness for this analysis: |
| # (closed program---no dlopen, classloading, no generics, typesafe) |
| |
| |
| * Pointer analysis |
| |
| Let `pts(p)` be the _points-to_ _set_ of pointer p |
| Its elements are abstract variables: globals, locals, unnamed (`new(T)`) |
| |
| Overview: |
| |
| - analyze each SSA statement in the whole program |
| |
| - generate appropriate constraints on the points-to set of each variable |
| |
| Statement Constraint |
| y = &x pts(y) ⊇ {x} |
| y = x pts(y) ⊇ pts(x) "inclusion-based" |
| *y = x ∀o ∈ pts(y). pts(o) ⊇ pts(x) |
| y = *x ∀o ∈ pts(x). pts(y) ⊇ pts(o) |
| y = &x->f \ |
| y = x.(T) } not shown |
| reflection / |
| |
| - solve the constraint system |
| |
| |
| * Pointer analysis: constraint generation |
| |
| All Go operations can be expressed in these constraints |
| Function, map, slice and channel ops all simplify to struct ops |
| |
| Treatment of `unsafe.Pointer` conversion is unsound |
| But we don't care! This isn't a compiler |
| |
| The constraint system is: |
| - *field-sensitive*: struct fields x.f and x.g have distinct solutions |
| - *flow-insensitive*: the order of statements is ignored |
| - *context-insensitive*: each function is analyzed only once |
| - *whole-program*: Go source is required for all dependencies |
| - *inclusion-based*: i.e. Andersen's analysis |
| |
| Optimization: don't make constraints for "uninteresting" types |
| such as types not related to a one-shot Oracle query |
| |
| * Pointer analysis: pre-solver optimizations |
| |
| Constraint system for 124KLoC program (oracle) has: |
| |
| 254,000 variables |
| 184,000 equations |
| |
| Solving phase is in O(|v|³), so pre-solver optimisation is crucial |
| |
| We can bring this down to: |
| |
| 88,000 variables |
| 65,000 equations |
| |
| * Pointer Equivalence by Hash-Value Numbering (Hardekopf & Lin '07) |
| |
| p = &x a = &x if ... { |
| q = p b = a p, q = &x, &y |
| r = q c = b } else { |
| s = r if ... { a = c } p, q = &y, &x |
| } |
| |
| .image static-analysis/hvn.svg |
| |
| Hash-Value Numbering |
| |
| * Pointer analysis: solving |
| |
| It's transitive closure of a graph, essentially |
| |
| Lots of optimizations, for example: |
| |
| _sparse_bit_vectors_, a very compact representation for points-to sets |
| |
| .link https://pkg.go.dev/golang.org/x/tools/container/ints golang.org/x/tools/container/ints |
| |
| Solver log is >1GB. Debugging is fun. |
| |
| |
| #* Sparse bit vector |
| #`golang.org/x/tools/container/ints` |
| # |
| #.image sparsebitvector.svg |
| # |
| #Very compact representation of sparse `set<int>` |
| #Doubly-linked list of (offset int, data [256]bit) blocks. |
| |
| |
| * Call graph |
| |
| The *call* *graph* can be read directly from the solution |
| |
| The `golang.org/x/tools/cmd/callgraph` tool prints raw call graphs |
| |
| Demo (time permitting) |
| |
| |
| * Refactoring tools |
| |
| * gorename: precise, type-safe identifier renaming |
| |
| A refactoring tool, usable from... |
| |
| - the command line |
| |
| % gorename -from bytes.Buffer.Len -to Size |
| Renamed 105 occurrences in 42 files in 33 packages. |
| |
| - many editors ([[https://code.google.com/p/go/source/browse/refactor/rename/rename.el?repo=tools&r=511801758bb9a0b83f9bf931342889fbedbc9b48][Emacs]], [[http://quick.as/dgof2p7][Vim]], [[https://github.com/smartystreets/sublime-gorename][Sublime]], [[https://github.com/davidrjenni/agorn][Acme]]) |
| |
| All renamings are reversible |
| |
| Sound: either a conflict is reported, or the refactoring preserves behaviour* |
| |
| *except reflection |
| |
| .link https://pkg.go.dev/golang.org/x/tools/cmd/gorename golang.org/x/tools/cmd/gorename |
| |
| * Demo: gorename |
| |
| |
| * Example-based refactoring with eg |
| |
| A Go port of the Java _Refaster_ tool |
| |
| A template specifies the pattern and replacement as Go expressions: |
| |
| package P |
| |
| import ( "errors"; "fmt" ) |
| |
| func before(s string) error { return fmt.Errorf("%s", s) } |
| func after(s string) error { return errors.New(s) } |
| |
| Parameters are _wildcards_ |
| |
| % eg -t template.go <package> ... |
| |
| .link https://pkg.go.dev/golang.org/x/tools/cmd/eg golang.org/x/tools/cmd/eg |
| |
| * Demo: eg |
| |
| * Conclusion |
| |
| From the outset, Go has been renowned for its tools: `go`, `gofmt` |
| |
| We have many building blocks for sophisticated source analysis tools |
| |
| What should we build next? |