| // Copyright 2013 The Go Authors. All rights reserved. |
| // Use of this source code is governed by a BSD-style |
| // license that can be found in the LICENSE file. |
| |
| /* |
| |
| Package pointer implements Andersen's analysis, an inclusion-based |
| pointer analysis algorithm first described in (Andersen, 1994). |
| |
| The implementation is similar to that described in (Pearce et al, |
| PASTE'04). Unlike many algorithms which interleave constraint |
| generation and solving, constructing the callgraph as they go, this |
| implementation has a strict phase ordering: generation before solving. |
| Only simple (copy) constraints may be generated during solving. This |
| improves the traction of presolver optimisations, but imposes certain |
| restrictions, e.g. potential context sensitivity is limited since all |
| variants must be created a priori. |
| |
| We intend to add various presolving optimisations such as Pointer and |
| Location Equivalence from (Hardekopf & Lin, SAS '07) and solver |
| optimisatisions such as Hybrid- and Lazy- Cycle Detection from |
| (Hardekopf & Lin, PLDI'07), |
| |
| |
| TERMINOLOGY |
| |
| We occasionally use C's x->f notation to distinguish the case where x |
| is a struct pointer from x.f where is a struct value. |
| |
| |
| NODES |
| |
| Nodes are the key datastructure of the analysis, and have a dual role: |
| they represent both constraint variables (equivalence classes of |
| pointers) and members of points-to sets (things that can be pointed |
| at, i.e. "labels"). |
| |
| Nodes are naturally numbered. The numbering enables compact |
| representations of sets of nodes such as bitvectors or BDDs; and the |
| ordering enables a very cheap way to group related nodes together. |
| (For example, passing n parameters consists of generating n parallel |
| constraints from caller+i to callee+i for 0<=i<n.) |
| |
| The zero nodeid means "not a pointer". Currently it's only used for |
| struct{} or (). We generate all flow constraints, even for non-pointer |
| types, with the expectations that (a) presolver optimisations will |
| quickly collapse all the non-pointer ones and (b) we may get more |
| precise results by treating uintptr as a possible pointer. |
| |
| Each node represents a scalar (word-sized) part of a value or object. |
| Aggregate types (structs, tuples, arrays) are recursively flattened |
| out into a sequential list of scalar component types, and all the |
| elements of an array are represented by a single node. (The |
| flattening of a basic type is a list containing a single node.) |
| |
| Nodes are connected into a graph with various kinds of labelled edges: |
| simple edges (or copy constraints) represent value flow. Complex |
| edges (load, store, etc) trigger the creation of new simple edges |
| during the solving phase. |
| |
| |
| OBJECTS |
| |
| An "object" is a contiguous sequence of nodes denoting an addressable |
| location: something that a pointer can point to. The first node of an |
| object has the ntObject flag, and its size indicates the extent of the |
| object. |
| |
| Objects include: |
| - functions and globals; |
| - variable allocations in the stack frame or heap; |
| - maps, channels and slices created by calls to make(); |
| - allocations to construct an interface; |
| - allocations caused by literals and conversions, |
| e.g. []byte("foo"), []byte(str). |
| - arrays allocated by calls to append(); |
| |
| Many objects have no Go types. For example, the func, map and chan |
| type kinds in Go are all varieties of pointers, but the respective |
| objects are actual functions, maps, and channels. Given the way we |
| model interfaces, they too are pointers to interface objects with no |
| Go type. And an *ssa.Global denotes the address of a global variable, |
| but the object for a Global is the actual data. So, types of objects |
| are usually "off by one indirection". |
| |
| The individual nodes of an object are sometimes referred to as |
| "labels". |
| |
| Objects containing no nodes (i.e. just empty structs; tuples may be |
| values but never objects in Go) are padded with an invalid-type node |
| to have at least one node so that there is something to point to. |
| (TODO(adonovan): I think this is unnecessary now that we have identity |
| nodes; check.) |
| |
| |
| ANALYSIS ABSTRACTION OF EACH TYPE |
| |
| Variables of the following "scalar" types may be represented by a |
| single node: basic types, pointers, channels, maps, slices, 'func' |
| pointers, interfaces. |
| |
| Pointers |
| Nothing to say here. |
| |
| Basic types (bool, string, numbers, unsafe.Pointer) |
| Currently all fields in the flattening of a type, including |
| non-pointer basic types such as int, are represented in objects and |
| values. Though non-pointer nodes within values are uninteresting, |
| non-pointer nodes in objects may be useful (if address-taken) |
| because they permit the analysis to deduce, in this example, |
| |
| var s struct{ ...; x int; ... } |
| p := &s.x |
| |
| that p points to s.x. If we ignored such object fields, we could only |
| say that p points somewhere within s. |
| |
| All other basic types are ignored. Expressions of these types have |
| zero nodeid, and fields of these types within aggregate other types |
| are omitted. |
| |
| unsafe.Pointer conversions are not yet modelled as pointer |
| conversions. Consequently uintptr is always a number and uintptr |
| nodes do not point to any object. |
| |
| Channels |
| An expression of type 'chan T' is a kind of pointer that points |
| exclusively to channel objects, i.e. objects created by MakeChan (or |
| reflection). |
| |
| 'chan T' is treated like *T. |
| *ssa.MakeChan is treated as equivalent to new(T). |
| *ssa.Send and receive (*ssa.UnOp(ARROW)) and are equivalent to store |
| and load. |
| |
| Maps |
| An expression of type 'map[K]V' is a kind of pointer that points |
| exclusively to map objects, i.e. objects created by MakeMap (or |
| reflection). |
| |
| map K[V] is treated like *M where M = struct{k K; v V}. |
| *ssa.MakeMap is equivalent to new(M). |
| *ssa.MapUpdate is equivalent to *y=x where *y and x have type M. |
| *ssa.Lookup is equivalent to y=x.v where x has type *M. |
| |
| Slices |
| A slice []T, which dynamically resembles a struct{array *T, len, cap int}, |
| is treated as if it were just a *T pointer; the len and cap fields are |
| ignored. |
| |
| *ssa.MakeSlice is treated like new([1]T): an allocation of a |
| singleton array. |
| *ssa.Index on a slice is equivalent to a load. |
| *ssa.IndexAddr on a slice returns the address of the sole element of the |
| slice, i.e. the same address. |
| *ssa.Slice is treated as a simple copy. |
| |
| Functions |
| An expression of type 'func...' is a kind of pointer that points |
| exclusively to function objects. |
| |
| A function object has the following layout: |
| |
| identity -- typ:*types.Signature; flags ⊇ {ntFunction}; data:*cgnode |
| params_0 -- (the receiver, if a method) |
| ... |
| params_n-1 |
| results_0 |
| ... |
| results_m-1 |
| |
| There may be multiple function objects for the same *ssa.Function |
| due to context-sensitive treatment of some functions. |
| |
| The first node is the function's identity node; its .data is the |
| callgraph node (*cgnode) this object represents. |
| Associated with every callsite is a special "targets" variable, |
| whose pts(·) contains the identity node of each function to which |
| the call may dispatch. Identity words are not otherwise used. |
| |
| The following block of nodes represent the flattened-out types of |
| the parameters and results of the function object, and are |
| collectively known as its "P/R block". |
| |
| The treatment of free variables of closures (*ssa.Capture) is like |
| that of global variables; it is not context-sensitive. |
| *ssa.MakeClosure instructions create copy edges to Captures. |
| |
| A Go value of type 'func' (i.e. a pointer to one or more functions) |
| is a pointer whose pts() contains function objects. The valueNode() |
| for an *ssa.Function returns a singleton for that function. |
| |
| Interfaces |
| An expression of type 'interface{...}' is a kind of pointer that |
| points exclusively to interface objects. |
| |
| An interface object has the following layout: |
| |
| conctype -- flags ⊇ {ntInterface}; data:*ssa.MakeInterface? |
| value |
| ... |
| |
| The conctype node's typ field is the concrete type of the interface |
| value which follows, flattened out. It has the ntInterface flag. |
| Its associated data is the originating MakeInterface instruction, if |
| any. |
| |
| Constructing an interface value causes generation of constraints for |
| all of the concrete type's methods; we can't tell a priori which ones |
| may be called. |
| |
| TypeAssert y = x.(T) is implemented by a dynamic filter triggered by |
| each interface object E added to pts(x). If T is an interface that |
| E.conctype implements, pts(y) gets E. If T is a concrete type then |
| edge pts(y) <- E.value is added. |
| |
| ChangeInterface is a simple copy because the representation of |
| interface objects is independent of the interface type (in contrast |
| to the "method tables" approach used by the gc runtime). |
| |
| y := Invoke x.m(...) is implemented by allocating a contiguous P/R |
| block for the callsite and adding a dynamic rule triggered by each |
| interface object E added to pts(x). The rule adds param/results copy |
| edges to/from each discovered concrete method. |
| |
| (Q. Why do we model an interface as a pointer to a pair of type and |
| value, rather than as a pair of a pointer to type and a pointer to |
| value? |
| A. Control-flow joins would merge interfaces ({T1}, {V1}) and ({T2}, |
| {V2}) to make ({T1,T2}, {V1,V2}), leading to the infeasible and |
| type-unsafe combination (T1,V2). Treating the value and its concrete |
| type as inseparable makes the analysis type-safe.) |
| |
| |
| Aggregate types: |
| |
| Aggregate types are treated as if all directly contained |
| aggregates are recursively flattened out. |
| |
| Structs |
| *ssa.Field y = x.f creates a simple edge to y from x's node at f's offset. |
| |
| *ssa.FieldAddr y = &x->f requires a dynamic closure rule to create |
| simple edges for each struct discovered in pts(x). |
| |
| The nodes of a struct consist of a special 'identity' node (whose |
| type is that of the struct itself), followed by the nodes for all |
| the struct's fields, recursively flattened out. A pointer to the |
| struct is a pointer to its identity node. That node allows us to |
| distinguish a pointer to a struct from a pointer to its first field. |
| |
| Field offsets are currently the logical field offsets (plus one for |
| the identity node), so the sizes of the fields can be ignored by the |
| analysis. |
| |
| Sound treatment of unsafe.Pointer conversions (not yet implemented) |
| would require us to model memory layout using physical field offsets |
| to ascertain which object field(s) might be aliased by a given |
| FieldAddr of a different base pointer type. It would also require |
| us to dispense with the identity node. |
| |
| *ssa.Field y = x.f creates a simple edge to y from x's node at f's offset. |
| |
| *ssa.FieldAddr y = &x->f requires a dynamic closure rule to create |
| simple edges for each struct discovered in pts(x). |
| |
| Arrays |
| We model an array by an identity node (whose type is that of the |
| array itself) followed by a node representing all the elements of |
| the array; the analysis does not distinguish elements with different |
| indices. Effectively, an array is treated like struct{elem T}, a |
| load y=x[i] like y=x.elem, and a store x[i]=y like x.elem=y; the |
| index i is ignored. |
| |
| A pointer to an array is pointer to its identity node. (A slice is |
| also a pointer to an array's identity node.) The identity node |
| allows us to distinguish a pointer to an array from a pointer to one |
| of its elements, but it is rather costly because it introduces more |
| offset constraints into the system. Furthermore, sound treatment of |
| unsafe.Pointer would require us to dispense with this node. |
| |
| Arrays may be allocated by Alloc, by make([]T), by calls to append, |
| and via reflection. |
| |
| Tuples (T, ...) |
| Tuples are treated like structs with naturally numbered fields. |
| *ssa.Extract is analogous to *ssa.Field. |
| |
| However, tuples have no identity field since by construction, they |
| cannot be address-taken. |
| |
| |
| FUNCTION CALLS |
| |
| There are three kinds of function call: |
| (1) static "call"-mode calls of functions. |
| (2) dynamic "call"-mode calls of functions. |
| (3) dynamic "invoke"-mode calls of interface methods. |
| Cases 1 and 2 apply equally to methods and standalone functions. |
| |
| Static calls. |
| A static call consists three steps: |
| - finding the function object of the callee; |
| - creating copy edges from the actual parameter value nodes to the |
| params block in the function object (this includes the receiver |
| if the callee is a method); |
| - creating copy edges from the results block in the function |
| object to the value nodes for the result of the call. |
| |
| Context sensitivity |
| |
| Static calls (alone) may be treated context sensitively, |
| i.e. each callsite may cause a distinct re-analysis of the |
| callee, improving precision. Our current context-sensitivity |
| policy treats all intrinsics and getter/setter methods in this |
| manner since such functions are small and seem like an obvious |
| source of spurious confluences, though this has not yet been |
| evaluated. |
| |
| Dynamic function calls |
| |
| Dynamic calls work in a similar manner except that the creation of |
| copy edges occurs dynamically, in a similar fashion to a pair of |
| struct copies: |
| |
| *fn->params = callargs |
| callresult = *fn->results |
| |
| (Recall that the function object's params and results blocks are |
| contiguous.) |
| |
| Interface method invocation |
| |
| For invoke-mode calls, we create a params/results block for the |
| callsite and attach a dynamic closure rule to the interface. For |
| each new interface object that flows to the interface, we look up |
| the concrete method, find its function object, and connect its P/R |
| block to the callsite's P/R block. |
| |
| Recording call targets |
| |
| The analysis notifies its clients of each callsite it encounters, |
| passing a CallSite interface. Among other things, the CallSite |
| contains a synthetic constraint variable ("targets") whose |
| points-to solution includes the set of all function objects to |
| which the call may dispatch. |
| |
| It is via this mechanism that the callgraph is made available. |
| Clients may also elect to be notified of callgraph edges directly; |
| internally this just iterates all "targets" variables' pts(·)s. |
| |
| |
| SOLVER |
| |
| The solver is currently a very naive Andersen-style implementation, |
| although it does use difference propagation (Pearce et al, SQC'04). |
| There is much to do. |
| |
| |
| FURTHER READING. |
| |
| Andersen, L. O. 1994. Program analysis and specialization for the C |
| programming language. Ph.D. dissertation. DIKU, University of |
| Copenhagen. |
| |
| David J. Pearce, Paul H. J. Kelly, and Chris Hankin. 2004. Efficient |
| field-sensitive pointer analysis for C. In Proceedings of the 5th ACM |
| SIGPLAN-SIGSOFT workshop on Program analysis for software tools and |
| engineering (PASTE '04). ACM, New York, NY, USA, 37-42. |
| http://doi.acm.org/10.1145/996821.996835 |
| |
| David J. Pearce, Paul H. J. Kelly, and Chris Hankin. 2004. Online |
| Cycle Detection and Difference Propagation: Applications to Pointer |
| Analysis. Software Quality Control 12, 4 (December 2004), 311-337. |
| http://dx.doi.org/10.1023/B:SQJO.0000039791.93071.a2 |
| |
| David Grove and Craig Chambers. 2001. A framework for call graph |
| construction algorithms. ACM Trans. Program. Lang. Syst. 23, 6 |
| (November 2001), 685-746. |
| http://doi.acm.org/10.1145/506315.506316 |
| |
| Ben Hardekopf and Calvin Lin. 2007. The ant and the grasshopper: fast |
| and accurate pointer analysis for millions of lines of code. In |
| Proceedings of the 2007 ACM SIGPLAN conference on Programming language |
| design and implementation (PLDI '07). ACM, New York, NY, USA, 290-299. |
| http://doi.acm.org/10.1145/1250734.1250767 |
| |
| Ben Hardekopf and Calvin Lin. 2007. Exploiting pointer and location |
| equivalence to optimize pointer analysis. In Proceedings of the 14th |
| international conference on Static Analysis (SAS'07), Hanne Riis |
| Nielson and Gilberto Filé (Eds.). Springer-Verlag, Berlin, Heidelberg, |
| 265-280. |
| |
| */ |
| package pointer |