| Go, from C to Go |
| |
| GopherCon |
| 25 Apr 2014 |
| |
| Russ Cox |
| Google |
| |
| https://go.dev/ |
| |
| |
| * Video |
| |
| A video of this talk was recorded at GopherCon in Denver. |
| |
| .link https://www.youtube.com/watch?v=QIE5nV5fDwA Watch the talk on YouTube |
| |
| |
| * Go Compiler |
| |
| * Go Compiler |
| |
| 80,000+ lines of C. |
| |
| * Problem |
| |
| Programming in Go is fun. |
| |
| Programming in C is not. |
| |
| * Problem |
| |
| Writing a Go compiler requires Go expertise. |
| |
| Writing any program in C requires C expertise. |
| |
| Writing a Go compiler in C requires Go and C expertise. |
| |
| * Solution |
| |
| Write the Go compiler in Go. |
| |
| * Past |
| |
| Why not write the Go compiler in Go on day one? |
| |
| - Go did not exist. |
| |
| - Go was unstable. |
| |
| - Go is not targeting compiler writers. |
| |
| * Present |
| |
| Why do it today? |
| |
| - Go does exist. |
| |
| - Go is stable. |
| |
| - Go is a great general purpose language. |
| |
| * How? |
| |
| Crazy idea: mechanical conversion. |
| |
| “One big gofix.” |
| |
| * C |
| |
| * C |
| |
| - First creative burst in 1972 at Bell Labs |
| |
| - Ritchie, [[http://cm.bell-labs.com/who/dmr/chist.html][The Development of the C Language]], HOPL 1993 |
| |
| - “C is quirky, flawed, and an enormous success...” |
| |
| * C Data Model |
| |
| - Original target: PDP-11 with 24 kB of memory. |
| |
| - Programmer is in charge of memory. |
| |
| - “Off-stack, dynamically-allocated storage is provided only by a library routine and the burden of managing it is placed on the programmer: C is hostile to automatic garbage collection.” |
| |
| - Types are there to help but not enforced. |
| |
| * C Control Flow |
| |
| - `do...while`, `for`, `switch`, `while` |
| |
| - the much maligned `goto` |
| |
| * C Program Model |
| |
| - Per-file compilation. |
| |
| - Headers vs code. |
| |
| - `#define`, `#include` |
| |
| * Conversion |
| |
| * Challenges for Converting C to Go |
| |
| - minor: unions, #define, comments |
| |
| - goto |
| |
| - type mapping |
| |
| * Goal |
| |
| Automated conversion of our C code to Go. |
| |
| Target: _our_ C code, not _all_ C code. |
| |
| - Want generated code to be maintainable. |
| - Want automatic translation for 99%+ of the code. |
| - No need to solve general problem for tiny number of cases. |
| - Special cases in converter are okay. |
| - Annotations in source code are okay. |
| |
| * Warmups |
| |
| * Unions |
| |
| go/src/cmd/gc/go.h |
| |
| struct Val |
| { |
| short ctype; |
| union |
| { |
| short reg; // OREGISTER |
| short bval; // bool value CTBOOL |
| Mpint* xval; // int CTINT, rune CTRUNE |
| Mpflt* fval; // float CTFLT |
| Mpcplx* cval; // float CTCPLX |
| Strlit* sval; // string CTSTR |
| } u; |
| }; |
| |
| * Unions |
| |
| go/include/link.h |
| |
| struct Addr |
| { |
| short type; |
| union |
| { |
| char sval[8]; |
| float64 dval; |
| Prog* branch; // for 5g, 6g, 8g |
| } u; |
| |
| ... |
| }; |
| |
| * Unions |
| |
| `#define` `struct` `union` `/*` `Great` `space` `saver` `*/` |
| |
| * Unions |
| |
| `#define` `union` `struct` `/*` `legal` `in` `C!` `*/` |
| |
| And anyway, there are only two. |
| |
| * #define |
| |
| Can't just expand during parsing. |
| |
| * #define |
| |
| Not many. |
| |
| /* |
| * defined macros |
| * you need super-gopher-guru privilege |
| * to add this list. |
| */ |
| #define nelem(x) (sizeof(x)/sizeof((x)[0])) |
| #define nil ((void*)0) |
| ... |
| |
| Extend parser to recognize special cases. |
| |
| * #define |
| |
| Annotate some. |
| |
| #define BOM 0xFEFF |
| /*c2go enum { BOM = 0xFEFF }; */ |
| |
| Rewrite others. |
| |
| enum { |
| BOM = 0xFEFF, |
| }; |
| |
| * Comments |
| |
| Can't just discard during parsing. |
| |
| /* |
| * If the new process paused because it was |
| * swapped out, set the stack level to the last call |
| * to savu(u_ssav). This means that the return |
| * which is executed immediately after the call to aretu |
| * actually returns from the last routine which did |
| * the savu. |
| * |
| * You are not expected to understand this. |
| */ |
| if(rp->p_flag&SSWAP) { |
| rp->p_flag =& ~SSWAP; |
| aretu(u.u_ssav); |
| } |
| |
| * Comments |
| |
| Record precise source locations. |
| |
| case OMAPLIT: |
| n->esc = EscNone; // until proven otherwise |
| e->noesc = list(e->noesc, n); |
| n->escloopdepth = e->loopdepth; |
| |
| // Keys and values make it to memory, lose track. |
| for(ll=n->list; ll; ll=ll->next) { |
| escassign(e, &e->theSink, ll->n->left); |
| escassign(e, &e->theSink, ll->n->right); |
| } |
| break; |
| |
| Whole-line comments attach to syntax immediately following (or EOF). |
| |
| Suffix comments attach to syntax immediately before. |
| |
| Syntax carries comments if it moves. |
| |
| * Goto |
| |
| * C Goto |
| |
| “27. Horrors! goto’s and labels |
| |
| C has a goto statement and labels, so you can branch about the way you used to. But most of the time goto’s aren’t needed... The code can almost always be more clearly expressed by for/while, if/else, and compound statements. |
| |
| * C Goto |
| |
| One use of goto’s with some legitimacy is in a program which contains a long loop, where a while(1) would be too extended. Then you might write |
| |
| mainloop: |
| ... |
| goto mainloop; |
| |
| Another use is to implement a break out of more than one level of for or while. goto’s can only branch to labels within the same function.” |
| |
| — Kernighan, [[http://cm.bell-labs.com/who/dmr/ctut.pdf][Programming in C – A Tutorial]] |
| |
| * Go Goto Restrictions |
| |
| - Cannot jump over a variable declaration in target scope. |
| |
| . if x { |
| goto Done |
| } |
| |
| y := f() |
| print(y) |
| |
| Done: |
| close(c) |
| return |
| |
| * Go Goto Restrictions |
| |
| - Cannot jump over a variable declaration in target scope. |
| |
| . var y int |
| |
| if x { |
| goto Done |
| } |
| |
| y = f() |
| print(y) |
| |
| Done: |
| close(c) |
| return |
| |
| * Go Goto Restrictions |
| |
| - Cannot jump into a new scope (into a { } block). |
| |
| if bad { |
| Bad: |
| printError() |
| return err |
| } |
| |
| ... |
| |
| if other bad thing { |
| goto Bad |
| } |
| |
| * Go Goto Restrictions |
| |
| - Cannot jump into a new scope (into a { } block or switch case). |
| |
| switch x { |
| case 1: |
| F() |
| goto Common; |
| case 2: |
| G() |
| goto Common |
| case 3: |
| Common: |
| H() |
| } |
| |
| * Goto in Go compiler |
| |
| 1032 goto statements |
| 241 labels |
| |
| * Goto in Go compiler |
| |
| 35 indented labels |
| |
| 18 switch case |
| 6 multilevel break/continue |
| 5 ‘else’ statement |
| 4 cleanup/error labels |
| 1 loop |
| 1 difficult to explain |
| |
| * Refactor switch case goto |
| |
| switch(r->op) { |
| case OINDEXMAP: |
| n->op = OAS2MAPR; |
| goto common; |
| case ORECV: |
| n->op = OAS2RECV; |
| goto common; |
| case ODOTTYPE: |
| n->op = OAS2DOTTYPE; |
| r->op = ODOTTYPE2; |
| common: |
| ... |
| } |
| |
| * Refactor switch case goto |
| |
| switch r.op { |
| case OINDEXMAP, ORECV, ODOTTYPE: |
| switch r.op { |
| case OINDEXMAP: |
| n.op = OAS2MAPR |
| case ORECV: |
| n.op = OAS2RECV |
| case ODOTTYPE: |
| n.op = OAS2DOTTYPE |
| r.op = ODOTTYPE2 |
| } |
| ... |
| } |
| |
| * General solution |
| |
| Baker, [[http://dl.acm.org/citation.cfm?id=321999][An Algorithm for Structuring Flowgraphs]], JACM 1977 |
| |
| But we don't need it. |
| |
| Handle trivial rewrites in converter. |
| Rewrite problematic gotos by hand. |
| |
| * Type Mapping |
| |
| * Type Mapping |
| |
| General question: what type to use in the Go translation? |
| |
| - C allows implicit conversion between int, long, char and so on. Go must use one consistently. |
| |
| - C uses pointers for what Go calls pointers _and_ slices. |
| |
| * Type Mapping |
| |
| Build graph of “assigned” value flow and extract clusters. |
| |
| x = y; |
| |
| int f(void) { |
| return x; |
| } |
| w = f(); |
| |
| void g(int z); |
| g(x); |
| g(y); |
| |
| Apply to entire compiler (all files). Exclude some functions. |
| |
| * Type Mapping |
| |
| int |
| islvalue(Node *n) |
| { |
| switch(n->op) { |
| case OINDEX: |
| if(isfixedarray(n->left->type)) |
| return islvalue(n->left); |
| if(n->left->type != T && n->left->type->etype == TSTRING) |
| return 0; |
| // fall through |
| case OIND: |
| case ODOTPTR: |
| case OCLOSUREVAR: |
| return 1; |
| case ODOT: |
| return islvalue(n->left); |
| case ONAME: |
| if(n->class == PFUNC) |
| return 0; |
| return 1; |
| } |
| return 0; |
| } |
| |
| * Type Mapping |
| |
| int |
| islvalue(Node *n) |
| { |
| ... |
| return islvalue(n->left); |
| ... |
| return 0; |
| ... |
| return 1; |
| ... |
| return islvalue(n->left); |
| ... |
| return 0; |
| ... |
| return 1; |
| ... |
| return 0; |
| } |
| |
| * Type Mapping |
| |
| cluster |
| types: int |
| values: |
| return from islvalue |
| 0 |
| 1 |
| islvalue(n) |
| islvalue(n->left) |
| islvalue(n->right) |
| contexts: |
| bool condition |
| /* if(islvalue(n)), if(!islvalue(n)), ... */ |
| |
| Translation: bool. |
| |
| * Type Mapping |
| |
| cluster |
| types: int |
| values: |
| return from checksliceconst |
| 0 |
| -1 |
| contexts: |
| checksliceconst(lo, hi) < 0 |
| checksliceconst(lo, mid) < 0 |
| checksliceconst(mid, hi) < 0 |
| |
| Translation: bool or error. |
| |
| * Type Mapping |
| |
| cluster |
| types: Val* |
| values: |
| var Val *v |
| va_arg(fp->args, Val*) |
| contexts: |
| v->ctype |
| v->u |
| |
| Translation: pointer. |
| |
| * Type Mapping |
| |
| cluster |
| types: long* |
| values: |
| var long* a1 |
| &a->a[0] |
| contexts: |
| *a1 |
| a1++ |
| |
| Translation: slice. |
| |
| * Type Mapping |
| |
| Cluster statistics |
| |
| - 1,703 clusters in Go compiler |
| - median cluster size 4 values |
| - max cluster size 16,592 values |
| |
| Clustering does not rely on C type information at all. |
| |
| * Conversion status |
| |
| - Still prototyping, but looks good. |
| - Aiming at Go 1.4, but no promises. |
| |
| By the way, please try the Go 1.3 beta! |
| |
| * Go from C to Go! |
| |
| - Practical |
| - Applicable to other code bases? |
| - Applicable to other languages? |
| - Applicable to program understanding tools? |
| |