blob: 9f38b9a7d4e39c657975c25bfbaf10efb191ee43 [file] [log] [blame]
Go, from C to Go
GopherCon
25 Apr 2014
Russ Cox
Google
http://golang.org/
* Video
A video of this talk was recorded at GopherCon in Denver.
.link https://www.youtube.com/watch?v=QIE5nV5fDwA Watch the talk on YouTube
* Go Compiler
* Go Compiler
80,000+ lines of C.
* Problem
Programming in Go is fun.
Programming in C is not.
* Problem
Writing a Go compiler requires Go expertise.
Writing any program in C requires C expertise.
Writing a Go compiler in C requires Go and C expertise.
* Solution
Write the Go compiler in Go.
* Past
Why not write the Go compiler in Go on day one?
- Go did not exist.
- Go was unstable.
- Go is not targeting compiler writers.
* Present
Why do it today?
- Go does exist.
- Go is stable.
- Go is a great general purpose language.
* How?
Crazy idea: mechanical conversion.
“One big gofix.”
* C
* C
- First creative burst in 1972 at Bell Labs
- Ritchie, [[http://cm.bell-labs.com/who/dmr/chist.html][The Development of the C Language]], HOPL 1993
- “C is quirky, flawed, and an enormous success...”
* C Data Model
- Original target: PDP-11 with 24 kB of memory.
- Programmer is in charge of memory.
- “Off-stack, dynamically-allocated storage is provided only by a library routine and the burden of managing it is placed on the programmer: C is hostile to automatic garbage collection.”
- Types are there to help but not enforced.
* C Control Flow
- `do...while`, `for`, `switch`, `while`
- the much maligned `goto`
* C Program Model
- Per-file compilation.
- Headers vs code.
- `#define`, `#include`
* Conversion
* Challenges for Converting C to Go
- minor: unions, #define, comments
- goto
- type mapping
* Goal
Automated conversion of our C code to Go.
Target: _our_ C code, not _all_ C code.
- Want generated code to be maintainable.
- Want automatic translation for 99%+ of the code.
- No need to solve general problem for tiny number of cases.
- Special cases in converter are okay.
- Annotations in source code are okay.
* Warmups
* Unions
go/src/cmd/gc/go.h
struct Val
{
short ctype;
union
{
short reg; // OREGISTER
short bval; // bool value CTBOOL
Mpint* xval; // int CTINT, rune CTRUNE
Mpflt* fval; // float CTFLT
Mpcplx* cval; // float CTCPLX
Strlit* sval; // string CTSTR
} u;
};
* Unions
go/include/link.h
struct Addr
{
short type;
union
{
char sval[8];
float64 dval;
Prog* branch; // for 5g, 6g, 8g
} u;
...
};
* Unions
`#define` `struct` `union` `/*` `Great` `space` `saver` `*/`
* Unions
`#define` `union` `struct` `/*` `legal` `in` `C!` `*/`
And anyway, there are only two.
* #define
Can't just expand during parsing.
* #define
Not many.
/*
* defined macros
* you need super-gopher-guru privilege
* to add this list.
*/
#define nelem(x) (sizeof(x)/sizeof((x)[0]))
#define nil ((void*)0)
...
Extend parser to recognize special cases.
* #define
Annotate some.
#define BOM 0xFEFF
/*c2go enum { BOM = 0xFEFF }; */
Rewrite others.
enum {
BOM = 0xFEFF,
};
* Comments
Can't just discard during parsing.
/*
* If the new process paused because it was
* swapped out, set the stack level to the last call
* to savu(u_ssav). This means that the return
* which is executed immediately after the call to aretu
* actually returns from the last routine which did
* the savu.
*
* You are not expected to understand this.
*/
if(rp->p_flag&SSWAP) {
rp->p_flag =& ~SSWAP;
aretu(u.u_ssav);
}
* Comments
Record precise source locations.
case OMAPLIT:
n->esc = EscNone; // until proven otherwise
e->noesc = list(e->noesc, n);
n->escloopdepth = e->loopdepth;
// Keys and values make it to memory, lose track.
for(ll=n->list; ll; ll=ll->next) {
escassign(e, &e->theSink, ll->n->left);
escassign(e, &e->theSink, ll->n->right);
}
break;
Whole-line comments attach to syntax immediately following (or EOF).
Suffix comments attach to syntax immediately before.
Syntax carries comments if it moves.
* Goto
* C Goto
“27. Horrors! goto’s and labels
C has a goto statement and labels, so you can branch about the way you used to. But most of the time goto’s aren’t needed... The code can almost always be more clearly expressed by for/while, if/else, and compound statements.
* C Goto
One use of goto’s with some legitimacy is in a program which contains a long loop, where a while(1) would be too extended. Then you might write
mainloop:
...
goto mainloop;
Another use is to implement a break out of more than one level of for or while. goto’s can only branch to labels within the same function.”
— Kernighan, [[http://cm.bell-labs.com/who/dmr/ctut.pdf][Programming in C – A Tutorial]]
* Go Goto Restrictions
- Cannot jump over a variable declaration in target scope.
. if x {
goto Done
}
y := f()
print(y)
Done:
close(c)
return
* Go Goto Restrictions
- Cannot jump over a variable declaration in target scope.
. var y int
if x {
goto Done
}
y = f()
print(y)
Done:
close(c)
return
* Go Goto Restrictions
- Cannot jump into a new scope (into a { } block).
if bad {
Bad:
printError()
return err
}
...
if other bad thing {
goto Bad
}
* Go Goto Restrictions
- Cannot jump into a new scope (into a { } block or switch case).
switch x {
case 1:
F()
goto Common;
case 2:
G()
goto Common
case 3:
Common:
H()
}
* Goto in Go compiler
1032 goto statements
241 labels
* Goto in Go compiler
35 indented labels
18 switch case
6 multilevel break/continue
5 ‘else’ statement
4 cleanup/error labels
1 loop
1 difficult to explain
* Refactor switch case goto
switch(r->op) {
case OINDEXMAP:
n->op = OAS2MAPR;
goto common;
case ORECV:
n->op = OAS2RECV;
goto common;
case ODOTTYPE:
n->op = OAS2DOTTYPE;
r->op = ODOTTYPE2;
common:
...
}
* Refactor switch case goto
switch r.op {
case OINDEXMAP, ORECV, ODOTTYPE:
switch r.op {
case OINDEXMAP:
n.op = OAS2MAPR
case ORECV:
n.op = OAS2RECV
case ODOTTYPE:
n.op = OAS2DOTTYPE
r.op = ODOTTYPE2
}
...
}
* General solution
Baker, [[http://dl.acm.org/citation.cfm?id=321999][An Algorithm for Structuring Flowgraphs]], JACM 1977
But we don't need it.
Handle trivial rewrites in converter.
Rewrite problematic gotos by hand.
* Type Mapping
* Type Mapping
General question: what type to use in the Go translation?
- C allows implicit conversion between int, long, char and so on. Go must use one consistently.
- C uses pointers for what Go calls pointers _and_ slices.
* Type Mapping
Build graph of “assigned” value flow and extract clusters.
x = y;
int f(void) {
return x;
}
w = f();
void g(int z);
g(x);
g(y);
Apply to entire compiler (all files). Exclude some functions.
* Type Mapping
int
islvalue(Node *n)
{
switch(n->op) {
case OINDEX:
if(isfixedarray(n->left->type))
return islvalue(n->left);
if(n->left->type != T && n->left->type->etype == TSTRING)
return 0;
// fall through
case OIND:
case ODOTPTR:
case OCLOSUREVAR:
return 1;
case ODOT:
return islvalue(n->left);
case ONAME:
if(n->class == PFUNC)
return 0;
return 1;
}
return 0;
}
* Type Mapping
int
islvalue(Node *n)
{
...
return islvalue(n->left);
...
return 0;
...
return 1;
...
return islvalue(n->left);
...
return 0;
...
return 1;
...
return 0;
}
* Type Mapping
cluster
types: int
values:
return from islvalue
0
1
islvalue(n)
islvalue(n->left)
islvalue(n->right)
contexts:
bool condition
/* if(islvalue(n)), if(!islvalue(n)), ... */
Translation: bool.
* Type Mapping
cluster
types: int
values:
return from checksliceconst
0
-1
contexts:
checksliceconst(lo, hi) < 0
checksliceconst(lo, mid) < 0
checksliceconst(mid, hi) < 0
Translation: bool or error.
* Type Mapping
cluster
types: Val*
values:
var Val *v
va_arg(fp->args, Val*)
contexts:
v->ctype
v->u
Translation: pointer.
* Type Mapping
cluster
types: long*
values:
var long* a1
&a->a[0]
contexts:
*a1
a1++
Translation: slice.
* Type Mapping
Cluster statistics
- 1,703 clusters in Go compiler
- median cluster size 4 values
- max cluster size 16,592 values
Clustering does not rely on C type information at all.
* Conversion status
- Still prototyping, but looks good.
- Aiming at Go 1.4, but no promises.
By the way, please try the Go 1.3 beta!
* Go from C to Go!
- Practical
- Applicable to other code bases?
- Applicable to other languages?
- Applicable to program understanding tools?