| Compiling and Linking |
| ---- |
| |
| Assume we have: |
| |
| - one or more source files, *.go, perhaps in different directories |
| - a compiler, C. it takes one .go file and generates a .o file. |
| - a linker, L, it takes one or more .o files and generates a go.out (!) file. |
| |
| There is a question around naming of the files. Let's avoid that |
| problem for now and state that if the input is X.go, the output of |
| the compiler is X.o, ignoring the package declaration in the file. |
| This is not current behavior and probably not correct behavior, but |
| it keeps the exposition simpler. |
| |
| Let's also assume that the linker knows about the run time and we |
| don't have to specify bootstrap and runtime linkage explicitly. |
| |
| |
| Basics |
| ---- |
| |
| Given a single file, main.go, with no dependencies, we do: |
| |
| C main.go # compile |
| L main.o # link |
| go.out # run |
| |
| Now let's say that main.go contains |
| |
| import "fmt" |
| |
| and that fmt.go contains |
| |
| import "sys" |
| |
| Then to build, we must compile in dependency order: |
| |
| C sys.go |
| C fmt.go |
| C main.go |
| |
| and then link |
| |
| L main.o fmt.o sys.o |
| |
| To the linker itself, the order of arguments is unimportant. |
| |
| When we compile fmt.go, we need to know the details of the functions |
| (etc.) exported by sys.go and used by fmt.go. When we run |
| |
| C fmt.go |
| |
| it discovers the import of sys, and must then read sys.o to discover |
| the details. We must therefore compile the exporting source file before we |
| can compile the importing source. Moreover, if there is a mismatch |
| between export and import, we can discover it during compilation |
| of the importing source. |
| |
| To be explicit, then, what we say is, in effect |
| |
| C sys.go |
| C fmt.go sys.o |
| C main.go fmt.o sys.o |
| L main.o fmt.o sys.o |
| |
| |
| The contents of .o files (I) |
| ---- |
| |
| It's necessary to include in fmt.o the information for linking |
| against the functions etc. in sys.o. It's also possible to identify |
| sys.o explicitly inside fmt.o, so we need to say only |
| |
| L main.o fmt.o |
| |
| with sys.o discovered automatically. Iterating again, it's easy |
| to reduce the link step to |
| |
| L main.o |
| |
| with L discovering automatically the .o files it needs to process |
| to create the final go.out. |
| |
| |
| Automation of dependencies (I) |
| ---- |
| |
| It should be possible to automate discovery of the dependencies of |
| main.go and therefore the order necessary to compile. Since the |
| source files contain explicit import statements, it is possible, |
| given a source file, to discover the dependency tree automatically. |
| (This will require rules and/or conventions about where to find |
| things; for now assume everything is in the same directory.) |
| |
| The program that does this might possibly be a variant of the |
| compiler, since it must parse import statements at least, but for |
| clarity let's call it D for dependency. It can be a little like |
| make, but let's not call it make because that brings along properties |
| we don't want. In particular, it reads the sources to discover the |
| dependencies; it doesn't need a separate description such as a |
| Makefile. |
| |
| In a directory with the source files above, including main.go, but |
| with no .o files, we say: |
| |
| D main.go |
| |
| D reads main.go, finds the import for fmt, and in effect descends, |
| automatically running |
| |
| D fmt.go |
| |
| which in turn invokes |
| |
| D sys.go |
| |
| The file sys.go has no dependencies, so it can be compiled; D |
| therefore says in effect |
| |
| "compile sys.go" |
| |
| and returns; then we have what we need for fmt.go since the exports |
| in sys.go are known (or at least the recipe to discover them is |
| known). So the next level says |
| |
| "compile fmt.go" |
| |
| and pops up, whereupon the top D says |
| |
| "compile main.go" |
| |
| The output of D could therefore be described as a script to run to |
| compile the source. |
| |
| We could imagine that instead, D actually runs the compiler. |
| (Conversely, we could imagine that C uses D to make sure the |
| dependencies are built, but that has the danger of causing unnecessary |
| dependency checking and compilation; more on that later.) |
| |
| To build, therefore, all we need to say is: |
| |
| D -c main.go # -c means 'run the compiler' |
| L main.o |
| |
| Obviously, D at this stage could just run L. Therefore, we can |
| simplify further by having it do so, whereupon |
| |
| D -c main.go |
| |
| can automate the complete compilation and linking process. |
| |
| Automation of dependencies (II) |
| ---- |
| |
| Let's say we now edit main.go without changing its imports. To |
| recompile, we have two options. First, we could be explicit: |
| |
| C main.go |
| |
| Or we could use D to automate running the compiler, as described |
| in the previous section: |
| |
| D -c main.go |
| |
| The D command will discover the import of fmt, but can see that fmt.o |
| already exists. Assuming its existence implies its currency, it need |
| go no further; it can invoke C to compile main.go and link as usual. |
| Whether it should make this assumption might be controlled by a flag. |
| For the purpose of discussion, let's say it makes the assumption if |
| the -c flag is set. |
| |
| There are two implications to this scheme. First, running D when D |
| is going to turn around and run C anyway implies we could just run |
| C directly and save one command invocation. (We could decide |
| independently whether C should automatically invoke the linker.) |
| |
| The other implication is more interesting. If we stop traversing |
| the dependency hierarchy as soon as we discover a .o file, then we |
| may not realize that fmt.o is out of date and link against a stale |
| binary. To fix this problem, we need to stat() or checksum the .o |
| and .go files to see if they need recompilation. Doing this every |
| time is expensive and gets us back into the make-like approach. |
| |
| The great majority of compilations do not require this full check, |
| however; this is especially true when in the compile-debug-edit |
| cycle. We therefore propose splitting the model into two scenarios. |
| |
| Scenario 1: General |
| |
| In this scenario, we ask D to update the full dependency tree by |
| stat()-ing or checksumming files to check currency. The generated |
| go.out will always be up to date but incremental compilation will |
| be slower. Typically, this will be necessary only after a major |
| operation like syncing or checking out code, or if there are known |
| changes being made to the dependencies. |
| |
| Scenario 2: Fast |
| |
| In this scenario, we explicitly tell D -c what has changed and have |
| it compile only what is required. Typically, this will mean compiling |
| only the single active file or maybe a few files. If an IDE is |
| present or there is some watcher tool, it's easy to avoid the common |
| mistake of forgetting to compile a changed file. |
| |
| If an edit has caused skew between export and import, this will be |
| caught by the compiler, so it should be type-safe at least. If D is |
| running the compilation, it might be possible to arrange that C tells |
| it there is a dependency problem and have D then try to resolve it |
| by reevaluation. |
| |
| |
| The contents of .o files (II) |
| ---- |
| |
| For scenario 2, we can make things even faster if the .o files |
| identify not just the files that must be imported to satisfy the |
| imports, but details about the imports themselves. Let's say main.go |
| uses only one function from fmt.go, called F. If the compiled main.o |
| says, in effect |
| |
| from package fmt get F |
| |
| then the linker will not need to read all of fmt.o to link main.o; |
| instead it can extract only the necessary function. |
| |
| Even better, if fmt is a package made of many files, it may be |
| possible to store in main.o specific information about the exact |
| files needed: |
| |
| from file fmtF.o get F |
| |
| The linker can then not even bother opening the other .o files that |
| form package fmt. |
| |
| The compiler should therefore be explicit and detailed within the .o |
| files it generates about what elements of a package are needed by |
| the program being compiled. |
| |
| Earlier, we said that when we run |
| |
| C fmt.go |
| |
| it discovers the import of sys, and must then read sys.o to discover |
| the details. Note that if we record the information as specified here, |
| when we then do |
| |
| C main.go |
| |
| and it reads fmt.o, it does not in turn need to read sys.o; the necessary |
| information has already been pulled up into fmt.o by D. |
| |
| Thus, once the dependency information is properly constructed, to |
| compile a program X.go we must read X.go plus N .o files, where N |
| is the number of packages explicitly imported by X.go. The transitive |
| closure need not be evaluated to compile a file, only the explicit |
| imports. By this result, we hope to dramatically reduce the amount |
| of I/O necessary to compile a Go source file. |
| |
| To put this another way, if a package P imports packages Xi, the |
| existence of Xi.o files is all that is needed to compile P because the |
| Xi.o files contain the export information. This is what breaks the |
| transitive dependency closure. |