blob: 38caba10a8de2cf17dc2209994d3063a9d098c40 [file] [log] [blame]
Compiling and Linking
----
Assume we have:
- one or more source files, *.go, perhaps in different directories
- a compiler, C. it takes one .go file and generates a .o file.
- a linker, L, it takes one or more .o files and generates a go.out (!) file.
There is a question around naming of the files. Let's avoid that
problem for now and state that if the input is X.go, the output of
the compiler is X.o, ignoring the package declaration in the file.
This is not current behavior and probably not correct behavior, but
it keeps the exposition simpler.
Let's also assume that the linker knows about the run time and we
don't have to specify bootstrap and runtime linkage explicitly.
Basics
----
Given a single file, main.go, with no dependencies, we do:
C main.go # compile
L main.o # link
go.out # run
Now let's say that main.go contains
import "fmt"
and that fmt.go contains
import "sys"
Then to build, we must compile in dependency order:
C sys.go
C fmt.go
C main.go
and then link
L main.o fmt.o sys.o
To the linker itself, the order of arguments is unimportant.
When we compile fmt.go, we need to know the details of the functions
(etc.) exported by sys.go and used by fmt.go. When we run
C fmt.go
it discovers the import of sys, and must then read sys.o to discover
the details. We must therefore compile the exporting source file before we
can compile the importing source. Moreover, if there is a mismatch
between export and import, we can discover it during compilation
of the importing source.
To be explicit, then, what we say is, in effect
C sys.go
C fmt.go sys.o
C main.go fmt.o sys.o
L main.o fmt.o sys.o
The contents of .o files (I)
----
It's necessary to include in fmt.o the information for linking
against the functions etc. in sys.o. It's also possible to identify
sys.o explicitly inside fmt.o, so we need to say only
L main.o fmt.o
with sys.o discovered automatically. Iterating again, it's easy
to reduce the link step to
L main.o
with L discovering automatically the .o files it needs to process
to create the final go.out.
Automation of dependencies (I)
----
It should be possible to automate discovery of the dependencies of
main.go and therefore the order necessary to compile. Since the
source files contain explicit import statements, it is possible,
given a source file, to discover the dependency tree automatically.
(This will require rules and/or conventions about where to find
things; for now assume everything is in the same directory.)
The program that does this might possibly be a variant of the
compiler, since it must parse import statements at least, but for
clarity let's call it D for dependency. It can be a little like
make, but let's not call it make because that brings along properties
we don't want. In particular, it reads the sources to discover the
dependencies; it doesn't need a separate description such as a
Makefile.
In a directory with the source files above, including main.go, but
with no .o files, we say:
D main.go
D reads main.go, finds the import for fmt, and in effect descends,
automatically running
D fmt.go
which in turn invokes
D sys.go
The file sys.go has no dependencies, so it can be compiled; D
therefore says in effect
"compile sys.go"
and returns; then we have what we need for fmt.go since the exports
in sys.go are known (or at least the recipe to discover them is
known). So the next level says
"compile fmt.go"
and pops up, whereupon the top D says
"compile main.go"
The output of D could therefore be described as a script to run to
compile the source.
We could imagine that instead, D actually runs the compiler.
(Conversely, we could imagine that C uses D to make sure the
dependencies are built, but that has the danger of causing unnecessary
dependency checking and compilation; more on that later.)
To build, therefore, all we need to say is:
D -c main.go # -c means 'run the compiler'
L main.o
Obviously, D at this stage could just run L. Therefore, we can
simplify further by having it do so, whereupon
D -c main.go
can automate the complete compilation and linking process.
Automation of dependencies (II)
----
Let's say we now edit main.go without changing its imports. To
recompile, we have two options. First, we could be explicit:
C main.go
Or we could use D to automate running the compiler, as described
in the previous section:
D -c main.go
The D command will discover the import of fmt, but can see that fmt.o
already exists. Assuming its existence implies its currency, it need
go no further; it can invoke C to compile main.go and link as usual.
Whether it should make this assumption might be controlled by a flag.
For the purpose of discussion, let's say it makes the assumption if
the -c flag is set.
There are two implications to this scheme. First, running D when D
is going to turn around and run C anyway implies we could just run
C directly and save one command invocation. (We could decide
independently whether C should automatically invoke the linker.)
The other implication is more interesting. If we stop traversing
the dependency hierarchy as soon as we discover a .o file, then we
may not realize that fmt.o is out of date and link against a stale
binary. To fix this problem, we need to stat() or checksum the .o
and .go files to see if they need recompilation. Doing this every
time is expensive and gets us back into the make-like approach.
The great majority of compilations do not require this full check,
however; this is especially true when in the compile-debug-edit
cycle. We therefore propose splitting the model into two scenarios.
Scenario 1: General
In this scenario, we ask D to update the full dependency tree by
stat()-ing or checksumming files to check currency. The generated
go.out will always be up to date but incremental compilation will
be slower. Typically, this will be necessary only after a major
operation like syncing or checking out code, or if there are known
changes being made to the dependencies.
Scenario 2: Fast
In this scenario, we explicitly tell D -c what has changed and have
it compile only what is required. Typically, this will mean compiling
only the single active file or maybe a few files. If an IDE is
present or there is some watcher tool, it's easy to avoid the common
mistake of forgetting to compile a changed file.
If an edit has caused skew between export and import, this will be
caught by the compiler, so it should be type-safe at least. If D is
running the compilation, it might be possible to arrange that C tells
it there is a dependency problem and have D then try to resolve it
by reevaluation.
The contents of .o files (II)
----
For scenario 2, we can make things even faster if the .o files
identify not just the files that must be imported to satisfy the
imports, but details about the imports themselves. Let's say main.go
uses only one function from fmt.go, called F. If the compiled main.o
says, in effect
from package fmt get F
then the linker will not need to read all of fmt.o to link main.o;
instead it can extract only the necessary function.
Even better, if fmt is a package made of many files, it may be
possible to store in main.o specific information about the exact
files needed:
from file fmtF.o get F
The linker can then not even bother opening the other .o files that
form package fmt.
The compiler should therefore be explicit and detailed within the .o
files it generates about what elements of a package are needed by
the program being compiled.
Earlier, we said that when we run
C fmt.go
it discovers the import of sys, and must then read sys.o to discover
the details. Note that if we record the information as specified here,
when we then do
C main.go
and it reads fmt.o, it does not in turn need to read sys.o; the necessary
information has already been pulled up into fmt.o by D.
Thus, once the dependency information is properly constructed, to
compile a program X.go we must read X.go plus N .o files, where N
is the number of packages explicitly imported by X.go. The transitive
closure need not be evaluated to compile a file, only the explicit
imports. By this result, we hope to dramatically reduce the amount
of I/O necessary to compile a Go source file.
To put this another way, if a package P imports packages Xi, the
existence of Xi.o files is all that is needed to compile P because the
Xi.o files contain the export information. This is what breaks the
transitive dependency closure.