doc/candl.txt - go - Git at Google

 Compiling and Linking
 ----

 Assume we have:

 	- one or more source files, *.go, perhaps in different directories
 	- a compiler, C. it takes one .go file and generates a .o file.
 	- a linker, L, it takes one or more .o files and generates a go.out (!) file.

 There is a question around naming of the files.  Let's avoid that
 problem for now and state that if the input is X.go, the output of
 the compiler is X.o, ignoring the package declaration in the file.
 This is not current behavior and probably not correct behavior, but
 it keeps the exposition simpler.

 Let's also assume that the linker knows about the run time and we
 don't have to specify bootstrap and runtime linkage explicitly.


 Basics
 ----

 Given a single file, main.go, with no dependencies, we do:

 	C main.go  # compile
 	L main.o  # link
 	go.out  # run

 Now let's say that main.go contains

 	import "fmt"

 and that fmt.go contains

 	import "sys"

 Then to build, we must compile in dependency order:

 	C sys.go
 	C fmt.go
 	C main.go

 and then link

 	L main.o fmt.o sys.o

 To the linker itself, the order of arguments is unimportant.

 When we compile fmt.go, we need to know the details of the functions
 (etc.) exported by sys.go and used by fmt.go.  When we run

 	C fmt.go

 it discovers the import of sys, and must then read sys.o to discover
 the details.  We must therefore compile the exporting source file before we
 can compile the importing source.  Moreover, if there is a mismatch
 between export and import, we can discover it during compilation
 of the importing source.

 To be explicit, then, what we say is, in effect

 	C sys.go
 	C fmt.go sys.o
 	C main.go fmt.o sys.o
 	L main.o fmt.o sys.o


 The contents of .o files (I)
 ----

 It's necessary to include in fmt.o the information for linking
 against the functions etc. in sys.o.  It's also possible to identify
 sys.o explicitly inside fmt.o, so we need to say only

 	L main.o fmt.o

 with sys.o discovered automatically.   Iterating again, it's easy
 to reduce the link step to

 	L main.o

 with L discovering automatically the .o files it needs to process
 to create the final go.out.


 Automation of dependencies (I)
 ----

 It should be possible to automate discovery of the dependencies of
 main.go and therefore the order necessary to compile.  Since the
 source files contain explicit import statements, it is possible,
 given a source file, to discover the dependency tree automatically.
 (This will require rules and/or conventions about where to find
 things; for now assume everything is in the same directory.)

 The program that does this might possibly be a variant of the
 compiler, since it must parse import statements at least, but for
 clarity let's call it D for dependency.  It can be a little like
 make, but let's not call it make because that brings along properties
 we don't want. In particular, it reads the sources to discover the
 dependencies; it doesn't need a separate description such as a
 Makefile.

 In a directory with the source files above, including main.go, but
 with no .o files, we say:

 	D main.go

 D reads main.go, finds the import for fmt, and in effect descends,
 automatically running

 	D fmt.go

 which in turn invokes

 	D sys.go

 The file sys.go has no dependencies, so it can be compiled; D
 therefore says in effect

 	"compile sys.go"

 and returns; then we have what we need for fmt.go since the exports
 in sys.go are known (or at least the recipe to discover them is
 known).  So the next level says

 	"compile fmt.go"

 and pops up, whereupon the top D says

 	"compile main.go"

 The output of D could therefore be described as a script to run to
 compile the source.

 We could imagine that instead, D actually runs the compiler.
 (Conversely, we could imagine that C uses D to make sure the
 dependencies are built, but that has the danger of causing unnecessary
 dependency checking and compilation; more on that later.)

 To build, therefore, all we need to say is:

 	D -c main.go  # -c means 'run the compiler'
 	L main.o

 Obviously, D at this stage could just run L.  Therefore, we can
 simplify further by having it do so, whereupon

 	D -c main.go

 can automate the complete compilation and linking process.

 Automation of dependencies (II)
 ----

 Let's say we now edit main.go without changing its imports.  To
 recompile, we have two options. First, we could be explicit:

 	C main.go

 Or we could use D to automate running the compiler, as described
 in the previous section:

 	D -c main.go

 The D command will discover the import of fmt, but can see that fmt.o
 already exists.  Assuming its existence implies its currency, it need
 go no further; it can invoke C to compile main.go and link as usual.
 Whether it should make this assumption might be controlled by a flag.
 For the purpose of discussion, let's say it makes the assumption if
 the -c flag is set.

 There are two implications to this scheme. First, running D when D
 is going to turn around and run C anyway implies we could just run
 C directly and save one command invocation.   (We could decide
 independently whether C should automatically invoke the linker.)

 The other implication is more interesting.  If we stop traversing
 the dependency hierarchy as soon as we discover a .o file, then we
 may not realize that fmt.o is out of date and link against a stale
 binary. To fix this problem, we need to stat() or checksum the .o
 and .go files to see if they need recompilation.  Doing this every
 time is expensive and gets us back into the make-like approach.

 The great majority of compilations do not require this full check,
 however; this is especially true when in the compile-debug-edit
 cycle.  We therefore propose splitting the model into two scenarios.

 Scenario 1: General

 In this scenario, we ask D to update the full dependency tree by
 stat()-ing or checksumming files to check currency.  The generated
 go.out will always be up to date but incremental compilation will
 be slower.  Typically, this will be necessary only after a major
 operation like syncing or checking out code, or if there are known
 changes being made to the dependencies.

 Scenario 2: Fast

 In this scenario, we explicitly tell D -c what has changed and have
 it compile only what is required.  Typically, this will mean compiling
 only the single active file or maybe a few files.  If an IDE is
 present or there is some watcher tool, it's easy to avoid the common
 mistake of forgetting to compile a changed file.

 If an edit has caused skew between export and import, this will be
 caught by the compiler, so it should be type-safe at least.  If D is
 running the compilation, it might be possible to arrange that C tells
 it there is a dependency problem and have D then try to resolve it
 by reevaluation.


 The contents of .o files (II)
 ----

 For scenario 2, we can make things even faster if the .o files
 identify not just the files that must be imported to satisfy the
 imports, but details about the imports themselves.  Let's say main.go
 uses only one function from fmt.go, called F. If the compiled main.o
 says, in effect

 	from package fmt get F

 then the linker will not need to read all of fmt.o to link main.o;
 instead it can extract only the necessary function.

 Even better, if fmt is a package made of many files, it may be
 possible to store in main.o specific information about the exact
 files needed:

 	from file fmtF.o get F

 The linker can then not even bother opening the other .o files that
 form package fmt.

 The compiler should therefore be explicit and detailed within the .o
 files it generates about what elements of a package are needed by
 the program being compiled.

 Earlier, we said that when we run

 	C fmt.go

 it discovers the import of sys, and must then read sys.o to discover
 the details.  Note that if we record the information as specified here,
 when we then do

 	C main.go

 and it reads fmt.o, it does not in turn need to read sys.o; the necessary
 information has already been pulled up into fmt.o by D.

 Thus, once the dependency information is properly constructed, to
 compile a program X.go we must read X.go plus N .o files, where N
 is the number of packages explicitly imported by X.go.  The transitive
 closure need not be evaluated to compile a file, only the explicit
 imports.  By this result, we hope to dramatically reduce the amount
 of I/O necessary to compile a Go source file.

 To put this another way, if a package P imports packages Xi, the
 existence of Xi.o files is all that is needed to compile P because the
 Xi.o files contain the export information.  This is what breaks the
 transitive dependency closure.
	Compiling and Linking
	----

	Assume we have:

	- one or more source files, *.go, perhaps in different directories
	- a compiler, C. it takes one .go file and generates a .o file.
	- a linker, L, it takes one or more .o files and generates a go.out (!) file.

	There is a question around naming of the files. Let's avoid that
	problem for now and state that if the input is X.go, the output of
	the compiler is X.o, ignoring the package declaration in the file.
	This is not current behavior and probably not correct behavior, but
	it keeps the exposition simpler.

	Let's also assume that the linker knows about the run time and we
	don't have to specify bootstrap and runtime linkage explicitly.


	Basics
	----

	Given a single file, main.go, with no dependencies, we do:

	C main.go # compile
	L main.o # link
	go.out # run

	Now let's say that main.go contains

	import "fmt"

	and that fmt.go contains

	import "sys"

	Then to build, we must compile in dependency order:

	C sys.go
	C fmt.go
	C main.go

	and then link

	L main.o fmt.o sys.o

	To the linker itself, the order of arguments is unimportant.

	When we compile fmt.go, we need to know the details of the functions
	(etc.) exported by sys.go and used by fmt.go. When we run

	C fmt.go

	it discovers the import of sys, and must then read sys.o to discover
	the details. We must therefore compile the exporting source file before we
	can compile the importing source. Moreover, if there is a mismatch
	between export and import, we can discover it during compilation
	of the importing source.

	To be explicit, then, what we say is, in effect

	C sys.go
	C fmt.go sys.o
	C main.go fmt.o sys.o
	L main.o fmt.o sys.o


	The contents of .o files (I)
	----

	It's necessary to include in fmt.o the information for linking
	against the functions etc. in sys.o. It's also possible to identify
	sys.o explicitly inside fmt.o, so we need to say only

	L main.o fmt.o

	with sys.o discovered automatically. Iterating again, it's easy
	to reduce the link step to

	L main.o

	with L discovering automatically the .o files it needs to process
	to create the final go.out.


	Automation of dependencies (I)
	----

	It should be possible to automate discovery of the dependencies of
	main.go and therefore the order necessary to compile. Since the
	source files contain explicit import statements, it is possible,
	given a source file, to discover the dependency tree automatically.
	(This will require rules and/or conventions about where to find
	things; for now assume everything is in the same directory.)

	The program that does this might possibly be a variant of the
	compiler, since it must parse import statements at least, but for
	clarity let's call it D for dependency. It can be a little like
	make, but let's not call it make because that brings along properties
	we don't want. In particular, it reads the sources to discover the
	dependencies; it doesn't need a separate description such as a
	Makefile.

	In a directory with the source files above, including main.go, but
	with no .o files, we say:

	D main.go

	D reads main.go, finds the import for fmt, and in effect descends,
	automatically running

	D fmt.go

	which in turn invokes

	D sys.go

	The file sys.go has no dependencies, so it can be compiled; D
	therefore says in effect

	"compile sys.go"

	and returns; then we have what we need for fmt.go since the exports
	in sys.go are known (or at least the recipe to discover them is
	known). So the next level says

	"compile fmt.go"

	and pops up, whereupon the top D says

	"compile main.go"

	The output of D could therefore be described as a script to run to
	compile the source.

	We could imagine that instead, D actually runs the compiler.
	(Conversely, we could imagine that C uses D to make sure the
	dependencies are built, but that has the danger of causing unnecessary
	dependency checking and compilation; more on that later.)

	To build, therefore, all we need to say is:

	D -c main.go # -c means 'run the compiler'
	L main.o

	Obviously, D at this stage could just run L. Therefore, we can
	simplify further by having it do so, whereupon

	D -c main.go

	can automate the complete compilation and linking process.

	Automation of dependencies (II)
	----

	Let's say we now edit main.go without changing its imports. To
	recompile, we have two options. First, we could be explicit:

	C main.go

	Or we could use D to automate running the compiler, as described
	in the previous section:

	D -c main.go

	The D command will discover the import of fmt, but can see that fmt.o
	already exists. Assuming its existence implies its currency, it need
	go no further; it can invoke C to compile main.go and link as usual.
	Whether it should make this assumption might be controlled by a flag.
	For the purpose of discussion, let's say it makes the assumption if
	the -c flag is set.

	There are two implications to this scheme. First, running D when D
	is going to turn around and run C anyway implies we could just run
	C directly and save one command invocation. (We could decide
	independently whether C should automatically invoke the linker.)

	The other implication is more interesting. If we stop traversing
	the dependency hierarchy as soon as we discover a .o file, then we
	may not realize that fmt.o is out of date and link against a stale
	binary. To fix this problem, we need to stat() or checksum the .o
	and .go files to see if they need recompilation. Doing this every
	time is expensive and gets us back into the make-like approach.

	The great majority of compilations do not require this full check,
	however; this is especially true when in the compile-debug-edit
	cycle. We therefore propose splitting the model into two scenarios.

	Scenario 1: General

	In this scenario, we ask D to update the full dependency tree by
	stat()-ing or checksumming files to check currency. The generated
	go.out will always be up to date but incremental compilation will
	be slower. Typically, this will be necessary only after a major
	operation like syncing or checking out code, or if there are known
	changes being made to the dependencies.

	Scenario 2: Fast

	In this scenario, we explicitly tell D -c what has changed and have
	it compile only what is required. Typically, this will mean compiling
	only the single active file or maybe a few files. If an IDE is
	present or there is some watcher tool, it's easy to avoid the common
	mistake of forgetting to compile a changed file.

	If an edit has caused skew between export and import, this will be
	caught by the compiler, so it should be type-safe at least. If D is
	running the compilation, it might be possible to arrange that C tells
	it there is a dependency problem and have D then try to resolve it
	by reevaluation.


	The contents of .o files (II)
	----

	For scenario 2, we can make things even faster if the .o files
	identify not just the files that must be imported to satisfy the
	imports, but details about the imports themselves. Let's say main.go
	uses only one function from fmt.go, called F. If the compiled main.o
	says, in effect

	from package fmt get F

	then the linker will not need to read all of fmt.o to link main.o;
	instead it can extract only the necessary function.

	Even better, if fmt is a package made of many files, it may be
	possible to store in main.o specific information about the exact
	files needed:

	from file fmtF.o get F

	The linker can then not even bother opening the other .o files that
	form package fmt.

	The compiler should therefore be explicit and detailed within the .o
	files it generates about what elements of a package are needed by
	the program being compiled.

	Earlier, we said that when we run

	C fmt.go

	it discovers the import of sys, and must then read sys.o to discover
	the details. Note that if we record the information as specified here,
	when we then do

	C main.go

	and it reads fmt.o, it does not in turn need to read sys.o; the necessary
	information has already been pulled up into fmt.o by D.

	Thus, once the dependency information is properly constructed, to
	compile a program X.go we must read X.go plus N .o files, where N
	is the number of packages explicitly imported by X.go. The transitive
	closure need not be evaluated to compile a file, only the explicit
	imports. By this result, we hope to dramatically reduce the amount
	of I/O necessary to compile a Go source file.

	To put this another way, if a package P imports packages Xi, the
	existence of Xi.o files is all that is needed to compile P because the
	Xi.o files contain the export information. This is what breaks the
	transitive dependency closure.