Go spec starting point. SVN=111041

commit: 18c5b488a3b2e218c0e0cf2a7d4820d9da93a554 [log] [tgz]
author: Robert Griesemer <gri@golang.org> Sun Mar 02 20:47:34 2008 -0800
committer: Robert Griesemer <gri@golang.org> Sun Mar 02 20:47:34 2008 -0800
tree: dbf6fb4817189534799d50a170a70e4684d2685f
parent: d82b11e4a46307f1f1415024f33263e819c222b8 [diff]
diff --git a/doc/go_spec b/doc/go_spec
new file mode 100644
index 0000000..b9fc639
--- /dev/null
+++ b/doc/go_spec

@@ -0,0 +1,1197 @@
+The Go Annotated Specification
+
+This document supersedes all previous Go spec attempts.  The intent is
+to make this a reference for syntax and semantics.  It is annotated
+with additional information not strictly belonging into a language
+spec.
+
+
+Recent design decisions
+
+A list of decisions made but for which we haven't incorporated proper
+language into this spec.  Keep this section small and the spec
+up-to-date instead.
+
+- multi-dimensional arrays: implementation restriction for now
+
+- no '->', always '.'
+- (*a)[i] can be sugared into: a[i]
+- '.' to select package elements
+
+- arrays are not automatically pointers, we must always say
+  explicitly: "*array T" if we mean a pointer to that array
+- there is no pointer arithmetic in the language
+- there are no unions
+
+- packages: need to pin it all down
+
+- tuple notation: (a, b) = (b, a);
+  generally: need to make this clear
+
+- for now: no (C) 'static' variables inside functions
+
+- exports: we write: 'export a, b, c;' (with a, b, c, etc.  a list of
+  exported names, possibly also: structure.field)
+- the ordering of methods in interfaces is not relevant
+- structs must be identical (same decl) to be the same
+  (Ken has different implementation: equivalent declaration is the
+  same; what about methods?)
+
+- new methods can be added to a struct outside the package where the
+  struct is declared (need to think through all implications)
+- array assignment by value
+- do we need a type switch?
+
+- write down scoping rules for statements
+
+- semicolons: where are they needed and where are they not needed.
+  need a simple and consistent rule
+  
+- we have: postfix ++ and -- as statements
+
+
+
+Guiding principles
+
+Go is an attempt at a new systems programming language.
+[gri: this needs to be expanded. some keywords below]
+
+- small, concise, crisp
+- procedural
+- strongly typed
+- few, orthogonal, and general concepts
+- avoid repetition of declarations
+- multi-threading support in the language
+- garbage collected
+- containers w/o templates
+- compiler can be written in Go and so can it's GC
+- very fast compilation possible (1MLOC/s stretch goal)
+- reasonably efficient (C ballpark)
+- compact, predictable code
+  (local program changes generally have local effects)
+- no macros
+
+
+Syntax
+
+The syntax of Go borrows from the C tradition with respect to
+statements and from the Pascal tradition with respect to declarations.
+Go programs are written using a lean notation with a small set of
+keywords, without filler keywords (such as 'of', 'to', etc.) or other
+gratuitous syntax, and with a slight preference for expressive
+keywords (e.g.  'function') over operators or other syntactic
+mechanisms.  Generally, "light" language features (variables, simple
+control flow, etc.) are expressed using a light-weight notation (short
+keywords, little syntax), while "heavy" language features use a more
+heavy-weight notation (longer keywords, more syntax).
+
+[gri: should say something about syntactic alternatives: if a
+syntactic form foreseeably will lead to a style recommendation, try to
+make that the syntactic form instead.  For instance, Go structured
+statements always require the {} braces even if there is only a single
+sub-statement.  Similar ideas apply elsewhere.]
+
+
+Modularity, identifiers and scopes
+
+A Go program consists of one or more files compiled separately, though
+not independently.  A single file or compilation unit may make
+individual identifiers visible to other files by marking them as
+exported; there is no "header file".  The exported interface of a file
+may be exposed in condensed form (without the corresponding
+implementation) through tools.
+
+A package collects types, constants, functions, and so on into a named
+entity that may be imported to enable its constituents be used in
+another compilation unit.  Each source file is part of exactly one
+package; each package is constructed from one source file.
+
+Within a file, all identifiers are declared explicitly (expect for
+general predeclared identifiers such as true and false) and thus for
+each identifier in a file the corresponding declaration can be found
+in that same file (usually before its use, except for the rare case of
+forward declarations).  Identifiers may denote program entities that
+are implemented in other files.  Nevertheless, such identifiers are
+still declared via an import declaration in the file that is referring
+to them.  This explicit declaration requirement ensures that every
+compilation unit can be read by itself.
+
+The scoping of identifiers is uniform: An identifier is visible from
+the point of its declaration to the end of the immediately surrounding
+block, and nested identifiers shadow outer identifiers with the same
+name.  All identifiers are in the same namespace; i.e., no two
+identifiers in the same scope may have the same name even if they
+denote different language concepts (for instance, such as variable vs
+a function).  Uniform scoping rules make Go programs easier to read
+and to understand.
+
+
+Program structure
+
+A compilation unit consists of a package specifier followed by import
+declarations followed by other declarations.  There are no statements
+at the top level of a file.  [gri: do we have a main function?  or do
+we treat all functions uniformly and instead permit a program to be
+started by providing a package name and a "start" function?  I like
+the latter because if gives a lot of flexibility and should be not
+hard to implement].  [r: i suggest that we define a symbol, main or
+Main or start or Start, and begin execution in the single exported
+function of that name in the program.  the flexibility of having a
+choice of name is unimportant and the corresponding need to define the
+name in order to link or execute adds complexity.  by default it
+should be trivial; we could allow a run-time flag to override the
+default for gri's flexibility.]
+
+
+Typing, polymorphism, and object-orientation
+
+Go programs are strongly typed; i.e., each program entity has a static
+type known at compile time.  Variables also have a dynamic type, which
+is the type of the value they hold at run-time.  Generally, the
+dynamic and the static type of a variable are identical, except for
+variables of interface type.  In that case the dynamic type of the
+variable is a pointer to a structure that implements the variable's
+(static) interface type.  There may be many different structures
+implementing an interface and thus the dynamic type of such variables
+is generally not known at compile time.  Such variables are called
+polymorphic.
+
+Interface types are the mechanism to support an object-oriented
+programming style.  Different interface types are independent of each
+other and no explicit hierarchy is required (such as single or
+multiple inheritance explicitly specified through respective type
+declarations).  Interface types only define a set of functions that a
+corresponding implementation must provide.  Thus interface and
+implementation are strictly separated.
+
+An interface is implemented by associating functions (methods) with
+structures.  If a structure implements all methods of an interface, it
+implements that interface and thus can be used where that interface is
+required.  Unless used through a variable of interface type, methods
+can always be statically bound (they are not "virtual"), and incur no
+runtime overhead compared to an ordinary function.
+
+Go has no explicit notion of classes, sub-classes, or inheritance.
+These concepts are trivially modeled in Go through the use of
+functions, structures, associated methods, and interfaces.
+
+Go has no explicit notion of type parameters or templates.  Instead,
+containers (such as stacks, lists, etc.) are implemented through the
+use of abstract data types operating on interface types.  [gri: there
+is some automatic boxing, semi-automatic unboxing support for basic
+types].
+
+
+Pointers and garbage collection
+
+Variables may be allocated automatically (when entering the scope of
+the variable) or explicitly on the heap.  Pointers are used to refer
+to heap-allocated variables.  Pointers may also be used to point to
+any other variable; such a pointer is obtained by "getting the
+address" of that variable.  In particular, pointers may point "inside"
+other variables, or to automatic variables (which are usually
+allocated on the stack).  Variables are automatically reclaimed when
+they are no longer accessible.  There is no pointer arithmetic in Go.
+
+
+Functions
+
+Functions contain declarations and statements.  They may be invoked
+recursively.  Functions may declare nested functions, and nested
+functions have access to the variables in the surrounding functions,
+they are in fact closures.  Functions may be anonymous and appear as
+literals in expressions.
+
+
+Multithreading and channels
+
+[Rob: We need something here]
+
+
+
+
+Notation
+
+The syntax is specified in green productions using Extended
+Backus-Naur Form (EBNF).  In particular:
+
+''  encloses lexical symbols
+|  separates alternatives
+()  used for grouping
+[]  specifies option (0 or 1 times)
+{}  specifies repetition (0 to n times)
+
+A production may be referred to from various places in this document
+but is usually defined close to its first use.  Code examples are
+written in gray.  Annotations are in blue, and open issues are in red.
+One goal is to get rid of all red text in this document. [r: done!]
+
+
+Vocabulary and representation
+
+REWRITE THIS: BADLY EXPRESSED
+
+Go program source is a sequence of characters.  Each character is a
+Unicode code point encoded in UTF-8.
+
+A Go program is a sequence of symbols satisfying the Go syntax.  A
+symbol is a non-empty sequence of characters.  Symbols are
+identifiers, numbers, strings, operators, delimiters, and comments.
+White space must not occur within symbols (except in comments, and in
+the case of blanks and tabs in strings).  They are ignored unless they
+are essential to separate two consecutive symbols.
+
+White space is composed of blanks, newlines, carriage returns, and
+tabs only.
+
+A character is a Unicode code point.  In particular, capital and
+lower-case letters are considered as being distinct.  Note that some
+Unicode characters (e.g., the character ä), may be representable in
+two forms, as a single code point, or as two code points.  For the
+Unicode standard these two encodings represent the same character, but
+for Go, these two encodings correspond to two different characters).
+
+Source encoding
+
+The input is encoded in UTF-8.  In the grammar we use the notation
+
+utf8_char
+
+to refer to an arbitrary Unicode code point encoded in UTF-8.
+
+Digits and Letters
+
+octal_digit = { '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' } .
+decimal_digit = { '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' } .
+hex_digit = { '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | 'a' |
+              'A' | 'b' | 'B' | 'c' | 'C' | 'd' | 'D' | 'e' | 'E' | 'f' | 'F' } .
+letter = 'A' | 'a' | ... 'Z' | 'z' | '_' .
+
+For now, letters and digits are ASCII.  We may expand this to allow
+Unicode definitions of letters and digits.
+
+
+Identifiers
+
+An identifier is a name for a program entity such as a variable, a
+type, a function, etc.
+
+identifier = letter { letter | decimal_digit } .
+
+
+- need to explain scopes, visibility (elsewhere)
+- need to say something about predeclared identifiers, and their
+  (universe) scope (elsewhere)
+
+
+Character and string literals
+
+A RawStringLit is a string literal delimited by back quotes ``; the
+first back quote encountered after the opening back quote terminates
+the string.
+
+RawStringLit = '`' { utf8_char } '`' .
+
+`abc`
+`\n`
+
+Character and string literals are very similar to C except:
+  - Octal character escapes are always 3 digits (\077 not \77)
+  - Hexadecimal character escapes are always 2 digits (\x07 not \x7)
+  - Strings are UTF-8 and represent Unicode
+  - `` strings exist; they do not interpret backslashes
+
+CharLit = '\'' ( UnicodeValue | ByteValue ) '\'' .
+StringLit = RawStringLit | InterpretedStringLit .
+InterpretedStringLit = '"' { UnicodeValue | ByteValue } '"' .
+ByteValue = OctalByteValue | HexByteValue .
+OctalByteValue = '\' octal_digit octal_digit octal_digit .
+HexByteValue = '\' 'x' hex_digit hex_digit .
+UnicodeValue = utf8_char | EscapedCharacter | LittleUValue | BigUValue .
+LittleUValue = '\' 'u' hex_digit hex_digit hex_digit hex_digit .
+BigUValue = '\' 'U' hex_digit hex_digit hex_digit hex_digit
+                    hex_digit hex_digit hex_digit hex_digit .
+EscapedCharacter = '\' ( 'a' | 'b' | 'f' | 'n' | 'r' | 't' | 'v' ) .
+
+An OctalByteValue contains three octal digits.  A HexByteValue
+contains two hexadecimal digits.  (Note: This differs from C but is
+simpler.)
+
+It is erroneous for an OctalByteValue to represent a value larger than 255. 
+(By construction, a HexByteValue cannot.)
+
+A UnicodeValue takes one of four forms:
+
+   1.  The UTF-8 encoding of a Unicode code point.  Since Go source
+       text is in UTF-8, this is the obvious translation from input
+       text into Unicode characters.
+   2.  The usual list of C backslash escapes: \n \t etc.  3.  A
+       `little u' value, such as \u12AB.  This represents the Unicode
+       code point with the corresponding hexadecimal value.  It always
+       has exactly 4 hexadecimal digits.
+   4.  A `big U' value, such as '\U00101234'.  This represents the
+       Unicode code point with the corresponding hexadecimal value.
+       It always has exactly 8 hexadecimal digits.
+
+Some values that can be represented this way are illegal because they
+are not valid Unicode code points.  These include values above
+0x10FFFF and surrogate halves.
+
+A character literal is a form of unsigned integer constant.  Its value
+is that of the Unicode code point represented by the text between the
+quotes.
+
+'a'
+'ä'
+'本'
+'\t'
+'\0'
+'\07'
+'\0377'
+'\x7'
+'\xff'
+'\u12e4'
+'\U00101234'
+
+A string literal has type 'string'.  Its value is constructed by
+taking the byte values formed by the successive elements of the
+literal.  For ByteValues, these are the literal bytes; for
+UnicodeValues, these are the bytes of the UTF-8 encoding of the
+corresponding Unicode code points.  Note that "\u00FF" and "\xFF" are
+different strings: the first contains the two-byte UTF-8 expansion of
+the value 255, while the second contains a single byte of value 255.
+The same rules apply to raw string literals, except the contents are
+uninterpreted UTF-8.
+
+""
+"Hello, world!\n"
+"日本語"
+"\u65e5本\U00008a9e"
+"\xff\u00FF"
+
+These examples all represent the same string:
+
+"日本語"  // UTF-8 input text
+`日本語`  // UTF-8 input text as a raw literal
+"\u65e5\u672c\u8a9e"  // The explicit Unicode code points
+"\U000065e5\U0000672c\U00008a9e"  // The explicit Unicode code points
+"\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e"  // The explicit UTF-8 bytes
+
+The language does not canonicalize Unicode text or evaluate combining
+forms.  The text of source code is passed uninterpreted.
+
+If the source code represents a character as two code points, such as
+a combining form involving an accent and a letter, the result will be
+an error if placed in a character literal (it is not a single code
+point), and will appear as two code points if placed in a string
+literal.  [This simple strategy may be insufficient in the long run
+but is surely fine for now.]
+
+
+Numeric literals
+
+Integer literals take the usual C form, except for the absence of the
+'U', 'L' etc.  suffixes, and represent integer constants.  (Character
+literals are also integer constants.) Similarly, floating point
+literals are also C-like, without suffixes and decimal only.
+
+An integer constant represents an abstract integer value of arbitrary
+precision.  Only when an integer constant (or arithmetic expression
+formed from integer constants) is assigned to a variable (or other
+l-value) is it required to fit into a particular size - that of type
+of the variable.  In other words, integer constants and arithmetic
+upon them is not subject to overflow; only assignment of integer
+constants (and constant expressions) to an l-value can cause overflow.
+It is an error if the value of the constant or expression cannot be
+represented correctly in the range of the type of the l-value.
+
+Floating point literals also represent an abstract, ideal floating
+point value that is constrained only upon assignment.  [r: what do we
+need to say here?  trickier because of truncation of fractions.]
+
+IntLit = [ '+' | '-' ] UnsignedIntLit .
+UnsignedIntLit = DecimalIntLit | OctalIntLit | HexIntLit .
+DecimalIntLit = ( '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' )
+                { decimal_digit } .
+OctalIntLit = '0' { octal_digit } .
+HexIntLit = '0' ( 'x' | 'X' ) hex_digit { hex_digit } .
+FloatLit = [ '+' | '-' ] UnsignedFloatLit .
+UnsignedFloatLit = "the usual decimal-only floating point representation".
+
+
+
+Compound Literals
+
+THIS SECTION IS WRONG
+Compound literals require some fine tuning.  I think we did ok in
+Sawzall but there are some loose ends.  I don't like that one cannot
+easily distinguish between an array and a struct.  We may need to
+specify a type if these literals appear in expressions, but we don't
+want to specify a type if these literals appear as intializer
+expressions where the variable is already typed.  And we don't want to
+do any implicit conversions.
+
+CompoundLit = ArrayLit | FunctionLit | StructureLit | MapLit.
+ArrayLit = '{' [ ExpressionList ] ']'.  // all elems must have "the same" type
+StructureLit = '{' [ ExpressionList ] '}'.
+MapLit = '{' [ PairList ] '}'.
+PairList = Pair { ',' Pair }.
+Pair = Expression ':' Expression.
+
+Literals
+
+Literal = BasicLit | CompoundLit .
+BasicLit = CharLit | StringLit | IntLit | FloatLit .
+
+
+Function Literals
+[THESE ARE CORRECT]
+
+FunctionLit = FunctionType Block.
+
+// Function literal
+func (a, b int, z float) bool { return a*b < int(z); }
+
+// Method literal
+func (p *T) . (a, b int, z float) bool { return a*b < int(z) + p.x; }
+
+
+Operators
+
+- incomplete
+
+
+Delimiters
+
+- incomplete
+
+
+Comments
+
+There are two forms of comments.
+
+The first starts '//' and ends at a newline.
+
+The second starts at '/*' and ends at the first '*/'.  It may cross
+newlines.  It does not nest.
+
+Comments are treated like white space.
+
+
+Common productions
+
+IdentifierList = identifier { ',' identifier }.
+ExpressionList = Expression { ',' Expression }.
+
+QualifiedIdent = [ PackageName '.' ] identifier.
+PackageName = identifier.
+
+
+Types
+
+A type specifies the set of values which variables of that type may
+assume, and the operators that are applicable.
+
+Except for variables of interface types, the static type of a variable
+(i.e.  the type the variable is declared with) is the same as the
+dynamic type of the variable (i.e.  the type of the variable at
+run-time).  Variables of interface types may hold variables of
+different dynamic types, but their dynamic types must be compatible
+with the static interface type.  At any given instant during run-time,
+a variable has exactly one dynamic type.  A type declaration
+associates an identifier with a type.
+
+Array and struct types are called structured types, all other types
+are called unstructured.  A structured type cannot contain itself.
+[gri: this needs to be formulated much more precisely].
+
+Type = TypeName | ArrayType | ChannelType | InterfaceType |
+       FunctionType | MapType | StructType | PointerType .
+TypeName = QualifiedIdent.
+
+
+[gri: To make the types specifications more precise we need to
+introduce some general concepts such as what it means to 'contain'
+another type, to be 'equal' to another type, etc.  Furthermore, we are
+imprecise as we sometimes use the word type, sometimes just the type
+name (int), or the structure (array) to denote different things (types
+and variables).  We should explain more precisely.  Finally, there is
+a difference between equality of types and assignment compatibility -
+or isn't there?]
+
+
+Basic types
+
+Go defines a number of basic types which are referred to by their
+predeclared type names.  There are signed and unsigned integer types,
+and floating point types:
+
+  bool     the truth values true and false
+
+  uint8    the set of all unsigned 8bit integers
+  uint16  the set of all unsigned 16bit integers
+  uint32  the set of all unsigned 32bit integers
+  unit64  the set of all unsigned 64bit integers
+
+  byte    same as uint8
+
+  int8    the set of all signed 8bit integers, in 2's complement
+  int16  the set of all signed 16bit integers, in 2's complement
+  int32  the set of all signed 32bit integers, in 2's complement
+  int64  the set of all signed 64bit integers, in 2's complement
+
+  float32    the set of all valid IEEE-754 32bit floating point numbers
+  float64    the set of all valid IEEE-754 64bit floating point numbers
+  float80    the set of all valid IEEE-754 80bit floating point numbers
+  
+  double    same as float64
+
+Additionally, Go declares 3 basic types, uint, int, and float, which
+are platform-specific.  The bit width of these types corresponds to
+the "natural bit width" for the respective types for the given
+platform (e.g.  int is usally the same as int32 on a 32bit
+architecture, or int64 on a 64bit architecture).  These types are by
+definition platform-specific and should be used with the appropriate
+caution.
+
+[gri: do we specify minimal sizes for uint, int, float?  e.g.  int is
+at least int32?] [gri: do we say something about the correspondence of
+sizeof(*T) and sizeof(int)?  Are they the same?] [r: do we want
+int128 and uint128?.]
+
+
+Built-in types
+
+Besides the basic types there is a set of built-in types: string, and chan,
+with maybe more to follow.
+
+
+Type string
+
+The string type represents the set of string values (strings).
+A string behaves like an array of bytes, with the following properties:
+
+- They are immutable: after creation, it is not possible to change the
+  contents of a string
+- No internal pointers: it is illegal to create a pointer to an inner
+  element of a string
+- They can be indexed: given string s1, s1[i] is a byte value
+- They can be concatenated: given strings s1 and s2, s1 + s2 is a value
+  combining the elements of s1 and s2 in sequence
+- Known length: the length of a string s1 can be obtained by the function/
+  operator len(s1).  [r: is it a bulitin? do we make it a method? etc.  this is
+  a placeholder].  The length of a string is the number of bytes within.
+  Unlike in C, there is no terminal NUL byte.
+- Creation 1: a string can be created from an integer value by a conversion
+    string('x') yields "x"
+- Creation 2: a string can by created from an array of integer values (maybe
+  just array of bytes) by a conversion
+    a [3]byte; a[0] = 'a'; a[1] = 'b'; a[2] = 'c';  string(a) == "abc";
+
+The language has string literals as dicussed above.  The type of a string
+literal is 'string'.
+
+
+Array types
+
+An array is a structured type consisting of a number of elements which
+are all of the same type, called the element type.  The number of
+elements of an array is called its length.  The elements of an array
+are designated by indices which are integers between 0 and the length
+- 1.
+
+THIS SECTION NEEDS WORK REGARDING STATIC AND DYNAMIC ARRAYS
+
+An array type specifies a set of arrays with a given element type and
+an optional array length.  The array length must be (compile-time)
+constant expression, if present.  Arrays without length specification
+are called open arrays.  An open array must not contain other open
+arrays, and open arrays can only be used as parameter types or in a
+pointer type (for instance, a struct may not contain an open array
+field, but only a pointer to an open array).
+
+[gri: Need to define when array types are the same!  Also need to
+define assignment compatibility] [gri: Need to define a mechanism to
+get to the length of an array at run-time.  This could be a
+predeclared function 'length' (which may be problematic due to the
+name).  Alternatively, we could define an interface for array types
+and say that there is a 'length()' method.  So we would write
+a.length() which I think is pretty clean.].  [r: if array types have
+an interface and a string is an array, some stuff (but not enough)
+falls out nicely.]
+
+ArrayType = 'array' { '[' ArrayLength ']' } ElementType.
+ArrayLength = Expression.
+ElementType = Type.
+
+The notation
+
+    array [n][m] T
+
+is a syntactic shortcut for
+
+    array [n] array [m] T.
+
+(the shortcut may be applied recursively).
+
+array uint8
+array [64] struct { x, y: int32; }
+array [1000][1000] float64
+
+
+Channel types
+
+
+ChannelType = 'channel' '(' Type '<-' Type ')' .
+
+channel(int <- float)
+
+- incomplete
+
+
+Pointer types
+
+- TODO: Need some intro here.
+
+Two pointer types are the same if they are pointing to variables of
+the same type.
+
+PointerType = '*' Type.
+
+- We do not allow pointer arithmetic of any kind.
+
+Interface types
+
+- TBD: This needs to be much more precise. For now we understand what it means.
+
+An interface type specifies a set of methods, the "method interface"
+of structs.  No two methods in one interface can have the same name.
+
+Two interfaces are the same if their set of functions is the same,
+i.e., if all methods exist in both interfaces and if the function
+names and signatures are the same.  The order of declaration of
+methods in an interface is irrelevant.
+
+A set of interface types implicitly creates an unconnected, ordered
+lattice of types.  An interface type T1 is said to be smaller than or
+equalt to an interface type T2 (T1 <= T2) if the entire interface of
+T1 "is part" of T2. Thus, two interface types T1, T2 are the same if
+T1 <= T2, and T2 <= T1, and thus we can write T1 == T2.
+
+
+InterfaceType = 'interface' '{' { MethodDecl } '}' .
+MethodDecl = identifier Signature ';',
+
+// An empty interface.
+interface {};
+
+// A basic file interface.
+interface {
+  Read(Buffer) bool;
+  Write(Buffer) bool;
+  Close();
+}
+
+
+Interface pointers can be implemented as "fat pointers"; namely a pair
+(ptr, tdesc) where ptr is simply the pointer to a struct instance
+implementing the interface, and tdesc is the structs type descriptor.
+Only when crossing the boundary from statically typed structs to
+interfaces and vice versa, does the type descriptor come into play.
+In those places, the compiler statically knows the value of the type
+descriptor.
+
+
+Function types
+
+FunctionType = 'func' Signature .
+Signature = [ Receiver '.' ] Parameters [ Result ] .
+Receiver = '(' identifier Type ')' .
+Parameters = '(' [ ParameterList ] ')' .
+ParameterList = ParameterSection { ',' ParameterSection } .
+ParameterSection = [ IdentifierList ] Type .
+Result = [ Type ] | '(' ParameterList ')' .
+
+// Function types
+func ()
+func (a, b int, z float) bool
+func (a, b int, z float) (success bool)
+func (a, b int, z float) (success bool, result float)
+
+// Method types
+func (p *T) . ()
+func (p *T) . (a, b int, z float) bool
+func (p *T) . (a, b int, z float) (success bool)
+func (p *T) . (a, b int, z float) (success bool, result float)
+
+
+Map types
+
+MapType = 'map' '(' Type <- Type ')'.
+
+map(int <- string)
+
+- incomplete
+
+
+Struct types
+
+Struct types are similar to C structs.
+
+NEED TO DEFINE STRUCT EQUIVALENCE Two struct types are the same if and
+only if they are declared by the same struct type; i.e., struct types
+are compared via equivalence, and *not* structurally.  For that
+reason, struct types are usually given a type name so that it is
+possible to refer to the same struct in different places in a program.
+What about equivalence of structs w/ respect to methods?  What if
+methods can be added in another package?  TBD.
+
+Each field of a struct represents a variable within the data
+structure.  In particular, a function field represents a function
+variable, not a method.
+
+StructType = 'struct' '{' { FieldDecl } '}' .
+FieldDecl = IdentifierList Type ';' .
+
+// An empty struct.
+struct {}
+
+// A struct with 5 fields.
+struct {
+    x, y int;
+    u float;
+    a []int;
+    f func();
+}
+
+
+
+Note that a program which never uses interface types can be fully
+statically typed.  That is, the "usual" implementation of structs (or
+classes as they are called in other languages) having an extra type
+descriptor prepended in front of every single struct is not required.
+Only when a pointer to a struct is assigned to an interface variable,
+the type descriptor comes into play, and at that point it is
+statically known at compile-time!
+
+Package specifiers
+
+Every source file is an element of a package, and defines which
+package by the first element of every source file, which must be a
+package specifier:
+
+PackageSpecifier = 'package' PackageName .
+
+package Math
+
+
+Package import declarations
+
+A program can access exported items from another package.  It does so
+by in effect declaring a local name providing access to the package,
+and then using the local name as a namespace with which to address the
+elements of the package.
+
+ImportDecl = 'import' PackageName FileName .
+FileName = DoubleQuotedString .
+DoubleQuotedString = '"' TEXT '"' .
+
+(DoubleQuotedString should be replaced by the correct string literal production!)
+Package import declarations must be the first statements in a file
+after the package specifier.
+
+A package import associates an identifier with a package, named by a
+file.  In effect, it is a declaration:
+
+import Math "lib/Math";
+import library "my/library";
+
+After such an import, one can use the Math (e.g) identifier to access
+elements within it
+
+x float = Math.sin(y);
+
+Note that this process derives nothing explicit about the type of the
+`imported' function (here Math.sin()).  The import must execute to
+provide this information to the compiler (or the programmer, for that
+matter).
+
+An angled-string refers to official stuff in a public place, in effect
+the run-time library.  A double-quoted-string refers to arbitrary
+code; it is probably a local file name that needs to be discovered
+using rules outside the scope of the language spec.
+
+The file name in a package must be complete except for a suffix.
+Moreover, the package name must correspond to the (basename of) the
+source file name.  For instance, the implementation of package Bar
+must be in file Bar.go, and if it lives in directory foo we write
+
+import Bar "foo/bar";
+
+to import it.
+
+[This is a little redundant but if we allow multiple files per package
+it will seem less so, and in any case the redundancy is useful and
+protective.]
+
+We assume Unix syntax for file names: / separators, no suffix for
+directories.  If the language is ported to other systems, the
+environment must simulate these properties to avoid changing the
+source code.
+
+
+Declarations
+
+- This needs to be expanded.
+- We need to think about enums (or some alternative mechanism).
+
+Declaration = (ConstDecl | VarDecl | TypeDecl | FunctionDecl |
+               ForwardDecl | AliasDecl) .
+
+
+Const declarations
+
+ConstDecl = 'const' ( ConstSpec | '(' ConstSpecList [ ';' ] ')' ).
+ConstSpec = identifier [ Type ] '=' Expression .
+ConstSpecList = ConstSpec { ';' ConstSpec }.
+
+const pi float = 3.14159265
+const e = 2.718281828
+const (
+  one int = 1;
+  two = 3
+)
+
+
+Variable declarations
+
+VarDecl = 'var' ( VarSpec | '(' VarSpecList [ ';' ] ')' ) | ShortVarDecl .
+VarSpec = IdentifierList ( Type [ '=' ExpressionList ] | '=' ExpressionList ) .
+VarSpecList = VarSpec { ';' VarSpec } .
+ShortVarDecl = identifier ':=' Expression .
+
+var i int
+var u, v, w float
+var k = 0
+var x, y float = -1.0, -2.0
+var (
+  i int;
+  u, v = 2.0, 3.0
+)
+
+If the expression list is present, it must have the same number of elements
+as there are variables in the variable specification.
+
+[ TODO: why is x := 0 not legal at the global level? ]
+
+
+Type declarations
+
+TypeDecl = 'type' ( TypeSpec | '(' TypeSpecList [ ';' ] ')' ).
+TypeSpec = identifier Type .
+TypeSpecList = TypeSpec { ';' TypeSpec }.
+
+
+type IntArray [16] int
+type (
+  Point struct { x, y float };
+  Polar Point
+)
+
+
+Function and method declarations
+
+FunctionDecl = 'func' [ Receiver ] identifier Parameters [ Result ] ( ';' | Block ) .
+Block = '{' { Statement } '}' .
+
+
+func min(x int, y int) int {
+  if x < y {
+    return x;
+  }
+  return y;
+}
+
+func foo (a, b int, z float) bool {
+  return a*b < int(z);
+}
+
+
+A method is a function that also declares a receiver.  The receiver is
+a struct with which the function is associated.  The receiver type
+must denote a pointer to a struct.
+
+func (p *T) foo (a, b int, z float) bool {
+  return a*b < int(z) + p.x; 
+}
+
+func (p *Point) Length() float {
+  return Math.sqrt(p.x * p.x + p.y * p.y);
+}
+
+func (p *Point) Scale(factor float) {
+  p.x = p.x * factor;
+  p.y = p.y * factor;
+}
+
+The last two examples are methods of struct type Point.  The variable p is
+the receiver; within the body of the method it represents the value of
+the receiving struct.
+
+Note that methods are declared outside the body of the corresponding
+struct.
+
+Functions and methods can be forward declared by omitting the body:
+
+func foo (a, b int, z float) bool;
+func (p *T) foo (a, b int, z float) bool;
+
+
+
+Statements
+
+Statement = EmptyStat | Assignment | CompoundStat | Declaration |
+            ExpressionStat | IncDecStat | IfStat | WhileStat | ReturnStat .
+
+
+Empty statements
+
+EmptyStat = ';' .
+
+
+Assignments
+
+Assignment = Designator '=' Expression .
+
+- no automatic conversions
+- values can be assigned to variables if they are of the same type, or
+if they satisfy the interface type (much more precision needed here!)
+
+
+
+Compound statements
+
+CompoundStat = '{' { Statement } '}' .
+
+
+Expression statements
+
+ExpressionStat = Expression .
+
+
+IncDec statements
+
+IncDecStat = Expression ( '++' | '--' ) .
+
+
+
+
+If statements
+
+IfStat = 'if' ( [ Expression ] '{' { IfCaseList } '}' ) |
+              ( Expression '{' { Statement } '}' [ 'else' { Statement } ] ).
+IfCaseList = ( 'case' ExpressionList | 'default' ) ':' { Statement } .
+
+if x < y {
+  return x;
+} else {
+  return y;
+}
+
+if tag {
+case 0, 1: s1();
+case 2: s2();
+default: ;
+}
+
+if {
+case x < y: f1();
+case x < z: f2();
+}
+
+
+While statements
+
+WhileStat = 'while' ( [ Expression ] '{' { WhileCaseList } '}' ) |
+                    ( Expression '{' { Statement } '}' ).
+WhileCaseList = 'case' ExpressionList ':' { Statement } .
+
+while {
+case i < n: f1();
+case i < m: f2();
+}
+
+
+Return statements
+
+ReturnStat = 'return' [ ExpressionList ] .
+
+There are two ways to return values from a function.  The first is to
+explicitly list the return value or values in the return statement:
+
+func simple_f  () int {
+  return 2;
+}
+
+func complex_f1() (re float, im float) {
+  return -7.0, -4.0;
+}
+
+The second is to provide names for the return values and assign them
+explicitly in the function; the return statement will then provide no
+values:
+
+func complex_f2() (re float, im float) {
+  re = 7.0;
+  im = 4.0;
+  return;
+}
+
+It is legal to name the return values in the declaration even if the
+first form of return statement is used:
+
+
+func complex_f2() (re float, im float) {
+  return 7.0, 4.0;
+}
+
+
+Expressions
+
+Expression = Conjunction { '||' Conjunction }.
+Conjunction = Comparison { '&&' Comparison }.
+Comparison = SimpleExpr [ relation SimpleExpr ].
+relation = '==' | '!=' | '<' | '<=' | '>' | '>='.
+SimpleExpr = Term { add_op Term }.
+add_op = '+' | '-' | '|' | '^'.
+Term = Factor { mul_op Factor }.
+mul_op = '*' | '/' | '%' | '<<' | '>>' | '&'.
+
+The corresponding precedence hierarchy is as follows: (5 levels of
+precedence is about the maximum people can keep comfortably in their
+heads.  The experience with C and C++ shows that more then that
+usually requires explicit manual consultation...).  [gri: I still
+think we should consider 0 levels of binary precedence: All operators
+are on the same level, but parentheses are required when different
+operators are mixed.  That would make it really easy, and really
+clear.  It would also open the door for straight-forward introduction
+of user-defined operators, which would be rather useful.]
+
+Precedence    Operator
+    1                  ||
+    2                  &&
+    3                  ==  !=  <  <=  >  >=
+    4                  +  -  |  ^
+    5                      *  /  %  <<  >>  &
+
+
+For integer values, / and % satisfy the following relationship:
+
+    (a / b) * b + a % b == a
+
+and
+
+    (a / b) is "truncated towards zero".
+
+The shift operators implement arithmetic shifts for signed integers,
+and logical shifts for unsigned integers.  TBD: is there any range
+checking on s in x >> s, or x << s ?
+
+[gri: We decided on a couple of issues here that we need to write down
+more nicely]
+
+- There are no implicit type conversions except for
+constants/literals.  In particular, unsigned and signed integers
+cannot be mixed in an expression w/o explicit casting.
+
+- Unary '^' corresponds to C '~' (bitwise negate).
+
+- Arrays can be subscripted (a[i]) or sliced (a[i : j]).  A slice a[i
+: j] is a new array of length (j - i), and consisting of the elements
+a[i], a[i + 1], ...  a[j - 1].  [gri/r: Is the slice array bounds
+check hard (leading to an error), or soft (truncating) ?].
+Furthermore: Array slicing is very tricky!  Do we get a copy (a new
+array) or a new array descriptor?  This is open at this point.  There
+is a simple way out of the mess: Structured types are always passed by
+reference, and there is no value assignment for structured types.  It
+gets very complicated very quickly.
+
+[gri: Syntax below is incomplete - what about method invocation?]
+
+Factor = Literal | Designator | '!' Expression | '-' Expression |
+         '^' Expression | '&' Expression | '(' Expression ')' | Call.
+Designator = QualifiedIdent { Selector }.
+Selector = '.' identifier | '[' Expression [ ':' Expression ] ']'.
+Call = Factor '(' ExpressionList ')'.
+
+[gri: We need a precise definition of a constant expression]
+
+
+
+
+Compilation units
+
+The unit of compilation is a single file.  A compilation unit consists
+of a package specifier followed by a list of import declarations
+followed by a list of global declarations.
+
+CompilationUnit = { ImportDecl } { GlobalDeclaration }.
+GlobalDeclaration = Declaration.
+
+
+Exports
+
+Globally declared identifiers may be exported, thus making the
+exported identifer visible outside the package.  Another package may
+then import the identifier to use it.
+
+Export directives must only appear at the global level of a
+compilation unit (at least for now).  That is, one can export
+compilation-unit global identifiers but not, for example, local
+variables or structure fields.
+
+Exporting an identifier makes the identifier visible externally to the
+package.  If the identifier represents a type, the type structure is
+exported as well.  The exported identifiers may appear later in the
+source than the export directive itself, but it is an error to specify
+an identifier not declared anywhere in the source file containing the
+export directive.
+
+ExportDirective = 'export' ExportIdentifier { ',' ExportIdentifier } .
+ExportIdentifier = identifier .
+
+export sin, cos;
+
+One may export variables and types, but (at least for now), not
+aliases.  [r: what is needed to make aliases exportable?  issue is
+transitivity.]
+
+Exporting a variable does not automatically export the type of the
+variable.  For illustration, consider the program fragment:
+
+package P;
+export v1, v2, p;
+struct S { a int; b int; }
+var v1 S;
+var v2 S;
+var p *S;
+
+Notice that S is not exported. Another source file may contain:
+
+import P;
+alias v1 P.v1;
+alias v2 P.v2;
+alias p P.p;
+
+This program can use v and p but not access the fields (a and b) of
+structure type S explicitly.  For instance, it could legally contain
+
+if p == nil { }
+if v1 == v2 { }
+
+but not
+
+if v.a == 0 { }
+
+
+
commit	18c5b488a3b2e218c0e0cf2a7d4820d9da93a554	[log] [tgz]
author	Robert Griesemer <gri@golang.org>	Sun Mar 02 20:47:34 2008 -0800
committer	Robert Griesemer <gri@golang.org>	Sun Mar 02 20:47:34 2008 -0800
tree	dbf6fb4817189534799d50a170a70e4684d2685f
parent	d82b11e4a46307f1f1415024f33263e819c222b8 [diff]