| The Go Annotated Specification |
| |
| This document supersedes all previous Go spec attempts. The intent is |
| to make this a reference for syntax and semantics. It is annotated |
| with additional information not strictly belonging into a language |
| spec. |
| |
| |
| Recent design decisions |
| |
| A list of decisions made but for which we haven't incorporated proper |
| language into this spec. Keep this section small and the spec |
| up-to-date instead. |
| |
| - multi-dimensional arrays: implementation restriction for now |
| |
| - no '->', always '.' |
| - (*a)[i] can be sugared into: a[i] |
| - '.' to select package elements |
| |
| - arrays are not automatically pointers, we must always say |
| explicitly: "*array T" if we mean a pointer to that array |
| - there is no pointer arithmetic in the language |
| - there are no unions |
| |
| - packages: need to pin it all down |
| |
| - tuple notation: (a, b) = (b, a); |
| generally: need to make this clear |
| |
| - for now: no (C) 'static' variables inside functions |
| |
| - exports: we write: 'export a, b, c;' (with a, b, c, etc. a list of |
| exported names, possibly also: structure.field) |
| - the ordering of methods in interfaces is not relevant |
| - structs must be identical (same decl) to be the same |
| (Ken has different implementation: equivalent declaration is the |
| same; what about methods?) |
| |
| - new methods can be added to a struct outside the package where the |
| struct is declared (need to think through all implications) |
| - array assignment by value |
| - do we need a type switch? |
| |
| - write down scoping rules for statements |
| |
| - semicolons: where are they needed and where are they not needed. |
| need a simple and consistent rule |
| |
| - we have: postfix ++ and -- as statements |
| |
| |
| |
| Guiding principles |
| |
| Go is an attempt at a new systems programming language. |
| [gri: this needs to be expanded. some keywords below] |
| |
| - small, concise, crisp |
| - procedural |
| - strongly typed |
| - few, orthogonal, and general concepts |
| - avoid repetition of declarations |
| - multi-threading support in the language |
| - garbage collected |
| - containers w/o templates |
| - compiler can be written in Go and so can it's GC |
| - very fast compilation possible (1MLOC/s stretch goal) |
| - reasonably efficient (C ballpark) |
| - compact, predictable code |
| (local program changes generally have local effects) |
| - no macros |
| |
| |
| Syntax |
| |
| The syntax of Go borrows from the C tradition with respect to |
| statements and from the Pascal tradition with respect to declarations. |
| Go programs are written using a lean notation with a small set of |
| keywords, without filler keywords (such as 'of', 'to', etc.) or other |
| gratuitous syntax, and with a slight preference for expressive |
| keywords (e.g. 'function') over operators or other syntactic |
| mechanisms. Generally, "light" language features (variables, simple |
| control flow, etc.) are expressed using a light-weight notation (short |
| keywords, little syntax), while "heavy" language features use a more |
| heavy-weight notation (longer keywords, more syntax). |
| |
| [gri: should say something about syntactic alternatives: if a |
| syntactic form foreseeably will lead to a style recommendation, try to |
| make that the syntactic form instead. For instance, Go structured |
| statements always require the {} braces even if there is only a single |
| sub-statement. Similar ideas apply elsewhere.] |
| |
| |
| Modularity, identifiers and scopes |
| |
| A Go program consists of one or more files compiled separately, though |
| not independently. A single file or compilation unit may make |
| individual identifiers visible to other files by marking them as |
| exported; there is no "header file". The exported interface of a file |
| may be exposed in condensed form (without the corresponding |
| implementation) through tools. |
| |
| A package collects types, constants, functions, and so on into a named |
| entity that may be imported to enable its constituents be used in |
| another compilation unit. Each source file is part of exactly one |
| package; each package is constructed from one source file. |
| |
| Within a file, all identifiers are declared explicitly (expect for |
| general predeclared identifiers such as true and false) and thus for |
| each identifier in a file the corresponding declaration can be found |
| in that same file (usually before its use, except for the rare case of |
| forward declarations). Identifiers may denote program entities that |
| are implemented in other files. Nevertheless, such identifiers are |
| still declared via an import declaration in the file that is referring |
| to them. This explicit declaration requirement ensures that every |
| compilation unit can be read by itself. |
| |
| The scoping of identifiers is uniform: An identifier is visible from |
| the point of its declaration to the end of the immediately surrounding |
| block, and nested identifiers shadow outer identifiers with the same |
| name. All identifiers are in the same namespace; i.e., no two |
| identifiers in the same scope may have the same name even if they |
| denote different language concepts (for instance, such as variable vs |
| a function). Uniform scoping rules make Go programs easier to read |
| and to understand. |
| |
| |
| Program structure |
| |
| A compilation unit consists of a package specifier followed by import |
| declarations followed by other declarations. There are no statements |
| at the top level of a file. [gri: do we have a main function? or do |
| we treat all functions uniformly and instead permit a program to be |
| started by providing a package name and a "start" function? I like |
| the latter because if gives a lot of flexibility and should be not |
| hard to implement]. [r: i suggest that we define a symbol, main or |
| Main or start or Start, and begin execution in the single exported |
| function of that name in the program. the flexibility of having a |
| choice of name is unimportant and the corresponding need to define the |
| name in order to link or execute adds complexity. by default it |
| should be trivial; we could allow a run-time flag to override the |
| default for gri's flexibility.] |
| |
| |
| Typing, polymorphism, and object-orientation |
| |
| Go programs are strongly typed; i.e., each program entity has a static |
| type known at compile time. Variables also have a dynamic type, which |
| is the type of the value they hold at run-time. Generally, the |
| dynamic and the static type of a variable are identical, except for |
| variables of interface type. In that case the dynamic type of the |
| variable is a pointer to a structure that implements the variable's |
| (static) interface type. There may be many different structures |
| implementing an interface and thus the dynamic type of such variables |
| is generally not known at compile time. Such variables are called |
| polymorphic. |
| |
| Interface types are the mechanism to support an object-oriented |
| programming style. Different interface types are independent of each |
| other and no explicit hierarchy is required (such as single or |
| multiple inheritance explicitly specified through respective type |
| declarations). Interface types only define a set of functions that a |
| corresponding implementation must provide. Thus interface and |
| implementation are strictly separated. |
| |
| An interface is implemented by associating functions (methods) with |
| structures. If a structure implements all methods of an interface, it |
| implements that interface and thus can be used where that interface is |
| required. Unless used through a variable of interface type, methods |
| can always be statically bound (they are not "virtual"), and incur no |
| runtime overhead compared to an ordinary function. |
| |
| Go has no explicit notion of classes, sub-classes, or inheritance. |
| These concepts are trivially modeled in Go through the use of |
| functions, structures, associated methods, and interfaces. |
| |
| Go has no explicit notion of type parameters or templates. Instead, |
| containers (such as stacks, lists, etc.) are implemented through the |
| use of abstract data types operating on interface types. [gri: there |
| is some automatic boxing, semi-automatic unboxing support for basic |
| types]. |
| |
| |
| Pointers and garbage collection |
| |
| Variables may be allocated automatically (when entering the scope of |
| the variable) or explicitly on the heap. Pointers are used to refer |
| to heap-allocated variables. Pointers may also be used to point to |
| any other variable; such a pointer is obtained by "getting the |
| address" of that variable. In particular, pointers may point "inside" |
| other variables, or to automatic variables (which are usually |
| allocated on the stack). Variables are automatically reclaimed when |
| they are no longer accessible. There is no pointer arithmetic in Go. |
| |
| |
| Functions |
| |
| Functions contain declarations and statements. They may be invoked |
| recursively. Functions may declare nested functions, and nested |
| functions have access to the variables in the surrounding functions, |
| they are in fact closures. Functions may be anonymous and appear as |
| literals in expressions. |
| |
| |
| Multithreading and channels |
| |
| [Rob: We need something here] |
| |
| |
| |
| |
| Notation |
| |
| The syntax is specified in green productions using Extended |
| Backus-Naur Form (EBNF). In particular: |
| |
| '' encloses lexical symbols |
| | separates alternatives |
| () used for grouping |
| [] specifies option (0 or 1 times) |
| {} specifies repetition (0 to n times) |
| |
| A production may be referred to from various places in this document |
| but is usually defined close to its first use. Code examples are |
| written in gray. Annotations are in blue, and open issues are in red. |
| One goal is to get rid of all red text in this document. [r: done!] |
| |
| |
| Vocabulary and representation |
| |
| REWRITE THIS: BADLY EXPRESSED |
| |
| Go program source is a sequence of characters. Each character is a |
| Unicode code point encoded in UTF-8. |
| |
| A Go program is a sequence of symbols satisfying the Go syntax. A |
| symbol is a non-empty sequence of characters. Symbols are |
| identifiers, numbers, strings, operators, delimiters, and comments. |
| White space must not occur within symbols (except in comments, and in |
| the case of blanks and tabs in strings). They are ignored unless they |
| are essential to separate two consecutive symbols. |
| |
| White space is composed of blanks, newlines, carriage returns, and |
| tabs only. |
| |
| A character is a Unicode code point. In particular, capital and |
| lower-case letters are considered as being distinct. Note that some |
| Unicode characters (e.g., the character รค), may be representable in |
| two forms, as a single code point, or as two code points. For the |
| Unicode standard these two encodings represent the same character, but |
| for Go, these two encodings correspond to two different characters). |
| |
| Source encoding |
| |
| The input is encoded in UTF-8. In the grammar we use the notation |
| |
| utf8_char |
| |
| to refer to an arbitrary Unicode code point encoded in UTF-8. |
| |
| Digits and Letters |
| |
| octal_digit = { '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' } . |
| decimal_digit = { '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' } . |
| hex_digit = { '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | 'a' | |
| 'A' | 'b' | 'B' | 'c' | 'C' | 'd' | 'D' | 'e' | 'E' | 'f' | 'F' } . |
| letter = 'A' | 'a' | ... 'Z' | 'z' | '_' . |
| |
| For now, letters and digits are ASCII. We may expand this to allow |
| Unicode definitions of letters and digits. |
| |
| |
| Identifiers |
| |
| An identifier is a name for a program entity such as a variable, a |
| type, a function, etc. |
| |
| identifier = letter { letter | decimal_digit } . |
| |
| |
| - need to explain scopes, visibility (elsewhere) |
| - need to say something about predeclared identifiers, and their |
| (universe) scope (elsewhere) |
| |
| |
| Character and string literals |
| |
| A RawStringLit is a string literal delimited by back quotes ``; the |
| first back quote encountered after the opening back quote terminates |
| the string. |
| |
| RawStringLit = '`' { utf8_char } '`' . |
| |
| `abc` |
| `\n` |
| |
| Character and string literals are very similar to C except: |
| - Octal character escapes are always 3 digits (\077 not \77) |
| - Hexadecimal character escapes are always 2 digits (\x07 not \x7) |
| - Strings are UTF-8 and represent Unicode |
| - `` strings exist; they do not interpret backslashes |
| |
| CharLit = '\'' ( UnicodeValue | ByteValue ) '\'' . |
| StringLit = RawStringLit | InterpretedStringLit . |
| InterpretedStringLit = '"' { UnicodeValue | ByteValue } '"' . |
| ByteValue = OctalByteValue | HexByteValue . |
| OctalByteValue = '\' octal_digit octal_digit octal_digit . |
| HexByteValue = '\' 'x' hex_digit hex_digit . |
| UnicodeValue = utf8_char | EscapedCharacter | LittleUValue | BigUValue . |
| LittleUValue = '\' 'u' hex_digit hex_digit hex_digit hex_digit . |
| BigUValue = '\' 'U' hex_digit hex_digit hex_digit hex_digit |
| hex_digit hex_digit hex_digit hex_digit . |
| EscapedCharacter = '\' ( 'a' | 'b' | 'f' | 'n' | 'r' | 't' | 'v' ) . |
| |
| An OctalByteValue contains three octal digits. A HexByteValue |
| contains two hexadecimal digits. (Note: This differs from C but is |
| simpler.) |
| |
| It is erroneous for an OctalByteValue to represent a value larger than 255. |
| (By construction, a HexByteValue cannot.) |
| |
| A UnicodeValue takes one of four forms: |
| |
| 1. The UTF-8 encoding of a Unicode code point. Since Go source |
| text is in UTF-8, this is the obvious translation from input |
| text into Unicode characters. |
| 2. The usual list of C backslash escapes: \n \t etc. 3. A |
| `little u' value, such as \u12AB. This represents the Unicode |
| code point with the corresponding hexadecimal value. It always |
| has exactly 4 hexadecimal digits. |
| 4. A `big U' value, such as '\U00101234'. This represents the |
| Unicode code point with the corresponding hexadecimal value. |
| It always has exactly 8 hexadecimal digits. |
| |
| Some values that can be represented this way are illegal because they |
| are not valid Unicode code points. These include values above |
| 0x10FFFF and surrogate halves. |
| |
| A character literal is a form of unsigned integer constant. Its value |
| is that of the Unicode code point represented by the text between the |
| quotes. |
| |
| 'a' |
| 'รค' |
| 'ๆฌ' |
| '\t' |
| '\0' |
| '\07' |
| '\0377' |
| '\x7' |
| '\xff' |
| '\u12e4' |
| '\U00101234' |
| |
| A string literal has type 'string'. Its value is constructed by |
| taking the byte values formed by the successive elements of the |
| literal. For ByteValues, these are the literal bytes; for |
| UnicodeValues, these are the bytes of the UTF-8 encoding of the |
| corresponding Unicode code points. Note that "\u00FF" and "\xFF" are |
| different strings: the first contains the two-byte UTF-8 expansion of |
| the value 255, while the second contains a single byte of value 255. |
| The same rules apply to raw string literals, except the contents are |
| uninterpreted UTF-8. |
| |
| "" |
| "Hello, world!\n" |
| "ๆฅๆฌ่ช" |
| "\u65e5ๆฌ\U00008a9e" |
| "\xff\u00FF" |
| |
| These examples all represent the same string: |
| |
| "ๆฅๆฌ่ช" // UTF-8 input text |
| `ๆฅๆฌ่ช` // UTF-8 input text as a raw literal |
| "\u65e5\u672c\u8a9e" // The explicit Unicode code points |
| "\U000065e5\U0000672c\U00008a9e" // The explicit Unicode code points |
| "\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e" // The explicit UTF-8 bytes |
| |
| The language does not canonicalize Unicode text or evaluate combining |
| forms. The text of source code is passed uninterpreted. |
| |
| If the source code represents a character as two code points, such as |
| a combining form involving an accent and a letter, the result will be |
| an error if placed in a character literal (it is not a single code |
| point), and will appear as two code points if placed in a string |
| literal. [This simple strategy may be insufficient in the long run |
| but is surely fine for now.] |
| |
| |
| Numeric literals |
| |
| Integer literals take the usual C form, except for the absence of the |
| 'U', 'L' etc. suffixes, and represent integer constants. (Character |
| literals are also integer constants.) Similarly, floating point |
| literals are also C-like, without suffixes and decimal only. |
| |
| An integer constant represents an abstract integer value of arbitrary |
| precision. Only when an integer constant (or arithmetic expression |
| formed from integer constants) is assigned to a variable (or other |
| l-value) is it required to fit into a particular size - that of type |
| of the variable. In other words, integer constants and arithmetic |
| upon them is not subject to overflow; only assignment of integer |
| constants (and constant expressions) to an l-value can cause overflow. |
| It is an error if the value of the constant or expression cannot be |
| represented correctly in the range of the type of the l-value. |
| |
| Floating point literals also represent an abstract, ideal floating |
| point value that is constrained only upon assignment. [r: what do we |
| need to say here? trickier because of truncation of fractions.] |
| |
| IntLit = [ '+' | '-' ] UnsignedIntLit . |
| UnsignedIntLit = DecimalIntLit | OctalIntLit | HexIntLit . |
| DecimalIntLit = ( '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ) |
| { decimal_digit } . |
| OctalIntLit = '0' { octal_digit } . |
| HexIntLit = '0' ( 'x' | 'X' ) hex_digit { hex_digit } . |
| FloatLit = [ '+' | '-' ] UnsignedFloatLit . |
| UnsignedFloatLit = "the usual decimal-only floating point representation". |
| |
| |
| |
| Compound Literals |
| |
| THIS SECTION IS WRONG |
| Compound literals require some fine tuning. I think we did ok in |
| Sawzall but there are some loose ends. I don't like that one cannot |
| easily distinguish between an array and a struct. We may need to |
| specify a type if these literals appear in expressions, but we don't |
| want to specify a type if these literals appear as intializer |
| expressions where the variable is already typed. And we don't want to |
| do any implicit conversions. |
| |
| CompoundLit = ArrayLit | FunctionLit | StructureLit | MapLit. |
| ArrayLit = '{' [ ExpressionList ] ']'. // all elems must have "the same" type |
| StructureLit = '{' [ ExpressionList ] '}'. |
| MapLit = '{' [ PairList ] '}'. |
| PairList = Pair { ',' Pair }. |
| Pair = Expression ':' Expression. |
| |
| Literals |
| |
| Literal = BasicLit | CompoundLit . |
| BasicLit = CharLit | StringLit | IntLit | FloatLit . |
| |
| |
| Function Literals |
| [THESE ARE CORRECT] |
| |
| FunctionLit = FunctionType Block. |
| |
| // Function literal |
| func (a, b int, z float) bool { return a*b < int(z); } |
| |
| // Method literal |
| func (p *T) . (a, b int, z float) bool { return a*b < int(z) + p.x; } |
| |
| |
| Operators |
| |
| - incomplete |
| |
| |
| Delimiters |
| |
| - incomplete |
| |
| |
| Comments |
| |
| There are two forms of comments. |
| |
| The first starts '//' and ends at a newline. |
| |
| The second starts at '/*' and ends at the first '*/'. It may cross |
| newlines. It does not nest. |
| |
| Comments are treated like white space. |
| |
| |
| Common productions |
| |
| IdentifierList = identifier { ',' identifier }. |
| ExpressionList = Expression { ',' Expression }. |
| |
| QualifiedIdent = [ PackageName '.' ] identifier. |
| PackageName = identifier. |
| |
| |
| Types |
| |
| A type specifies the set of values which variables of that type may |
| assume, and the operators that are applicable. |
| |
| Except for variables of interface types, the static type of a variable |
| (i.e. the type the variable is declared with) is the same as the |
| dynamic type of the variable (i.e. the type of the variable at |
| run-time). Variables of interface types may hold variables of |
| different dynamic types, but their dynamic types must be compatible |
| with the static interface type. At any given instant during run-time, |
| a variable has exactly one dynamic type. A type declaration |
| associates an identifier with a type. |
| |
| Array and struct types are called structured types, all other types |
| are called unstructured. A structured type cannot contain itself. |
| [gri: this needs to be formulated much more precisely]. |
| |
| Type = TypeName | ArrayType | ChannelType | InterfaceType | |
| FunctionType | MapType | StructType | PointerType . |
| TypeName = QualifiedIdent. |
| |
| |
| [gri: To make the types specifications more precise we need to |
| introduce some general concepts such as what it means to 'contain' |
| another type, to be 'equal' to another type, etc. Furthermore, we are |
| imprecise as we sometimes use the word type, sometimes just the type |
| name (int), or the structure (array) to denote different things (types |
| and variables). We should explain more precisely. Finally, there is |
| a difference between equality of types and assignment compatibility - |
| or isn't there?] |
| |
| |
| Basic types |
| |
| Go defines a number of basic types which are referred to by their |
| predeclared type names. There are signed and unsigned integer types, |
| and floating point types: |
| |
| bool the truth values true and false |
| |
| uint8 the set of all unsigned 8bit integers |
| uint16 the set of all unsigned 16bit integers |
| uint32 the set of all unsigned 32bit integers |
| unit64 the set of all unsigned 64bit integers |
| |
| byte same as uint8 |
| |
| int8 the set of all signed 8bit integers, in 2's complement |
| int16 the set of all signed 16bit integers, in 2's complement |
| int32 the set of all signed 32bit integers, in 2's complement |
| int64 the set of all signed 64bit integers, in 2's complement |
| |
| float32 the set of all valid IEEE-754 32bit floating point numbers |
| float64 the set of all valid IEEE-754 64bit floating point numbers |
| float80 the set of all valid IEEE-754 80bit floating point numbers |
| |
| double same as float64 |
| |
| Additionally, Go declares 3 basic types, uint, int, and float, which |
| are platform-specific. The bit width of these types corresponds to |
| the "natural bit width" for the respective types for the given |
| platform (e.g. int is usally the same as int32 on a 32bit |
| architecture, or int64 on a 64bit architecture). These types are by |
| definition platform-specific and should be used with the appropriate |
| caution. |
| |
| [gri: do we specify minimal sizes for uint, int, float? e.g. int is |
| at least int32?] [gri: do we say something about the correspondence of |
| sizeof(*T) and sizeof(int)? Are they the same?] [r: do we want |
| int128 and uint128?.] |
| |
| |
| Built-in types |
| |
| Besides the basic types there is a set of built-in types: string, and chan, |
| with maybe more to follow. |
| |
| |
| Type string |
| |
| The string type represents the set of string values (strings). |
| A string behaves like an array of bytes, with the following properties: |
| |
| - They are immutable: after creation, it is not possible to change the |
| contents of a string |
| - No internal pointers: it is illegal to create a pointer to an inner |
| element of a string |
| - They can be indexed: given string s1, s1[i] is a byte value |
| - They can be concatenated: given strings s1 and s2, s1 + s2 is a value |
| combining the elements of s1 and s2 in sequence |
| - Known length: the length of a string s1 can be obtained by the function/ |
| operator len(s1). [r: is it a bulitin? do we make it a method? etc. this is |
| a placeholder]. The length of a string is the number of bytes within. |
| Unlike in C, there is no terminal NUL byte. |
| - Creation 1: a string can be created from an integer value by a conversion |
| string('x') yields "x" |
| - Creation 2: a string can by created from an array of integer values (maybe |
| just array of bytes) by a conversion |
| a [3]byte; a[0] = 'a'; a[1] = 'b'; a[2] = 'c'; string(a) == "abc"; |
| |
| The language has string literals as dicussed above. The type of a string |
| literal is 'string'. |
| |
| |
| Array types |
| |
| An array is a structured type consisting of a number of elements which |
| are all of the same type, called the element type. The number of |
| elements of an array is called its length. The elements of an array |
| are designated by indices which are integers between 0 and the length |
| - 1. |
| |
| THIS SECTION NEEDS WORK REGARDING STATIC AND DYNAMIC ARRAYS |
| |
| An array type specifies a set of arrays with a given element type and |
| an optional array length. The array length must be (compile-time) |
| constant expression, if present. Arrays without length specification |
| are called open arrays. An open array must not contain other open |
| arrays, and open arrays can only be used as parameter types or in a |
| pointer type (for instance, a struct may not contain an open array |
| field, but only a pointer to an open array). |
| |
| [gri: Need to define when array types are the same! Also need to |
| define assignment compatibility] [gri: Need to define a mechanism to |
| get to the length of an array at run-time. This could be a |
| predeclared function 'length' (which may be problematic due to the |
| name). Alternatively, we could define an interface for array types |
| and say that there is a 'length()' method. So we would write |
| a.length() which I think is pretty clean.]. [r: if array types have |
| an interface and a string is an array, some stuff (but not enough) |
| falls out nicely.] |
| |
| ArrayType = 'array' { '[' ArrayLength ']' } ElementType. |
| ArrayLength = Expression. |
| ElementType = Type. |
| |
| The notation |
| |
| array [n][m] T |
| |
| is a syntactic shortcut for |
| |
| array [n] array [m] T. |
| |
| (the shortcut may be applied recursively). |
| |
| array uint8 |
| array [64] struct { x, y: int32; } |
| array [1000][1000] float64 |
| |
| |
| Channel types |
| |
| |
| ChannelType = 'channel' '(' Type '<-' Type ')' . |
| |
| channel(int <- float) |
| |
| - incomplete |
| |
| |
| Pointer types |
| |
| - TODO: Need some intro here. |
| |
| Two pointer types are the same if they are pointing to variables of |
| the same type. |
| |
| PointerType = '*' Type. |
| |
| - We do not allow pointer arithmetic of any kind. |
| |
| Interface types |
| |
| - TBD: This needs to be much more precise. For now we understand what it means. |
| |
| An interface type specifies a set of methods, the "method interface" |
| of structs. No two methods in one interface can have the same name. |
| |
| Two interfaces are the same if their set of functions is the same, |
| i.e., if all methods exist in both interfaces and if the function |
| names and signatures are the same. The order of declaration of |
| methods in an interface is irrelevant. |
| |
| A set of interface types implicitly creates an unconnected, ordered |
| lattice of types. An interface type T1 is said to be smaller than or |
| equalt to an interface type T2 (T1 <= T2) if the entire interface of |
| T1 "is part" of T2. Thus, two interface types T1, T2 are the same if |
| T1 <= T2, and T2 <= T1, and thus we can write T1 == T2. |
| |
| |
| InterfaceType = 'interface' '{' { MethodDecl } '}' . |
| MethodDecl = identifier Signature ';', |
| |
| // An empty interface. |
| interface {}; |
| |
| // A basic file interface. |
| interface { |
| Read(Buffer) bool; |
| Write(Buffer) bool; |
| Close(); |
| } |
| |
| |
| Interface pointers can be implemented as "fat pointers"; namely a pair |
| (ptr, tdesc) where ptr is simply the pointer to a struct instance |
| implementing the interface, and tdesc is the structs type descriptor. |
| Only when crossing the boundary from statically typed structs to |
| interfaces and vice versa, does the type descriptor come into play. |
| In those places, the compiler statically knows the value of the type |
| descriptor. |
| |
| |
| Function types |
| |
| FunctionType = 'func' Signature . |
| Signature = [ Receiver '.' ] Parameters [ Result ] . |
| Receiver = '(' identifier Type ')' . |
| Parameters = '(' [ ParameterList ] ')' . |
| ParameterList = ParameterSection { ',' ParameterSection } . |
| ParameterSection = [ IdentifierList ] Type . |
| Result = [ Type ] | '(' ParameterList ')' . |
| |
| // Function types |
| func () |
| func (a, b int, z float) bool |
| func (a, b int, z float) (success bool) |
| func (a, b int, z float) (success bool, result float) |
| |
| // Method types |
| func (p *T) . () |
| func (p *T) . (a, b int, z float) bool |
| func (p *T) . (a, b int, z float) (success bool) |
| func (p *T) . (a, b int, z float) (success bool, result float) |
| |
| |
| Map types |
| |
| MapType = 'map' '(' Type <- Type ')'. |
| |
| map(int <- string) |
| |
| - incomplete |
| |
| |
| Struct types |
| |
| Struct types are similar to C structs. |
| |
| NEED TO DEFINE STRUCT EQUIVALENCE Two struct types are the same if and |
| only if they are declared by the same struct type; i.e., struct types |
| are compared via equivalence, and *not* structurally. For that |
| reason, struct types are usually given a type name so that it is |
| possible to refer to the same struct in different places in a program. |
| What about equivalence of structs w/ respect to methods? What if |
| methods can be added in another package? TBD. |
| |
| Each field of a struct represents a variable within the data |
| structure. In particular, a function field represents a function |
| variable, not a method. |
| |
| StructType = 'struct' '{' { FieldDecl } '}' . |
| FieldDecl = IdentifierList Type ';' . |
| |
| // An empty struct. |
| struct {} |
| |
| // A struct with 5 fields. |
| struct { |
| x, y int; |
| u float; |
| a []int; |
| f func(); |
| } |
| |
| |
| |
| Note that a program which never uses interface types can be fully |
| statically typed. That is, the "usual" implementation of structs (or |
| classes as they are called in other languages) having an extra type |
| descriptor prepended in front of every single struct is not required. |
| Only when a pointer to a struct is assigned to an interface variable, |
| the type descriptor comes into play, and at that point it is |
| statically known at compile-time! |
| |
| Package specifiers |
| |
| Every source file is an element of a package, and defines which |
| package by the first element of every source file, which must be a |
| package specifier: |
| |
| PackageSpecifier = 'package' PackageName . |
| |
| package Math |
| |
| |
| Package import declarations |
| |
| A program can access exported items from another package. It does so |
| by in effect declaring a local name providing access to the package, |
| and then using the local name as a namespace with which to address the |
| elements of the package. |
| |
| ImportDecl = 'import' PackageName FileName . |
| FileName = DoubleQuotedString . |
| DoubleQuotedString = '"' TEXT '"' . |
| |
| (DoubleQuotedString should be replaced by the correct string literal production!) |
| Package import declarations must be the first statements in a file |
| after the package specifier. |
| |
| A package import associates an identifier with a package, named by a |
| file. In effect, it is a declaration: |
| |
| import Math "lib/Math"; |
| import library "my/library"; |
| |
| After such an import, one can use the Math (e.g) identifier to access |
| elements within it |
| |
| x float = Math.sin(y); |
| |
| Note that this process derives nothing explicit about the type of the |
| `imported' function (here Math.sin()). The import must execute to |
| provide this information to the compiler (or the programmer, for that |
| matter). |
| |
| An angled-string refers to official stuff in a public place, in effect |
| the run-time library. A double-quoted-string refers to arbitrary |
| code; it is probably a local file name that needs to be discovered |
| using rules outside the scope of the language spec. |
| |
| The file name in a package must be complete except for a suffix. |
| Moreover, the package name must correspond to the (basename of) the |
| source file name. For instance, the implementation of package Bar |
| must be in file Bar.go, and if it lives in directory foo we write |
| |
| import Bar "foo/bar"; |
| |
| to import it. |
| |
| [This is a little redundant but if we allow multiple files per package |
| it will seem less so, and in any case the redundancy is useful and |
| protective.] |
| |
| We assume Unix syntax for file names: / separators, no suffix for |
| directories. If the language is ported to other systems, the |
| environment must simulate these properties to avoid changing the |
| source code. |
| |
| |
| Declarations |
| |
| - This needs to be expanded. |
| - We need to think about enums (or some alternative mechanism). |
| |
| Declaration = (ConstDecl | VarDecl | TypeDecl | FunctionDecl | |
| ForwardDecl | AliasDecl) . |
| |
| |
| Const declarations |
| |
| ConstDecl = 'const' ( ConstSpec | '(' ConstSpecList [ ';' ] ')' ). |
| ConstSpec = identifier [ Type ] '=' Expression . |
| ConstSpecList = ConstSpec { ';' ConstSpec }. |
| |
| const pi float = 3.14159265 |
| const e = 2.718281828 |
| const ( |
| one int = 1; |
| two = 3 |
| ) |
| |
| |
| Variable declarations |
| |
| VarDecl = 'var' ( VarSpec | '(' VarSpecList [ ';' ] ')' ) | ShortVarDecl . |
| VarSpec = IdentifierList ( Type [ '=' ExpressionList ] | '=' ExpressionList ) . |
| VarSpecList = VarSpec { ';' VarSpec } . |
| ShortVarDecl = identifier ':=' Expression . |
| |
| var i int |
| var u, v, w float |
| var k = 0 |
| var x, y float = -1.0, -2.0 |
| var ( |
| i int; |
| u, v = 2.0, 3.0 |
| ) |
| |
| If the expression list is present, it must have the same number of elements |
| as there are variables in the variable specification. |
| |
| [ TODO: why is x := 0 not legal at the global level? ] |
| |
| |
| Type declarations |
| |
| TypeDecl = 'type' ( TypeSpec | '(' TypeSpecList [ ';' ] ')' ). |
| TypeSpec = identifier Type . |
| TypeSpecList = TypeSpec { ';' TypeSpec }. |
| |
| |
| type IntArray [16] int |
| type ( |
| Point struct { x, y float }; |
| Polar Point |
| ) |
| |
| |
| Function and method declarations |
| |
| FunctionDecl = 'func' [ Receiver ] identifier Parameters [ Result ] ( ';' | Block ) . |
| Block = '{' { Statement } '}' . |
| |
| |
| func min(x int, y int) int { |
| if x < y { |
| return x; |
| } |
| return y; |
| } |
| |
| func foo (a, b int, z float) bool { |
| return a*b < int(z); |
| } |
| |
| |
| A method is a function that also declares a receiver. The receiver is |
| a struct with which the function is associated. The receiver type |
| must denote a pointer to a struct. |
| |
| func (p *T) foo (a, b int, z float) bool { |
| return a*b < int(z) + p.x; |
| } |
| |
| func (p *Point) Length() float { |
| return Math.sqrt(p.x * p.x + p.y * p.y); |
| } |
| |
| func (p *Point) Scale(factor float) { |
| p.x = p.x * factor; |
| p.y = p.y * factor; |
| } |
| |
| The last two examples are methods of struct type Point. The variable p is |
| the receiver; within the body of the method it represents the value of |
| the receiving struct. |
| |
| Note that methods are declared outside the body of the corresponding |
| struct. |
| |
| Functions and methods can be forward declared by omitting the body: |
| |
| func foo (a, b int, z float) bool; |
| func (p *T) foo (a, b int, z float) bool; |
| |
| |
| |
| Statements |
| |
| Statement = EmptyStat | Assignment | CompoundStat | Declaration | |
| ExpressionStat | IncDecStat | IfStat | WhileStat | ReturnStat . |
| |
| |
| Empty statements |
| |
| EmptyStat = ';' . |
| |
| |
| Assignments |
| |
| Assignment = Designator '=' Expression . |
| |
| - no automatic conversions |
| - values can be assigned to variables if they are of the same type, or |
| if they satisfy the interface type (much more precision needed here!) |
| |
| |
| |
| Compound statements |
| |
| CompoundStat = '{' { Statement } '}' . |
| |
| |
| Expression statements |
| |
| ExpressionStat = Expression . |
| |
| |
| IncDec statements |
| |
| IncDecStat = Expression ( '++' | '--' ) . |
| |
| |
| |
| |
| If statements |
| |
| IfStat = 'if' ( [ Expression ] '{' { IfCaseList } '}' ) | |
| ( Expression '{' { Statement } '}' [ 'else' { Statement } ] ). |
| IfCaseList = ( 'case' ExpressionList | 'default' ) ':' { Statement } . |
| |
| if x < y { |
| return x; |
| } else { |
| return y; |
| } |
| |
| if tag { |
| case 0, 1: s1(); |
| case 2: s2(); |
| default: ; |
| } |
| |
| if { |
| case x < y: f1(); |
| case x < z: f2(); |
| } |
| |
| |
| While statements |
| |
| WhileStat = 'while' ( [ Expression ] '{' { WhileCaseList } '}' ) | |
| ( Expression '{' { Statement } '}' ). |
| WhileCaseList = 'case' ExpressionList ':' { Statement } . |
| |
| while { |
| case i < n: f1(); |
| case i < m: f2(); |
| } |
| |
| |
| Return statements |
| |
| ReturnStat = 'return' [ ExpressionList ] . |
| |
| There are two ways to return values from a function. The first is to |
| explicitly list the return value or values in the return statement: |
| |
| func simple_f () int { |
| return 2; |
| } |
| |
| func complex_f1() (re float, im float) { |
| return -7.0, -4.0; |
| } |
| |
| The second is to provide names for the return values and assign them |
| explicitly in the function; the return statement will then provide no |
| values: |
| |
| func complex_f2() (re float, im float) { |
| re = 7.0; |
| im = 4.0; |
| return; |
| } |
| |
| It is legal to name the return values in the declaration even if the |
| first form of return statement is used: |
| |
| |
| func complex_f2() (re float, im float) { |
| return 7.0, 4.0; |
| } |
| |
| |
| Expressions |
| |
| Expression = Conjunction { '||' Conjunction }. |
| Conjunction = Comparison { '&&' Comparison }. |
| Comparison = SimpleExpr [ relation SimpleExpr ]. |
| relation = '==' | '!=' | '<' | '<=' | '>' | '>='. |
| SimpleExpr = Term { add_op Term }. |
| add_op = '+' | '-' | '|' | '^'. |
| Term = Factor { mul_op Factor }. |
| mul_op = '*' | '/' | '%' | '<<' | '>>' | '&'. |
| |
| The corresponding precedence hierarchy is as follows: (5 levels of |
| precedence is about the maximum people can keep comfortably in their |
| heads. The experience with C and C++ shows that more then that |
| usually requires explicit manual consultation...). [gri: I still |
| think we should consider 0 levels of binary precedence: All operators |
| are on the same level, but parentheses are required when different |
| operators are mixed. That would make it really easy, and really |
| clear. It would also open the door for straight-forward introduction |
| of user-defined operators, which would be rather useful.] |
| |
| Precedence Operator |
| 1 || |
| 2 && |
| 3 == != < <= > >= |
| 4 + - | ^ |
| 5 * / % << >> & |
| |
| |
| For integer values, / and % satisfy the following relationship: |
| |
| (a / b) * b + a % b == a |
| |
| and |
| |
| (a / b) is "truncated towards zero". |
| |
| The shift operators implement arithmetic shifts for signed integers, |
| and logical shifts for unsigned integers. TBD: is there any range |
| checking on s in x >> s, or x << s ? |
| |
| [gri: We decided on a couple of issues here that we need to write down |
| more nicely] |
| |
| - There are no implicit type conversions except for |
| constants/literals. In particular, unsigned and signed integers |
| cannot be mixed in an expression w/o explicit casting. |
| |
| - Unary '^' corresponds to C '~' (bitwise negate). |
| |
| - Arrays can be subscripted (a[i]) or sliced (a[i : j]). A slice a[i |
| : j] is a new array of length (j - i), and consisting of the elements |
| a[i], a[i + 1], ... a[j - 1]. [gri/r: Is the slice array bounds |
| check hard (leading to an error), or soft (truncating) ?]. |
| Furthermore: Array slicing is very tricky! Do we get a copy (a new |
| array) or a new array descriptor? This is open at this point. There |
| is a simple way out of the mess: Structured types are always passed by |
| reference, and there is no value assignment for structured types. It |
| gets very complicated very quickly. |
| |
| [gri: Syntax below is incomplete - what about method invocation?] |
| |
| Factor = Literal | Designator | '!' Expression | '-' Expression | |
| '^' Expression | '&' Expression | '(' Expression ')' | Call. |
| Designator = QualifiedIdent { Selector }. |
| Selector = '.' identifier | '[' Expression [ ':' Expression ] ']'. |
| Call = Factor '(' ExpressionList ')'. |
| |
| [gri: We need a precise definition of a constant expression] |
| |
| |
| |
| |
| Compilation units |
| |
| The unit of compilation is a single file. A compilation unit consists |
| of a package specifier followed by a list of import declarations |
| followed by a list of global declarations. |
| |
| CompilationUnit = { ImportDecl } { GlobalDeclaration }. |
| GlobalDeclaration = Declaration. |
| |
| |
| Exports |
| |
| Globally declared identifiers may be exported, thus making the |
| exported identifer visible outside the package. Another package may |
| then import the identifier to use it. |
| |
| Export directives must only appear at the global level of a |
| compilation unit (at least for now). That is, one can export |
| compilation-unit global identifiers but not, for example, local |
| variables or structure fields. |
| |
| Exporting an identifier makes the identifier visible externally to the |
| package. If the identifier represents a type, the type structure is |
| exported as well. The exported identifiers may appear later in the |
| source than the export directive itself, but it is an error to specify |
| an identifier not declared anywhere in the source file containing the |
| export directive. |
| |
| ExportDirective = 'export' ExportIdentifier { ',' ExportIdentifier } . |
| ExportIdentifier = identifier . |
| |
| export sin, cos; |
| |
| One may export variables and types, but (at least for now), not |
| aliases. [r: what is needed to make aliases exportable? issue is |
| transitivity.] |
| |
| Exporting a variable does not automatically export the type of the |
| variable. For illustration, consider the program fragment: |
| |
| package P; |
| export v1, v2, p; |
| struct S { a int; b int; } |
| var v1 S; |
| var v2 S; |
| var p *S; |
| |
| Notice that S is not exported. Another source file may contain: |
| |
| import P; |
| alias v1 P.v1; |
| alias v2 P.v2; |
| alias p P.p; |
| |
| This program can use v and p but not access the fields (a and b) of |
| structure type S explicitly. For instance, it could legally contain |
| |
| if p == nil { } |
| if v1 == v2 { } |
| |
| but not |
| |
| if v.a == 0 { } |
| |
| |
| |