TODO(adonovan): this doc is internal, not for end users. Move it closer to the code in golang or protocol/semtok.
The LSP specifies semantic tokens as a way of telling clients about language-specific properties of pieces of code in a file being edited.
The client asks for a set of semantic tokens and modifiers. This note describe which ones gopls will return, and under what circumstances. Gopls has no control over how the client converts semantic tokens into colors (or some other visible indication). In vscode it is possible to modify the color a theme uses by setting the editor.semanticTokenColorCustomizations
object. We provide a little guidance later.
There are 22 semantic tokens, with 10 possible modifiers. The protocol allows each semantic token to be used with any of the 1024 subsets of possible modifiers, but most combinations don't make intuitive sense (although async documentation
has a certain appeal).
The 22 semantic tokens are namespace
, type
, class
, enum
, interface
, struct
, typeParameter
, parameter
, variable
, property
, enumMember
, event
, function
, method
, macro
, keyword
, modifier
, comment
, string
, number
, regexp
, operator
.
The 10 modifiers are declaration
, definition
, readonly
, static
, deprecated
, abstract
, async
, modification
, documentation
, defaultLibrary
.
The authoritative lists are in the specification
For the implementation to work correctly the client and server have to agree on the ordering of the tokens and of the modifiers. Gopls, therefore, will only send tokens and modifiers that the client has asked for. This document says what gopls would send if the client asked for everything. By default, vscode asks for everything.
Gopls sends 11 token types for .go
files and 1 for .*tmpl
files. Nothing is sent for any other kind of file. This all could change. (When Go has generics, gopls will return typeParameter
.)
For .*tmpl
files gopls sends macro
, and no modifiers, for each {{
...}}
scope.
There are two contrasting guiding principles that might be used to decide what to mark with semantic tokens. All clients already do some kind of syntax marking. E.g., vscode uses a TextMate grammar. The minimal principle would send semantic tokens only for those language features that cannot be reliably found without parsing Go and looking at types. The maximal principle would attempt to convey as much as possible about the Go code, using all available parsing and type information.
There is much to be said for returning minimal information, but the minimal principle is not well-specified. Gopls has no way of knowing what the clients know about the Go program being edited. Even in vscode the TextMate grammars can be more or less elaborate and change over time. (Nonetheless, a minimal implementation would not return keyword
, number
, comment
, or string
.)
The maximal position isn't particularly well-specified either. To chose one example, a format string might have formatting codes (%[4]-3.6f
), escape sequences (\U00010604
), and regular characters. Should these all be distinguished? One could even imagine distinguishing different runes by their Unicode language assignment, or some other Unicode property, such as being confusable.
Gopls does not come close to either of these principles. Semantic tokens are returned for identifiers, keywords, operators, comments, and literals. (Semantic tokens do not cover the file. They are not returned for white space or punctuation, and there is no semantic token for labels.) The following describes more precisely what gopls does, with a few notes on possible alternative choices. The references to object refer to the types.Object
returned by the type checker. The references to nodes refer to the ast.Node
from the parser.
keyword
All Go keywords are marked keyword
.namespace
All package names are marked namespace
. In an import, if there is an alias, it would be marked. Otherwise the last component of the import path is marked.type
Objects of type types.TypeName
are marked type
. It also reports a modifier for the top-level constructor of the object's type, one of: interface
, struct
, signature
, pointer
, array
, map
, slice
, chan
, string
, number
, bool
, invalid
.parameter
The formal arguments in ast.FuncDecl
and ast.FuncType
nodes are marked parameter
.variable
Identifiers in the scope of const
are modified with readonly
. nil
is usually a variable
modified with both readonly
and defaultLibrary
. (nil
is a predefined identifier; the user can redefine it, in which case it would just be a variable, or whatever.) Identifiers of type types.Variable
are, not surprisingly, marked variable
. Identifiers being defined (node ast.GenDecl
) are modified by definition
and, if appropriate, readonly
. Receivers (in method declarations) are variable
.method
Methods are marked at their definition (func (x foo) bar() {}
) or declaration in an interface
. Methods are not marked where they are used. In x.bar()
, x
will be marked either as a namespace
if it is a package name, or as a variable
if it is an interface value, so distinguishing bar
seemed superfluous.function
Bultins (types.Builtin
) are modified with defaultLibrary
(e.g., make
, len
, copy
). Identifiers whose object is types.Func
or whose node is ast.FuncDecl
are function
.comment
Comments and struct tags. (Perhaps struct tags should be property
?)string
Strings. Could add modifiers for e.g., escapes or format codes.number
Numbers. Should the i
in 23i
be handled specially?operator
Assignment operators, binary operators, ellipses (...
), increment/decrement operators, sends (<-
), and unary operators.Gopls will send the modifier deprecated
if it finds a comment // deprecated
in the godoc.
The unused tokens for Go code are class
, enum
, interface
, struct
, typeParameter
, property
, enumMember
, event
, macro
, modifier
, regexp
These comments are about vscode.
The documentation has a helpful description of which semantic tokens correspond to scopes in TextMate grammars. Themes seem to use the TextMate scopes to decide on colors.
Some examples of color customizations are here.
While a file is being edited it may temporarily contain either parsing errors or type errors. In this case gopls cannot determine some (or maybe any) of the semantic tokens. To avoid weird flickering it is the responsibility of clients to maintain the semantic token information in the unedited part of the file, and they do.