gopls/doc: Document how gopls generates semantic tokens.

The new file gopls/doc/semantictokens.md describes how gopls presently
decides which semantic tokens and modifiers to send to clients.

Change-Id: Ifc76e13954dde2ea58b6da446517f94be038a285
Reviewed-on: https://go-review.googlesource.com/c/tools/+/333010
Run-TryBot: Peter Weinberger <pjw@google.com>
gopls-CI: kokoro <noreply+kokoro@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Robert Findley <rfindley@google.com>
Trust: Peter Weinberger <pjw@google.com>
diff --git a/gopls/doc/semantictokens.md b/gopls/doc/semantictokens.md
new file mode 100644
index 0000000..fc541fb
--- /dev/null
+++ b/gopls/doc/semantictokens.md
@@ -0,0 +1,121 @@
+# Semantic Tokens
+
+The [LSP](https://microsoft.github.io/language-server-protocol/specifications/specification-3-17/#textDocument_semanticTokens)
+specifies semantic tokens as a way of telling clients about language-specific
+properties of pieces of code in a file being edited.
+
+The client asks for a set of semantic tokens and modifiers. This note describe which ones
+gopls will return, and under what circumstances. Gopls has no control over how the client
+converts semantic tokens into colors (or some other visible indication). In vscode it
+is possible to modify the color a theme uses by setting the `editor.semanticTokenColorCustomizations`
+object. We provide a little [guidance](#Colors) later.
+
+There are 22 semantic tokens, with 10 possible modifiers. The protocol allows each semantic
+token to be used with any of the 1024 subsets of possible modifiers, but most combinations
+don't make intuitive sense (although `async documentation` has a certain appeal).
+
+The 22 semantic tokens are `namespace`, `type`, `class`, `enum`, `interface`,
+		`struct`, `typeParameter`, `parameter`, `variable`, `property`, `enumMember`,
+		`event`, `function`, `member`, `macro`, `keyword`, `modifier`, `comment`,
+		`string`, `number`, `regexp`, `operator`.
+
+The 10 modifiers are `declaration`, `definition`, `readonly`, `static`,
+		`deprecated`, `abstract`, `async`, `modification`, `documentation`, `defaultLibrary`.
+
+The authoritative lists are in the [specification](https://microsoft.github.io/language-server-protocol/specifications/specification-3-17/#semanticTokenTypes)
+
+For the implementation to work correctly the client and server have to agree on the ordering
+of the tokens and of the modifiers. Gopls, therefore, will only send tokens and modifiers
+that the client has asked for. This document says what gopls would send if the client
+asked for everything. By default, vscode asks for everything.
+
+Gopls sends 11 token types for `.go` files and 1 for `.*tmpl` files.
+Nothing is sent for any other kind of file.
+This all could change. (When Go has generics, gopls will return `typeParameter`.)
+
+For `.*tmpl` files gopls sends `macro`, and no modifiers, for each `{{`...`}}` scope.
+
+## Semantic tokens for Go files
+
+There are two contrasting guiding principles that might be used to decide what to mark
+with semantic tokens. All clients already do some kind of syntax marking. E.g., vscode
+uses a TextMate grammar. The minimal principle would send semantic tokens only for those
+language features that cannot be reliably found without parsing Go and looking at types.
+The maximal principle would attempt to convey as much as possible about the Go code,
+using all available parsing and type information.
+
+There is much to be said for returning minimal information, but the minimal principle is
+not well-specified. Gopls has no way of knowing what the clients know about the Go program
+being edited. Even in vscode the TextMate grammars can be more or less elaborate
+and change over time. (Nonetheless, a minimal implementation would not return `keyword`,
+`number`, `comment`, or `string`.)
+
+The maximal position isn't particularly well-specified either. To chose one example, a
+format string might have formatting codes (`%[4]-3.6f`), escape sequences (`\U00010604`), and regular
+characters. Should these all be distinguished? One could even imagine distinguishing
+different runes by their Unicode language assignment, or some other Unicode property, such as
+being [confusable](http://www.unicode.org/Public/security/10.0.0/confusables.txt).
+
+Gopls does not come close to either of these principles.  Semantic tokens are returned for
+identifiers, keywords, operators, comments, and literals. (Sematic tokens do not
+cover the file. They are not returned for
+white space or punctuation, and there is no semantic token for labels.)
+The following describes more precisely what gopls
+does, with a few notes on possible alternative choices.
+The references to *object* refer to the
+```types.Object``` returned by the type checker. The references to *nodes* refer to the
+```ast.Node``` from the parser.
+
+1. __`keyword`__ All Go [keywords](https://golang.org/ref/spec#Keywords) are marked `keyword`.
+1. __`namespace`__ All package names are marked `namespace`. In an import, if there is an
+alias, it would be marked. Otherwise the last component of the import path is marked.
+1. __`type`__ Objects of type ```types.TypeName``` are marked `type`.
+If they are also ```types.Basic```
+the modifier is `defaultLibrary`. (And in ```type B struct{C}```, ```B``` has modifier `definition`.)
+1. __`parameter`__ The formal arguments in ```ast.FuncDecl``` nodes are marked `parameter`.
+1. __`variable`__  Identifiers in the
+scope of ```const``` are modified with `readonly`. ```nil``` is usually a `variable` modified with both
+`readonly` and `defaultLibrary`. (```nil``` is a predefined identifier; the user can redefine it,
+in which case it would just be a variable, or whatever.) Identifiers of type ```types.Variable``` are,
+not surprisingly, marked `variable`. Identifiers being defined (node ```ast.GenDecl```) are modified
+by `definition` and, if appropriate, `readonly`. Receivers (in method declarations) are
+`variable`.
+1. __`member`__ Members are marked at their definition (```func (x foo) bar() {}```) or declaration
+in an ```interface```. Members are not marked where they are used.
+In ```x.bar()```, ```x``` will be marked
+either as a `namespace` if it is a package name, or as a `variable` if it is an interface value,
+so distinguishing ```bar``` seemed superfluous.
+1. __`function`__ Bultins (```types.Builtin```) are modified with `defaultLibrary`
+(e.g., ```make```, ```len```, ```copy```). Identifiers whose
+object is ```types.Func``` or whose node is ```ast.FuncDecl``` are `function`.
+1. __`comment`__ Comments and struct tags. (Perhaps struct tags should be `property`?)
+1. __`string`__ Strings. Could add modifiers for e.g., escapes or format codes.
+1. __`number`__ Numbers. Should the ```i``` in ```23i``` be handled specially?
+1. __`operator`__ Assignment operators, binary operators, ellipses (```...```), increment/decrement
+operators, sends (```<-```), and unary operators.
+
+Gopls will send the modifier `deprecated` if it finds a comment
+```// deprecated``` in the godoc.
+
+The unused tokens for Go code are `class`, `enum`, `interface`,
+		`struct`, `typeParameter`, `property`, `enumMember`,
+		`event`, `macro`, `modifier`,
+		`regexp`
+
+## Colors
+
+These comments are about vscode.
+
+The documentation has a [helpful](https://code.visualstudio.com/api/language-extensions/semantic-highlight-guide#custom-textmate-scope-mappings)
+description of which semantic tokens correspond to scopes in TextMate grammars. Themes seem
+to use the TextMate scopes to decide on colors.
+
+Some examples of color customizations are [here](https://medium.com/@danromans/how-to-customize-semantic-token-colorization-with-visual-studio-code-ac3eab96141b).
+
+## Note
+
+While a file is being edited it may temporarily contain either
+parsing errors or type errors. In this case gopls cannot determine some (or maybe any)
+of the semantic tokens. To avoid weird flickering it is the responsibility
+of clients to maintain the semantic token information
+in the unedited part of the file, and they do.
\ No newline at end of file