blob: 3efd228c3af88f75645e07761ac6fc91d9ed6b65 [file] [log] [blame]
The Cultural Evolution of gofmt
gofmt 的文化演变
Robert Griesemer
Google, Inc.
gri@golang.org
* gofmt
- Go source code formatter
- Defines _de_facto_ Go formatting
- All submitted Go code must be formatted with `gofmt` in `golang.org` repos.
- Functionality available outside gofmt via `go/format` library.
- No knobs!
* Original motivation
- Code reviews are software engineering best practice.
- Code reviews are informed by style guides, prescribe formatting.
# Google C++ style guide: ~65 pages (~15p on formatting)
# Go spec: ~50 pages
- *Much*too*much*time*lost*on*reviewing*formatting*rather*than*code.*
# Example: Formatting review time 10 min/day, 600 engineers => 100 manhours/day!
- Yet it's the perfect job for a machine.
- Day 1 decision to write a pretty printer for Go.
# informed by experience with Java and C++ code reviews at Google
* History
- Pretty printers and code beautifiers existed since the early days of computing.
- Essential to produce readable Lisp code:
GRINDEF (Bill Gosper, 1967) first to measure line length
- Many others:
SOAP (R. Scowen et al, 1969) Simplifies Obscure Algol Programs
NEATER2 (Ken Conrow, R. Smith, 1970) PL/1 reformatter, use as (early) error detection tool
cb (Unix Version 7, 1979) C program beautifier
indent (4.2 BSD, 1983) indent and format C source code
etc.
- More recently:
ClangFormat C/C++/Objective-C formatter
Uncrustify beautifier for C, C++, C#, ObjectiveC, D, Java, Pawn and VALA
etc.
* Reality check
- In 2007, nobody seemed to like source code formatters.
- Exception: IDE-imposed formatting.
- But: Many programmers don't use IDEs...
- Problem: If automatic formatting feels too destructive, it is not used.
- Missing insight: "good enough" uniform formatting style is better than having lots of different formats.
- Value of style guide: Uniformity, not perfection.
* The problem with pretty printers
- The more people think about their own formatting style, the more they get attached to it.
# religion
- Wrong conclusion: Automatic formatters must permit a lot of formatting options!
- But: Formatters with too many options defeat the purpose.
# e.g., indent
- Also: Very hard to do a good job.
# combinatorial explosion of styles to test
- Respecting user intent is key.
- Dealing with comments is hard.
- Language may add extra complexity (e.g., C macros)
* Formatting Go
* Keep it as simple as possible
- Small language makes task much simpler.
- Don't fret over line length control.
- Instead, respect user: Consider line breaks in original source.
- Don't support any options.
- Make it easy to use.
*One*formatting*style*to*rule*them*all!*
* Basic structure of gofmt
- Parsing of source code
- Basic formatting
- Enhancement: Handling of comments
- Make it nice: Alignment of code and comments
- But: No fancy general layout algorithms.
- Instead: Node-specific fine tuning.
* Parsing source code
- Use `go/scanner`, `go/parser`, and friends.
- Result is an abstract syntax tree (`go/ast`) for each `.go` file.
# misnomer: AST is actually a concrete syntax tree
- Each syntactic construct has a corresponding AST node.
// Syntax of an if statement.
IfStmt = "if" [ SimpleStmt ";" ] Expression Block [ "else" ( IfStmt | Block ) ] .
// An IfStmt node represents an if statement.
IfStmt struct {
If token.Pos // position of "if" keyword
Init Stmt // initialization statement; or nil
Cond Expr // condition
Body *BlockStmt
Else Stmt // else branch; or nil
}
- AST nodes have (selected) position information.
* Basic formatting
- Traverse AST and print nodes.
case *ast.IfStmt:
p.print(token.IF)
p.controlClause(false, s.Init, s.Cond, nil)
p.block(s.Body, 1)
if s.Else != nil {
p.print(blank, token.ELSE, blank)
switch s.Else.(type) {
case *ast.BlockStmt, *ast.IfStmt:
p.stmt(s.Else, nextIsRBrace)
default:
p.print(token.LBRACE, indent, formfeed)
p.stmt(s.Else, true)
p.print(unindent, formfeed, token.RBRACE)
}
}
- Printer (`p.print`) accepts a sequence of tokens, including position and white space information.
* Fine tuning
- Precedence-dependent spacing between operands.
# implemented by rsc in gofmt
- Improves readability of expressions.
x = a + b
x = a + b*c
if a+b <= d {
if a+b*c <= d {
- Use position information to guide line break decisions.
- Various other heuristics.
* Handling of comments
- Comments can appear between any two tokens of a program.
- In general, not obviously clear to which AST node a comment belongs.
# In retrospect, a heuristic might have been better than the list of comments
# we have now. See Lessons learned.
- Comments often come in groups:
// A CommentGroup represents a sequence of comments
// with no other tokens and no empty lines between.
//
type CommentGroup struct {
List []*Comment // len(List) > 0
}
- Grouped comments treated as a single larger comment.
* Representation of comments in the AST
- Sequential list of comment groups attached to the ast.File node.
# In retrospect this was not a good decision. It's general but puts burden on AST clients.
- Additionally, comments that are identified as _doc_strings_ are attached to declaration nodes.
.image ./gofmt/comments.jpg 425 600
* Formatting with comments
- Basic idea: Merge "token stream" with "comment stream" based on position information.
.image ./gofmt/merge.jpg 425 700
* Devil is in the details
# It's an entire hell of devils, really.
- Estimate current position in "source code space".
- Compare current position with comment position to decide what's next.
- Token stream also contains "white space" tokens - comments must be properly interspersed!
- Maintain buffer of unprinted white space, flush before next token, intersperse comments.
- Various heuristics to get white space correct.
- Lots of trial and error.
* Formatting individual comments
- Distinguish between line and general comments.
- Try to properly indent multi-line general comments:
func f() { func() {
/* /*
* foo * foo
* bar ==> * bar
* bal * bal
*/ */
if ... if ...
} }
- Doesn't always work well.
- Want both: comments indented, and comment contents left alone. No good solution.
* Alignment
- Carefully chosen alignment can make code easier to read:
var ( var (
x, y int = 2, 3 // foo x, y int = 2, 3 // foo
z float32 // bar ==> z float32 // bar
s string // bal s string // bal
) )
- Painful to maintain manually (regular tabs don't do the job).
- Perfect job for a formatter.
* Elastic tabstops
Regular tabs (`\t`) advance writing position to fixed tab stops.
Basic idea: Make tab stops _elastic_.
- A tab is used to indicate the _end_ of a text _cell_.
- A _column_block_ is a run of uninterrupted vertically adjacent cells.
- A column block is as wide as the widest piece of text in the cells.
Proposed by Nick Gravgaard, 2006
.link http://nickgravgaard.com/elastic-tabstops/
Implemented by `text/tabwriter` package.
* Elastic tabstops illustrated
.image ./gofmt/tabstops.jpg 500 700
* Putting it all together (1)
- Parser generates AST.
- Printer prints AST recursively, uses tabs (`\t`) to indicate elastic tab spots.
- The resulting token, position, and whitespace stream is merged with the "stream" of comments.
- Tokens are expanded into strings; all text flows through a tabwriter.
- Tabwriter replaces tabs with appropriate amount of blanks.
Works well for fixed-width fonts.
Proportional fonts could be handled by an editor supporting elastic tab stops.
# go/printer can produce output containing elastic tab stops
* Putting it all together (2)
.image ./gofmt/bigpic.jpg 550 800
* The big picture
.image ./gofmt/biggerpic.jpg 400 800
* gofmt applications
* gofmt as source code transformer
- Go rewriter (Russ Cox), `gofmt`-r`
gofmt -w -r 'a[i:len(x)] -> a[i:]' *.go
- Go simplifier, `gofmt`-s`
- API updater (Russ Cox), `go`fix`
- Language changes (removal of semicolons, others)
- goimport (Brad Fitzpatrick)
* Reactions
- The Go project mandates that all submitted code is gofmt-ed.
- First, complaints: `gofmt` doesn't do _my_ style!
- Eventually, acquiescence: The Go Team really means it!
- Finally, insight: gofmt's style is _nobody's_ favorite, yet
`gofmt` is everybody's favorite.
- Now, praise: `gofmt` is one of the many reasons why people like Go.
Formatting has become a non-issue.
* Others are starting to take note
- Formatter for Google's BUILD files (Russ Cox).
- Java formatter
- Clang formatter
- Dartfmt
.link https://www.dartlang.org/tools/dartfmt/
- etc.
Automatic source code formatting is becoming a requirement
for any kind of language.
* Conclusions
* Evolution in programming culture
- `gofmt` is significant selling point for Go
- Insight is spreading that uniform "good enough" formatting is hugely beneficial.
# no need for detailed formatting style guides
# no time wasted on formatting
# improved readability
# smaller diffs when changing code
- Source code manipulation at AST-level enables a new category of tools.
# simple to complex automatic source code transformations
# various auto-completion mechanisms (e.g. goimport)
# enables syntax evolution
- Others are taking note: Programming culture is slowly evolving.
* Lessons learned: Application
- Basic source code formatting is great initial goal.
- True power lies in source code transformation tools.
- Avoid formatting options.
- Keep it simple.
Want:
- Go parser: source code => syntax tree
- Make it easy to manipulate syntax tree in any way possible.
- Go printer: syntax tree => source code
* Lessons learned: Implementation
- Lots of trial and error in initial version.
- Single biggest mistake: comments not attached to AST nodes.
=> Current design makes it extremely hard to manipulate AST
and maintain comments in right places.
- Cludge: ast.CommentMap
Want:
- Easy to manipulate syntax tree with comments attached.
* Going forward
- Design of new syntax tree in the works (still experimental).
- Syntax tree simpler and easier to manipulate (e.g., declaration nodes)
- Faster and easier to use parser and printer.
- Make it robust and fast. Don't do anything else.
# no semantic analyses in parser
# no options in printer
# ----------------------------------------------------------------------------------
#
# Implementation size
#
# go/token 849 lines lexical tokes, source positions
# go/scanner 884 lines tokenization
# go/parser 2689 lines parsing
# go/ast 2966 lines abstract syntax tree, tree traversal
# go/printer 2948 lines actual AST printer
# go/format 115 lines helper library to make printer easy to use
# internal/format 161 lines
# cmd/gofmt 801 lines gofmt tool
# ----------------------------
# 11413 lines