blob: 9c3baa54fff2ea307470008a734db92f4869bfb3 [file] [log] [blame]
## The Cultural Evolution of gofmt
gofmt 的文化演变
The Cultural Evolution of gofmt
Robert Griesemer
Google, Inc.
gri@golang.org
* gofmt
## Go source code formatter
- Go源代码格式化工具
## Defines _de_facto_ Go formatting
- 定义了“标准“格式
## All submitted Go code must be formatted with `gofmt` in `golang.org` repos.
- golang.org代码库中所有提交的Go代码都必须通过gofmt格式化过
## Functionality available outside gofmt via `go/format` library.
- 除了gofmt之外,相同功能可以通过go/format库获得
## No knobs!
- 不需要设置!
## Original motivation
* 初衷
## Code reviews are software engineering best practice.
- 代码审查是软件工程的最佳实践
## Code reviews are informed by style guides, prescribe formatting.
- 代码审查是基于代码规范和正规格式的
# Google C++ style guide: ~65 pages (~15p on formatting)
# Google C++的规范:~65页(~15页是关于格式)
# Go spec: ~50 pages
# Go的细则:50页
## *Much*too*much*time*lost*on*reviewing*formatting*rather*than*code.*
- *太多时间浪费在审查格式上而不是代码本身了*
# Example: Formatting review time 10 min/day, 600 engineers => 100 manhours/day!
# 例子:格式审查需要10分钟/天,600工程师=>100人时/天
## Yet it's the perfect job for a machine.
- 但是这工作对机器来说是最好不过了的
## Day 1 decision to write a pretty printer for Go.
- 第一个决定就是要写一个好的格式美化器
# informed by experience with Java and C++ code reviews at Google
# 基于在Google的Java和C++的代码审查经验
## History
* 历史
## Pretty printers and code beautifiers existed since the early days of computing.
- 格式美化器和代码美化工具在计算机发展的早期就已出现
## Essential to produce readable Lisp code:
- 对于产生可读的Lisp代码很重要的:
GRINDEF (Bill Gosper, 1967) 第一个计算行长度
## Many others:
- 其他:
SOAP (R. Scowen et al, 1969) 简化了晦涩的算法程序
NEATER2 (Ken Conrow, R. Smith, 1970) PL/1格式器,作为(早期的)纠错工具
cb (Unix Version 7, 1979) C程序美化器
indent (4.2 BSD, 1983) 缩进和格化化C代码
等等
## More recently:
- 最近的:
ClangFormat C/C++/Objective-C 格式器
Uncrustify C, C++, C#, ObjectiveC, D, Java, Pawn and VALA的美化器
等等
## Reality check
* 事实上
## In 2007, nobody seemed to like source code formatters.
- 在2007年,没人喜欢代码格式器
## Exception: IDE-imposed formatting.
- 例外:IDE强制的格式化
## But: Many programmers don't use IDEs…
- 但是:很多程序员不用IDE...
## Problem: If automatic formatting feels too destructive, it is not used.
- 问题:如果是格式化太具有毁坏性,那么就没有人会用
## Missing insight: "good enough" uniform formatting style is better than having lots of different formats.
- 被忽视的观点:“刚刚好“的,统一化的格式是好过于各种不同的格式的。
## Value of style guide: Uniformity, not perfection.
- 规范的价值在于:整齐划一,而不是完美
## The problem with pretty printers
* 好的格式美化器的问题
## The more people think about their own formatting style, the more they get attached to it.
- 当越多人思考他们自己的格式风格的时候,他们就变得更加固执于此了
# religion
# 宗教
## Wrong conclusion: Automatic formatters must permit a lot of formatting options!
- 错误的结论:自动格式器必须要有很多选项!
## But: Formatters with too many options defeat the purpose.
- 但是有很多选项的格式器其实违背他们的目的
# e.g., indent
# 比如说indent
## Also: Very hard to do a good job.
- 此外,支持很多选项是难的
# combinatorial explosion of styles to test
# 太多组合需要测试
## Respecting user intent is key.
- 尊重用户的想法是最关键的
## Dealing with comments is hard.
- 处理注释是很难的
## Language may add extra complexity (e.g., C macros)
- 语言本身也会增加很多额外的复杂度(比如,C的宏)
## Formatting Go
* 格式化Go
## Keep it as simple as possible
* 尽量保证其简单
## Small language makes task much simpler.
- 小的语言能让事情变得简单
## Don't fret over line length control.
- 不要为行长度烦恼
## Instead, respect user: Consider line breaks in original source.
- 相反的,尊重用户:考虑原有代码中的断行
## Don't support any options.
- 不要支持任何选项
## Make it easy to use.
- 使其使用傻瓜化
## *One*formatting*style*to*rule*them*all!*
*一个格化标准搞定所有!*
## Basic structure of gofmt
* gofmt的基本结构
## Parsing of source code
- 源代码的处理
## Basic formatting
- 基本的格式化
## Enhancement: Handling of comments
- 附加:注释的处理
## Make it nice: Alignment of code and comments
- 完善:代码和注释的对齐
## But: No fancy general layout algorithms.
- 但是,没有牛X的通用布局算法
## Instead: Node-specific fine tuning.
- 相反的:基于节点的精细优化
## Parsing source code
* 处理源代码
## Use `go/scanner`, `go/parser`, and friends.
- 使用`go/scanner`, `go/parser`及其相关的库
## Result is an abstract syntax tree (`go/ast`) for each `.go` file.
- 给每一个go文件生成一个抽象语法树
# misnomer: AST is actually a concrete syntax tree
# 用词不当:抽象语法树其实是一个具体语法树
## Each syntactic construct has a corresponding AST node.
- 每一个语法结构都有相应的AST节点
// Syntax of an if statement.
IfStmt = "if" [ SimpleStmt ";" ] Expression Block [ "else" ( IfStmt | Block ) ] .
// An IfStmt node represents an if statement.
IfStmt struct {
If token.Pos // position of "if" keyword
Init Stmt // initialization statement; or nil
Cond Expr // condition
Body *BlockStmt
Else Stmt // else branch; or nil
}
## AST nodes have (selected) position information.
- AST节点有(选择性的)位置信息。
## Basic formatting
* 基本的格式化
## Traverse AST and print nodes.
- 遍历AST然后打印每个节点
case *ast.IfStmt:
p.print(token.IF)
p.controlClause(false, s.Init, s.Cond, nil)
p.block(s.Body, 1)
if s.Else != nil {
p.print(blank, token.ELSE, blank)
switch s.Else.(type) {
case *ast.BlockStmt, *ast.IfStmt:
p.stmt(s.Else, nextIsRBrace)
default:
p.print(token.LBRACE, indent, formfeed)
p.stmt(s.Else, true)
p.print(unindent, formfeed, token.RBRACE)
}
}
## Printer (`p.print`) accepts a sequence of tokens, including position and white space information.
- 打印器(`p.print`)接收包括位置和空格符等的一系列记号
## Fine tuning
* 细致的调节
## Precedence-dependent spacing between operands.
- 基于优先级安排操作数之间的空格.
# implemented by rsc in gofmt
## Improves readability of expressions.
- 提高表达式的可读性.
x = a + b
x = a + b*c
if a+b <= d {
if a+b*c <= d {
## Use position information to guide line break decisions.
- 使用位置信息决定何时换行.
## Various other heuristics.
- 其他一些策略.
## Handling of comments
* 注释的处理
## Comments can appear between any two tokens of a program.
- 注释可以出现在程序的任何两个词汇之间.
## In general, not obviously clear to which AST node a comment belongs.
- 通常情况下,不能很明显的知道注释属于哪个 AST 节点.
# In retrospect, a heuristic might have been better than the list of comments
# we have now. See Lessons learned.
## Comments often come in groups:
- 注释经常是成组出现:
// A CommentGroup represents a sequence of comments
// with no other tokens and no empty lines between.
//
type CommentGroup struct {
List []*Comment // len(List) > 0
}
## Grouped comments treated as a single larger comment.
- 成组的注释被处理为一个大的注释.
## Representation of comments in the AST
* 注释在 AST 上的表达
## Sequential list of comment groups attached to the ast.File node.
- 注释组的连续列表被连接到 AST 的文件节点.
# In retrospect this was not a good decision. It's general but puts burden on AST clients.
## Additionally, comments that are identified as _doc_strings_ are attached to declaration nodes.
- 另外,一些被标示为 _doc_strings_ 的注释被连接到声明节点.
.image ./gofmt/comments.jpg 425 600
## Formatting with comments
* 格式化注释
## Basic idea: Merge "token stream" with "comment stream" based on position information.
- 基本的办法:基于位置信息合并词汇流和注释流.
.image ./gofmt/merge.jpg 425 700
## Devil is in the details
* 魔鬼就在细节中
# It's an entire hell of devils, really.
## Estimate current position in "source code space".
- 在源代码中估计当前的位置.
## Compare current position with comment position to decide what's next.
- 比较当前的位置和注释的位置去决定下一个是什么.
## Token stream also contains "white space" tokens - comments must be properly interspersed!
- 词汇也包含了空格词汇 - 注释必须被合理的分布!
## Maintain buffer of unprinted white space, flush before next token, intersperse comments.
- 维持一个未被打印的空格缓冲区,在下一个词汇之前输出,然后分布注释.
## Various heuristics to get white space correct.
- 多种策略得以正确地处理空格.
## Lots of trial and error.
- 很多次的尝试和错误.
## Formatting individual comments
* 格式化单独的注释
## Distinguish between line and general comments.
- 区分代码行和注释.
## Try to properly indent multi-line general comments:
- 努力对多行注释进行合理的缩进.
func f() { func() {
/* /*
* foo * foo
* bar ==> * bar
* bal * bal
*/ */
if ... if ...
} }
## Doesn't always work well.
- 但并不总是能够处理正确.
## Want both: comments indented, and comment contents left alone. No good solution.
- 想达到两个效果:注释能够缩进,注释的内容不进行处理。还没有好的解决办法.
## Alignment
* 对齐
## Carefully chosen alignment can make code easier to read:
- 仔细选择的对齐可以让代码更容易阅读.
var ( var (
x, y int = 2, 3 // foo x, y int = 2, 3 // foo
z float32 // bar ==> z float32 // bar
s string // bal s string // bal
) )
## Painful to maintain manually (regular tabs don't do the job).
- 很难进行手工维护 (制表符并不能够做到).
## Perfect job for a formatter.
- 但是却非常适合使用格式化工具.
## Elastic tabstops
* 灵活的制表符宽度
## Regular tabs (`\t`) advance writing position to fixed tab stops.
通常的制表符把当前的写位置移动到下一个固定的位置.
## Basic idea: Make tab stops _elastic_.
基本的办法:让制表符宽度更加灵活.
## A tab is used to indicate the _end_ of a text _cell_.
- 制表符可以标示一个文本单元的结束位置.
## A _column_block_ is a run of uninterrupted vertically adjacent cells.
- 一个列块是一个连续的相邻的单元.
## A column block is as wide as the widest piece of text in the cells.
- 一个列块的宽度可以到达多个单元里最宽文本的宽度.
## Proposed by Nick Gravgaard, 2006
被 Nick Gravgaard 提出于2006
.link http://nickgravgaard.com/elastic-tabstops/
## Implemented by `text/tabwriter` package.
实现在 `text/tabwriter` 包中.
## Elastic tabstops illustrated
* 灵活制表符宽度的展示
.image ./gofmt/tabstops.jpg 500 700
## Putting it all together (1)
* 综合在一起 (1)
## Parser generates AST.
- 分析器生成 AST.
## Printer prints AST recursively, uses tabs (`\t`) to indicate elastic tab spots.
- 打印工具递归地打印AST,使用制表符去灵活的标示制表符的位置.
## The resulting token, position, and whitespace stream is merged with the "stream" of comments.
- 产生的词汇,位置和空格流会和注释流进行合并.
## Tokens are expanded into strings; all text flows through a tabwriter.
- 词汇会扩展为字符串,所有的文本流将会被制表符写入器处理.
## Tabwriter replaces tabs with appropriate amount of blanks.
- 制表符写入器会将制表符替换为合适数量的空格.
## Works well for fixed-width fonts.
对于固定宽度的字体,处理的很好.
## Proportional fonts could be handled by an editor supporting elastic tab stops.
比例大小的字体也可以被编辑器支持,如果这个编辑器可以支持灵活的制表符宽度.
# go/printer can produce output containing elastic tab stops
## Putting it all together (2)
* 综合在一起 (2)
.image ./gofmt/bigpic.jpg 550 800
## The big picture
* 从宏观上看
.image ./gofmt/biggerpic.jpg 400 800
## gofmt applications
* gofmt 的应用
## gofmt as source code transformer
* gofmt 作为源代码变换工具
## Go rewriter (Russ Cox), `gofmt`-r`
- 改写 Go 的代码 (Russ Cox), `gofmt`-r`
gofmt -w -r 'a[i:len(x)] -> a[i:]' *.go
## Go simplifier, `gofmt`-s`
- 简化 Go 的代码, `gofmt`-s`
## API updater (Russ Cox), `go`fix`
- 更新 API (Russ Cox), `go`fix`
## Language changes (removal of semicolons, others)
- 改变语言 (去掉分号,其它)
## goimport
- goimport (Brad Fitzpatrick)
## Reactions
* 大家的反应
## The Go project mandates that all submitted code is gofmt-ed.
- Go 项目要求所有提交的源代码都用 gofmt 的格式。
## First, complaints: `gofmt` doesn't do _my_ style!
- 一开始,大家都抱怨:`gofmt` 不知道怎样格式成我的风格!
## Eventually, acquiescence: The Go Team really means it!
- 慢慢地,大家不作声了:Go 项目组一定要用 gofmt!
## Finally, insight: gofmt's style is _nobody's_ favorite, yet `gofmt` is everybody's favorite.
- 最后,大家看清了:gofmt 不是任何人的风格,但所有人都喜欢 gofmt 的风格。
## Now, praise: `gofmt` is one of the many reasons why people like Go.
- 现在,大家都赞扬: `gofmt` 是大家喜欢 Go 的一个原因。
## Formatting has become a non-issue.
现在,格式已经不是一个问题。
## Others are starting to take note
* 其它语言也在向我们学习
## Formatter for Google's BUILD files (Russ Cox).
- Google 的 BUILD 文件现在也有格式器 (Russ Cox).
## Java formatter
- Java 格式器
## Clang formatter
- Clang 格式器
- Dartfmt
.link https://www.dartlang.org/tools/dartfmt/
## etc.
- 等等
## Automatic source code formatting is becoming a requirement for any kind of language.
现在,任何语言都被要求带有自动的源代码格式器。
## Conclusions
* 总结
## Evolution in programming culture
* 编程文化的演变
## `gofmt` is significant selling point for Go
- `gofmt` 是 Go 语言的一个重要的卖点
## Insight is spreading that uniform "good enough" formatting is hugely beneficial.
- 大家渐渐达成共识:一致的“足够好“的格式很有好处
# no need for detailed formatting style guides
# 无需详细的格式风格手册
# no time wasted on formatting
# 无需在格式上浪费时间
# improved readability
# 代码的可读性提高了
# smaller diffs when changing code
# 改代码时代码的变动变小了
## Source code manipulation at AST-level enables a new category of tools.
- 这种在 AST-级别上的源代码操作带动了一系列的新的工具。
# simple to complex automatic source code transformations
# 各种各样的,从简单到复杂的,自动的源代码变换
# various auto-completion mechanisms (e.g. goimport)
# 各种自动完成的机制 (例如 goimport)
# enables syntax evolution
# 使得语法可以慢慢进化
## Others are taking note: Programming culture is slowly evolving.
- 其它语言也在向我们学习:编程的文化在慢慢演变。
## Lessons learned: Application
* 至今的收获:应用程序
## Basic source code formatting is great initial goal.
- 一开始,基本的源代码格式化是一个很好的目标。
## True power lies in source code transformation tools.
- 但是,真正的用处在于源代码的变换工具。
## Avoid formatting options.
- 不要给大家有选择格式的机会。
## Keep it simple.
- 越简单越好。
## Want:
我们想要:
## Go parser: source code => syntax tree
- Go 分析器:源代码 => 语法树
## Make it easy to manipulate syntax tree in any way possible.
- 尽可能让语法树的操作变得容易。
## Go printer: syntax tree => source code
- Go 打印器:语法树 => 源代码
## Lessons learned: Implementation
* 至今的收获:实现过程
## Lots of trial and error in initial version.
- 最初的版本有很多的尝试和失败。
## Single biggest mistake: comments not attached to AST nodes.
- 最大的错误:注释没有连到 AST-节点上.
## => Current design makes it extremely hard to manipulate AST
## and maintain comments in right places.
=> 现在的设计使得操作 AST 和保持注释在正确的地方十分困难。
## Cludge: ast.CommentMap
- 很混乱:ast.CommentMap
## Want:
我们想要:
## Easy to manipulate syntax tree with comments attached.
- 容易操作语法树,连带注释。
## Going forward
* 将来的计划
## Design of new syntax tree in the works (still experimental).
- 正在设计新的语法树(仍在试验阶段)
## Syntax tree simpler and easier to manipulate (e.g., declaration nodes)
- 语法树操作起来更加简单和容易(例如:声明结点)
## Faster and easier to use parser and printer.
- 更快和更容易地使用分析器和打印器。
## Make it robust and fast. Don't do anything else.
- 让工具用起来可靠并且快。其它一概不理。
# no semantic analyses in parser
# 分析器不作语义分析。
# no options in printer
# 打印器里没有任何选择。
# ----------------------------------------------------------------------------------
#
# Implementation size
#
# go/token 849 lines lexical tokes, source positions
# go/scanner 884 lines tokenization
# go/parser 2689 lines parsing
# go/ast 2966 lines abstract syntax tree, tree traversal
# go/printer 2948 lines actual AST printer
# go/format 115 lines helper library to make printer easy to use
# internal/format 161 lines
# cmd/gofmt 801 lines gofmt tool
# ----------------------------
# 11413 lines