blob: 14ce42575be01215db9603b64cd72ac78c897f57 [file] [log] [blame]
Go for C programmers
ACCU, Silicon Valley Chapter, Nov. 14, 2012
# Go is a general-purpose programming language developed at Google.
# Go's design is influenced by the C and Pascal lineage, but instead of providing
# a conglomerate of features, Go takes a fresh look at the essential mechanisms
# required for modern programming tasks. As a result, Go has evolved to a compact
# language of surprising power. In this presentation we give a brief introduction
# to Go, slightly slanted towards programmers with a C/C++ background, and then
# talk about particularly interesting aspects of Go.
Robert Griesemer
Google Inc.
gri@golang.org
* Organization of this talk
Part 1: Introduction to Go
Break
Part 2: Object-oriented and concurrent programming in Go
# if time permits and desire exists
* Motivation
* Have
Incomprehensible and unsafe code.
Extremely slow builds.
Missing concurrency support.
Outdated tools.
# C, C++ are > 30 years old!
# Java is 15 years old, but not really a systems-programming language.
# (No control over data layout, no unsigned, verbose, etc.)
* Want
Readable, safe, and efficient code.
A build system that scales.
Good concurrency support.
Tools that can operate at Google-scale.
# A language for the 21. century!
* Meet Go
Compact, yet expressive
# spec is ~50 pages
Statically typed and garbage collected
Object- but not type-oriented
Strong concurrency support
Efficient implementation
Rich standard library
Fast compilers
Scalable tools
* Hello, World
.play goforc/hello.go
# the canonical first program
# note: immediately readable for a C programmer even without prior knowledge of Go
# (not the case with C++'s Hello, World, which is using the << operator)
# which bring us to the first item on the "Want" list: readable code.
* Syntax
* Syntax
_Syntax_is_not_important..._-_unless_you_are_a_programmer._
Rob Pike.
_The_readability_of_programs_is_immeasurably_more_important_than_their_writeability._
Hints on Programming Language Design
C. A. R. Hoare 1973
# A lot of effort has gone into honing Go's syntax over more than 3 years.
# The result is a clutter- and stutter-free, compact, yet readable notation.
# Consider some alternatives...
* Too verbose
scoped_ptr<logStats::LogStats>
logStats(logStats::LogStats::NewLogStats(FLAGS_logStats, logStats::LogStats::kFIFO));
# real example from C++ (2012) (names changed)
# logStats mentioned 9 times!
# too much clutter
* Too dense
(n: Int) => (2 to n) |> (r => r.foldLeft(r.toSet)((ps, x) =>
if (ps(x)) ps -- (x * x to n by x) else ps))
# prime sieve in Scala
# pretty and short but incomprehensible
* Just right
t := time.Now()
switch {
case t.Hour() < 12:
return "morning"
case t.Hour() < 18:
return "afternoon"
default:
return "evening"
}
# looks like pseudo code
# readable even w/o knowing Go
# lots of things are going on but not relevant for understanding
# no clutter
* Reading Go code
* Packages
A Go program consists of _packages_.
A _package_ consists of one or more _source_files_ (`.go` files).
Each _source_file_ starts with a _package_clause_ followed by declarations.
.code goforc/hello.go
By convention, all files belonging to a single package are located in a single directory.
* Declarations, "Pascal-style" (left to right)
Pattern: keyword names [type] [initializer]
.code goforc/decls.go /START/,/STOP/
* Why?
p, q *Point
func adder(delta int) func(x int) int
# try this in C
# composition of types straight-forward
* Constants
In Go, constants are _mathematically_precise_.
There is no need for qualifiers (`-42LL`, `7UL`, etc.)
.code goforc/consts.go /START1/,/STOP1/
Only when used, constants are truncated to size:
.play goforc/consts.go /START2/,/STOP2/
Huge win in readability and ease of use.
* Types
Familiar:
- Basic types, arrays, structs, pointers, functions.
But:
- `string` is a basic type.
- No automatic conversion of basic types in expressions.
- No pointer arithmetic; pointers and arrays are different.
- A function type represents a function; context and all.
New:
- _Slices_ instead of array pointers + separate length: `[]int`
- _Maps_ because everybody needs them: `map[string]int`
- _Interfaces_ for polymorphism: `interface{}`
- _Channels_ for communication between goroutines: `chan` `int`
* Variables
Familiar:
.code goforc/vars.go /START1/,/STOP1/
New: Type can be inferred from type of initializing expression.
.code goforc/vars.go /START2/,/STOP2/
Shortcut (inside functions):
.code goforc/vars.go /i :=/
It is _safe_ to take the address of _any_ variable:
.code goforc/vars.go /return/
* Functions
Functions may have multiple return values:
func atoi(s string) (i int, err error)
Functions are first-class citizens (_closures_):
.code goforc/adder.go /START1/,/STOP1/
.play goforc/adder.go /START2/,/STOP2/
* Statements
.code goforc/stmts.go /START/,/STOP/
* Go statements for C programmers
Cleanups:
- No semicolons
- Multiple assignments
- `++` and `--` are statements
- No parentheses around conditions; curly braces mandatory
- Implicit `break` in `switch`; explicit `fallthrough`
New:
- `for` `range`
- type `switch`
- `go`, `select`
- `defer`
* Assignments
Assignments may assign multiple values simultaneously:
a, b = x, y
Equivalent to:
t1 := x
t2 := y
a = t1
b = t2
For instance:
a, b = b, a // swap a and b
i, err = atoi(s) // assign results of atoi
i, _ = atoi(991) // discard 2nd value
* Switch statements
Switch statements may have multiple case values, and `break` is implicit:
switch day {
case 1, 2, 3, 4, 5:
tag = "workday"
case 0, 6:
tag = "weekend"
default:
tag = "invalid"
}
The case values don't have to be constants:
switch {
case day < 0 || day > 6:
tag = "invalid"
case day == 0 || day == 6:
tag = "weekend"
default:
tag = "workday"
}
* For loops
.code goforc/forloop.go /START1/,/STOP1/
A `range` clause permits easy iteration over arrays and slices:
.play goforc/forloop.go /START3/,/STOP3/
Unused values are discarded by assigning to the blank (_) identifier:
.code goforc/forloop.go /START2/,/STOP2/
* Dependencies
* Dependencies in Go
An _import_declaration_ is used to express a _dependency_ on another package:
import "net/rpc"
Here, the importing package depends on the Go package "rpc".
The _import_path_ ("net/rpc") uniquely identifies a package; multiple packages may have the same name, but they all reside at different locations (directories).
By convention, the package name matches the last element of the import path (here: "rpc").
_Exported_ functionality of the `rpc` package is available via the _package_qualifier_ (`rpc`):
rpc.Call
A Go import declaration takes the place of a C _include_.
* Naming: An excursion
How names work in a programming language is critical to readability.
Scopes control how names work.
Go has very simple scope hierarchy:
- universe
- package
- file (for imports only)
- function
- block
* Locality of names
Upper case names are _exported_: `Name` _vs._ `name`.
The package qualifier is always present for imported names.
(First component of) every name is always declared in the current package.
One of the best (and hardest!) decisions made in Go.
* Locality scales
No surprises when importing:
- Adding an exported name to my package cannot break your package!
Names do not leak across boundaries.
In C, C++, Java the name `y` could refer to anything.
In Go, `y` (or even `Y`) is always defined within the package.
In Go, `x.Y` is clear: find `x` locally, `Y` belongs to it, and there is only one such `Y`.
Immediate consequences for readability.
* Back to imports
Importing a package means reading a package's exported API.
This export information is self-contained. For instance:
- A imports B
- B imports C
- B exports contain references to C
B's export data contains all necessary information about C. There is no need for a compiler to read the export data of C.
This has huge consequences for build times!
* Dependencies in C
`.h` files are not self-contained.
As a result, a compiler ends up reading core header files over and over again.
`ifdef` still requires the preprocessor to read a lot of code.
No wonder it takes a long time to compile...
At Google scale: dependencies explode, are exponential, become almost non-computable.
A large Google C++ build can read the same header file tens of thousands of times!
# it is real work, too, teaching the compiler what a string is each time
* Tools
* A brief overview
Two compilers: gc, gccgo
Support for multiple platforms: x86 (32/64bit), ARM (32bit), Linux, BSD, OS X, ...
Automatic formatting of source code: `gofmt`
Automatic documentation extraction: `godoc`
Automatic API adaption: `gofix`
All (and more!) integrated into the `go` command.
* Building a Go program
A Go program can be compiled and linked without additional build information (`make` files, etc.).
By convention, all files belonging to a package are found in the same directory.
All depending packages are found by inspecting the import paths of the top-most (_main_) package.
A single integrated tool takes care of compiling individual files or entire systems.
* The go command
Usage:
go command [arguments]
Selected commands:
build compile packages and dependencies
clean remove object files
doc run godoc on package sources
fix run go tool fix on packages
fmt run gofmt on package sources
get download and install packages and dependencies
install compile and install packages and dependencies
list list packages
run compile and run Go program
test test packages
vet run go tool vet on packages
.link http://golang.org/cmd/go/
* Break
# After the break, we will discuss OOP and concurrency in Go.
* Object-oriented programming
* What is object-oriented programming?
"Object-oriented programming (OOP) is a programming paradigm using _objects_ – usually instances of a class – consisting of data fields and methods together with their interactions – to design applications and computer programs. Programming techniques may include features such as data abstraction, encapsulation, messaging, modularity, polymorphism, and inheritance. Many modern programming languages now support forms of OOP, at least as an option."
(Wikipedia)
# Important:
# Notion of _objects_
# _Objects_ have _state_ (data fields), and _methods_ (for interactions)
* OOP requires very little extra programming language support
We only need
- the notion of an _Object_,
- a mechanism to interact with them (_Methods_),
- and support for polymorphism (_Interfaces_).
Claim: Data abstraction, encapsulation, and modularity are mechanisms
in their own right, not OOP specific, and a modern language (OO or not)
should have support for them independently.
* Object-oriented programming in Go
Methods without classes
Interfaces without hierarchies
Code reuse without inheritance
Specifically:
- Any _value_ can be an _object_
- Any _type_ can play the role of a _class_
- _Methods_ can be attached to any _type_
- _Interfaces_ implement polymorphism.
* Methods
.play goforc/point.go
* Methods can be attached to any type
.play goforc/celsius.go
* Interfaces
type Stringer interface {
String() string
}
type Reader interface {
Read(p []byte) (n int, err error)
}
type Writer interface {
Write(p []byte) (n int, err error)
}
type Empty interface{}
An interface defines a set of methods.
A type that implements all methods of an interface is said to implement the interface.
All types implement the empty interface interface{}.
* Dynamic dispatch
.play goforc/interface.go /START/,/STOP/
A value (here: `corner`, `boiling`) of a type (`Point`, `Celsius`) that implements
an interface (`Stringer`) can be assigned to a variable (`v`) of that interface type.
* Composition and chaining
Typically, interfaces are small (1-3 methods).
Pervasive use of key interfaces in the standard library make it easy to chain APIs together.
package io
func Copy(dst Writer, src Reader) (int64, error)
The io.Copy function copies by reading from any Reader and writing to any Writer.
Interfaces are often introduced ad-hoc, and after the fact.
There is no explicit hierarchy and thus no need to design one!
* cat
.code goforc/cat.go
# An `os.File` implements the `Reader` interface;
# `os.Stdout` implements the `Writer` interface.
* Interfaces in practice
Methods on any types and _ad_hoc_ interfaces make for a light-weight OO programming style.
Go interfaces enable post-facto abstraction.
# Existing code may not know about a new interface
No explicit type hierarchies.
# Coming up with the correct type hierarchy is hard
# They are often wrong
# They are difficult and time-consuming to change
"Plug'n play" in a type-safe way.
# There's much more but we won't have time for more detail.
* Concurrency
* What is concurrency?
Concurrency is the composition of independently executing computations.
Concurrency is a way to structure software, particularly as a way to write clean code that interacts well with the real world.
It is not parallelism.
* Concurrency is not parallelism
Concurrency is not parallelism, although it enables parallelism.
If you have only one processor, your program can still be concurrent but it cannot be parallel.
On the other hand, a well-written concurrent program might run efficiently in parallel on a multiprocessor. That property could be important...
For more on that distinction, see the link below. Too much to discuss here.
.link http://golang.org/s/concurrency-is-not-parallelism
* A model for software construction
Easy to understand.
Easy to use.
Easy to reason about.
You don't need to be an expert!
(Much nicer than dealing with the minutiae of parallelism (threads, semaphores, locks, barriers, etc.))
There is a long history behind Go's concurrency features, going back to Hoare's CSP in 1978 and even Dijkstra's guarded commands (1975).
* Basic Examples
* A boring function
We need an example to show the interesting properties of the concurrency primitives.
To avoid distraction, we make it a boring example.
.code goforc/example0.go /START1/,/STOP1/
.play goforc/example0.go /START2/,/STOP2/
* Ignoring it
The `go` statement runs the function as usual, but doesn't make the caller wait.
It launches a goroutine.
The functionality is analogous to the & on the end of a shell command.
.play goforc/example1.go /START/,/STOP/
* Ignoring it a little less
When main returns, the program exits and takes the function `f` down with it.
We can hang around a little, and on the way show that both main and the launched goroutine are running.
.play goforc/example2.go /START/,/STOP/
* Goroutines
What is a goroutine? It's an independently executing function, launched by a `go` statement.
It has its own call stack, which grows and shrinks as required.
It's very cheap. It's practical to have thousands, even hundreds of thousands of goroutines.
It's not a thread.
There might be only one thread in a program with thousands of goroutines.
Instead, goroutines are multiplexed dynamically onto threads as needed to keep all the goroutines running.
But if you think of it as a very cheap thread, you won't be far off.
* Channels
A channel in Go provides a connection between two goroutines, allowing them to communicate.
.code goforc/channels.go /START1/,/STOP1/
.code goforc/channels.go /START2/,/STOP2/
.code goforc/channels.go /START3/,/STOP3/
* Using channels
A channel connects the `main` and `f` goroutines so they can communicate.
.play goforc/communication1.go /START1/,/STOP1/
.code goforc/communication1.go /START2/,/STOP2/
* Synchronization
When the main function executes `<–c`, it will wait for a value to be sent.
Similarly, when the function `f` executes `c<–value`, it waits for a receiver to be ready.
A sender and receiver must both be ready to play their part in the communication. Otherwise we wait until they are.
Thus channels both communicate and synchronize.
Channels can be unbuffered or buffered.
* Using channels between many goroutines
.play goforc/communication2.go /START1/,/STOP1/
A single channel may be used to communicate between many (not just two) goroutines; many goroutines may communicate via one or multiple channels.
This enables a rich variety of concurrency patterns.
# There's enough material for several talks on this subject alone.
* Elements of a work-stealing scheduler
.code goforc/worker1.go /START2/,/STOP2/
The `worker` uses two channels to communicate:
- The `in` channel waits for some work order.
- The `out` channel communicates the result.
- As work load, a worker (very slowly) computes the list of prime factors for a given order.
* A matching producer and consumer
.code goforc/worker1.go /START3/,/STOP3/
The `producer` produces and endless supply of work orders and sends them `out`.
The `consumer` receives `n` results from the `in` channel and then terminates.
* Putting it all together
.play goforc/worker1.go /START1/,/STOP1/
We use one worker to handle the entire work load.
Because there is only one worker, we see the result coming back in order.
This is running rather slow...
* Using 10 workers
.play goforc/worker2.go /START1/,/STOP1/
A ready worker will read the next order from the `in` channel and start working on it. Another ready worker will proceed with the next order, and so forth.
Because we have many workers and since different orders take different amounts of time to work on, we see the results coming back out-of-order.
On a multi-core system, many workers may truly run in parallel.
This is running much faster...
* The Go approach
Don't communicate by sharing memory, share memory by communicating.
* There's much more...
* Links
Go Home Page:
.link http://golang.org
Go Tour (learn Go in your browser)
.link http://tour.golang.org
Package documentation:
.link http://golang.org/pkg
Articles galore:
.link http://golang.org/doc