blob: ff336558da65f8120664022ef4afbdf4c4afb5c4 [file] [log] [blame] [view]
# Proposal: Go 2 Number Literal Changes
Russ Cox\
Robert Griesemer
Last updated: March 6, 2019
[golang.org/design/19308-number-literals](https://golang.org/design/19308-number-literals)
Discussion at:
- [golang.org/issue/19308](https://golang.org/issue/19308) (binary integer literals)
- [golang.org/issue/12711](https://golang.org/issue/12711) (octal integer literals)
- [golang.org/issue/28493](https://golang.org/issue/28493) (digit separator)
- [golang.org/issue/29008](https://golang.org/issue/29008) (hexadecimal floating point)
## Abstract
We propose four related changes to number literals in Go:
1. Add binary integer literals, as in 0b101.
2. Add alternate octal integer literals, as in 0o377.
3. Add hexadecimal floating-point literals, as in 0x1p-1021.
4. Allow _ as a digit separator in number literals.
## Background
Go adopted C’s number literal syntax and in so doing
joined a large group of widely-used languages
that all broadly agree about how numbers are written.
The group of such “C-numbered languages” includes at least
C, C++, C#, Java, JavaScript, Perl, PHP, Python, Ruby, Rust, and Swift.
In the decade since Go’s initial design,
nearly all the C-numbered languages have extended
their number literals to add one or more of the four changes in this proposal.
Extending Go in the same way makes it easier for developers
to move between these languages, eliminating an unnecessary rough edge
without adding significant complexity to the language.
### Binary Integer Literals
The idea of writing a program’s integer literals in binary is quite old,
dating back at least to
[PL/I (1964)](http://www.bitsavers.org/pdf/ibm/npl/320-0908_NPL_Technical_Report_Dec64.pdf), which used `'01111000'B`.
In C’s lineage,
[CPL (1966)](http://www.ancientgeek.org.uk/CPL/CPL_Elementary_Programming_Manual.pdf)
supported decimal, binary, and octal integers.
Binary and octal were introduced by an underlined 2 or 8 prefix.
[BCPL (1967)](http://web.eah-jena.de/~kleine/history/languages/Richards-BCPL-ReferenceManual.pdf) removed binary but retained octal,
still introduced by an 8 (it’s unclear whether the 8 was underlined or followed by a space).
[B (1972)](https://www.bell-labs.com/usr/dmr/www/kbman.html)
introduced the leading zero syntax for octal, as in `0377`.
[C as of 1974](http://cm.bell-labs.co/who/dmr/cman74.pdf) had only decimal and octal.
Hexadecimal `0x12ab` had been added by the time
[K&R (1978)](http://www.ccapitalia.net/descarga/docs/1978-ritchie-the-c-programming-language.pdf)
was published.
Possibly the earliest use of the exact `0b01111000` syntax was in
[Caml Light 0.5 (1992)](https://discuss.ocaml.org/t/the-origin-of-the-0b-01-notation/3180/2),
which was written in C and borrowed `0x12ab` for hexadecimal.
Binary integer literals using the `0b01111000` syntax were added in
[C++14 (2014)](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3472.pdf),
[C# 7.0 (2017)](https://blogs.msdn.microsoft.com/dotnet/2017/03/09/new-features-in-c-7-0/),
[Java 7 (2011)](https://docs.oracle.com/javase/7/docs/technotes/guides/language/binary-literals.html),
[JavaScript ES6 (2015)](http://www.ecma-international.org/ecma-262/6.0/#sec-literals-numeric-literals),
[Perl 5.005\_55 (1998)](https://perl5.git.perl.org/perl.git/commitdiff/4f19785bce4da39a768aa6210f1f97ab4c0600dd),
[PHP 5.4.0 (2012)](http://php.net/manual/en/language.types.integer.php),
[Python 2.6 (2008)](https://docs.python.org/2.7/whatsnew/2.6.html#pep-3127-integer-literal-support-and-syntax),
[Ruby 1.4.0 (1999)](https://github.com/ruby/ruby/blob/v1_4_0/ChangeLog#L647),
[Rust 0.1 or earlier (2012)](https://github.com/rust-lang/rust/blob/release-0.1/doc/rust.md#integer-literals),
and
[Swift 1.0 or earlier (2014)](https://carlosicaza.com/swiftbooks/SwiftLanguage.pdf).
The syntax is a leading `0b` prefix followed by some number of 0s and 1s.
There is no corresponding character escape sequence
(that is, no `'\b01111000'` for `'x'`, since `'\b'` is already used for backspace, U+0008).
Most languages also updated their integer parsing and formatting routines to support binary forms as well.
Although C++14 added binary integer literals, C itself has not, [as of C18](http://www.open-std.org/jtc1/sc22/wg14/www/abq/c17_updated_proposed_fdis.pdf).
### Octal Integer Literals
As noted earlier, octal was the
most widely-used form for writing bit patterns
in the early days of computing
(after binary itself).
Even though octal today is far less common,
B’s introduction of `0377` as syntax for octal carried forward into
C, C++, Go, Java, JavaScript, Python, Perl, PHP, and Ruby.
But because programmers don't see octal much,
it sometimes comes as a surprise that
`01234` is not 1234 decimal or that `08` is a syntax error.
[Caml Light 0.5 (1992)](https://discuss.ocaml.org/t/the-origin-of-the-0b-01-notation/3180/2),
mentioned above
as possibly the earliest language with `0b01111000` for binary,
may also have been the first to use the analogous notation `0o377` for octal.
[JavaScript ES3 (1999)](https://www.ecma-international.org/publications/files/ECMA-ST-ARCH/ECMA-262,%203rd%20edition,%20December%201999.pdf)
technically removed support for `0377` as octal,
but of course allowed implementations to continue recognizing them.
[ES5 (2009)](https://www.ecma-international.org/publications/files/ECMA-ST-ARCH/ECMA-262%205th%20edition%20December%202009.pdf)
added “strict mode,” in which, among other restrictions, octal literals are disallowed entirely
(`0377` is an error, not decimal).
[ES6 (2015)](https://www.ecma-international.org/ecma-262/6.0/index.html#sec-literals-numeric-literals)
introduced the `0o377` syntax, allowed even in strict mode.
[Python’s initial release (1991)](https://www.python.org/download/releases/early/)
used `0377` syntax for octal.
[Python 3 (2008)](https://docs.python.org/3.0/reference/lexical_analysis.html#integer-and-long-integer-literals)
changed the syntax to `0o377`,
removing the `0377` syntax (`0377` is an error, not decimal).
[Python 2.7 (2010)](https://docs.python.org/2.7/reference/lexical_analysis.html#integer-and-long-integer-literals)
backported `0o377` as an alternate octal syntax (`0377` is still supported).
[Rust (2012)](https://github.com/rust-lang/rust/blob/release-0.1/doc/rust.md#integer-literals)
initially had no octal syntax but added `0o377` in
[Rust 0.9 (2014)](https://github.com/rust-lang/rust/blob/0.9/doc/rust.md#integer-literals).
[Swift’s initial release (2014)](https://carlosicaza.com/swiftbooks/SwiftLanguage.pdf) used `0o377` for octal.
Both Rust and Swift allow decimals to have leading zeros (`0377` is decimal 377),
creating a potential point of confusion for programmers coming from
other C-numbered languages.
### Hexadecimal Floating-Point
The exact decimal floating-point literal syntax of C and its successors (`1.23e4`)
appears to have originated at IBM in
[Fortran (1956)](https://archive.computerhistory.org/resources/text/Fortran/102649787.05.01.acc.pdf),
some time after the
[1954 draft](https://archive.computerhistory.org/resources/text/Fortran/102679231.05.01.acc.pdf).
The syntax was not used in
[Algol 60 (1960)](http://web.eah-jena.de/~kleine/history/languages/Algol60-Naur.pdf)
but was adopted by [PL/I (1964)](http://www.bitsavers.org/pdf/ibm/npl/320-0908_NPL_Technical_Report_Dec64.pdf)
and
[Algol 68 (1968)](http://web.eah-jena.de/~kleine/history/languages/Algol68-Report.pdf),
and it spread from those into many other languages.
Hexadecimal floating-point literals appear to have originated in
[C99 (1999)](http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf),
spreading to
[C++17 (2017)](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0245r0.html),
[Java 5 (2004)](http://psc.informatik.uni-jena.de/languages/Java/javaspec-3.pdf)
[Perl 5.22 (2015)](https://perldoc.perl.org/perl5220delta.html#Floating-point-parsing-has-been-improved),
and
[Swift's initial release (2014)](https://carlosicaza.com/swiftbooks/SwiftLanguage.pdf).
[IEEE 754-2008](http://www.dsc.ufcg.edu.br/~cnum/modulos/Modulo2/IEEE754_2008.pdf)
also added hexadecimal floating-point literals, citing C99.
All these languages use the syntax `0x123.fffp5`,
where the “`pN`” specifies a decimal number interpreted as a power of two:
`0x123.fffp5` is (0x123 + 0xfff/0x1000) x 2^5.
In all languages, the exponent is required: `0x123.fff` is not a valid hexadecimal floating-point literal.
The fraction may be omitted, as in `0x1p-1000`.
C, C++, Java, Perl, and the IEEE 754-2008 standard
allow omitting the digits before or after the hexadecimal point:
`0x1.p0` and `0x.fp0` are valid hexadecimal floating-point literals
just as `1.` and `.9` are valid decimal literals.
Swift requires digits on both sides of a decimal or hexadecimal point;
that is, in Swift, `0x1.p0`, `0x.fp0`, `1.`, and `.9` are all invalid.
Adding hexadecimal floating-point literals also requires adding library support.
C99 added the `%a` and `%A` `printf` formats for formatting and `%a` for scanning.
It also redefined `strtod` to accept hexadecimal floating-point values.
The other languages made similar changes.
C# (as of C# 7.3, which has [no published language specification](https://github.com/dotnet/csharplang/issues/64)),
JavaScript (as of [ES8](https://www.ecma-international.org/ecma-262/8.0/index.html#sec-literals-numeric-literals)),
PHP (as of [PHP 7.3.0](http://php.net/manual/en/language.types.float.php)),
Python (as of [Python 3.7.2](https://docs.python.org/3/reference/lexical_analysis.html#floating-point-literals)),
Ruby (as of [Ruby 2.6.0](https://docs.ruby-lang.org/en/2.6.0/syntax/literals_rdoc.html#label-Numbers)),
and
Rust (as of [Rust 1.31.1](https://doc.rust-lang.org/stable/reference/tokens.html#floating-point-literals))
do not support hexadecimal floating-point literals.
### Digit Separators
Allowing the use of an underscore to separate digits in a number literal into groups dates back at least to
[Ada 83](http://archive.adaic.com/standards/83rat/html/ratl-02-01.html#2.1), possibly earlier.
A digit-separating underscore was added to
[C# 7.0 (2017)](https://blogs.msdn.microsoft.com/dotnet/2017/03/09/new-features-in-c-7-0/),
[Java 7 (2011)](https://docs.oracle.com/javase/7/docs/technotes/guides/language/underscores-literals.html),
[Perl 2.0 (1988)](https://perl5.git.perl.org/perl.git/blob/378cc40b38293ffc7298c6a7ed3cd740ad79be52:/toke.c#l1021),
[Python 3.6 (2016)](https://www.python.org/dev/peps/pep-0515/),
[Ruby 1.0 or earlier (1998)](https://github.com/ruby/ruby/blob/v1_0/parse.y#L2282),
[Rust 0.1 or earlier (2012)](https://github.com/rust-lang/rust/blob/release-0.1/doc/rust.md#integer-literals),
and
[Swift 1.0 or earlier (2014)](https://carlosicaza.com/swiftbooks/SwiftLanguage.pdf).
C has not yet added digit separators as of C18.
C++14 uses
[single-quote as a digit separator](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3781.pdf)
to avoid an ambiguity with C++11 user-defined integer suffixes
that might begin with underscore.
JavaScript is
[considering adding underscore as a digit separator](https://github.com/tc39/proposal-numeric-separator)
but ran into a similar problem with user-defined suffixes.
PHP [considered but decided against](https://wiki.php.net/rfc/number_format_separator) adding digit separators.
The design space for a digit separator feature reduces to four questions:
(1) whether to accept a separator immediately after the single-digit octal `0` base prefix, as in `0_1`;
(2) whether to accept a separator immediately after non-digit base prefixes like `0b`, `0o`, and `0x`, as in `0x_1`;
(3) whether to accept multiple separators in a row, as in `1__2`; and
(4) whether to accept trailing separators, as in `1_`.
(Note that a “leading separator” would create a variable name, as in _1.)
These four questions produce sixteen possible approaches.
Case 0b0001:
If the name “digit separator” is understood literally,
so that each underscore must separate (appear between) digits,
then the answers should be that `0_1` is allowed but `0x_1`, `1__2`, and `1_` are all disallowed.
This is the approach taken by
[Ada 83](http://archive.adaic.com/standards/83lrm/html/lrm-02-04.html#2.4)
(using `8#123#` for octal and so avoiding question 1),
[C++14](http://eel.is/c++draft/lex.icon),
[Java 7](https://docs.oracle.com/javase/7/docs/technotes/guides/language/underscores-literals.html),
and
[Swift](https://docs.swift.org/swift-book/ReferenceManual/LexicalStructure.html#ID415)
(using only `0o` for octal and thereby also avoiding question 1).
Case 0b0011:
If we harmonize the treatment of the `0` octal base prefix
with the `0b`, `0o`, and `0x` base prefixes by allowing a digit separator
between a base prefix and leading digit,
then the answers are that `0_1` and `0x_1` are allowed but `1__2` and `1_` are disallowed.
This is the approach taken in
[Python 3.6](https://www.python.org/dev/peps/pep-0515/#literal-grammar) and
[Ruby 1.8.0](https://github.com/ruby/ruby/blob/v1_8_0/parse.y#L3723).
Case 0b0111:
If we allow runs of multiple separators as well, that allows `0_1`, `0x_1`,
and `1__2`, but not `1_`.
This is the approach taken in
[C# 7.2](https://github.com/dotnet/csharplang/blob/master/proposals/csharp-7.2/leading-separator.md)
and
[Ruby 1.6.2](https://github.com/ruby/ruby/blob/v1_6_2/parse.y#L2779).
Case 0b1111:
If we then also accept trailing digit separators,
the implementation becomes trivial: ignore digit separators wherever they appear.
[Perl](https://perl5.git.perl.org/perl.git/blob/378cc40b38293ffc7298c6a7ed3cd740ad79be52:/toke.c#l1021)
takes this approach,
as does [Rust](https://swift.godbolt.org/z/1f72LH).
Other combinations have been tried:
[C# 7.0](https://github.com/dotnet/csharplang/blob/master/proposals/csharp-7.0/digit-separators.md)
used 0b0101 (`0x_1` and `1_` disallowed)
before moving to case 0b1110 in
[C# 7.2](https://github.com/dotnet/csharplang/blob/master/proposals/csharp-7.2/leading-separator.md).
[Ruby 1.0](https://github.com/ruby/ruby/blob/v1_0/parse.y#L2282)
used 0b1110 (only `0_1` disallowed)
and
[Ruby 1.3.1](https://github.com/ruby/ruby/blob/v1_3_1_/parse.y#L2779)
used 0b1101 (only `0x_1` disallowed),
before Ruby 1.6.2 tried 0b0111 and Ruby 1.8.0 settled on 0b0011.
A similar question arises for whether to allow underscore between
a decimal point and a decimal digit in a floating-point number,
or between the literal `e` and the exponent.
We won’t enumerate the cases here, but again languages
make surprising choices.
For example, in Rust, `1_.2` is valid but `1._2` is not.
## Proposal
We propose to add binary integer literals,
to add octal `0o377` as an alternate octal literal syntax,
to add hexadecimal floating-point literals,
and to add underscore as a base-prefix-or-digit separator
(case 0b0011 above; see rationale below),
along with appropriate library support.
Finally, to fit the existing imaginary literals seemlessly
into the new number literals, we propose that the imaginary
suffix `i` may be used on any (non-imaginary) number literal.
### Language Changes
The definitions in https://golang.org/ref/spec#Letters_and_digits add:
> binary_digit = "0" | "1" .
The https://golang.org/ref/spec#Integer_literals section would be amended to read:
> An integer literal is a sequence of digits representing an integer constant.
> An optional prefix sets a non-decimal base:
> 0, 0o, or 0O for octal, 0b or 0B for binary, 0x or 0X for hexadecimal.
> A single 0 is considered a decimal zero.
> In hexadecimal literals, letters a-f and A-F represent values 10 through 15.
> For readability, an underscore may appear after a base prefix or
> between successive digits; such underscores do not change the literal value.
>
> int_lit = decimal_lit | binary_lit | octal_lit | hex_lit .
> decimal_lit = "0" | ( "1" … "9" ) [ [ "_" ] decimal_digits ] .
> binary_lit = "0" ( "b" | "B" ) [ "_" ] binary_digits .
> octal_lit = "0" [ "o" | "O" ] [ "_" ] octal_digits .
> hex_lit = "0" ( "x" | "X" ) [ "_" ] hex_digits .
>
> decimal_digits = decimal_digit { [ "_" ] decimal_digit } .
> binary_digits = binary_digit { [ "_" ] binary_digit } .
> octal_digits = octal_digit { [ "_" ] octal_digit } .
> hex_digits = hex_digit { [ "_" ] hex_digit } .
>
> 42
> 4_2
> 0600
> 0_600
> 0o600
> 0O600 // second character is capital letter 'O'
> 0xBadFace
> 0xBad_Face
> 0x_67_7a_2f_cc_40_c6
> 170141183460469231731687303715884105727
> 170_141183_460469_231731_687303_715884_105727
>
> _42 // an identifier, not an integer literal
> 42_ // invalid: _ must separate successive digits
> 4__2 // invalid: only one _ at a time
> 0_xBadFace // invalid: _ must separate successive digits
The https://golang.org/ref/spec#Floating-point_literals section would be amended to read:
> A floating-point literal is a decimal or hexadecimal representation
> of a floating-point constant.
> A decimal floating-point literal consists of
> an integer part (decimal digits),
> a decimal point,
> a fractional part (decimal digits)
> and an exponent part (e or E followed by an optional sign and decimal digits).
> One of the integer part or the fractional part may be elided;
> one of the decimal point or the exponent part may be elided.
> A hexadecimal floating-point literal consists of
> a 0x or 0X prefix,
> an integer part (hexadecimal digits),
> a decimal point,
> a fractional part (hexadecimal digits),
> and an exponent part (p or P followed by an optional sign and decimal digits).
> One of the integer part or the fractional part may be elided;
> the decimal point may be elided as well, but the exponent part is required.
> (This syntax matches the one given in
> [IEEE 754-2008](https://doi.org/10.1109/IEEESTD.2008.4610935) §5.12.3.)
> For readability, an underscore may appear after a base prefix or
> between successive digits; such underscores do not change the literal value.
>
>
> float_lit = decimal_float_lit | hex_float_lit .
>
> decimal_float_lit = decimal_digits "." [ decimal_digits ] [ decimal_exponent ] |
> decimal_digits decimal_exponent |
> "." decimal_digits [ decimal_exponent ] .
> decimal_exponent = ( "e" | "E" ) [ "+" | "-" ] decimal_digits .
>
> hex_float_lit = "0" ( "x" | "X" ) hex_mantissa hex_exponent .
> hex_mantissa = [ "_" ] hex_digits "." [ hex_digits ] |
> [ "_" ] hex_digits |
> "." hex_digits .
> hex_exponent = ( "p" | "P" ) [ "+" | "-" ] decimal_digits .
>
>
> 0.
> 72.40
> 072.40 // == 72.40
> 2.71828
> 1.e+0
> 6.67428e-11
> 1E6
> .25
> .12345E+5
> 1_5. // == 15.0
> 0.15e+0_2 // == 15.0
>
> 0x1p-2 // == 0.25
> 0x2.p10 // == 2048.0
> 0x1.Fp+0 // == 1.9375
> 0X.8p-0 // == 0.5
> 0X_1FFFP-16 // == 0.1249847412109375
> 0x15e-2 // == 0x15e - 2 (integer subtraction)
>
> 0x.p1 // invalid: mantissa has no digits
> 1p-2 // invalid: p exponent requires hexadecimal mantissa
> 0x1.5e-2 // invalid: hexadecimal mantissa requires p exponent
> 1_.5 // invalid: _ must separate successive digits
> 1._5 // invalid: _ must separate successive digits
> 1.5_e1 // invalid: _ must separate successive digits
> 1.5e_1 // invalid: _ must separate successive digits
> 1.5e1_ // invalid: _ must separate successive digits
The syntax in https://golang.org/ref/spec#Imaginary_literals section would be amended to read:
> An imaginary literal represents the imaginary part of a complex constant.
> It consists of an integer or floating-point literal followed by the lower-case
> letter i.
> The value of an imaginary literal is the value of the respective
> integer or floating-point literal multiplied by the imaginary unit i.
>
> imaginary_lit = (decimal_digits | int_lit | float_lit) "i" .
>
> For backward-compatibility, an imaginary literal's integer part consisting
> entirely of decimal digits (and possibly underscores) is considered a decimal
> integer, not octal, even if it starts with a leading 0.
>
> 0i
> 0123i // == 123i for backward-compatibility
> 0o123i // == 0o123 * 1i == 83i
> 0xabci // == 0xabc * 1i == 2748i
> 0.i
> 2.71828i
> 1.e+0i
> 6.67428e-11i
> 1E6i
> .25i
> .12345E+5i
> 0x1p-2i // == 0x1p-2 * 1i == 0.25i
### Library Changes
In [`fmt`](https://golang.org/pkg/fmt/),
[`Printf`](https://golang.org/pkg/fmt/#Printf) with `%#b`
will format an integer argument in binary with a leading `0b` prefix.
Today, [`%b` already formats an integer in binary](https://play.golang.org/p/3MPBPo2sZu9)
with no prefix;
[`%#b` does the same](https://play.golang.org/p/wwPshrf3oae)
but is rejected by `go` `vet`, including during `go` `test`,
so redefining `%#b` will not break vetted, tested programs.
`Printf` with `%#o` is already defined to format an
integer argument in octal with a leading `0` (not `0o`) prefix,
and all the other available format flags have defined effects too.
It appears no change is possible here.
Clients can use `0o%o`, at least for non-negative arguments.
`Printf` with `%x`
will format a floating-point argument in hexadecimal floating-point syntax.
(Today, `%x` on a floating-point argument formats as a `%!x` error
and also provokes a vet error.)
[`Scanf`](https://golang.org/pkg/fmt/#Scanf) will accept
both decimal and hexadecimal floating-point forms
where it currently accepts decimal.
In [`go/scanner`](https://golang.org/pkg/go/scanner/),
the implementation must change to understand the
new syntax, but the public API needs no changes.
Because [`text/scanner`](https://golang.org/pkg/text/scanner/)
recognizes Go’s number syntax as well,
it will be updated to add the new numbers too.
In [`math/big`](https://golang.org/pkg/math/big/),
[`Int.SetString`](https://golang.org/pkg/math/big/#Int.SetString)
with `base` set to zero accepts binary integer literals already;
it will change to recognize the new octal prefix and the underscore digit separator.
[`ParseFloat`](https://golang.org/pkg/math/big/#ParseParse) and
[`Float.Parse`](https://golang.org/pkg/math/big/#Float.Parse) with `base` set to zero,
[`Float.SetString`](https://golang.org/pkg/math/big/#Float.SetString),
and [`Rat.SetString`](https://golang.org/pkg/math/big/#Rat.SetString) each
accept binary integer literals and hexadecimal floating-point literals already;
they will change to recognize the new octal prefix and the underscore digit separator.
Calls using non-zero bases will continue to reject inputs with underscores.
In [`strconv`](https://golang.org/pkg/strconv/),
[`ParseInt`](https://golang.org/pkg/strconv/#ParseInt)
and
[`ParseUint`](https://golang.org/pkg/strconv/#ParseUint)
will change behavior.
When the `base` argument is zero,
they will recognize binary literals like `0b0111`
and also allow underscore as a digit separator.
Calls using non-zero bases will continue to reject inputs with underscores.
[`ParseFloat`](https://golang.org/pkg/strconv/#ParseFloat)
will change to accept hexadecimal floating-point literals and
the underscore digit separator.
[`FormatFloat`](https://golang.org/pkg/strconv/#FormatFloat)
will add a new format `x` to generate hexadecimal floating-point.
In [`text/template/parse`](https://golang.org/pkg/text/template/parse),
`(*lex).scanNumber` will need to recognize the three new syntaxes.
This will provide the new literals to both
[`html/template`](https://golang.org/pkg/html/template/)
and
[`text/template`](https://golang.org/pkg/html/template/).
### Tool Changes
Gofmt will understand the new syntax once
[`go/scanner`](https://golang.org/pkg/go/scanner/)
is updated.
For legibility,
gofmt will also rewrite capitalized base prefixes `0B`, `0O`, and `0X`
and exponent prefixes `E` and `P`
to their lowercase equivalents `0b`, `0o`, `0x`, `e`, and `p`.
This is especially important for `0O377` vs `0o377`.
To avoid introducing incompatibilities into
otherwise backward-compatible code,
gofmt will not rewrite `0377` to `0o377`.
(Perhaps in a few years we will be able to consider doing that.)
## Rationale
As discussed in the background section,
the choices being made in this proposal
match those already made in Go's broader language family.
Making these same changes to Go is useful on its own
and avoids unnecessary lexical differences with the
other languages.
This is the primary rationale for all four changes.
### Octal Literals
We considered using `0o377` in the initial design of Go,
but we decided that even if Go used `0o377`
for octal, it would have to reject `0377` as invalid syntax
(that is, Go could not accept `0377` as decimal 377),
to avoid an unpleasant surprise for programmers coming
from C, C++, Java, Python 2, Perl, PHP, Ruby, and so on.
Given that `0377` cannot be decimal,
it seemed at the time unnecessary
and gratuitously different to avoid it for octal.
It still seemed that way in 2015, when the issue
was raised as [golang.org/issue/12711](https://golang.org/issue/12711).
Today, however, it seems clear that there is agreement
among at least the newer C-numbered languages
for `0o377` as octal (either alone or in addition to `0377`).
Harmonizing Go’s octal integer syntax with these languages
makes sense for the same reasons as harmonizing
the binary integer and hexadecimal floating-point syntax.
For backwards compatibility,
we must keep the existing `0377` syntax in Go 1,
so Go will have two octal integer syntaxes,
like Python 2.7 and non-strict JavaScript.
As noted earlier,
after a few years, once there are no supported Go releases
missing the `0o377` syntax,
we could consider changing
`gofmt` to at least reformat `0377` to `0o377` for clarity.
### Arbitrary Bases
Another obvious change is to consider
arbitrary-radix numbers, like Algol 68’s `2r101`.
Perhaps the form most in keeping with Go’s history
would be to allow `BxDIGITS` where `B` is the base,
as in `2x0101`, `8x377`, and `16x12ab`,
where `0x` becomes an alias for `16x`.
We considered this in the initial design of Go,
but it seemed gratuitously
different from the common C-numbered languages,
and it would still not let us interpret `0377` as decimal.
It also seemed that very few programs would be
aided by being able to write numbers in, say,
base 3 or base 36.
That logic still holds today,
reinforced by the weight of existing Go usage.
Better to add only the syntaxes that other languages use.
For discussion, see [golang.org/issue/28256](https://golang.org/issue/28256).
### Library Changes
In the library changes, the various number parsers
are changed to accept underscores only in the base-detecting case.
For example:
strconv.ParseInt("12_34", 0, 0) // decimal with underscores
strconv.ParseInt("0b11_00", 0, 0) // binary with underscores
strconv.ParseInt("012_34", 0, 0) // 01234 (octal)
strconv.ParseInt("0o12_34", 0, 0) // 0o1234 (octal)
strconv.ParseInt("0x12_34", 0, 0) // 0x1234 (hexadecimal)
strconv.ParseInt("12_34", 10, 0) // error: fixed base cannot use underscores
strconv.ParseInt("11_00", 2, 0) // error: fixed base cannot use underscores
strconv.ParseInt("12_34", 8, 0) // error: fixed base cannot use underscores
strconv.ParseInt("12_34", 16, 0) // error: fixed base cannot use underscores
Note that the fixed-base case also rejects base prefixes (and always has):
strconv.ParseInt("0b1100", 2, 0) // error: fixed base cannot use base prefix
strconv.ParseInt("0o1100", 8, 0) // error: fixed base cannot use base prefix
strconv.ParseInt("0x1234", 16, 0) // error: fixed base cannot use base prefix
The rationale for rejecting underscores when the base is known
is the same as the rationale for rejecting base prefixes:
the caller is likely to be parsing a substring of a larger
input and would not appreciate the “flexibility.”
For example, parsing hex bytes two digits at a time
might use `strconv.ParseInt(input[i:i+2], 16, 8)`,
and parsers for various text formats
use `strconv.ParseInt(field, 10, 64)`
to parse a plain decimal number.
These use cases should not be required to guard
against underscores in the inputs themselves.
On the other hand,
uses of `strconv.ParseInt` and `strconv.ParseUint` with `base` argument zero
already accept decimal, octal `0377`, and hexadecimal literals,
so they will start accepting the new binary and octal literals
and digit-separating underscores.
For example, command line flags defined with `flag.Int` will start
accepting these inputs.
Similarly, uses of `strconv.ParseFloat`, like `flag.Float64`
or the conversion of string-typed database entries to `float64`
in [`database/sql`](https://golang.org/pkg/database/sql/),
will start accepting hexadecimal floating-point literals
and digit-separating underscores.
### Digit Separators
The main bike shed to paint is the detail about
where exactly digit separators are allowed.
Following discussion on [golang.org/issue/19308](https://golang.org/issue/19308),
and matching the latest versions of Python and Ruby,
this proposal adopts the rule
that each digit separator must separate
a digit from the base prefix or another digit:
`0_1`, `0x_1`, and `1_2` are all allowed, while `1__2` and `1_` are not.
## Compatibility
The syntaxes being introduced here were all previously invalid,
either syntactically or semantically.
For an example of the latter,
`0x1.fffp-2` parses in current versions of Go
as the value `0x1`’s `fffp` field minus two.
Of course, integers have no fields, so while this program
is syntactically valid, it is still semantically invalid.
The changes to numeric parsing functions like
`strconv.ParseInt` and `strconv.ParseFloat`
mean that programs that might have failed before
on inputs like `0x1.fffp-2` or `1_2_3` will now succeed.
Some users may be surprised.
Part of the rationale with limiting the changes
to calls using `base` zero is to limit the potential surprise
to those cases that already accepted multiple syntaxes.
## Implementation
The implementation requires:
- Language specification changes, detailed above.
- Library changes, detailed above.
- Compiler changes, in gofrontend and cmd/compile/internal/syntax.
- Testing of compiler changes, library changes, and gofmt.
Robert Griesemer and Russ Cox plan to split the work
and aim to have all the changes ready at the start of the Go 1.13 cycle,
around February 1.
As noted in our blog post
[“Go 2, here we come!”](https://blog.golang.org/go2-here-we-come),
the development cycle will serve as a way to collect experience about
these new features and feedback from (very) early adopters.
At the release freeze, May 1, we will revisit the proposed features
and decide whether to include them in Go 1.13.