blob: 2f3130f5a381f87810ffeeb5e435fefe31d7f2e3 [file] [log] [blame] [view]
Marcel van Lohuizen6d710732015-09-25 21:12:15 +02001# Proposal: Localization support in Go
2
Austin Clementsd6176782015-10-01 14:26:47 -04003Discussion at https://golang.org/issue/12750.
Marcel van Lohuizen6d710732015-09-25 21:12:15 +02004
5## Abstract
6This proposal gives a big-picture overview of localization support for
7Go, explaining how all pieces fit together.
8It is intended as a guide to designing the individual packages and to allow
9catching design issues early.
10
11## Background
12Localization can be a complex matter.
13For many languages, localization is more than just translating an English format
14string.
15For example, a sentence may change depending on properties of the arguments such
16as gender or plurality.
17In turn, the rendering of the arguments may be influenced by, for example:
18language, sentence context (start, middle, list item, standalone, etc.),
19role within the sentence (case: dative, nominative, genitive, etc.),
20formatting options, and
21user-specific settings, like measurement system.
22
23In other words, the format string is selected based on the arguments and the
24arguments may be rendered differently based on the format string, or even the
25position within the format string.
26
27A localization framework should provide at least the following features:
28
291. mark and extract text in code to be translated,
301. injecting translated text received from a translator, and
311. formatting values, such as numbers, currencies, units, names, etc.
32
33Language-specific parsing of values belongs in this list as well,
34but we consider it to be out of scope for now.
35
36### Localization in Go
37Although we have drawn some ideas for the design from other localization
38libraries, the design will inevitably be different in various aspects for Go.
39
40Most frameworks center around the concept of a single user per machine.
41This leads to concepts like default locale, per-locale loadable files, etc.
42Go applications tend to be multi-user and single static libraries.
43
44Also many frameworks predate CLDR-provided features such as varying values
45based on plural and gender.
46Retrofitting frameworks to use this data is hard and often results in clunky APIs.
47Designing a framework from scratch allows designing with such features in mind.
48
49### Definitions
50We call a **message** the abstract notion of some semantic content to be
51conveyed to the user.
52Each message is identified by a key, which will often be
53a fmt- or template-style format string.
54A message definition defines concrete format strings for a message
55called **variants**.
56A single message will have at least one variant per supported language.
57
58A message may take **arguments** to be substituted at given insertion points.
59An argument may have 0 or more features.
60An argument **feature** is a key-value pair derived from the value of this argument.
61Features are used to select the specific variant for a message for a given
62language at runtime.
63A **feature value** is the value of an argument feature.
64The set of possible feature values for an attribute can vary per language.
65A **selector** is a user-provided string to select a variant based on a feature
66or argument value.
67
68## Proposal
69Most messages in Go programs pass through either the fmt or one of the template
70packages.
71We treat each of these two types of packages separately.
72
73### Package golang.org/x/text/message
74Package message has drop-in replacements for most functions in the fmt package.
75Replacing one of the print functions in fmt with the equivalent in package
76message flags the string for extraction and causes language-specific rendering.
77
78Consider a traditional use of fmt:
79
80```go
81fmt.Printf("%s went to %s.", person, city)
82```
83
84To localize this message, replace fmt with a message.Printer for a given language:
85
86```go
87p := message.NewPrinter(userLang)
88p.Printf("%s went to %s.", person, city)
89```
90
91To localize all strings in a certain scope, the user could assign such a printer
92to `fmt`.
93
94Using the Printf of `message.Printer` has the following consequences:
95
96* it flags the format string for translation,
97* the format string is now a key used for looking up translations (the format
98 string is still used as a format string in case of a missing translation),
99* localizable types, like numbers are rendered corresponding to p's language.
100
101
102In practice translations will be automatically injected from
103a translator-supplied data source.
104But let’s do this manually for now.
105The following adds a localized variant for Dutch:
106
107```go
108message.Set(language.Dutch, "%s went to %s.", "%s is in %s geweest.")
109```
110
111Assuming p is configured with `language.Dutch`, the Printf above will now print
112the message in Dutch.
113
114In practice, translators do not see the code and may need more context than just
115the format string.
116The user may add context to the message by simply commenting the Go code:
117
118```go
119p.Printf("%s went to %s.", // Describes the location a person visited.
120 person, // The Person going to the location.
121 city, // The location visited.
122)
123```
124
125The message extraction tool can pick up these comments and pass them to the
126translator.
127
128The section on Features and the Rationale chapter present more details on package
129message.
130
131### Package golang.org/x/text/{template|html/template}
132Templates can be localized by using the drop-in replacement packages of equal name.
133They add the following functionality:
134
135* mark to-be-localized text in templates,
136* substitute variants of localized text based on the language, and
137* use the localized versions of the print builtins, if applicable.
138
139The `msg` action marks text in templates for localization analogous to the
140namesake construct in Soy.
141
142Consider code using core’s text/template:
143
144```go
145import "text/template"
146import "golang.org/x/text/language"
147
148const letter = `
149Dear {{.Name}},
150{{if .Attended}}
151It was a pleasure to see you at the wedding.{{else}}
152It is a shame you couldn't make it to the wedding.{{end}}
153Best wishes,
154Josie
155`
156// Prepare some data to insert into the template.
157type Recipient struct {
158 Name string
159 Attended bool
160 Language language.Tag
161}
162var recipients = []Recipient{
163 {"Mildred", true, language.English},
164 {"Aurélie", false, language.French},
165 {"Rens", false, language.Dutch},
166}
167func main() {
168 // Create a new template and parse the letter into it.
169 t := template.Must(template.New("letter").Parse(letter))
170
171 // Execute the template for each recipient.
172 for _, r := range recipients {
173 if err := t.Execute(os.Stdout, r); err != nil {
174 log.Println("executing template:", err)
175 }
176 }
177}
178```
179
180To localize this program the user may adopt the program as follows:
181
182```go
183import "golang.org/x/text/template"
184
185const letter = `
186{{msg "Opening of a letter"}}Dear {{.Name}},{{end}}
187{{if .Attended}}
188{{msg}}It was a pleasure to see you at the wedding.{{end}}{{else}}
189{{msg}}It is a shame you couldn't make it to the wedding.{{end}}{{end}}
190{{msg "Closing of a letter, followed by name (f)"}}Best wishes,{{end}}
191Josie
192`
Marcel van Lohuizenfe0e5212015-09-28 10:14:26 +0200193```
194
Marcel van Lohuizen6d710732015-09-25 21:12:15 +0200195and
Marcel van Lohuizenfe0e5212015-09-28 10:14:26 +0200196
197```go
Marcel van Lohuizen6d710732015-09-25 21:12:15 +0200198func main() {
199 // Create a new template and parse the letter into it.
200 t := template.Must(template.New("letter").Parse(letter))
201
202 // Execute the template for each recipient.
203 for _, r := range recipients {
204 if err := t.Language(r.Language).Execute(os.Stdout, r); err != nil {
205 log.Println("executing template:", err)
206 }
207 }
208}
209```
210
211To make this work, we distinguish between normal and language-specific templates.
212A normal template behaves exactly like a template in core, but may be associated
213with a set of language-specific templates.
214
215A language-specific template differs from a normal template as follows:
216It is associated with exactly one normal template, which we call its base template.
217
2181. A Lookup of an associated template will find the first non-empty result of
219 a Lookup on:
Marcel van Lohuizenfe0e5212015-09-28 10:14:26 +0200220 1. the language-specific template itself,
221 1. recursively, the result of Lookup on the template for the parent language
Marcel van Lohuizen6d710732015-09-25 21:12:15 +0200222 (as defined by language.Tag.Parent) associated with its base template, or
Marcel van Lohuizenfe0e5212015-09-28 10:14:26 +0200223 1. the base template.
Marcel van Lohuizen6d710732015-09-25 21:12:15 +02002241. Any template obtained from a lookup on a language-specific template will itself
225 be a language-specific template for the same language.
226 The same lookup algorithm applies for such templates.
2271. The builtins print, println, and printf will respectively call the Sprint,
228 Sprintln, and Sprintf methods of a message.Printer for the associated language.
229
230A top-level template called `Messages` holds all translations of messages
231in language-specific templates. This allows registering of variants using
232existing methods defined on templates.
233
234
235```go
236dutch := template.Messages.Language(language.Dutch)
237template.Must(dutch.New(`Dear {{.Name}},`).Parse(`Lieve {{.Name}},`))
238template.Must(dutch.
239 New(`It was a pleasure to see you at the wedding.`).
240 Parse(`Het was een genoegen om je op de bruiloft te zien.`))
241 // etc.
242```
243
244### Package golang.org/x/text/feature
245So far we have addressed cases where messages get translated one-to-one in
246different languages.
247Translations are often not as simple.
248Consider the message `"%[1]s went to %[2]"`, which has the arguments P (a person)
249and D (a destination).
250This one variant suffices for English.
251In French, one needs two:
252
253 gender of P is female: "%[1]s est allée à %[2]s.", and
254 gender of P is male: "%[1]s est allé à %[2]s."
255
256The number of variants needed to properly translate a message can vary
257wildly per language.
258For example, Arabic has six plural forms.
259At worst, the number of variants for a language is equal to the Cartesian product
260of all possible values for the argument features for this language.
261
262Package feature defines a mechanism for selecting message variants based on
263linguistic features of its arguments.
264Both the message and template packages allow selecting variants based on features.
265CLDR provides data for plural and gender features.
266Likewise-named packages in the text repo provide support for each.
267
268
269An argument may have multiple features.
270For example, a list of persons can have both a count attribute (the number of
271people in the list) as well as a gender attribute (the combined gender of the
272group of people in the list, the determination of which varies per language).
273
274The feature.Select struct defines a mapping of selectors to variants.
275In practice, it is created by a feature-specific, high-level wrapper.
Marcel van Lohuizenfe0e5212015-09-28 10:14:26 +0200276For the above example, such a definition may look like:
Marcel van Lohuizen6d710732015-09-25 21:12:15 +0200277
278```go
279message.SetSelect(language.French, "%s went to %s",
280 gender.Select(1, // Select on gender of the first argument.
281 "female", "%[1]s est allée à %[2]s.",
282 "other", "%[1]s est allé à %[2]s."))
283```
284
285The "1" in the Select statement refers to the first argument, which was our person.
286The message definition now expects the first argument to support the gender feature.
287For example:
288
289```go
290type Person struct {
291 Name string
292 gender.Gender
293}
294person := Person{ "Joe", gender.Male }
295p.Printf("%s went to %s.", person, city)
296```
297
298The plural package defines a feature type for plural forms.
299An obvious consumer is the numbers package.
300But any package that has any kind of amount or cardinality (e.g. lists) can use it.
301An example usage:
302
303```go
304message.SetSelect(language.English, "There are %d file(s) remaining.",
305 plural.Select(1,
306 "zero", "Done!",
307 "one", "One file remaining",
308 "other", "There are %d files remaining."))
309```
310
311This works in English because the CLDR category "zero" and "one" correspond
312exclusively to the values 0 and 1.
313This is not the case, for example, for Serbian, where "one" is really a category
314for a broad range of numbers ending in 1 but not 11.
315To deal with such cases, we borrow a notation from ICU to support exact matching:
316
317```go
318message.SetSelect(language.English, "There are %d file(s) remaining.",
319 plural.Select(1,
320 "=0", "Done!",
321 "=1", "One file remaining",
322 "other", "There are %d files remaining."))
323```
324
325Besides "=", and in addition to ICU, we will also support the "<" and ">" comparators.
326
327The template packages would add a corresponding ParseSelect to add translation variants.
328
329### Value formatting
330We now move from localizing messages to localizing values.
331This is a non-exhaustive list of value type that support localized rendering:
332
333* numbers
334* currencies
335* units
336* lists
337* dates (calendars, formatting with spell-out, intervals)
338* time zones
339* phone numbers
340* postal addresses
341
342Each type maps to a separate package that roughly provides the same types:
343
344* Value: encapsulates a value and implements fmt.Formatter.
Marcel van Lohuizenfe0e5212015-09-28 10:14:26 +0200345For example, currency.Value encapsulates the amount, the currency, and
Marcel van Lohuizen6d710732015-09-25 21:12:15 +0200346whether it should be rendered as cash, accounting, etc.
347* Formatter: a func of the form func(x interface{}) Value that creates or wraps
348a Value to be rendered according to the Formatter's purpose.
349
350Since a Formatter leaves the actual printing to the implementation of
Marcel van Lohuizenfe0e5212015-09-28 10:14:26 +0200351fmt.Formatter, the value is not printed until after it is passed to one of the
Marcel van Lohuizen6d710732015-09-25 21:12:15 +0200352print methods.
353This allows formatting flags, as well as other context information to influence
354the rendering.
355
356The State object passed to Format needs to provide more information than
Marcel van Lohuizenfe0e5212015-09-28 10:14:26 +0200357what is passed by fmt.State, namely:
Marcel van Lohuizen6d710732015-09-25 21:12:15 +0200358
359* a `language.Tag`,
360* locale settings that a user may override relative to the user locale setting
361 (e.g. preferred time format, measurement system),
362* sentence context, such as standalone, start-, mid-, or end-of-sentence, and
363* formatting options, possibly defined by the translator.
364
365To accommodate this, we either need to define a text repo-specific State
366implementation that Format implementations can type assert to or
367define a different Formatter interface.
368
369#### Example: Currencies
370We consider this pattern applied to currencies. The Value and Formatter type:
371
372```go
373// A Formatter associates formatting information with the given value. x may be a
374// Currency, a Value, or a number if the Formatter is associated with a default currency.
375type Formatter func(x interface{}) Value
376
377func (f Formatter) NumberFormat(f number.Formatter) Formatter
378...
379
380var Default Formatter = Formatter(formISO)
381var Symbol Formatter = Formatter(formSymbol)
382var SpellOut Formatter = Formatter(formSpellOut)
383
384type Value interface {
385 amount interface{}
386 currency Currency
387 formatter *settings
388}
389
390// Format formats v. If State is a format.State, the value is formatted
391// according to the given language. If State is not language-specific, it will
392// use number plus ISO code for values and the ISO code for Currency.
393func (v Value) Format(s fmt.State, verb rune)
394func (v Value) Amount() interface{}
395func (v Value) Float() (float64, error)
396func (v Value) Currency() Currency
397...
398```
399
400Usage examples:
401
402```go
403p := message.NewPrinter(language.AmericanEnglish)
404p.Printf("You pay %s.", currency.USD.Value(3)) // You pay USD 3.
405p.Printf("You pay %s.", currency.Symbol(currency.USD.Value(3))) // You pay $3.
406p.Printf("You pay %s.", currency.SpellOut(currency.USD.Value(1)) // You pay 1 US Dollar.
407spellout := currency.SpellOut.NumberFormat(number.SpellOut)
408p.Printf("You pay %s.", spellout(currency.USD.Value(3))) // You pay three US Dollars.
409```
410
411Formatters have option methods for creating new formatters.
412Under the hood all formatter implementations use the same settings type, a
413pointer of which is included as a field in Value.
414So option methods can access a formatter’s settings by formatting a dummy value.
415
416Different types of currency types are available for different localized rounding
417and accounting practices.
418
419```go
420v := currency.CHF.Value(3.123)
421p.Printf("You pay %s.", currency.Cash.Value(v)) // You pay CHF 3.15.
422
423spellCash := currency.SpellOut.Kind(currency.Cash).NumberFormat(number.SpellOut)
424p.Printf("You pay %s.", spellCash(v)) // You pay three point fifteen Swiss Francs.
425```
426
427The API ensures unused tables are not linked in.
428For example, the rather large tables for spelling out numbers and currencies
429needed for number.SpellOut and currency.SpellOut are only linked in when
430the respective formatters are called.
431
432#### Example: units
Marcel van Lohuizenfe0e5212015-09-28 10:14:26 +0200433Units are like currencies but have the added complexity that the amount and
Marcel van Lohuizen6d710732015-09-25 21:12:15 +0200434unit may change per locale.
435The Formatter and Value types are analogous to those of Currency.
436It defines "constructors" for a selection of unit types.
437
438```go
439type Formatter func(x interface{}) Value
440var (
441 Symbol Formatter = Formatter(formSymbol)
442 SpellOut Formatter = Formatter(formSpellOut)
443)
444// Unit sets the default unit for the formatter. This allows the formatter to
445// create values directly from numbers.
446func (f Formatter) Unit(u Unit) Formatter
447
448// create formatted values:
449func (f Formatter) Value(x interface{}, u Unit) Value
450func (f Formatter) Meters(x interface{}) Value
451func (f Formatter) KilometersPerHour(x interface{}) Value
452
453
454type Unit int
455const SpeedKilometersPerHour Unit = ...
456
457type Kind int
458const Speed Kind = ...
459```
460
461Usage examples:
462
463```go
464p := message.NewPrinter(language.AmericanEnglish)
465p.Printf("%d", unit.KilometersPerHour(250)) // 155 mph
466```
467
468spelling out the unit names:
469
470```go
471p.Print(unit.SpellOut.KilometersPerHour(250)) // 155.343 miles per hour
472```
473
474Associating a default unit with a formatter allows it to format numbers directly:
475
476```go
477kmh := unit.SpellOut.Unit(unit.SpeedKilometersPerHour)
478p.Print(kmh(250)) // 155.343 miles per hour
479```
480
481Spell out the number as well:
482
483```go
484spellout := unit.SpellOut.NumberFormat(number.SpellOut)
485p.Print(spellout.KilometersPerHour(250))
486// one hundred fifty-five point three four three miles per hour
487```
488
489or perhaps also
490
491```go
492p.Print(unit.SpellOut.KilometersPerHour(number.SpellOut(250)))
493// one hundred fifty-five point three four three miles per hour
494```
495
496Using a formatter, like `number.SpellOut(250)`, just returns a Value wrapped
497with the new formatting settings.
498The underlying value is retained, allowing its features to select
499the proper unit names.
500
501There may be an ambiguity as to which unit to convert to when converting from
502US to the metric system.
503For example, feet can be converted to meters or centimeters.
504Moreover, which one is to prefer may differ per language.
505If this is an issue we may consider allowing overriding the default unit to
506convert in a message.
507For example:
508
509 %[2:unit=km]f
510
511Such a construct would allow translators to annotate the preferred unit override.
512
513
514## Details and Rationale
515
516### Formatting
517
518The proposed Go API deviates from a common pattern in other localization APIs by
519_not_ associating a Formatter with a language.
520Passing the language through State has several advantages:
521
5221. the user needs to specify a language for a message only once, which means
523 1. less typing,
524 1. no possibility of mismatch, and
525 1. no need to initialize a formatter for each language (which may mean on
526 every usage),
5271. the value is preserved up till selecting the variant, and
5281. a string is not rendered until its context is known.
529
530It prevents strings from being rendered prematurely, which, in turn, helps
531picking the proper variant and allows translators to pass in options in
532formatting strings.
533The Formatter construct is a natural way of allowing for this flexibility and
534allows for a straightforward and natural API for something that is otherwise
535quite complex.
536
537The Value types of the formatting packages conflate data with formatting.
538However, formatting types often are strongly correlated to types.
539Combining formatting types with values is not unlike associating the time zone
540with a Time or rounding information with a number.
541Combined with the fact that localized formatting is one of the main purposes
542of the text repo, it seems to make sense.
543
544#### Differences from the fmt package
545Formatted printing in the message package differs from the equivalent in the
546fmt package in various ways:
547
548* An argument may be used solely for its features, or may be unused for
Marcel van Lohuizenfe0e5212015-09-28 10:14:26 +0200549 specific variants.
Marcel van Lohuizen6d710732015-09-25 21:12:15 +0200550 It is therefore possible to have a format string that has no
551 substitutions even in the presence of arguments.
Marcel van Lohuizenfe0e5212015-09-28 10:14:26 +0200552* Package message dynamically selects a variant based on the
Marcel van Lohuizen6d710732015-09-25 21:12:15 +0200553 arguments’ features and the configured language.
554 The format string passed to a formatted print method is mostly used as a
555 reference or key.
556* The variant selection mechanism allows for the definition of variables
557 (see the section on package feature).
558 It seems unnatural to refer to these by position.
559 We contemplate the usage of named arguments for such variables: `%[name]s`.
Marcel van Lohuizenfe0e5212015-09-28 10:14:26 +0200560* Rendered text is always natural language and values render accordingly.
Marcel van Lohuizen6d710732015-09-25 21:12:15 +0200561 For example, `[]int{1, 2, 3}` will be rendered, in English, as `"1, 2 and 3"`,
562 instead of `"[1 2 3]"`.
563* Formatters may use information about sentence context.
Marcel van Lohuizenfe0e5212015-09-28 10:14:26 +0200564 Such meta data must be derived by automated analysis or supplied by a
565 translator.
Marcel van Lohuizen6d710732015-09-25 21:12:15 +0200566
567Considering the differences with fmt we expect package message to do its own
568parsing.
569Different substitution points of the same argument may require a different State
570object to be passed.
571Using fmt’s parser would require rewriting such arguments into different forms
572and/or exposing more internals of fmt in the API.
573It seems more straightforward for package message to do its own parsing.
574Nonetheless, we aim to utilize as much of the fmt package as possible.
575
576#### Currency
577Currency is its own package.
578In most localization APIs the currency formatter is part of the number formatter.
579Currency data is large, though, and putting it in its own package
580avoids linking it in unnecessarily.
581Separating the currency package also allows greater control over options.
582Currencies have specific locale-sensitive rounding and scale settings that
583may interact poorly with options provided for a number formatter.
584
585#### Units
586We propose to have one large package that includes all unit types.
587We could split this package up in, for example, packages for energy, mass,
588length, speed etc.
589However, there is a lot of overlap in data (e.g. kilometers and kilometers per hour).
590Spreading the tables across packages will make sharing data harder.
591Also, not all units belong naturally in a specific package.
592
593To mitigate the impact of including large tables, we can have composable modules
594of data from which user can compose smaller formatters
595(similar to the display package).
596
597
598### Features
599
600The proposed mechanism for features takes a somewhat different approach
601to OS X and ICU.
602It allows mitigating the combinatorial explosion that may occur when combining
603features while still being legible.
604
605#### Matching algorithm
606The matching algorithm returns the first match on a depth-first search on all cases.
607We also allow for variable assignment.
608We define the following types (in Go-ey pseudo code):
609
610 Select struct {
611 Feature string // identifier of feature type
612 Argument interface{} // Argument reference
613 Cases []Case // The variants.
614 }
615 Case struct { Selector string; Value interface{} }
616 Var: struct { Name string; Value interface{} }
617 Value: Select or String
618 SelectSequence: [](Select or Var)
619
620To select a variant given a set of arguments:
621
622
6231. Initialize a map m from argument name to argument value.
6241. For each v in s:
625 1. If v is of type Var, update m[v.Name] = Eval(v.Value, m)
626 1. If v is of type Select, then let v be Eval(v, m).
627 1. If v is of type string, return v.
628
629Eval(v, m): Value
630
6311. If v is a string, return it.
6321. Let f be the feature value for feature v.Feature of argument v.Argument.
6331. For each case in v.Cases,
634 1. return Eval(v) if f.Match(case.Selector, f, v.Argument)
6351. Return nil (no match)
636
637Match(s, cat, arg): string x string x interface{} // Implementation for numbers.
638
6391. If s[0] == ‘=’ return int(s[1:]) == arg.
6401. If s[0] == ‘<’ return int(s[1:]) < arg.
6411. If s[0] == ‘>’ return int(s[1:]) > arg.
6421. If s == cat return true.
6431. return s == "other"
644
645A simple data structure encodes the entire Select procedure, which makes it
646trivially machine-readable, a condition for including it in a translation pipeline.
647
648#### Full Example
649
650Consider the message `"%[1]s invite %[2] to their party"`, where argument 1 an 2
651are lists of respectively hosts and guests, and data:
652
653
654```go
655map[string]interface{}{
656 "Hosts": []gender.String{
Ben Lubar3c72fa42016-02-19 21:56:53 -0600657 gender.Male.String("Andy"),
Marcel van Lohuizen6d710732015-09-25 21:12:15 +0200658 gender.Female.String("Sheila"),
659 },
660 "Guests": []string{ "Andy", "Mary", "Bob", "Linda", "Carl", "Danny" },
661}
662```
663
664
665The following variant selector covers various cases for different values of the
666arguments.
667It limits the number of guests listed to 4.
668
669```go
670message.SetSelect(en, "%[1]s invite %[2]s and %[3]d other guests to their party.",
671 plural.Select(1, // Hosts
672 "=0", `There is no party. Move on!`,
673 "=1", plural.Select(2, // Guests
674 "=0", `%[1]s does not give a party.`,
675 "other", plural.Select(3, // Other guests count
676 "=0", gender.Select(1, // Hosts
677 "female", "%[1]s invites %[2]s to her party.",
678 "other ", "%[1]s invites %[2]s to his party."),
679 "=1", gender.Select(1, // Hosts
Ben Lubar3c72fa42016-02-19 21:56:53 -0600680 "female", "%[1]s invites %#[2]s and one other person to her party.",
681 "other ", "%[1]s invites %#[2]s and one other person to his party."),
Marcel van Lohuizen6d710732015-09-25 21:12:15 +0200682 "other", gender.Select(1, // Hosts
683 "female", "%[1]s invites %#[2]s and %[3]d other people to her party.",
684 "other ", "%[1]s invites %#[2]s and %[3]d other people to his party.")),
685 "other", plural.Select(2, // Guests,
686 "=0 ", "%[1]s do not give a party.",
687 "other", plural.Select(3, // Other guests count
688 "=0", "%[1]s invite %[2]s to their party.",
Ben Lubar3c72fa42016-02-19 21:56:53 -0600689 "=1", "%[1]s invite %#[2]s and one other person to their party.",
690 "other ", "%[1]s invite %#[2]s and %[3]d other people to their party."))))
Marcel van Lohuizen6d710732015-09-25 21:12:15 +0200691```
692
693<!-- ```go
694template.Language(language.English).
695New("{{.Hosts}} invite {{.Guests}} to their party.").
696ParseSelect(plural.Select(".Hosts",
697 "=0", `There is no party. Move on!`,
698 "=1", plural.Select(".Guests",
699 "=0", `{{.Hosts}} does not give a party.`,
700 "<5", gender.Select(".Hosts",
701 "female", `{{.Hosts}} invites {{.Guests}} to her party.`,
702 "other ", `{{.Hosts}} invites {{.Guests}} to his party.`),
703 "=5", gender.Select(".Hosts",
704 "female", `{{.Hosts}} invites {{first 4 .Guests}} and one other
705 person to her party.`,
706 "other ", `{{.Hosts}} invites {{first 4 .Guests}} and one other
707 person to his party.`),
708 "other", gender.Select(".Hosts",
709 "female", `{{.Hosts}} invites {{first 4 .Guests}} and {{offset 4 .Guests}}
710 other people to her party.`,
711 "other ", `{{.Hosts}} invites {{first 4 .Guests}} and {{offset 4 .Guests}}
712 other people to his party.`),
713 ),
714 "other", plural.Select(".Guests",
715 "=0 ", `{{.Hosts}} do not give a party.`,
716 "<5 ", `{{.Hosts}} invite {{.Guests}} to their party.`,
717 "=5 ", `{{.Hosts}} invite {{first 4 .Guests}} and one other person
718 to their party.`,
719 "other ", `{{.Hosts}} invite {{first 4 .Guests}} and
720 {{offset 4 .Guests}} other people to their party.`)))
721``` -->
722
723For English, we have three variables to deal with:
724the plural form of the hosts and guests and the gender of the hosts.
725Both guests and hosts are slices.
726Slices have a plural feature (its cardinality) and gender (based on CLDR data).
727We define the flag `#` as an alternate form for lists to drop the comma.
728
729It should be clear how quickly things can blow up with when dealing with
730multiple features.
731There are 12 variants.
732For other languages this could be quite a bit more.
733Using the properties of the matching algorithm one can often mitigate this issue.
734With a bit of creativity, we can remove the two cases where `Len(Guests) == 0`
735and add another select block at the start of the list:
736
737
738
739```go
Ben Lubar3c72fa42016-02-19 21:56:53 -0600740message.SetSelect(en, "%[1]s invite %[2]s and %[3]d other guests to their party.",
741 plural.Select(2, "=0", `There is no party. Move on!`),
742 plural.Select(1,
Marcel van Lohuizen6d710732015-09-25 21:12:15 +0200743 "=0", `There is no party. Move on!`,
744
745```
746
747<!-- ```go
748template.Language(language.English).
749 New("{{.Hosts}} invite {{.Guests}} to their party.").
750 ParseSelect(
751 plural.Select(".Guests", "=0", `There is no party. Move on!`),
752 plural.Select(".Hosts",
753 "=0", `There is no party. Move on!`,
754
755``` -->
756
757The algorithm will return from the first select when `len(Guests) == 0`,
758so this case will not have to be considered later.
759
760Using Var we can do a lot better, though:
761
762```go
763message.SetSelect(en, "%[1]s invite %[2]s and %[3]d other guests to their party.",
764 feature.Var("noParty", "There is no party. Move on!"),
765 plural.Select(1, "=0", "%[noParty]s"),
766 plural.Select(2, "=0", "%[noParty]s"),
767
768 feature.Var("their", gender.Select(1, "female", "her", "other ", "his")),
769 // Variables may be overwritten.
770 feature.Var("their", plural.Select(1, ">1", "their")),
771 feature.Var("invite", plural.Select(1, "=1", "invites", "other ", "invite")),
772
773 feature.Var("guests", plural.Select(3, // other guests
774 "=0", "%[2]s",
775 "=1", "%#[2]s and one other person",
Ben Lubar3c72fa42016-02-19 21:56:53 -0600776 "other", "%#[2]s and %[3]d other people"),
Marcel van Lohuizen6d710732015-09-25 21:12:15 +0200777 feature.String("%[1]s %[invite]s %[guests]s to %[their]s party."))
778```
779
780<!--```go
781template.Language(language.English).
782 New("{{.Hosts}} invite {{.Guests}} to their party.").
783 ParseSelect(
784 feature.Var("noParty", "There is no party. Move on!"),
785 plural.Select(".Hosts", "=0", `{{$noParty}}`),
786 plural.Select(".Guests", "=0", `{{$noParty}}`),
787
788 feature.Var("their", gender.Select(".Hosts",
789 "female", "her",
790 "other ", "his")),
791 // Variables may be overwritten.
792 feature.Var("their", plural.Select(".Hosts", ">1", "their")),
793 feature.Var("invite", plural.Select(".Hosts",
794 "=1", "invites",
795 "other ", "invite")),
796
797 plural.Select(".Guests",
798 "<5", `{{.Hosts}} {{$invite}} {{.Guests}} to {{$their}} party.`,
799 "=5", `{{.Hosts}} {{$invite}} {{first 4 .Guests}} and one other person
800 to {{$their}} party.`,
801 "other", `{{.Hosts}} {{$invite}} {{first 4 .Guests | printf "%#v"}}
802 and {{offset 4 .Guests}} other people to {{$their}} party.`))
803```-->
804
805
806This is essentially the same as the example before, but with the use of
807variables to reduce the verbosity.
808If one always shows all guests, there would only be one variant for describing
809the guests attending a party!
810
811#### Comparison to ICU
812ICU has a similar approach to dealing with gender and plurals.
813The above example roughly translates to:
814
815```
816`{num_hosts, plural,
817 =0 {There is no party. Move on!}
818 other {
819 {gender_of_host, select,
820 female {
821 {num_guests, plural, offset:1
822 =0 {{host} does not give a party.}
823 =1 {{host} invites {guest} to her party.}
824 =2 {{host} invites {guest} and one other person to her party.}
825 other {{host} invites {guest} and # other people to her party.}}}
826 male {
827 {num_guests, plural, offset:1
828 =0 {{host} does not give a party.}
829 =1 {{host} invites {guest} to his party.}
830 =2 {{host} invites {guest} and one other person to his party.}
831 other {{host} invites {guest} and # other people to his party.}}}
832 other {
833 {num_guests, plural, offset:1
834 =0 {{host} do not give a party.}
835 =1 {{host} invite {guest} to their party.}
836 =2 {{host} invite {guest} and one other person to their party.}
837 other {{host} invite {guest} and # other people to their party.}}}}}}`
838```
839
840Comparison:
841
842* In Go, features are associated with values, instead of passed separately.
843* There is no Var construct in ICU.
844* Instead the ICU notation is more flexible and allows for notations like:
845
846 ```
847 "{1, plural,
848 zero {Personne ne se rendit}
849 one {{0} est {2, select, female {allée} other {allé}}}
850 other {{0} sont {2, select, female {allées} other {allés}}}} à {3}"
851 ```
852
853* In Go, strings can only be assigned to variables or used in leaf nodes of a
854 select. We find this to result in more readable definitions.
855* The Go notation is fully expressed in terms of Go structs:
856 * There is no separate syntax to learn.
857 * Most of the syntax is checked at compile time.
858 * It is serializable and machine readable without needing another parser.
859* In Go, feature types are fully generic.
860* Go has no special syntax for constructs like offset (see the third argument
861in ICU’s plural select and the "#" for substituting offsets).
862We can solve this with pipelines in templates and special interpretation for
863flag and verb types for the Format implementation of lists.
Marcel van Lohuizenfe0e5212015-09-28 10:14:26 +0200864* ICU's algorithm seems to prohibit the user of ‘<’ and ‘>’ selectors.
Marcel van Lohuizen6d710732015-09-25 21:12:15 +0200865
866#### Comparison to OS X
867
868OS X recently introduced support for handling plurals and prepared for support
869for gender.
Marcel van Lohuizenfe0e5212015-09-28 10:14:26 +0200870The data for selecting variants is stored in the stringsdict file.
Marcel van Lohuizen6d710732015-09-25 21:12:15 +0200871This example from the referenced link shows how to vary sentences for
872"number of files selected" in English:
873
874```
875<?xml version="1.0" encoding="UTF-8"?>
876<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
877<plist version="1.0">
878<dict>
879 <key>%d files are selected</key>
880 <dict>
881 <key>NSStringLocalizedFormatKey</key>
882 <string>%#@num_files_are@ selected</string>
883 <key>num_files_are</key>
884 <dict>
885 <key>NSStringFormatSpecTypeKey</key>
886 <string>NSStringPluralRuleType</string>
887 <key>NSStringFormatValueTypeKey</key>
888 <string>d</string>
889 <key>zero</key>
890 <string>No file is</string>
891 <key>one</key>
892 <string>A file is</string>
893 <key>other</key>
894 <string>%d files are</string>
895 </dict>
896 </dict>
897</dict>
898</plist>
899```
900
901The equivalent in the proposed Go format:
902
903```go
904message.SetSelect(language.English, "%d files are selected",
905 feature.Var("numFilesAre", plural.Select(1,
906 "zero", "No file is",
907 "one", "A file is",
908 "other", "%d files are")),
909 feature.String("%[numFilesAre]s selected"))
910```
911
912A comparison between OS X and the proposed design:
913
914* In both cases, the selection of variants can be represented in a data structure.
915* OS X does not have a specific API for defining the variant selection in code.
916* Both approaches allow for arbitrary feature implementations.
917* OS X allows for a similar construct to Var to allow substitution of substrings.
918* OS X has extended its printf-style format specifier to allow for named substitutions.
919 The substitution string `"%#@foo@"` will substitute the variable foo.
920 The equivalent in Go is the less offensive `"%[foo]v"`.
921
922### Code organization
923The typical Go deployment is that of a single statically linked binary.
924Traditionally, though, most localization frameworks have grouped data in
925per-language dynamically-loaded files.
926We suggested some code organization methods for both use cases.
927
928#### Example: statically linked package
929
930In the following code, a single file called messages.go contains all collected
931translations:
932
933```go
934import "golang.org/x/text/message"
935func init() {
936 for _, e := range entries{
937 for _, t := range e {
938 message.SetSelect(e.lang, t.key, t.value)
939 }
940 }
941}
942type entry struct {
Ben Lubar3c72fa42016-02-19 21:56:53 -0600943 key string
944 value feature.Value
Marcel van Lohuizen6d710732015-09-25 21:12:15 +0200945}
946var entries = []struct{
Ben Lubar3c72fa42016-02-19 21:56:53 -0600947 lang language.Tag
948 entry []entry
Marcel van Lohuizen6d710732015-09-25 21:12:15 +0200949}{
Ben Lubar3c72fa42016-02-19 21:56:53 -0600950 { language.French, []entry{
Marcel van Lohuizen6d710732015-09-25 21:12:15 +0200951 { "Hello", feature.String("Bonjour") },
952 { "%s went to %s", feature.Select{ … } },
Ben Lubar3c72fa42016-02-19 21:56:53 -0600953
954 },
Marcel van Lohuizen6d710732015-09-25 21:12:15 +0200955}
956
957```
958
959#### Example: dynamically loaded files
960
961We suggest storing per-language data files in a messages subdirectory:
962
963```go
964func NewPrinter(t language.Tag) *message.Printer {
965 r, err := os.Open(filepath.Join("messages", t.String() + ".json"))
966 // handle error
967 cat := message.NewCatalog()
968 d := json.NewDecoder(r)
969 for {
Marcel van Lohuizenfe0e5212015-09-28 10:14:26 +0200970 var msg struct{ Key string; Value []feature.Value }
Marcel van Lohuizen6d710732015-09-25 21:12:15 +0200971 if err := d.Decode(&msg); err == io.EOF {
972 break
973 } else if err != nil {
974 // handle error
975 }
976 cat.SetSelect(t, msg.Key, msg.Value...)
977 }
978 return cat.NewPrinter(t)
979}
980```
981
Marcel van Lohuizen6d710732015-09-25 21:12:15 +0200982## Compatibility
983
984The implementation of the `msg` action will require some modification to core’s
985template/parse package.
986Such a change would be backward compatible.
987
988## Implementation Plan
989
990Implementation would start with some of the rudimentary package in the text
991repo, most notably format.
992Subsequently, this allows the implementation of the formatting of some specific
993types, like currencies.
994The messages package will be implemented first.
995The template package is more invasive and will be implemented at a later stage.
996Work on infrastructure for extraction messages from templates and print
997statements will allow integrating the tools with translation pipelines.