blob: 8974ed9a8413aafe64157721917737d1f3428b8f [file] [log] [blame] [view]
Andrew Gerrand5bc444d2014-12-10 11:35:11 +11001Strings are **not** required to be UTF-8. Go source code **is** required
2to be UTF-8. There is a complex path between the two.
3
4In short, there are three kinds of strings. They are:
5
6 1. the substring of the source that lexes into a string literal.
7 1. a string literal.
8 1. a value of type string.
9
10Only the first is required to be UTF-8. The second is required to be
11written in UTF-8, but its contents are interpreted various ways
12and may encode arbitrary bytes. The third can contain any bytes at
13all.
14
15Try this on:
16
17```
18var s string = "\xFF語"
19```
20Source substring: ` "\xFF語" `, UTF-8 encoded. The data:
21
22```
2322
245c
2578
2646
2746
28e8
29aa
309e
3122
32```
33
34String literal: ` \xFF語 ` (between the quotes). The data:
35
36```
375c
3878
3946
4046
41e8
42aa
439e
44```
45
46The string value (unprintable; this is a UTF-8 stream). The data:
47
48```
49ff
50e8
51aa
529e
53```
54
55And for record, the characters (code points):
56```
57<erroneous byte FF, will appear as U+FFFD if you range over the string value>
58語 U+8a9e
59```