blob: 8974ed9a8413aafe64157721917737d1f3428b8f [file] [log] [blame] [view]
Strings are **not** required to be UTF-8. Go source code **is** required
to be UTF-8. There is a complex path between the two.
In short, there are three kinds of strings. They are:
1. the substring of the source that lexes into a string literal.
1. a string literal.
1. a value of type string.
Only the first is required to be UTF-8. The second is required to be
written in UTF-8, but its contents are interpreted various ways
and may encode arbitrary bytes. The third can contain any bytes at
all.
Try this on:
```
var s string = "\xFF語"
```
Source substring: ` "\xFF語" `, UTF-8 encoded. The data:
```
22
5c
78
46
46
e8
aa
9e
22
```
String literal: ` \xFF語 ` (between the quotes). The data:
```
5c
78
46
46
e8
aa
9e
```
The string value (unprintable; this is a UTF-8 stream). The data:
```
ff
e8
aa
9e
```
And for record, the characters (code points):
```
<erroneous byte FF, will appear as U+FFFD if you range over the string value>
語 U+8a9e
```