Blame - GoStrings.md - wiki

blob: 8974ed9a8413aafe64157721917737d1f3428b8f [file] [log] [blame] [view]

Andrew Gerrand	5bc444d	2014-12-10 11:35:11 +1100	[diff] [blame]	1	Strings are not required to be UTF-8. Go source code is required
				2	to be UTF-8. There is a complex path between the two.
				3
				4	In short, there are three kinds of strings. They are:
				5
				6	1. the substring of the source that lexes into a string literal.
				7	1. a string literal.
				8	1. a value of type string.
				9
				10	Only the first is required to be UTF-8. The second is required to be
				11	written in UTF-8, but its contents are interpreted various ways
				12	and may encode arbitrary bytes. The third can contain any bytes at
				13	all.
				14
				15	Try this on:
				16
				17	```
				18	var s string = "\xFF語"
				19	```
				20	Source substring: ` "\xFF語" `, UTF-8 encoded. The data:
				21
				22	```
				23	22
				24	5c
				25	78
				26	46
				27	46
				28	e8
				29	aa
				30	9e
				31	22
				32	```
				33
				34	String literal: ` \xFF語 ` (between the quotes). The data:
				35
				36	```
				37	5c
				38	78
				39	46
				40	46
				41	e8
				42	aa
				43	9e
				44	```
				45
				46	The string value (unprintable; this is a UTF-8 stream). The data:
				47
				48	```
				49	ff
				50	e8
				51	aa
				52	9e
				53	```
				54
				55	And for record, the characters (code points):
				56	```
				57	<erroneous byte FF, will appear as U+FFFD if you range over the string value>
				58	語 U+8a9e
				59	```