html: handle '<' before a tag
As pointed out at
https://groups.google.com/forum/#!topic/golang-nuts/LJozHIXAAJY,
`<<p>html</p>` was parsed as `<<p>html</p>`.
There was no test case for this. Chrome parses it as `<<p>html</p>`,
and that seems to be correct. We were missing the
"Reconcume the current input character" step at
http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#tag-open-state
LGTM=nigeltao
R=golang-codereviews, gobot, nigeltao
CC=golang-codereviews, nigeltao
https://golang.org/cl/96060044
diff --git a/html/token.go b/html/token.go
index 3c17c7e..a226099 100644
--- a/html/token.go
+++ b/html/token.go
@@ -1002,6 +1002,8 @@
// "<!DOCTYPE declarations>" and "<?xml processing instructions?>".
tokenType = CommentToken
default:
+ // Reconsume the current character.
+ z.raw.end--
continue
}
diff --git a/html/token_test.go b/html/token_test.go
index 38d80d7..f6988a8 100644
--- a/html/token_test.go
+++ b/html/token_test.go
@@ -105,6 +105,11 @@
"if x<0 and y < 0 then x*y>0",
"if x<0 and y < 0 then x*y>0",
},
+ {
+ "not a tag #11",
+ "<<p>",
+ "<$<p>",
+ },
// EOF in a tag name.
{
"tag name eof #0",