regexp/syntax: fix factoring of common prefixes in alternations
In the past, `a.*?c|a.*?b` was factored to `a.*?[bc]`. Thus, given
"abc" as its input string, the automaton would consume "ab" and
then stop (when unanchored) whereas it should consume all of "abc"
as per leftmost semantics.
Fixes #13812.
Change-Id: I67ac0a353d7793b3d0c9c4aaf22d157621dfe784
Reviewed-on: https://go-review.googlesource.com/18357
Reviewed-by: Russ Cox <rsc@golang.org>
diff --git a/src/regexp/syntax/parse_test.go b/src/regexp/syntax/parse_test.go
index 626ceea..5ca54bb 100644
--- a/src/regexp/syntax/parse_test.go
+++ b/src/regexp/syntax/parse_test.go
@@ -172,7 +172,7 @@
// Factoring.
{`abc|abd|aef|bcx|bcy`, `alt{cat{lit{a}alt{cat{lit{b}cc{0x63-0x64}}str{ef}}}cat{str{bc}cc{0x78-0x79}}}`},
- {`ax+y|ax+z|ay+w`, `cat{lit{a}alt{cat{plus{lit{x}}cc{0x79-0x7a}}cat{plus{lit{y}}lit{w}}}}`},
+ {`ax+y|ax+z|ay+w`, `cat{lit{a}alt{cat{plus{lit{x}}lit{y}}cat{plus{lit{x}}lit{z}}cat{plus{lit{y}}lit{w}}}}`},
// Bug fixes.
{`(?:.)`, `dot{}`},
@@ -195,12 +195,13 @@
{`abc|x|abd`, `alt{str{abc}lit{x}str{abd}}`},
{`(?i)abc|ABD`, `cat{strfold{AB}cc{0x43-0x44 0x63-0x64}}`},
{`[ab]c|[ab]d`, `cat{cc{0x61-0x62}cc{0x63-0x64}}`},
- {`(?:xx|yy)c|(?:xx|yy)d`,
- `cat{alt{str{xx}str{yy}}cc{0x63-0x64}}`},
+ {`.c|.d`, `cat{dot{}cc{0x63-0x64}}`},
{`x{2}|x{2}[0-9]`,
`cat{rep{2,2 lit{x}}alt{emp{}cc{0x30-0x39}}}`},
{`x{2}y|x{2}[0-9]y`,
`cat{rep{2,2 lit{x}}alt{lit{y}cat{cc{0x30-0x39}lit{y}}}}`},
+ {`a.*?c|a.*?b`,
+ `cat{lit{a}alt{cat{nstar{dot{}}lit{c}}cat{nstar{dot{}}lit{b}}}}`},
// Valid repetitions.
{`((((((((((x{2}){2}){2}){2}){2}){2}){2}){2}){2}))`, ``},