gosrc: Allow Unicode letters in import paths.
Background
The following is a valid vanity import path that works without issues:
dmitri.shuralyov.com/temp/go-get-issue-unicode/испытание
You can go get, go install, go test, go doc it without issues:
$ go get -u dmitri.shuralyov.com/temp/go-get-issue-unicode/испытание
$ go install dmitri.shuralyov.com/temp/go-get-issue-unicode/испытание
$ go test dmitri.shuralyov.com/temp/go-get-issue-unicode/испытание
ok dmitri.shuralyov.com/temp/go-get-issue-unicode/испытание 0.014s
$ go doc dmitri.shuralyov.com/temp/go-get-issue-unicode/испытание
package испытание // import "dmitri.shuralyov.com/temp/go-get-issue-unicode/испытание"
Package испытание demonstrates Unicode capabilities in Go source code.
type Эксперимент struct{ ... }
func Испытание() Эксперимент
You can also call vcs.RepoRootForImportPath("dmitri.shuralyov.com/temp/go-get-issue-unicode/испытание", false)
(from golang.org/x/tools/go/vcs) successfully on the vanity import path:
$ goexec 'vcs.RepoRootForImportPath("dmitri.shuralyov.com/temp/go-get-issue-unicode/испытание", false)'
(*vcs.RepoRoot)(&vcs.RepoRoot{
VCS: (*vcs.Cmd)(&vcs.Cmd{
Name: (string)("Git"),
Cmd: (string)("git"),
CreateCmd: (string)("clone {repo} {dir}"),
DownloadCmd: (string)("pull --ff-only"),
TagCmd: ([]vcs.TagCmd)([]vcs.TagCmd{
(vcs.TagCmd)(vcs.TagCmd{
Cmd: (string)("show-ref"),
Pattern: (string)("(?:tags|origin)/(\\S+)$"),
}),
}),
TagLookupCmd: ([]vcs.TagCmd)([]vcs.TagCmd{
(vcs.TagCmd)(vcs.TagCmd{
Cmd: (string)("show-ref tags/{tag} origin/{tag}"),
Pattern: (string)("((?:tags|origin)/\\S+)$"),
}),
}),
TagSyncCmd: (string)("checkout {tag}"),
TagSyncDefault: (string)("checkout master"),
LogCmd: (string)(""),
Scheme: ([]string)([]string{
(string)("git"),
(string)("https"),
(string)("http"),
(string)("git+ssh"),
}),
PingCmd: (string)("ls-remote {scheme}://{repo}"),
}),
Repo: (string)("https://github.com/shurcooL-test/go-get-issue-unicode"),
Root: (string)("dmitri.shuralyov.com/temp/go-get-issue-unicode"),
})
(interface{})(nil)
However, gosrc.IsValidRemotePath incorrectly reports false for the
"dmitri.shuralyov.com/temp/go-get-issue-unicode/испытание" import path.
Fix
gosrc.IsValidRemotePath reports false for such import paths because
validPathElement regexp only allows ASCII letters A-Za-z, not Unicode
ones.
This change fixes that by using a predefined character class, the
Unicode character property class \p{L} that describes the Unicode
characters that are letters.
Additionally, fix an issue where a query parameter value was not
correctly escaped when constructing a URL.
Fixes golang/gddo#468.
Updates golang/go#18660.
References
- https://stackoverflow.com/questions/3617797/regex-to-match-only-letters
- https://stackoverflow.com/questions/6005459/is-there-a-way-to-match-any-unicode-non-alphabetic-character
- https://www.regular-expressions.info/unicode.html#prop
Change-Id: I48680749d827cbc63fefca2c21e9790009f20746
Reviewed-on: https://go-review.googlesource.com/41750
Reviewed-by: Chris Broadfoot <cbro@golang.org>
Reviewed-by: Tuo Shan <shantuo@google.com>
Reviewed-by: Francesc Campoy Flores <campoy@golang.org>
diff --git a/gosrc/github.go b/gosrc/github.go
index a96cf92..276a1fa 100644
--- a/gosrc/github.go
+++ b/gosrc/github.go
@@ -75,11 +75,11 @@
status := Active
var commits []*githubCommit
- url := expand("https://api.github.com/repos/{owner}/{repo}/commits", match)
+ u := expand("https://api.github.com/repos/{owner}/{repo}/commits", match)
if match["dir"] != "" {
- url += fmt.Sprintf("?path=%s", match["dir"])
+ u += fmt.Sprintf("?path=%s", url.QueryEscape(match["dir"]))
}
- if _, err := c.getJSON(url, &commits); err != nil {
+ if _, err := c.getJSON(u, &commits); err != nil {
return nil, err
}
if len(commits) == 0 {
diff --git a/gosrc/path.go b/gosrc/path.go
index 8219d68..cde7c5a 100644
--- a/gosrc/path.go
+++ b/gosrc/path.go
@@ -15,7 +15,7 @@
)
var validHost = regexp.MustCompile(`^[-a-z0-9]+(?:\.[-a-z0-9]+)+$`)
-var validPathElement = regexp.MustCompile(`^[-A-Za-z0-9~+_][-A-Za-z0-9_.]*$`)
+var validPathElement = regexp.MustCompile(`^[-\p{L}0-9~+_][-\p{L}0-9_.]*$`)
func isValidPathElement(s string) bool {
return validPathElement.MatchString(s)
diff --git a/gosrc/path_test.go b/gosrc/path_test.go
index 965b75b..faa51d4 100644
--- a/gosrc/path_test.go
+++ b/gosrc/path_test.go
@@ -21,6 +21,7 @@
"launchpad.net/~user/+junk/version",
"github.com/user/repo/_ok/x",
"exampleproject.com",
+ "exampleproject.com/unicode/испытание",
}
var badImportPaths = []string{