cmd/compile,cmd/asm: fix function pointer call perf regression on ppc64

by inserting hint when using bclrl.

Using this instruction as subroutine call is not the expected
default behavior, and as a result confuses the branch predictor.

The default expected behavior is a conditional return from a
subroutine.

We can change this assumption by encoding a hint this is not a
subroutine return.

The regex benchmarks are a pretty good example of how much this
hint can help generic ppc64le code on a power9 machine:

name                          old time/op    new time/op     delta
Find                             606ns ± 0%      447ns ± 0%  -26.27%
FindAllNoMatches                 309ns ± 0%      205ns ± 0%  -33.72%
FindString                       609ns ± 0%      451ns ± 0%  -26.04%
FindSubmatch                     734ns ± 0%      594ns ± 0%  -19.07%
FindStringSubmatch               706ns ± 0%      574ns ± 0%  -18.83%
Literal                          177ns ± 0%      136ns ± 0%  -22.89%
NotLiteral                      4.69µs ± 0%     2.34µs ± 0%  -50.14%
MatchClass                      6.05µs ± 0%     3.26µs ± 0%  -46.08%
MatchClass_InRange              5.93µs ± 0%     3.15µs ± 0%  -46.86%
ReplaceAll                      3.15µs ± 0%     2.18µs ± 0%  -30.77%
AnchoredLiteralShortNonMatch     156ns ± 0%      109ns ± 0%  -30.61%
AnchoredLiteralLongNonMatch      192ns ± 0%      136ns ± 0%  -29.34%
AnchoredShortMatch               268ns ± 0%      209ns ± 0%  -22.00%
AnchoredLongMatch                472ns ± 0%      357ns ± 0%  -24.30%
OnePassShortA                   1.16µs ± 0%     0.87µs ± 0%  -25.03%
NotOnePassShortA                1.34µs ± 0%     1.20µs ± 0%  -10.63%
OnePassShortB                    940ns ± 0%      655ns ± 0%  -30.29%
NotOnePassShortB                 873ns ± 0%      703ns ± 0%  -19.52%
OnePassLongPrefix                258ns ± 0%      155ns ± 0%  -40.13%
OnePassLongNotPrefix             943ns ± 0%      529ns ± 0%  -43.89%
MatchParallelShared              591ns ± 0%      436ns ± 0%  -26.31%
MatchParallelCopied              596ns ± 0%      435ns ± 0%  -27.10%
QuoteMetaAll                     186ns ± 0%      186ns ± 0%   -0.16%
QuoteMetaNone                   55.9ns ± 0%     55.9ns ± 0%   +0.02%
Compile/Onepass                 9.64µs ± 0%     9.26µs ± 0%   -3.97%
Compile/Medium                  21.7µs ± 0%     20.6µs ± 0%   -4.90%
Compile/Hard                     174µs ± 0%      174µs ± 0%   +0.07%
Match/Easy0/16                  7.35ns ± 0%     7.34ns ± 0%   -0.11%
Match/Easy0/32                   116ns ± 0%       97ns ± 0%  -16.27%
Match/Easy0/1K                   592ns ± 0%      562ns ± 0%   -5.04%
Match/Easy0/32K                 12.6µs ± 0%     12.5µs ± 0%   -0.64%
Match/Easy0/1M                   556µs ± 0%      556µs ± 0%   -0.00%
Match/Easy0/32M                 17.7ms ± 0%     17.7ms ± 0%   +0.05%
Match/Easy0i/16                 7.34ns ± 0%     7.35ns ± 0%   +0.10%
Match/Easy0i/32                 2.82µs ± 0%     1.64µs ± 0%  -41.71%
Match/Easy0i/1K                 83.2µs ± 0%     48.2µs ± 0%  -42.06%
Match/Easy0i/32K                2.13ms ± 0%     1.80ms ± 0%  -15.34%
Match/Easy0i/1M                 68.1ms ± 0%     57.6ms ± 0%  -15.31%
Match/Easy0i/32M                 2.18s ± 0%      1.80s ± 0%  -17.52%
Match/Easy1/16                  7.36ns ± 0%     7.34ns ± 0%   -0.24%
Match/Easy1/32                   118ns ± 0%       96ns ± 0%  -18.72%
Match/Easy1/1K                  2.46µs ± 0%     1.58µs ± 0%  -35.65%
Match/Easy1/32K                 80.2µs ± 0%     54.6µs ± 0%  -31.92%
Match/Easy1/1M                  2.75ms ± 0%     1.88ms ± 0%  -31.66%
Match/Easy1/32M                 87.5ms ± 0%     59.8ms ± 0%  -31.62%
Match/Medium/16                 7.34ns ± 0%     7.34ns ± 0%   +0.01%
Match/Medium/32                 2.60µs ± 0%     1.50µs ± 0%  -42.61%
Match/Medium/1K                 78.1µs ± 0%     43.7µs ± 0%  -44.06%
Match/Medium/32K                2.08ms ± 0%     1.52ms ± 0%  -27.11%
Match/Medium/1M                 66.5ms ± 0%     48.6ms ± 0%  -26.96%
Match/Medium/32M                 2.14s ± 0%      1.60s ± 0%  -25.18%
Match/Hard/16                   7.35ns ± 0%     7.35ns ± 0%   +0.03%
Match/Hard/32                   3.58µs ± 0%     2.44µs ± 0%  -31.82%
Match/Hard/1K                    108µs ± 0%       75µs ± 0%  -31.04%
Match/Hard/32K                  2.79ms ± 0%     2.25ms ± 0%  -19.30%
Match/Hard/1M                   89.4ms ± 0%     72.2ms ± 0%  -19.26%
Match/Hard/32M                   2.91s ± 0%      2.37s ± 0%  -18.60%
Match/Hard1/16                  11.1µs ± 0%      8.3µs ± 0%  -25.07%
Match/Hard1/32                  21.4µs ± 0%     16.1µs ± 0%  -24.85%
Match/Hard1/1K                   658µs ± 0%      498µs ± 0%  -24.27%
Match/Hard1/32K                 12.2ms ± 0%     11.7ms ± 0%   -4.60%
Match/Hard1/1M                   391ms ± 0%      374ms ± 0%   -4.40%
Match/Hard1/32M                  12.6s ± 0%      12.0s ± 0%   -4.68%
Match_onepass_regex/16           870ns ± 0%      611ns ± 0%  -29.79%
Match_onepass_regex/32          1.58µs ± 0%     1.08µs ± 0%  -31.48%
Match_onepass_regex/1K          45.7µs ± 0%     30.3µs ± 0%  -33.58%
Match_onepass_regex/32K         1.45ms ± 0%     0.97ms ± 0%  -33.20%
Match_onepass_regex/1M          46.2ms ± 0%     30.9ms ± 0%  -33.01%
Match_onepass_regex/32M          1.46s ± 0%      0.99s ± 0%  -32.02%

name                          old alloc/op   new alloc/op    delta
Find                             0.00B           0.00B         0.00%
FindAllNoMatches                 0.00B           0.00B         0.00%
FindString                       0.00B           0.00B         0.00%
FindSubmatch                     48.0B ± 0%      48.0B ± 0%    0.00%
FindStringSubmatch               32.0B ± 0%      32.0B ± 0%    0.00%
Compile/Onepass                 4.02kB ± 0%     4.02kB ± 0%    0.00%
Compile/Medium                  9.39kB ± 0%     9.39kB ± 0%    0.00%
Compile/Hard                    84.7kB ± 0%     84.7kB ± 0%    0.00%
Match_onepass_regex/16           0.00B           0.00B         0.00%
Match_onepass_regex/32           0.00B           0.00B         0.00%
Match_onepass_regex/1K           0.00B           0.00B         0.00%
Match_onepass_regex/32K          0.00B           0.00B         0.00%
Match_onepass_regex/1M           5.00B ± 0%      3.00B ± 0%  -40.00%
Match_onepass_regex/32M           136B ± 0%        68B ± 0%  -50.00%

name                          old allocs/op  new allocs/op   delta
Find                              0.00            0.00         0.00%
FindAllNoMatches                  0.00            0.00         0.00%
FindString                        0.00            0.00         0.00%
FindSubmatch                      1.00 ± 0%       1.00 ± 0%    0.00%
FindStringSubmatch                1.00 ± 0%       1.00 ± 0%    0.00%
Compile/Onepass                   52.0 ± 0%       52.0 ± 0%    0.00%
Compile/Medium                     112 ± 0%        112 ± 0%    0.00%
Compile/Hard                       424 ± 0%        424 ± 0%    0.00%
Match_onepass_regex/16            0.00            0.00         0.00%
Match_onepass_regex/32            0.00            0.00         0.00%
Match_onepass_regex/1K            0.00            0.00         0.00%
Match_onepass_regex/32K           0.00            0.00         0.00%
Match_onepass_regex/1M            0.00            0.00         0.00%
Match_onepass_regex/32M           2.00 ± 0%       1.00 ± 0%  -50.00%

name                          old speed      new speed       delta
QuoteMetaAll                  75.2MB/s ± 0%   75.3MB/s ± 0%   +0.15%
QuoteMetaNone                  465MB/s ± 0%    465MB/s ± 0%   -0.02%
Match/Easy0/16                2.18GB/s ± 0%   2.18GB/s ± 0%   +0.10%
Match/Easy0/32                 276MB/s ± 0%    330MB/s ± 0%  +19.46%
Match/Easy0/1K                1.73GB/s ± 0%   1.82GB/s ± 0%   +5.29%
Match/Easy0/32K               2.60GB/s ± 0%   2.62GB/s ± 0%   +0.64%
Match/Easy0/1M                1.89GB/s ± 0%   1.89GB/s ± 0%   +0.00%
Match/Easy0/32M               1.89GB/s ± 0%   1.89GB/s ± 0%   -0.05%
Match/Easy0i/16               2.18GB/s ± 0%   2.18GB/s ± 0%   -0.10%
Match/Easy0i/32               11.4MB/s ± 0%   19.5MB/s ± 0%  +71.48%
Match/Easy0i/1K               12.3MB/s ± 0%   21.2MB/s ± 0%  +72.62%
Match/Easy0i/32K              15.4MB/s ± 0%   18.2MB/s ± 0%  +18.12%
Match/Easy0i/1M               15.4MB/s ± 0%   18.2MB/s ± 0%  +18.12%
Match/Easy0i/32M              15.4MB/s ± 0%   18.6MB/s ± 0%  +21.21%
Match/Easy1/16                2.17GB/s ± 0%   2.18GB/s ± 0%   +0.24%
Match/Easy1/32                 271MB/s ± 0%    333MB/s ± 0%  +23.07%
Match/Easy1/1K                 417MB/s ± 0%    648MB/s ± 0%  +55.38%
Match/Easy1/32K                409MB/s ± 0%    600MB/s ± 0%  +46.88%
Match/Easy1/1M                 381MB/s ± 0%    558MB/s ± 0%  +46.33%
Match/Easy1/32M                383MB/s ± 0%    561MB/s ± 0%  +46.25%
Match/Medium/16               2.18GB/s ± 0%   2.18GB/s ± 0%   -0.01%
Match/Medium/32               12.3MB/s ± 0%   21.4MB/s ± 0%  +74.13%
Match/Medium/1K               13.1MB/s ± 0%   23.4MB/s ± 0%  +78.73%
Match/Medium/32K              15.7MB/s ± 0%   21.6MB/s ± 0%  +37.23%
Match/Medium/1M               15.8MB/s ± 0%   21.6MB/s ± 0%  +36.93%
Match/Medium/32M              15.7MB/s ± 0%   21.0MB/s ± 0%  +33.67%
Match/Hard/16                 2.18GB/s ± 0%   2.18GB/s ± 0%   -0.03%
Match/Hard/32                 8.93MB/s ± 0%  13.10MB/s ± 0%  +46.70%
Match/Hard/1K                 9.48MB/s ± 0%  13.74MB/s ± 0%  +44.94%
Match/Hard/32K                11.7MB/s ± 0%   14.5MB/s ± 0%  +23.87%
Match/Hard/1M                 11.7MB/s ± 0%   14.5MB/s ± 0%  +23.87%
Match/Hard/32M                11.6MB/s ± 0%   14.2MB/s ± 0%  +22.86%
Match/Hard1/16                1.44MB/s ± 0%   1.93MB/s ± 0%  +34.03%
Match/Hard1/32                1.49MB/s ± 0%   1.99MB/s ± 0%  +33.56%
Match/Hard1/1K                1.56MB/s ± 0%   2.05MB/s ± 0%  +31.41%
Match/Hard1/32K               2.68MB/s ± 0%   2.80MB/s ± 0%   +4.48%
Match/Hard1/1M                2.68MB/s ± 0%   2.80MB/s ± 0%   +4.48%
Match/Hard1/32M               2.66MB/s ± 0%   2.79MB/s ± 0%   +4.89%
Match_onepass_regex/16        18.4MB/s ± 0%   26.2MB/s ± 0%  +42.41%
Match_onepass_regex/32        20.2MB/s ± 0%   29.5MB/s ± 0%  +45.92%
Match_onepass_regex/1K        22.4MB/s ± 0%   33.8MB/s ± 0%  +50.54%
Match_onepass_regex/32K       22.6MB/s ± 0%   33.9MB/s ± 0%  +49.67%
Match_onepass_regex/1M        22.7MB/s ± 0%   33.9MB/s ± 0%  +49.27%
Match_onepass_regex/32M       23.0MB/s ± 0%   33.9MB/s ± 0%  +47.14%

Fixes #42709

Change-Id: Ice07fec2de4c5b1302febf8c2978ae8c1e4fd3e5
Reviewed-on: https://go-review.googlesource.com/c/go/+/271337
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
Trust: Carlos Eduardo Seo <carlos.seo@linaro.org>
2 files changed
tree: 3e609f9922f9a9d2ba92bdfc6650582910d434a2
  1. .gitattributes
  2. .github/
  3. .gitignore
  4. AUTHORS
  5. CONTRIBUTING.md
  6. CONTRIBUTORS
  7. LICENSE
  8. PATENTS
  9. README.md
  10. SECURITY.md
  11. api/
  12. doc/
  13. favicon.ico
  14. lib/
  15. misc/
  16. robots.txt
  17. src/
  18. test/
README.md

The Go Programming Language

Go is an open source programming language that makes it easy to build simple, reliable, and efficient software.

Gopher image Gopher image by Renee French, licensed under Creative Commons 3.0 Attributions license.

Our canonical Git repository is located at https://go.googlesource.com/go. There is a mirror of the repository at https://github.com/golang/go.

Unless otherwise noted, the Go source files are distributed under the BSD-style license found in the LICENSE file.

Download and Install

Binary Distributions

Official binary distributions are available at https://golang.org/dl/.

After downloading a binary release, visit https://golang.org/doc/install or load doc/install.html in your web browser for installation instructions.

Install From Source

If a binary distribution is not available for your combination of operating system and architecture, visit https://golang.org/doc/install/source or load doc/install-source.html in your web browser for source installation instructions.

Contributing

Go is the work of thousands of contributors. We appreciate your help!

To contribute, please read the contribution guidelines: https://golang.org/doc/contribute.html

Note that the Go project uses the issue tracker for bug reports and proposals only. See https://golang.org/wiki/Questions for a list of places to ask questions about the Go language.