internal/postgres: change reprocessing logic

At the moment, we reprocess and requeue modules using the following
logic:

1. Set all modules to be reprocessed = 505.
2. Requeue modules with status=0 or status >= 500. Prioritize the
   following:
   - IsLatest: sorted by release vs prerelease modules
   - IsBig: hardcoded list of modules we know are big

This poses the following problems:

1. Requeue order is not idempotent: priority is given to categories of
modules, but within each category, the order of modules being queued can
change each time requeue is called. This leads to many modules sitting
in the task queue, and a lack of clarity as to how much progress we have
made when looking at the logs.

2. Modules missing from isBig list: there are several modules missing
from the isBig list, but these aren't being accounted for. We
deproritize large modules because they take a really long time to
process and can timeout if too many are being processed at once, so we
want to process them at a slower rate than other modules.

3. Alternative modules have the same priority as non-alternative
modules: we usually don't care about alternative modules, and they will
be deleted from search_documents once identified. These should be
processed after the lastest version of non-alternative modules are
processed to prevent unnecessary deletes.

To address these issues, reprocessing / requeue now follows the
following logic:
1. All modules are reprocess with a 50x status code based on their last
fetch status in module version states.
2. Modules are requeued in the following order (with the exception of large modules):
- Latest version of modules previously with 20x status
- Latest version of bad modules and alternative modules
- Any version of modules previously with 20x status
- Any version of bad modules and alternative modules
- Any module with a status=0 or status=500 (we expect these to already be in the queue)
3. All large modules are queued last, since these take up a lot of time
and need to be processed at a slower rate.

Within each category, modules are sorted as follows:
1. num_packages
2. version DESC
3. module_path

This keeps the order idempotent, and prioritizes smaller and newer
modules. It also allows modules of similar sizes to be processed
together.

Change-Id: I49580ed75bf60cc2698b756882bfdc906f72d935
Reviewed-on: https://team-review.git.corp.google.com/c/golang/discovery/+/725873
Reviewed-by: Jonathan Amsterdam <jba@google.com>
6 files changed
tree: fd002cf0adf7f4d348581d998715383aa1012e52
  1. cmd/
  2. content/
  3. doc/
  4. internal/
  5. migrations/
  6. scripts/
  7. third_party/
  8. .gitignore
  9. .prettierrc.yaml
  10. all.bash
  11. cloudbuild.yaml
  12. CONTRIBUTING.md
  13. go.mod
  14. go.sum
  15. LICENSE
  16. PATENTS
  17. README.md
README.md

Pkg.go.dev

Pkg.go.dev is a website for discovering and evaluting Go packages and modules.

Roadmap

Pkg.go.dev launched in November 2019, and is currently under active development by the Go team.

Our current goal is to work towards redirecting godoc.org traffic to pkg.go.dev, and ensure that we address users' needs in the process. Read more about our plans for pkg.go.dev in 2020.

We encourage everyone to begin using pkg.go.dev today for all of their needs and file feedback! You can redirect all of your requests from godoc.org to pkg.go.dev, by visiting godoc.org/?redirect=on. Details at Go issue #37099.

If you are having issues with pkg.go.dev, please first check the known issues before following the troubleshooting guide. If that does not give you the information you need, reach out to us.

Issues

You can chat with us on the #tools slack channel on the Gophers slack.

If you think you have an issue that needs fixing, or a feature suggestion, then please make sure you follow the steps to file an issue with the right information to allow us to address it.

Contributing

We would love your help!

Our canonical Git repository is located cat go.googlesource.com/discovery. There is a mirror of the repository at github.com/golang/discovery.

To contribute, please read our contributing guide.

License

Unless otherwise noted, the Go source files are distributed under the BSD-style license found in the LICENSE file.