blob: d00495b0349859e7b5e62f6deff2896eeead7907 [file] [log] [blame]
// Copyright 2024 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
// Gaby is an experimental new bot running in the Go issue tracker as [@gabyhelp],
// to try to help automate various mundane things that a machine can do reasonably well,
// as well as to try to discover new things that a machine can do reasonably well.
//
// The name gaby is short for “Go AI Bot”, because one of the purposes of the experiment
// is to learn what LLMs can be used for effectively, including identifying what they should
// not be used for. Some of the gaby functionality will involve LLMs; other functionality will not.
// The guiding principle is to create something that helps maintainers and that maintainers like,
// which means to use LLMs when they make sense and help but not when they don't.
//
// In the long term, the intention is for this code base or a successor version
// to take over the current functionality of “gopherbot” and become [@gopherbot],
// at which point the @gabyhelp account will be retired.
//
// The [GitHub Discussion] is a good place to leave feedback about @gabyhelp.
//
// # Code Overview
//
// The bot functionality is implemented in internal packages in subdirectories.
// This comment gives a brief tour of the structure.
//
// An explicit goal for the Gaby code base is that it run well in many different environments,
// ranging from a maintainer's home server or even Raspberry Pi all the way up to a
// hosted cloud. (At the moment, Gaby runs on a Linux server in my basement.)
// Due to this emphasis on portability, Gaby defines its own interfaces for all the functionality
// it needs from the surrounding environment and then also defines a variety of
// implementations of those interfaces.
//
// Another explicit goal for the Gaby code base is that it be very well tested.
// (See my [Go Testing talk] for more about why this is so important.)
// Abstracting the various external functionality into interfaces also helps make
// testing easier, and some packages also provide explicit testing support.
//
// The result of both these goals is that Gaby defines some basic functionality
// like time-ordered indexing for itself instead of relying on some specific
// other implementation. In the grand scheme of things, these are a small amount
// of code to maintain, and the benefits to both portability and testability are
// significant.
//
// # Testing
//
// Code interacting with services like GitHub and code running on cloud servers
// is typically difficult to test and therefore undertested.
// It is an explicit requirement in this repo to test all the code,
// even (and especially) when testing is difficult.
//
// A useful command to have available when working in the code is
// [rsc.io/uncover], which prints the package source lines not covered by a unit test.
// A useful invocation is:
//
// % go install rsc.io/uncover@latest
// % go test && go test -coverprofile=/tmp/c.out && uncover /tmp/c.out
// PASS
// ok golang.org/x/oscar/internal/related 0.239s
// PASS
// coverage: 92.2% of statements
// ok golang.org/x/oscar/internal/related 0.197s
// related.go:180,181
// p.slog.Error("triage parse createdat", "CreatedAt", issue.CreatedAt, "err", err)
// continue
// related.go:203,204
// p.slog.Error("triage lookup failed", "url", u)
// continue
// related.go:250,251
// p.slog.Error("PostIssueComment", "issue", e.Issue, "err", err)
// continue
// %
//
// The first “go test” command checks that the test passes.
// The second repeats the test with coverage enabled.
// Running the test twice this way makes sure that any syntax or type errors
// reported by the compiler are reported without coverage,
// because coverage can mangle the error output.
// After both tests pass and second writes a coverage profile,
// running “uncover /tmp/c.out” prints the uncovered lines.
//
// In this output, there are three error paths that are untested.
// In general, error paths should be tested, so tests should be written
// to cover these lines of code. In limited cases, it may not be practical
// to test a certain section, such as when code is unreachable but left
// in case of future changes or mistaken assumptions.
// That part of the code can be labeled with a comment beginning
// “// Unreachable” or “// unreachable” (usually with explanatory text following),
// and then uncover will not report it.
// If a code section should be tested but the test is being deferred to later,
// that section can be labeled “// Untested” or “// untested” instead.
//
// The [golang.org/x/oscar/internal/testutil] package provides a few other testing helpers.
//
// The overview of the code now proceeds from bottom up, starting with
// storage and working up to the actual bot.
//
// # Secret Storage
//
// Gaby needs to manage a few secret keys used to access services.
// The [golang.org/x/oscar/internal/secret] package defines the interface for
// obtaining those secrets. The only implementations at the moment
// are an in-memory map and a disk-based implementation that reads
// $HOME/.netrc. Future implementations may include other file formats
// as well as cloud-based secret storage services.
//
// Secret storage is intentionally separated from the main database storage,
// described below. The main database should hold public data, not secrets.
//
// # Large Language Models
//
// Gaby defines the interface it expects from a large language model.
//
// The [llm.Embedder] interface abstracts an LLM that can take a collection
// of documents and return their vector embeddings, each of type [llm.Vector].
// The two implementations to date are [golang.org/x/oscar/internal/gemini]
// and [golang.org/x/oscar/internal/ollama].
//
// For tests that need an embedder but don't care about the quality of
// the embeddings, [llm.QuoteEmbedder] copies a prefix of the text
// into the vector (preserving vector unit length) in a deterministic way.
// This is good enough for testing functionality like vector search
// and simplifies tests by avoiding a dependence on a real LLM.
//
// At the moment, only the embedding interface is defined.
// In the future we expect to add more interfaces around text generation
// and tool use.
//
// # Storage
//
// As noted above, Gaby defines interfaces for all the functionality it needs
// from its external environment, to admit a wide variety of implementations
// for both execution and testing. The lowest level interface is storage,
// defined in [golang.org/x/oscar/internal/storage].
//
// Gaby requires a key-value store that supports ordered traversal of key ranges
// and atomic batch writes up to a modest size limit (at least a few megabytes).
// The basic interface is [storage.DB].
// [storage.MemDB] returns an in-memory implementation useful for testing.
// Other implementations can be put through their paces using
// [storage.TestDB].
//
// The only real [storage.DB] implementation is [golang.org/x/oscar/internal/pebble],
// which is a [LevelDB]-derived on-disk key-value store developed and used
// as part of [CockroachDB]. It is a production-quality local storage implementation
// and maintains the database as a directory of files.
//
// In the future we plan to add an implementation using [Google Cloud Firestore],
// which provides a production-quality key-value lookup as a Cloud service
// without fixed baseline server costs.
// (Firestore is the successor to Google Cloud Datastore.)
//
// The [storage.DB] makes the simplifying assumption that storage never fails,
// or rather that if storage has failed then you'd rather crash your program than
// try to proceed through typically untested code paths.
// As such, methods like Get and Set do not return errors.
// They panic on failure, and clients of a DB can call the DB's Panic method
// to invoke the same kind of panic if they notice any corruption.
// It remains to be seen whether this decision is kept.
//
// In addition to the usual methods like Get, Set, and Delete, [storage.DB] defines
// Lock and Unlock methods that acquire and release named mutexes managed
// by the database layer. The purpose of these methods is to enable coordination
// when multiple instances of a Gaby program are running on a serverless cloud
// execution platform. So far Gaby has only run on an underground basement server
// (the opposite of cloud), so these have not been exercised much and the APIs
// may change.
//
// In addition to the regular database, package storage also defines [storage.VectorDB],
// a vector database for use with LLM embeddings.
// The basic operations are Set, Get, and Search.
// [storage.MemVectorDB] returns an in-memory implementation that
// stores the actual vectors in a [storage.DB] for persistence but also
// keeps a copy in memory and searches by comparing against all the vectors.
// When backed by a [storage.MemDB], this implementation is useful for testing,
// but when backed by a persistent database, the implementation suffices for
// small-scale production use (say, up to a million documents, which would
// require 3 GB of vectors).
//
// It is possible that the package ordering here is wrong and that VectorDB
// should be defined in the llm package, built on top of storage,
// and not the current “storage builds on llm”.
//
// # Ordered Keys
//
// Because Gaby makes minimal demands of its storage layer,
// any structure we want to impose must be implemented on top of it.
// Gaby uses the [rsc.io/ordered] encoding format to produce database keys
// that order in useful ways.
//
// For example, ordered.Encode("issue", 123) < ordered.Encode("issue", 1001),
// so that keys of this form can be used to scan through issues in numeric order.
// In contrast, using something like fmt.Sprintf("issue%d", n) would visit issue 1001
// before issue 123 because "1001" < "123".
//
// Using this kind of encoding is common when using NoSQL key-value storage.
// See the [rsc.io/ordered] package for the details of the specific encoding.
//
// # Timed Storage
//
// One of the implied jobs Gaby has is to collect all the relevant information
// about an open source project: its issues, its code changes, its documentation,
// and so on. Those sources are always changing, so derived operations like
// adding embeddings for documents need to be able to identify what is new
// and what has been processed already. To enable this, Gaby implements
// time-stamped—or just “timed”—storage, in which a collection of key-value pairs
// also has a “by time” index of ((timestamp, key), no-value) pairs to make it possible
// to scan only the key-value pairs modified after the previous scan.
// This kind of incremental scan only has to remember the last timestamp processed
// and then start an ordered key range scan just after that timestamp.
//
// This convention is implemented by [golang.org/x/oscar/internal/timed], along with
// a [timed.Watcher] that formalizes the incremental scan pattern.
//
// # Document Storage
//
// Various packages take care of downloading state from issue trackers and the like,
// but then all that state needs to be unified into a common document format that
// can be indexed and searched. That document format is defined by
// [golang.org/x/oscar/internal/docs]. A document consists of an ID (conventionally a URL),
// a document title, and document text. Documents are stored using timed storage,
// enabling incremental processing of newly added documents .
//
// # Document Embedding
//
// The next stop for any new document is embedding it into a vector and storing
// that vector in a vector database. The [golang.org/x/oscar/internal/embeddocs] package
// does this, and there is very little to it, given the abstractions of a document store
// with incremental scanning, an LLM embedder, and a vector database, all of which
// are provided by other packages.
//
// # HTTP Record and Replay
//
// None of the packages mentioned so far involve network operations, but the
// next few do. It is important to test those but also equally important not to
// depend on external network services in the tests. Instead, the package
// [golang.org/x/oscar/internal/httprr] provides an HTTP record/replay system specifically
// designed to help testing. It can be run once in a mode that does use external
// network servers and records the HTTP exchanges, but by default tests look up
// the expected responses in the previously recorded log, replaying those responses.
//
// The result is that code making HTTP request can be tested with real server
// traffic once and then re-tested with recordings of that traffic afterward.
// This avoids having to write entire fakes of services but also avoids needing
// the services to stay available in order for tests to pass. It also typically makes
// the tests much faster than using the real servers.
//
// # GitHub Interactions
//
// Gaby uses GitHub in two main ways. First, it downloads an entire copy of the
// issue tracker state, with incremental updates, into timed storage.
// Second, it performs actions in the issue tracker, like editing issues or comments,
// applying labels, or posting new comments. These operations are provided by
// [golang.org/x/oscar/internal/github].
//
// Gaby downloads the issue tracker state using GitHub's REST API, which makes
// incremental updating very easy but does not provide access to a few newer features
// such as project boards and discussions, which are only available in the GraphQL API.
// Sync'ing using the GraphQL API is left for future work: there is enough data available
// from the REST API that for now we can focus on what to do with that data and not
// that a few newer GitHub features are missing.
//
// The github package provides two important aids for testing. For issue tracker state,
// it also allows loading issue data from a simple text-based issue description, avoiding
// any actual GitHub use at all and making it easier to modify the test data.
// For issue tracker actions, the github package defaults in tests to not actually making
// changes, instead diverting edits into an in-memory log. Tests can then check the log
// to see whether the right edits were requested.
//
// The [golang.org/x/oscar/internal/githubdocs] package takes care of adding content from
// the downloaded GitHub state into the general document store.
// Currently the only GitHub-derived documents are one document per issue,
// consisting of the issue title and body. It may be worth experimenting with
// incorporating issue comments in some way, although they bring with them
// a significant amount of potential noise.
//
// # Gerrit Interactions
//
// Gaby will need to download and store Gerrit state into the database and then
// derive documents from it. That code has not yet been written, although
// [rsc.io/gerrit/reviewdb] provides a basic version that can be adapted.
//
// # Web Crawling
//
// Gaby will also need to download and store project documentation into the
// database and derive documents from it corresponding to cutting the page
// at each heading. That code has been written but is not yet tested well enough
// to commit. It will be added later.
//
// # Fixing Comments
//
// The simplest job Gaby has is to go around fixing new comments, including
// issue descriptions (which look like comments but are a different kind of GitHub data).
// The [golang.org/x/oscar/internal/commentfix] package implements this,
// watching GitHub state incrementally and applying a few kinds of rewrite rules
// to each new comment or issue body.
// The commentfix package allows automatically editing text, automatically editing URLs,
// and automatically hyperlinking text.
//
// # Finding Related Issues and Documents
//
// The next job Gaby has is to respond to new issues with related issues and documents.
// The [golang.org/x/oscar/internal/related] package implements this,
// watching GitHub state incrementally for new issues, filtering out ones that should be ignored,
// and then finding related issues and documents and posting a list.
//
// This package was originally intended to identify and automatically close duplicates,
// but the difference between a duplicate and a very similar or not-quite-fixed issue
// is too difficult a judgement to make for an LLM. Even so, the act of bringing forward
// related context that may have been forgotten or never known by the people reading
// the issue has turned out to be incredibly helpful.
//
// # Main Loop
//
// All of these pieces are put together in the main program, this package, [golang.org/x/oscar].
// The actual main package has no tests yet but is also incredibly straightforward.
// It does need tests, but we also need to identify ways that the hard-coded policies
// in the package can be lifted out into data that a natural language interface can
// manipulate. For example the current policy choices in package main amount to:
//
// cf := commentfix.New(lg, gh, "gerritlinks")
// cf.EnableProject("golang/go")
// cf.EnableEdits()
// cf.AutoLink(`\bCL ([0-9]+)\b`, "https://go.dev/cl/$1")
// cf.ReplaceURL(`\Qhttps://go-review.git.corp.google.com/\E`, "https://go-review.googlesource.com/")
//
// rp := related.New(lg, db, gh, vdb, dc, "related")
// rp.EnableProject("golang/go")
// rp.EnablePosts()
// rp.SkipBodyContains("— [watchflakes](https://go.dev/wiki/Watchflakes)")
// rp.SkipTitlePrefix("x/tools/gopls: release version v")
// rp.SkipTitleSuffix(" backport]")
//
// These could be stored somewhere as data and manipulated and added to by the LLM
// in response to prompts from maintainers. And other features could be added and
// configured in a similar way. Exactly how to do this is an important thing to learn in
// future experimentation.
//
// # Future Work and Structure
//
// As mentioned above, the two jobs Gaby does already are both fairly simple and straightforward.
// It seems like a general approach that should work well is well-written, well-tested deterministic
// traditional functionality such as the comment fixer and related-docs poster, configured by
// LLMs in response to specific directions or eventually higher-level goals specified by project
// maintainers. Other functionality that is worth exploring is rules for automatically labeling issues,
// rules for identifying issues or CLs that need to be pinged, rules for identifying CLs that need
// maintainer attention or that need submitting, and so on. Another stretch goal might be to identify
// when an issue needs more information and ask for that information. Of course, it would be very
// important not to ask for information that is already present or irrelevant, so getting that right would
// be a very high bar. There is no guarantee that today's LLMs work well enough to build a useful
// version of that.
//
// Another important area of future work will be running Gaby on top of cloud databases
// and then moving Gaby's own execution into the cloud. Getting it a server with a URL will
// enable GitHub callbacks instead of the current 2-minute polling loop, which will enable
// interactive conversations with Gaby.
//
// Overall, we believe that there are a few good ideas for ways that LLM-based bots can help
// make project maintainers' jobs easier and less monotonous, and they are waiting to be found.
// There are also many bad ideas, and they must be filtered out. Understanding the difference
// will take significant care, thought, and experimentation. We have work to do.
//
// [@gabyhelp]: https://github.com/gabyhelp
// [@gopherbot]: https://github.com/gopherbot
// [GitHub Discussion]: https://github.com/golang/go/discussions/67901
// [LevelDB]: https://github.com/google/leveldb
// [CockroachDB]: https://github.com/cockroachdb/cockroach
// [Google Cloud Firestore]: https://cloud.google.com/firestore
// [Go Testing talk]: https://research.swtch.com/testing
package main