commit	80b5df0f2b0c24f68df32c5ac0a9f3e765322a93	[log] [tgz]
author	Zvonimir Pavlinovic <zpavlinovic@google.com>	Mon Nov 04 20:05:06 2024 +0000
committer	Zvonimir Pavlinovic <zpavlinovic@google.com>	Mon Nov 04 20:23:20 2024 +0000
tree	1e22e38877be0d89bcbc0f6375f6024bb27dc068
parent	600b1f127af7b3ca56b95e5f48077922a281e744 [diff]

tree: 1e22e38877be0d89bcbc0f6375f6024bb27dc068

README.md

Oscar, an open-source contributor agent architecture

Oscar is a project aiming to improve open-source software development by creating automated help, or “agents,” for open-source maintenance. We believe there are many opportunities to reduce the amount of toil involved with maintaining open-source projects both large and small.

The ability of large language models (LLMs) to do semantic analysis of natural language (such as issue reports or maintainer instructions) and to convert between natural language instructions and program code creates new opportunities for agents to interact more smoothly with people. LLMs will likely end up being only a small (but critical!) part of the picture; the bulk of an agent's actions will be executing standard, deterministic code.

Oscar differs from many development-focused uses of LLMs by not trying to augment or displace the code writing process at all. After all, writing code is the fun part of writing software. Instead, the idea is to focus on the not-fun parts, like processing incoming issues, matching questions to existing documentation, and so on.

Oscar is very much an experiment. We don't know yet where it will go or what we will learn. Even so, our first prototype, the @gabyhelp bot, has already had many successful interactions in the Go issue tracker.

For now, Oscar is being developed under the auspices of the Go project. At some point in the future it may (or may not) be spun out into a separate project.

The rest of this README explains Oscar in more detail.

Goals

The concrete goals for the Oscar project are:

Reduce maintainer effort to resolve issues [note that resolve does not always mean fix]
Reduce maintainer effort to resolve change lists (CLs) or pull requests (PRs) [note that resolve does not always mean submit/merge]
Reduce maintainer effort to resolve forum questions
Enable more people to become productive maintainers

It is a non-goal to automate away coding. Instead we are focused on automating away maintainer toil.

Approach

Maintainer toil is not unique to the Go project, so we are aiming to build an architecture that any software project can reuse and extend, building their own agents customized to their project's needs. Hence Oscar: open-source contributor agent architecture. Exactly what that will mean is still something we are exploring.

So far, we have identified three capabilities that will be an important part of Oscar:

Indexing and surfacing related project context during contributor interactions.
Using natural language to control deterministic tools.
Analyzing issue reports and CLs/PRs, to help improve them in real time during or shortly after submission, and to label and route them appropriately.

It should make sense that LLMs have something to offer here, because open-source maintenance is fundamentally about interacting with people using natural language, and natural language is what LLMs are best at. So it‘s not surprising that all of these have an LLM-related component. On the other hand, all of these are also backed by significant amounts of deterministic code. Our approach is to use LLMs for what they’re good at—semantic analysis of natural language and translation from natural language into programs—and rely on deterministic code to do the rest.

The following sections look at each of those three important capabilities in turn. Note that we are still experimenting, and we expect to identify additional important capabilities as time goes on.

Indexing and surfacing related project context

Software projects are complex beasts. Only at the very beginning can a maintainer expect to keep all the important details and context in their head, and even when that‘s possible, those being in one person’s head does not help when a new contributor arrives with a bug report, a feature request, or a question. To address this, maintainers write design documentation, API references, FAQs, manual pages, blog posts, and so on. Now, instead of providing context directly, a maintainer can provide links to written context that already exists. Serving as a project search engine is still not the best use of the maintainer's time. Once a project grows even to modest size, any single maintainer cannot keep track of all the context that might be relevant, making it even harder to serve as a project search engine.

On the other hand, LLMs turn out to be a great platform for building a project search engine. LLMs can analyze documents and produce embeddings, which are high-dimensional (for example, 768-dimensional) floating point unit vectors with the property that documents with similar semantic meaning are mapped to vectors that point in similar directions. (For more about embeddings, see this blog post.) Combined with a vector database to retrieve vectors similar to an input vector, LLM embeddings provide a very effective way to index all of an open-source project's context, including documentation, issue reports, and CLs/PRs, and forum discussions. When a new issue report arrives, an agent can use the LLM-based project context index to identify highly related context, such as similar previous issues or relevant project documentation.

Our prototype agent implements this functionality and replies to new issues in the Go repository with a list of at most ten highly related links that add context to the report. (If the agent cannot find anything that looks related enough, it stays quiet and does not reply at all.) In the first few weeks we ran the agent, we identified the following benefits of such an agent:

The agent surfaces related context to contributors.
It is common for new issue reports to duplicate existing issue reports: a new bug might be reported multiple times in a short time window, or a non-bug might be reported every few months. When an agent replies with a link to a duplicate report, the contributor can close their new report and then watch that earlier issue. When an agent replies with a link to a report that looks like a duplicate but is not, the contributor can provide added context to distinguish their report from the earlier one.
For example, in golang/go#68196, after the agent replied with a near duplicate, the original reporter commented:
Good bot :). Based on the discussion in this issue, I understand that it might not be possible to do what‘s being suggested here. If that’s the case I'd still suggest to leave the issue open for a bit to see how many Go users care about this problem.
As another example, on golang/go#67986, after the agent replied with an exact duplicate, the original reporter commented:
Drats, I spent quite a bit of time searching existing issues. Not sure how I missed [that one].
The agent surfaces related context even to project maintainers.
Once a project reaches even modest size, no one person can remember all the context, not even a highly dedicated project maintainer. When an agent replies with a link to a related report, that eliminates the time the maintainer must spend to find it. If the maintainer has forgotten the related report entirely, or never saw it in the first place (perhaps it was handled by someone else), the reply is even more helpful, because it can point the maintainer in the right direction and save them the effort of repeating the analysis done in the earlier issue.
For example, in golang/go#68183, a project maintainer filed a bug against the Go compiler for mishandling certain malformed identifiers. The agent replied with a link to a similar report of the same bug, filed almost four years earlier but triaged to low priority. The added context allowed closing the earlier bug and provided an argument for raising the priority of the new bug.
As another example, in golang/go#67938, a project maintainer filed a bug against the Go coverage tool for causing the compiler to report incorrect sub-line position information. The agent replied with an earlier related issue (incorrect line numbers) from a decade earlier as well as a more recent issue about coverage not reporting sub-line position information at all. The first bug was important context, and the second bug's “fix” was the root cause of the bug in the new report: the sub-line position information added then was not added correctly. Those links pinpointed the exact code where the bug was. Once that was identified, it was also easy to determine the fix.
The agent interacts with bug reporters immediately.
In all of the previous examples, the fact that the agent replied only a minute or two after the report was filed meant that the reporter was still available and engaged enough to respond in a meaningful way: adding details to clarify the suggestion, closing the report as a duplicate, raising bug priority based on past reports, or identifying a fix. In contrast, if hours or days (or more) go by after the initial report, the original reporter may no longer be available, interested, or able to provide context or additional details. Immediately after the bug report is the best time to engage the reporter and refine the report. Maintainers cannot be expected to be engaged in this work all the time, but an agent can.

Finally, note that surfacing project context is extensible, so that projects can incorporate their context no matter what form it takes. Our prototype's context sources are tailored to the Go project, reading issues from GitHub, documentation from go.dev, and (soon) code reviews from Gerrit, but the architecture makes it easy to add additional sources.

Using natural language to control deterministic tools

The second important agent capability is using natural language to control deterministic tooling. As open-source projects grow, the number of helpful tools increases, and it can be difficult to keep track of all of them and remember how to use each one. For example, our prototype includes a general facility for editing GitHub issue comments to add or fix links. We envision also adding facilities for adding labels to an issue or assigning or CC‘ing people when it matches certain criteria. If a maintainer does not know this functionality exists it might be difficult to find. And even if they know it exists, perhaps they aren’t familiar with the specific API and don't want to take the time to learn it.

On the other hand, LLMs are very good at translating between intentions written in natural language and executable forms of those intentions such as program code or tool invocations. We have done preliminary experiments with Gemini selecting from and invoking available tools to satisfy natural language requests made by a maintainer. We don't have anything running for real yet, but it looks like a promising approach.

A different approach would be to rely more heavily on LLMs, letting them edit code, issues, and so on entirely based on natural language prompts with no deterministic tools. This “magic wand” approach demands more of LLMs than they are capable of today. We believe it will be far more effective to use LLMs to convert from natural language to deterministic tool use once and then apply those deterministic tools automatically. Our approach also limits the amount of “LLM supervision” needed: a person can check that the tool invocation is correct and then rely on the tool to operate deterministically.

We have not built this part of Oscar yet, but when we do, it will be extensible, so that projects can easily plug in their own tools.

Analyzing issue reports and CLs/PRs

The third important agent capability is analyzing issue reports and CLs/PRs (change lists / pull requests). Posting about related issues is a limited form of analysis, but we plan to add other kinds of semantic analysis, such as determining that an issue is primarily about performance and should have a “performance” label added.

We also plan to explore whether it is possible to analyze reports well enough to identify whether more information is needed to make the report useful. For example, if a report does not include a link to a reproduction program on the Go playground, the agent could ask for one. And if there is such a link, the agent could make sure to inline the code into the report to make it self-contained. The agent could potentially also run a sandboxed execution tool to identify which Go releases contain the bug and even use git bisect to identify the commit that introduced the bug.

As discussed earlier, all of these analyses and resulting interactions work much better when they happen immediately after the report is filed, when the reporter is still available and engaged. Automated agents can be on duty 24/7.

We have not built this part of Oscar yet, but when we do, it too will be extensible, so that projects can easily define their own analyses customized to the reports they receive.

Prototype

Our first prototype to explore open-source contributor agents is called Gaby (for “Go AI bot”) and runs in the Go issue tracker, posting as @gabyhelp. The source code is in internal/gaby in this repository. The gaby package's documentation explains the overall structure of the code in the repository as well.

So far, Gaby indexes Go issue content from GitHub as well as Go documentation from go.dev and replies to new issues with relevant links. We plan to add Gerrit code reviews in the near future.

Gaby‘s structure makes it easy to run on any kind of hosting service, using any LLM, any storage layer, and any vector database. Right now, it runs on a local workstation, using Google’s Gemini LLM, Pebble key-value storage files, and an in-memory vector database.

We plan to add support for a variety of other options, including Google Cloud Firestore for key-value storage and vector database. Firestore in particular will make it easy to run Gaby on hosted platforms like Cloud Run.

Running on hosted platforms with their own URLs (as opposed to a local workstation) will enable subscribing to GitHub webhooks, so that Gaby can respond even more quickly to issues and also carry on conversations.

Our experience with all of this will inform the eventual generalized Oscar design.

There is much work left to do.

Relationship to Gopherbot

The Go project has run its own completely deterministic agent, @gopherbot, for many years. That agent is configured by writing, reviewing, and checking in Go code in the golang.org/x/build/cmd/gopherbot package. Having the agent has been an incredible help to the Go project and is part of the inspiration for Oscar. At the same time, we are aiming for an even lighter-weight way to configure new agent behaviors: using natural language to control general behaviors. Over time, our goal is to merge @gabyhelp back into @gopherbot by re-building @gopherbot as an Oscar agent.

Discussion and Feedback

We are excited about the opportunities here, but we recognize that we may be missing important concerns as well as important opportunities to reduce open-source maintainer toil. We have created this GitHub discussion to discuss both concerns and new ideas for ways that Oscar-based agent can help improve open-source maintenance. Feedback there is much appreciated.