| A new Go API for Protocol Buffers |
| 02 Mar 2020 |
| Tags: protobuf, technical |
| |
| Joe Tsai, Damien Neil, and Herbie Ong |
| |
| * Introduction |
| |
| We are pleased to announce the release of a major revision of the Go API for |
| [[https://developers.google.com/protocol-buffers][protocol buffers]], |
| Google's language-neutral data interchange format. |
| |
| * Motivations for a new API |
| |
| The first protocol buffer bindings for Go were |
| [[https://blog.golang.org/third-party-libraries-goprotobuf-and][announced by Rob Pike]] |
| in March of 2010. Go 1 would not be released for another two years. |
| |
| In the decade since that first release, the package has grown and |
| developed along with Go. Its users' requirements have grown too. |
| |
| Many people want to write programs that use reflection to examine protocol |
| buffer messages. The |
| [[https://pkg.go.dev/reflect][`reflect`]] |
| package provides a view of Go types and |
| values, but omits information from the protocol buffer type system. For |
| example, we might want to write a function that traverses a log entry and |
| clears any field annotated as containing sensitive data. The annotations |
| are not part of the Go type system. |
| |
| Another common desire is to use data structures other than the ones |
| generated by the protocol buffer compiler, such as a dynamic message type |
| capable of representing messages whose type is not known at compile time. |
| |
| We also observed that a frequent source of problems was that the |
| [[https://pkg.go.dev/github.com/golang/protobuf/proto?tab=doc#Message][`proto.Message`]] |
| interface, which identifies values of generated message types, does very |
| little to describe the behavior of those types. When users create types |
| that implement that interface (often inadvertently by embedding a message |
| in another struct) and pass values of those types to functions expecting |
| a generated message value, programs crash or behave unpredictably. |
| |
| All three of these problems have a common cause, and a common solution: |
| The `Message` interface should fully specify the behavior of a message, |
| and functions operating on `Message` values should freely accept any |
| type that correctly implements the interface. |
| |
| Since it is not possible to change the existing definition of the |
| `Message` type while keeping the package API compatible, we decided that |
| it was time to begin work on a new, incompatible major version of the |
| protobuf module. |
| |
| Today, we're pleased to release that new module. We hope you like it. |
| |
| * Reflection |
| |
| Reflection is the flagship feature of the new implementation. Similar |
| to how the `reflect` package provides a view of Go types and values, the |
| [[https://pkg.go.dev/google.golang.org/protobuf/reflect/protoreflect?tab=doc][`google.golang.org/protobuf/reflect/protoreflect`]] |
| package provides a view of values according to the protocol buffer |
| type system. |
| |
| A complete description of the `protoreflect` package would run too long |
| for this post, but let's look at how we might write the log-scrubbing |
| function we mentioned previously. |
| |
| First, we'll write a `.proto` file defining an extension of the |
| [[https://github.com/protocolbuffers/protobuf/blob/b96241b1b716781f5bc4dc25e1ebb0003dfaba6a/src/google/protobuf/descriptor.proto#L509][`google.protobuf.FieldOptions`]] |
| type so we can annotate fields as containing |
| sensitive information or not. |
| |
| syntax = "proto3"; |
| import "google/protobuf/descriptor.proto"; |
| package golang.example.policy; |
| extend google.protobuf.FieldOptions { |
| bool non_sensitive = 50000; |
| } |
| |
| We can use this option to mark certain fields as non-sensitive. |
| |
| message MyMessage { |
| string public_name = 1 [(golang.example.policy.non_sensitive) = true]; |
| } |
| |
| Next, we will write a Go function which accepts an arbitrary message |
| value and removes all the sensitive fields. |
| |
| // Redact clears every sensitive field in pb. |
| func Redact(pb proto.Message) { |
| // ... |
| } |
| |
| This function accepts a |
| [[https://pkg.go.dev/google.golang.org/protobuf/proto?tab=doc#Message][`proto.Message`]], |
| an interface type implemented by all generated message types. This type |
| is an alias for one defined in the `protoreflect` package: |
| |
| type ProtoMessage interface{ |
| ProtoReflect() Message |
| } |
| |
| To avoid filling up the namespace of generated |
| messages, the interface contains only a single method returning a |
| [[https://pkg.go.dev/google.golang.org/protobuf/reflect/protoreflect?tab=doc#Message][`protoreflect.Message`]], |
| which provides access to the message contents. |
| |
| (Why an alias? Because `protoreflect.Message` has a corresponding |
| method returning the original `proto.Message`, and we need to avoid an |
| import cycle between the two packages.) |
| |
| The |
| [[https://pkg.go.dev/google.golang.org/protobuf/reflect/protoreflect?tab=doc#Message.Range][`protoreflect.Message.Range`]] |
| method calls a function for every populated field in a message. |
| |
| m := pb.ProtoReflect() |
| m.Range(func(fd protoreflect.FieldDescriptor, v protoreflect.Value) bool { |
| // ... |
| return true |
| }) |
| |
| The range function is called with a |
| [[https://pkg.go.dev/google.golang.org/protobuf/reflect/protoreflect?tab=doc#FieldDescriptor][`protoreflect.FieldDescriptor`]] |
| describing the protocol buffer type of the field, and a |
| [[https://pkg.go.dev/google.golang.org/protobuf/reflect/protoreflect?tab=doc#Value][`protoreflect.Value`]] |
| containing the field value. |
| |
| The |
| [[https://pkg.go.dev/google.golang.org/protobuf/reflect/protoreflect?tab=doc#Descriptor.Options][`protoreflect.FieldDescriptor.Options`]] |
| method returns the field options as a `google.protobuf.FieldOptions` |
| message. |
| |
| opts := fd.Options().(*descriptorpb.FieldOptions) |
| |
| (Why the type assertion? Since the generated `descriptorpb` package |
| depends on `protoreflect`, the `protoreflect` package can't return the |
| concrete options type without causing an import cycle.) |
| |
| We can then check the options to see the value of our extension boolean: |
| |
| if proto.GetExtension(opts, policypb.E_NonSensitive).(bool) { |
| return true // don't redact non-sensitive fields |
| } |
| |
| Note that we are looking at the field _descriptor_ here, not the field |
| _value_. The information we're interested in lies in the protocol |
| buffer type system, not the Go one. |
| |
| This is also an example of an area where we |
| have simplified the `proto` package API. The original |
| [[https://pkg.go.dev/github.com/golang/protobuf/proto?tab=doc#GetExtension][`proto.GetExtension`]] |
| returned both a value and an error. The new |
| [[https://pkg.go.dev/google.golang.org/protobuf/proto?tab=doc#GetExtension][`proto.GetExtension`]] |
| returns just a value, returning the default value for the field if it is |
| not present. Extension decoding errors are reported at `Unmarshal` time. |
| |
| Once we have identified a field that needs redaction, clearing it is simple: |
| |
| m.Clear(fd) |
| |
| Putting all the above together, our complete redaction function is: |
| |
| // Redact clears every sensitive field in pb. |
| func Redact(pb proto.Message) { |
| m := pb.ProtoReflect() |
| m.Range(func(fd protoreflect.FieldDescriptor, v protoreflect.Value) bool { |
| opts := fd.Options().(*descriptorpb.FieldOptions) |
| if proto.GetExtension(opts, policypb.E_NonSensitive).(bool) { |
| return true |
| } |
| m.Clear(fd) |
| return true |
| }) |
| } |
| |
| A more complete implementation might recursively descend into |
| message-valued fields. We hope that this simple example gives a |
| taste of protocol buffer reflection and its uses. |
| |
| * Versions |
| |
| We call the original version of Go protocol buffers APIv1, and the |
| new one APIv2. Because APIv2 is not backwards compatible with APIv1, |
| we need to use different module paths for each. |
| |
| (These API versions are not the same as the versions of the protocol |
| buffer language: `proto1`, `proto2`, and `proto3`. APIv1 and APIv2 |
| are concrete implementations in Go that both support the `proto2` and |
| `proto3` language versions.) |
| |
| The |
| [[https://pkg.go.dev/github.com/golang/protobuf?tab=overview][`github.com/golang/protobuf`]] |
| module is APIv1. |
| |
| The |
| [[https://pkg.go.dev/google.golang.org/protobuf?tab=overview][`google.golang.org/protobuf`]] |
| module is APIv2. We have taken advantage of the need to change the |
| import path to switch to one that is not tied to a specific hosting |
| provider. (We considered `google.golang.org/protobuf/v2`, to make it |
| clear that this is the second major version of the API, but settled on |
| the shorter path as being the better choice in the long term.) |
| |
| We know that not all users will move to a new major version of a package |
| at the same rate. Some will switch quickly; others may remain on the old |
| version indefinitely. Even within a single program, some parts may use |
| one API while others use another. It is essential, therefore, that we |
| continue to support programs that use APIv1. |
| |
| - `github.com/golang/protobuf@v1.3.4` is the most recent pre-APIv2 version of APIv1. |
| |
| - `github.com/golang/protobuf@v1.4.0` is a version of APIv1 implemented in terms of APIv2. |
| The API is the same, but the underlying implementation is backed by the new one. |
| This version contains functions to convert between the APIv1 and APIv2 `proto.Message` |
| interfaces to ease the transition between the two. |
| |
| - `google.golang.org/protobuf@v1.20.0` is APIv2. |
| This module depends upon `github.com/golang/protobuf@v1.4.0`, |
| so any program which uses APIv2 will automatically pick a version of APIv1 |
| which integrates with it. |
| |
| (Why start at version `v1.20.0`? To provide clarity. |
| We do not anticipate APIv1 to ever reach `v1.20.0`, |
| so the version number alone should be enough to unambiguously differentiate |
| between APIv1 and APIv2.) |
| |
| We intend to maintain support for APIv1 indefinitely. |
| |
| This organization ensures that any given program will use only a single |
| protocol buffer implementation, regardless of which API version it uses. |
| It permits programs to adopt the new API gradually, or not at all, while |
| still gaining the advantages of the new implementation. The principle of |
| minimum version selection means that programs may remain on the old |
| implementation until the maintainers choose to update to the new one |
| (either directly, or by updating a dependency). |
| |
| * Additional features of note |
| |
| The |
| [[https://pkg.go.dev/google.golang.org/protobuf/encoding/protojson][`google.golang.org/protobuf/encoding/protojson`]] |
| package converts protocol buffer messages to and from JSON using the |
| [[https://developers.google.com/protocol-buffers/docs/proto3#json][canonical JSON mapping]], |
| and fixes a number of issues with the old `jsonpb` package |
| that were difficult to change without causing problems for existing users. |
| |
| The |
| [[https://pkg.go.dev/google.golang.org/protobuf/types/dynamicpb][`google.golang.org/protobuf/types/dynamicpb`]] |
| package provides an implementation of `proto.Message` for messages whose |
| protocol buffer type is derived at runtime. |
| |
| The |
| [[https://pkg.go.dev/google.golang.org/protobuf/testing/protocmp][`google.golang.org/protobuf/testing/protocmp`]] |
| package provides functions to compare protocol buffer messages with the |
| [[https://pkg.go.dev/github.com/google/go-cmp/cmp][`github.com/google/cmp`]] |
| package. |
| |
| The |
| [[https://pkg.go.dev/google.golang.org/protobuf/compiler/protogen?tab=doc][`google.golang.org/protobuf/compiler/protogen`]] |
| package provides support for writing protocol compiler plugins. |
| |
| * Conclusion |
| |
| The `google.golang.org/protobuf` module is a major overhaul of |
| Go's support for protocol buffers, providing first-class support |
| for reflection, custom message implementations, and a cleaned up API |
| surface. We intend to maintain the previous API indefinitely as a wrapper |
| of the new one, allowing users to adopt the new API incrementally at |
| their own pace. |
| |
| Our goal in this update is to improve upon the benefits of the old |
| API while addressing its shortcomings. As we completed each component of |
| the new implementation, we put it into use within Google's codebase. This |
| incremental rollout has given us confidence in both the usability of the new |
| API and the performance and correctness of the new implementation. We believe |
| it is production ready. |
| |
| We are excited about this release and hope that it will serve the Go |
| ecosystem for the next ten years and beyond! |