blob: 3dd7c386b2226a2c167c068254681edc4849027c [file] [log] [blame] [view]
# Proposal: XML Stream
Author(s): Sam Whited <sam@samwhited.com>
Last updated: 2017-03-09
Discussion at https://golang.org/issue/19480
## Abstract
The `encoding/xml` package contains an API for tokenizing an XML stream, but no
API exists for processing or manipulating the resulting token stream.
This proposal describes such an API.
## Background
The [`encoding/xml`][encoding/xml] package contains APIs for tokenizing an XML
stream and decoding that token stream into native data types.
Once unmarshaled, the data can then be manipulated and transformed.
However, this is not always ideal.
If we cannot change the type we are unmarshaling into and it does not match the
XML format we are attempting to deserialize, eg. if the type is defined in a
separate package or cannot be modified for API compatibility reasons, we may
have to first unmarshal into a type we control, then copy each field over to the
original type; this is cumbersome and verbose.
Unmarshaling into a struct is also lossy.
As stated in the XML package:
> Mapping between XML elements and data structures is inherently flawed:
> an XML element is an order-dependent collection of anonymous values, while a
> data structure is an order-independent collection of named values.
This means that transforming the XML stream itself cannot necessarily be
accomplished by deserializing into a struct and then reserializing the struct
back to XML; instead it requires manipulating the XML tokens directly.
This may require re-implementing parts of the XML package, for instance, when
renaming an element the start and end tags would have to be matched in user code
so that they can both be transformed to the new name.
To address these issues, an API for manipulating the token stream itself, before
marshaling or unmarshaling occurs, is necessary.
Ideally, such an API should allow for the composition of complex XML
transformations from simple, well understood building blocks.
The transducer pattern, widely available in functional languages, matches these
requirements perfectly.
Transducers (also called, transformers, adapters, etc.) are iterators that
provide a set of operations for manipulating the data being iterated over.
Common transducer operations include Map, Reduce, Filter, etc. and these
operations are are already widely known and understood.
## Proposal
The proposed API introduces two concepts that do not already exist in the
`encoding/xml` package:
```go
// A Tokenizer is anything that can decode a stream of XML tokens, including an
// xml.Decoder.
type Tokenizer interface {
Token() (xml.Token, error)
Skip() error
}
// A Transformer is a function that takes a Tokenizer and returns a new
// Tokenizer which outputs a transformed token stream.
type Transformer func(src Tokenizer) Tokenizer
```
Common transducer operations will also be included:
```go
// Inspect performs an operation for each token in the stream without
// transforming the stream in any way.
// It is often injected into the middle of a transformer pipeline for debugging.
func Inspect(f func(t xml.Token)) Transformer {}
// Map transforms the tokens in the input using the given mapping function.
func Map(mapping func(t xml.Token) xml.Token) Transformer {}
// Remove returns a Transformer that removes tokens for which f matches.
func Remove(f func(t xml.Token) bool) Transformer {}
```
Because Go does not provide a generic iterator concept, this (and all
transducers in the Go libraries) are domain specific, meaning operations that
only make sense when discussing XML tokens can also be included:
```go
// RemoveElement returns a Transformer that removes entire elements (and their
// children) if f matches the elements start token.
func RemoveElement(f func(start xml.StartElement) bool) Transformer {}
```
## Rationale
Transducers are commonly used in functional programming and in languages that
take inspiration from functional programming languages, including Go.
Examples include [Clojure transducers][clojure/transducer], [Rust
adapters][rust/adapter], and the various "Transformer" types used throughout Go,
such as in the [`golang.org/x/text/transform`][transform] package.
Because transducers are so widely used (and already used elsewhere in Go), they
are well understood.
## Compatibility
This proposal introduces two new exported types and 4 exported functions that
would be covered by the compatibility promise.
A minimal set of Transformers is proposed, but others can be added at a later
date without breaking backwards compatibility.
## Implementation
A version of this API is already implemented in the
[`mellium.im/xmlstream`][xmlstream] package.
If this proposal is accepted, the author volunteers to copy the relevant parts
to the correct location before the 1.9 (or 1.10, depending on the length of this
proposal process) planning cycle closes.
## Open issues
- Where does this API live?
It could live in the `encoding/xml` package itself, in another package (eg.
`encoding/xml/stream`) or, temporarily or permanently, in the subrepos:
`golang.org/x/xml/stream`.
- A Transformer for removing attributes from `xml.StartElement`'s was originally
proposed as part of this API, but its implementation is more difficult to do
efficiently since each use of `RemoveAttr` in a pipeline would need to iterate
over the `xml.Attr` slice separately.
- Existing APIs in the XML package such as `DecodeElement` require an
`xml.Decoder` to function and could not be used with the Tokenizer interface
used in this package.
A compatibility API may be needed to create a new Decoder with an underlying
tokenizer.
This would require that the new functionality reside in the `encoding/xml`
package.
Alternatively, general Decoder methods could be reimplemented in a new package
with the Tokenizer API.
[encoding/xml]: https://golang.org/pkg/encoding/xml/
[clojure/transducer]: https://clojure.org/reference/transducers
[rust/adapter]: https://doc.rust-lang.org/std/iter/#adapters
[transform]: https://godoc.org/golang.org/x/text/transform
[xmlstream]: https://godoc.org/mellium.im/xmlstream