design: audio for mobile

Updates golang/go#13432.

Change-Id: I718006e8f039c476d456c1276c54132bd66d9410
Reviewed-on: https://go-review.googlesource.com/17262
Reviewed-by: Burcu Dogan <jbd@google.com>
diff --git a/design/13432-mobile-audio.md b/design/13432-mobile-audio.md
new file mode 100644
index 0000000..3967d4b
--- /dev/null
+++ b/design/13432-mobile-audio.md
@@ -0,0 +1,273 @@
+# Proposal: Audio for Mobile
+
+Author: Burcu Dogan
+
+With input from David Crawshaw, Hyang-Ah Kim and Andrew Gerrand.
+
+Last updated: November 30, 2015
+
+Discussion at https://golang.org/issue/13432.
+
+## Abstract
+
+This proposal suggests core abstractions to support audio decoding
+and playback on mobile devices.
+
+## Background
+
+In the scope of the Go mobile project, an audio package that supports
+decoding and playback is a top priority. The current status of audio
+support under x/mobile is limited to OpenAL bindings and an experimental
+high-level audio player that is backed by OpenAL.
+
+The experimental audio package fails to
+- provide high level abstractions to represents audio and audio processors,
+- implement a memory-efficient playback model,
+- implement decoders (e.g. an mp3 decoder),
+- support live streaming or other networking audio sources.
+
+In order to address these concerns, I am proposing core abstractions and
+a minimal set of features based on the proposed abstractions to provide
+decoding and playback support.
+
+## Proposal
+
+I (Burcu Dogan) surveyed the top iOS and Android apps for audio features.
+Three major categories with majorly different requirements have revealed
+as a result of the survey. A good audio package shouldn't address the
+different class of requirements with isolated audio APIs, but must introduce
+common concepts and types that could be the backbone of both high- and low-
+level audio packages. This is how we will enable users to expand their audio
+capabilities by partially delegating their work to lower-level layers of the
+audio package without having to rewrite their entire audio stack.
+
+### Features considered
+This section briefly explains the features required in order to support common
+audio requirements of the mobile applications. The abstractions we introduce
+today should be extendable to meet a majority of the features listed below in
+the long run.
+
+#### Playback
+Single or multi-channel playback with player controls such as play, pause,
+stop, etc. Games use a looping sample as the background music, so looping
+functionality is also essential. Multiple playback instances are needed. Most
+games require a background audio track and one-shot audio effects on the
+foreground.
+
+#### Decoding
+Codec library and decoding support. Most radio-like apps and music players
+need to play a variety of audio sources. Codec support in the parity of
+AudioUnit on iOS and OpenMAX on Android is good to have.
+
+#### Remote streaming
+Audio players, radios and tools that streams audio need to be able to work
+with remote audio sources. HTTP Live Streaming works on both platforms but
+used to be inefficient on Android devices.
+
+#### Synchronization and composition
+- Synchronization between channels/players
+- APIs that allow developers to schedule the playback, frame-level timers
+- Mixers, multiple channels need to be multiplexed into a single device buffer
+- Music software apps that require audio composition and filtering features
+
+#### Playlist features
+Music players and radios require playlisting features, so the users can queue,
+unqueue tracks on the player. Player also need shuffling and repeating
+features.
+
+More information on the classification of the audio apps based on the features
+listed above is available at Appendix: Audio Apps Classification.
+
+### Goals
+
+#### Short-term goals
+
+- Playback of generated data (such as a PCM sine wave).
+- Playback of an audio asset.
+- Playback from streaming network sources.
+- Core interfaces to represent decoders.
+- Initial decoder implementations, ideally delegating the decoding to the
+- system codecs (OpenMax for Android and AudioUnit for iOS).
+- Basic play functions such as play (looping and one-shot), stop, pause,
+gain control.
+- Prefetching before user invokes playback.
+
+#### Longer-term goals
+- Multi channel playback (Playing multiple streams at the same time.)
+- Multi channel synchronization and an internal clock
+- Composition and filtering (mixing of multiple signals, low-pass filter,
+reverb, etc)
+- Tracklisting features to queue, unqueue multiple sources to a player;
+playback features such as prefetching the next song
+
+### Non-goals
+- Audio capture. Recording and encoding audio is not in the roadmap initially.
+Both could be added to the package without touching any API surface.
+- Dependency on the visual frame rate. This feature requires the audio
+scheduler to work in cooperation with the graphics layer and currently not
+in our radar.
+
+### Core abstractions
+
+The section proposes the core interfaces and abstractions to represent audio,
+audio sources and decoding primitives. The goal of introducing and agreeing on
+the core abstractions is to be able to extend the audio package features in
+the light of the considered features listed above without breaking the APIs.
+
+####  Clip
+The audio package will represent audio data as linear PCM formatted in-memory
+audio chuncks. A fundamental interface, Clip, will define how to consume audio
+data and how audio attributes (such as bit and sample rate) are reported to
+the consumers of an audio media source.
+
+Clip is an io.ReadSeeker and must be considered as a small window into the
+underlying audio data.
+
+```
+// FrameInfo represents the frame-level information.
+type FrameInfo struct {
+    // Channels represent the number of audio channels
+    // (e.g. 1 for mono, 2 for stereo).
+    Channels int
+
+    // Bit depth is the number of bits used to represent
+    // a single sample.
+    BitDepth int
+
+    // Sample rate is the number of samples to be played
+    // at each second.
+    SampleRate int64
+}
+
+// Clip is an io.ReadSeeker that represents linear PCM formatted audio.
+// Clip can seek and read from a section and allow users to
+// consume a small section of the underlying audio data.
+//
+// FrameInfo returns the basic frame-level information about the clip audio.
+//
+// Size returns the total number of bytes of the underlying audio data.
+// TODO(jbd): Support cases where size is unknown?
+type Clip interface {
+    io.ReadSeeker
+    FrameInfo() FrameInfo
+    Size() int64
+}
+```
+
+#### Decoders
+Decoders take any arbitrary input and is responsible to output a clip.
+
+```
+// Decoder that reads from a Reader and converts the input
+// to a PCM clip output.
+func Decode(r io.ReadSeeker) (Clip, error) {
+  panic("not implemented")
+}
+
+// A decoder that decodes the given data WAV byte slice and decodes it
+// into a PCM clip output. An error is returned if any of the decoding
+// steps fail. (e.g. ClipInfo cannot be determined from the WAV header.)
+func DecodeWAVBytes(data []byte) (Clip, error) {
+  panic("not implemented")
+}
+
+```
+
+#### Clip sources
+Any arbitrary valid audio data source can be converted into a clip.
+
+```
+// NewBufferClip converts a buffer to a Clip.
+func NewBufferClip(buf []byte, info FrameInfo) Clip {
+    panic("not implemented")
+}
+
+// NewRemoteClip converts the HTTP live streaming media
+// source into a Clip.
+func NewRemoteClip(url string) (Clip, error) {
+    panic("not implemented")
+}
+```
+
+#### Players
+
+A player plays a series of clips back-to-back, provides basic control
+functions (play, stop, pause, seek, etc).
+
+Note: Currently, x/mobile/exp/audio package provides an experimental and
+highly immature player. With the introduction of the new core interfaces, we
+will break the API surface in order to bless the new abstractions.
+
+```
+// NewPlayer returns a new Player. It initializes the underlying
+// audio devices and the related resources.
+// A player can play multiple clips back-to-back. Players will begin
+// prefetching the next clip to provide a smooth and uninterrupted
+// playback.
+func NewPlayer(c ...Clip) (*Player, error)
+```
+
+## Compatibility
+
+No compatibility issues.
+
+## Implementation
+
+The current scope of the implementation will be restricted to meet the
+requirements listed in the "Short-term goals" sections.
+
+The interfaces will be contributed by Burcu Dogan. The implementation of the
+decoders and playback is a team effort and requires additional planning.
+
+The audio package has no dependencies to the next Go releases and therefore
+doesn't have to fit in the Go release cycle.
+
+## Open issues
+
+- WAV and AIFF both support float PCM values even though the use of float
+values is unpopular. Should we consider supporting float values?
+- Decoding on desktop. The package will use the system codec libraries
+provided by Android and iOS on mobile devices. It is not possible to provide
+feature parity for desktop envs in the scope of decoding.
+- Playback on desktop. The playback may directly use AudioUnit on iOS, and
+libmedia (or stagefright) on Android. The media libraries on the desktop are
+highly fragmented and cross-platform libraries are third-party dependencies.
+It is unlikely that we can provide an audio package that works out of the box
+on desktop if we don't write an audio backend for each platform.
+- Hardware acceleration. Should we allow users to bypass the decoders and
+stream to the device buffer in the longer term? The scope of the audio package
+is primarily mobile devices (which case-by-case supports hardware
+acceleration). But if the package will cover beyond the mobile, we should
+consider this case.
+
+## Appendix: Audio Apps Classification
+
+Classification of the audio apps are based on thet survey results mentioned
+above. This section summarizes which features are highly related to each other.
+
+### Class A
+Class A mostly represents games that require to play a background sound (in
+looping mode or not) and occasionally need to play one-shot audio effects fit
+in this category.
+- Single channel player with looping audio
+- Buffering audio files entirely in memory is efficient enough, audio files
+are small
+- Timing of the playback doesn’t have to be precise, latency is neglectable
+
+### Class B
+Class B represents games with advanced audio. Most apps that fit in this
+category are using advanced audio engines as their audio backend.
+- Multi channel player
+- Synchronization between channels/players
+- APIs that allow developers to schedule the playback, such as frame-level
+timers
+- Low latency, timing of the playback needs to be precise
+- Mixers, multiple channels need to be multiplexed into a single device buffer
+- Music software apps require audio composition, filtering, etc
+
+### Class C
+Class C represents the media players.
+- Remote streaming
+- Playlisting features, multitrack playback features such as prefetching and cross fading
+- High-level player controls such as looping and shuffling
+- Good decoder support