blob: 7006334337d551589c329fcf112f1aeae2725ab9 [file] [log] [blame] [view]
# Proposal: Audio for Mobile
Author: Jaana Burcu Dogan
With input from David Crawshaw, Hyang-Ah Kim and Andrew Gerrand.
Last updated: November 30, 2015
Discussion at https://golang.org/issue/13432.
## Abstract
This proposal suggests core abstractions to support audio decoding
and playback on mobile devices.
## Background
In the scope of the Go mobile project, an audio package that supports
decoding and playback is a top priority. The current status of audio
support under x/mobile is limited to OpenAL bindings and an experimental
high-level audio player that is backed by OpenAL.
The experimental audio package fails to
- provide high level abstractions to represents audio and audio processors,
- implement a memory-efficient playback model,
- implement decoders (e.g. an mp3 decoder),
- support live streaming or other networking audio sources.
In order to address these concerns, I am proposing core abstractions and
a minimal set of features based on the proposed abstractions to provide
decoding and playback support.
## Proposal
I (Burcu Dogan) surveyed the top iOS and Android apps for audio features.
Three major categories with majorly different requirements have revealed
as a result of the survey. A good audio package shouldn't address the
different class of requirements with isolated audio APIs, but must introduce
common concepts and types that could be the backbone of both high- and low-
level audio packages. This is how we will enable users to expand their audio
capabilities by partially delegating their work to lower-level layers of the
audio package without having to rewrite their entire audio stack.
### Features considered
This section briefly explains the features required in order to support common
audio requirements of the mobile applications. The abstractions we introduce
today should be extendable to meet a majority of the features listed below in
the long run.
#### Playback
Single or multi-channel playback with player controls such as play, pause,
stop, etc. Games use a looping sample as the background music -- looping
functionality is also essential. Multiple playback instances are needed. Most
games require a background audio track and one-shot audio effects on the
foreground.
#### Decoding
Codec library and decoding support. Most radio-like apps and music players
need to play a variety of audio sources. Codec support in the parity of
AudioUnit on iOS and OpenMAX on Android is good to have.
#### Remote streaming
Audio players, radios and tools that streams audio need to be able to work
with remote audio sources. HTTP Live Streaming works on both platforms but
used to be inefficient on Android devices.
#### Synchronization and composition
- Synchronization between channels/players
- APIs that allow developers to schedule the playback, frame-level timers
- Mixers, multiple channels need to be multiplexed into a single device buffer
- Music software apps that require audio composition and filtering features
#### Playlist features
Music players and radios require playlisting features, so the users can queue,
unqueue tracks on the player. Player also need shuffling and repeating
features.
More information on the classification of the audio apps based on the features
listed above is available at Appendix: Audio Apps Classification.
### Goals
#### Short-term goals
- Playback of generated data (such as a PCM sine wave).
- Playback of an audio asset.
- Playback from streaming network sources.
- Core interfaces to represent decoders.
- Initial decoder implementations, ideally delegating the decoding to the
- system codecs (OpenMax for Android and AudioUnit for iOS).
- Basic play functions such as play (looping and one-shot), stop, pause,
gain control.
- Prefetching before user invokes playback.
#### Longer-term goals
- Multi channel playback (Playing multiple streams at the same time.)
- Multi channel synchronization and an internal clock
- Composition and filtering (mixing of multiple signals, low-pass filter,
reverb, etc)
- Tracklisting features to queue, unqueue multiple sources to a player;
playback features such as prefetching the next song
### Non-goals
- Audio capture. Recording and encoding audio is not in the roadmap initially.
Both could be added to the package without touching any API surface.
- Dependency on the visual frame rate. This feature requires the audio
scheduler to work in cooperation with the graphics layer and currently not
in our radar.
### Core abstractions
The section proposes the core interfaces and abstractions to represent audio,
audio sources and decoding primitives. The goal of introducing and agreeing on
the core abstractions is to be able to extend the audio package features in
the light of the considered features listed above without breaking the APIs.
#### Clip
The audio package will represent audio data as linear PCM formatted in-memory
audio chuncks. A fundamental interface, Clip, will define how to consume audio
data and how audio attributes (such as bit and sample rate) are reported to
the consumers of an audio media source.
Clip is is a small window into the underlying audio data.
```
// FrameInfo represents the frame-level information.
type FrameInfo struct {
// Channels represent the number of audio channels
// (e.g. 1 for mono, 2 for stereo).
Channels int
// Bit depth is the number of bits used to represent
// a single sample.
BitDepth int
// Sample rate is the number of samples to be played
// at each second.
SampleRate int64
}
// Clip represents linear PCM formatted audio.
// Clip can seek and read a small number of frames to allow users to
// consume a small section of the underlying audio data.
//
// Frames return audio frames up to a number that can fit into the buf.
// n is the total number of returned frames.
// err is io.EOF if there are no frames left to read.
//
// FrameInfo returns the basic frame information about the clip audio.
//
// Seek seeks (offset*framesize*channels) byte in the source audio data.
// Seeking to negative offsets are illegal.
// An error is returned if the offset is out of the bounds of the
// audio data source.
//
// Size returns the total number of bytes of the underlying audio data.
// TODO(jbd): Support cases where size is unknown?
type Clip interface {
Frames(buf []byte) (n int, err error)
Seek(offset int64) (error)
FrameInfo() FrameInfo
Size() int64
}
```
#### Decoders
Decoders take any arbitrary input and is responsible to output a clip.
TODO(jbd): Proposal should also mention how the decoders will be organized.
e.g. image package's support for png, jpeg, gif, etc decoders.
```
// Decoder that reads from a Reader and converts the input
// to a PCM clip output.
func Decode(r io.ReadSeeker) (Clip, error) {
panic("not implemented")
}
// A decoder that decodes the given data WAV byte slice and decodes it
// into a PCM clip output. An error is returned if any of the decoding
// steps fail. (e.g. ClipInfo cannot be determined from the WAV header.)
func DecodeWAVBytes(data []byte) (Clip, error) {
panic("not implemented")
}
```
#### Clip sources
Any arbitrary valid audio data source can be converted into a clip. Examples
of clip sources are networking streams, file assets and in-memory buffers.
```
// NewBufferClip converts a buffer to a Clip.
func NewBufferClip(buf []byte, info FrameInfo) Clip {
panic("not implemented")
}
// NewRemoteClip converts the HTTP live streaming media
// source into a Clip.
func NewRemoteClip(url string) (Clip, error) {
panic("not implemented")
}
```
#### Players
A player plays a series of clips back-to-back, provides basic control
functions (play, stop, pause, seek, etc).
Note: Currently, x/mobile/exp/audio package provides an experimental and
highly immature player. With the introduction of the new core interfaces, we
will break the API surface in order to bless the new abstractions.
```
// NewPlayer returns a new Player. It initializes the underlying
// audio devices and the related resources.
// A player can play multiple clips back-to-back. Players will begin
// prefetching the next clip to provide a smooth and uninterrupted
// playback.
func NewPlayer(c ...Clip) (*Player, error)
```
## Compatibility
No compatibility issues.
## Implementation
The current scope of the implementation will be restricted to meet the
requirements listed in the "Short-term goals" sections.
The interfaces will be contributed by Burcu Dogan. The implementation of the
decoders and playback is a team effort and requires additional planning.
The audio package has no dependencies to the next Go releases and therefore
doesn't have to fit in the Go release cycle.
## Open issues
- WAV and AIFF both support float PCM values even though the use of float
values is unpopular. Should we consider supporting float values? Float values
mean more expensive encoding and decoding. Even if float values are supported,
they must be optional -- not the primary type to represent values.
- Decoding on desktop. The package will use the system codec libraries
provided by Android and iOS on mobile devices. It is not possible to provide
feature parity for desktop envs in the scope of decoding.
- Playback on desktop. The playback may directly use AudioUnit on iOS, and
libmedia (or stagefright) on Android. The media libraries on the desktop are
highly fragmented and cross-platform libraries are third-party dependencies.
It is unlikely that we can provide an audio package that works out of the box
on desktop if we don't write an audio backend for each platform.
- Hardware acceleration. Should we allow users to bypass the decoders and
stream to the device buffer in the longer term? The scope of the audio package
is primarily mobile devices (which case-by-case supports hardware
acceleration). But if the package will cover beyond the mobile, we should
consider this case.
- Seeking on variable bit rate encoded audio data is hard without a seek table.
## Appendix: Audio Apps Classification
Classification of the audio apps are based on thet survey results mentioned
above. This section summarizes which features are highly related to each other.
### Class A
Class A mostly represents games that require to play a background sound (in
looping mode or not) and occasionally need to play one-shot audio effects fit
in this category.
- Single channel player with looping audio
- Buffering audio files entirely in memory is efficient enough, audio files
are small
- Timing of the playback doesn’t have to be precise, latency is neglectable
### Class B
Class B represents games with advanced audio. Most apps that fit in this
category are using advanced audio engines as their audio backend.
- Multi channel player
- Synchronization between channels/players
- APIs that allow developers to schedule the playback, such as frame-level
timers
- Low latency, timing of the playback needs to be precise
- Mixers, multiple channels need to be multiplexed into a single device buffer
- Music software apps require audio composition, filtering, etc
### Class C
Class C represents the media players.
- Remote streaming
- Playlisting features, multitrack playback features such as prefetching and cross fading
- High-level player controls such as looping and shuffling
- Good decoder support