| # Proposal: Audio for Mobile |
| |
| Author: Jaana Burcu Dogan |
| |
| With input from David Crawshaw, Hyang-Ah Kim and Andrew Gerrand. |
| |
| Last updated: November 30, 2015 |
| |
| Discussion at https://golang.org/issue/13432. |
| |
| ## Abstract |
| |
| This proposal suggests core abstractions to support audio decoding |
| and playback on mobile devices. |
| |
| ## Background |
| |
| In the scope of the Go mobile project, an audio package that supports |
| decoding and playback is a top priority. The current status of audio |
| support under x/mobile is limited to OpenAL bindings and an experimental |
| high-level audio player that is backed by OpenAL. |
| |
| The experimental audio package fails to |
| - provide high level abstractions to represents audio and audio processors, |
| - implement a memory-efficient playback model, |
| - implement decoders (e.g. an mp3 decoder), |
| - support live streaming or other networking audio sources. |
| |
| In order to address these concerns, I am proposing core abstractions and |
| a minimal set of features based on the proposed abstractions to provide |
| decoding and playback support. |
| |
| ## Proposal |
| |
| I (Burcu Dogan) surveyed the top iOS and Android apps for audio features. |
| Three major categories with majorly different requirements have revealed |
| as a result of the survey. A good audio package shouldn't address the |
| different class of requirements with isolated audio APIs, but must introduce |
| common concepts and types that could be the backbone of both high- and low- |
| level audio packages. This is how we will enable users to expand their audio |
| capabilities by partially delegating their work to lower-level layers of the |
| audio package without having to rewrite their entire audio stack. |
| |
| ### Features considered |
| This section briefly explains the features required in order to support common |
| audio requirements of the mobile applications. The abstractions we introduce |
| today should be extendable to meet a majority of the features listed below in |
| the long run. |
| |
| #### Playback |
| Single or multi-channel playback with player controls such as play, pause, |
| stop, etc. Games use a looping sample as the background music -- looping |
| functionality is also essential. Multiple playback instances are needed. Most |
| games require a background audio track and one-shot audio effects on the |
| foreground. |
| |
| #### Decoding |
| Codec library and decoding support. Most radio-like apps and music players |
| need to play a variety of audio sources. Codec support in the parity of |
| AudioUnit on iOS and OpenMAX on Android is good to have. |
| |
| #### Remote streaming |
| Audio players, radios and tools that streams audio need to be able to work |
| with remote audio sources. HTTP Live Streaming works on both platforms but |
| used to be inefficient on Android devices. |
| |
| #### Synchronization and composition |
| - Synchronization between channels/players |
| - APIs that allow developers to schedule the playback, frame-level timers |
| - Mixers, multiple channels need to be multiplexed into a single device buffer |
| - Music software apps that require audio composition and filtering features |
| |
| #### Playlist features |
| Music players and radios require playlisting features, so the users can queue, |
| unqueue tracks on the player. Player also need shuffling and repeating |
| features. |
| |
| More information on the classification of the audio apps based on the features |
| listed above is available at Appendix: Audio Apps Classification. |
| |
| ### Goals |
| |
| #### Short-term goals |
| |
| - Playback of generated data (such as a PCM sine wave). |
| - Playback of an audio asset. |
| - Playback from streaming network sources. |
| - Core interfaces to represent decoders. |
| - Initial decoder implementations, ideally delegating the decoding to the |
| - system codecs (OpenMax for Android and AudioUnit for iOS). |
| - Basic play functions such as play (looping and one-shot), stop, pause, |
| gain control. |
| - Prefetching before user invokes playback. |
| |
| #### Longer-term goals |
| - Multi channel playback (Playing multiple streams at the same time.) |
| - Multi channel synchronization and an internal clock |
| - Composition and filtering (mixing of multiple signals, low-pass filter, |
| reverb, etc) |
| - Tracklisting features to queue, unqueue multiple sources to a player; |
| playback features such as prefetching the next song |
| |
| ### Non-goals |
| - Audio capture. Recording and encoding audio is not in the roadmap initially. |
| Both could be added to the package without touching any API surface. |
| - Dependency on the visual frame rate. This feature requires the audio |
| scheduler to work in cooperation with the graphics layer and currently not |
| in our radar. |
| |
| ### Core abstractions |
| |
| The section proposes the core interfaces and abstractions to represent audio, |
| audio sources and decoding primitives. The goal of introducing and agreeing on |
| the core abstractions is to be able to extend the audio package features in |
| the light of the considered features listed above without breaking the APIs. |
| |
| #### Clip |
| The audio package will represent audio data as linear PCM formatted in-memory |
| audio chuncks. A fundamental interface, Clip, will define how to consume audio |
| data and how audio attributes (such as bit and sample rate) are reported to |
| the consumers of an audio media source. |
| |
| Clip is is a small window into the underlying audio data. |
| |
| ``` |
| // FrameInfo represents the frame-level information. |
| type FrameInfo struct { |
| // Channels represent the number of audio channels |
| // (e.g. 1 for mono, 2 for stereo). |
| Channels int |
| |
| // Bit depth is the number of bits used to represent |
| // a single sample. |
| BitDepth int |
| |
| // Sample rate is the number of samples to be played |
| // at each second. |
| SampleRate int64 |
| } |
| |
| // Clip represents linear PCM formatted audio. |
| // Clip can seek and read a small number of frames to allow users to |
| // consume a small section of the underlying audio data. |
| // |
| // Frames return audio frames up to a number that can fit into the buf. |
| // n is the total number of returned frames. |
| // err is io.EOF if there are no frames left to read. |
| // |
| // FrameInfo returns the basic frame information about the clip audio. |
| // |
| // Seek seeks (offset*framesize*channels) byte in the source audio data. |
| // Seeking to negative offsets are illegal. |
| // An error is returned if the offset is out of the bounds of the |
| // audio data source. |
| // |
| // Size returns the total number of bytes of the underlying audio data. |
| // TODO(jbd): Support cases where size is unknown? |
| type Clip interface { |
| Frames(buf []byte) (n int, err error) |
| Seek(offset int64) (error) |
| FrameInfo() FrameInfo |
| Size() int64 |
| } |
| ``` |
| |
| #### Decoders |
| Decoders take any arbitrary input and is responsible to output a clip. |
| TODO(jbd): Proposal should also mention how the decoders will be organized. |
| e.g. image package's support for png, jpeg, gif, etc decoders. |
| |
| ``` |
| // Decoder that reads from a Reader and converts the input |
| // to a PCM clip output. |
| func Decode(r io.ReadSeeker) (Clip, error) { |
| panic("not implemented") |
| } |
| |
| // A decoder that decodes the given data WAV byte slice and decodes it |
| // into a PCM clip output. An error is returned if any of the decoding |
| // steps fail. (e.g. ClipInfo cannot be determined from the WAV header.) |
| func DecodeWAVBytes(data []byte) (Clip, error) { |
| panic("not implemented") |
| } |
| |
| ``` |
| |
| #### Clip sources |
| Any arbitrary valid audio data source can be converted into a clip. Examples |
| of clip sources are networking streams, file assets and in-memory buffers. |
| |
| ``` |
| // NewBufferClip converts a buffer to a Clip. |
| func NewBufferClip(buf []byte, info FrameInfo) Clip { |
| panic("not implemented") |
| } |
| |
| // NewRemoteClip converts the HTTP live streaming media |
| // source into a Clip. |
| func NewRemoteClip(url string) (Clip, error) { |
| panic("not implemented") |
| } |
| ``` |
| |
| #### Players |
| |
| A player plays a series of clips back-to-back, provides basic control |
| functions (play, stop, pause, seek, etc). |
| |
| Note: Currently, x/mobile/exp/audio package provides an experimental and |
| highly immature player. With the introduction of the new core interfaces, we |
| will break the API surface in order to bless the new abstractions. |
| |
| ``` |
| // NewPlayer returns a new Player. It initializes the underlying |
| // audio devices and the related resources. |
| // A player can play multiple clips back-to-back. Players will begin |
| // prefetching the next clip to provide a smooth and uninterrupted |
| // playback. |
| func NewPlayer(c ...Clip) (*Player, error) |
| ``` |
| |
| ## Compatibility |
| |
| No compatibility issues. |
| |
| ## Implementation |
| |
| The current scope of the implementation will be restricted to meet the |
| requirements listed in the "Short-term goals" sections. |
| |
| The interfaces will be contributed by Burcu Dogan. The implementation of the |
| decoders and playback is a team effort and requires additional planning. |
| |
| The audio package has no dependencies to the next Go releases and therefore |
| doesn't have to fit in the Go release cycle. |
| |
| ## Open issues |
| |
| - WAV and AIFF both support float PCM values even though the use of float |
| values is unpopular. Should we consider supporting float values? Float values |
| mean more expensive encoding and decoding. Even if float values are supported, |
| they must be optional -- not the primary type to represent values. |
| - Decoding on desktop. The package will use the system codec libraries |
| provided by Android and iOS on mobile devices. It is not possible to provide |
| feature parity for desktop envs in the scope of decoding. |
| - Playback on desktop. The playback may directly use AudioUnit on iOS, and |
| libmedia (or stagefright) on Android. The media libraries on the desktop are |
| highly fragmented and cross-platform libraries are third-party dependencies. |
| It is unlikely that we can provide an audio package that works out of the box |
| on desktop if we don't write an audio backend for each platform. |
| - Hardware acceleration. Should we allow users to bypass the decoders and |
| stream to the device buffer in the longer term? The scope of the audio package |
| is primarily mobile devices (which case-by-case supports hardware |
| acceleration). But if the package will cover beyond the mobile, we should |
| consider this case. |
| - Seeking on variable bit rate encoded audio data is hard without a seek table. |
| |
| ## Appendix: Audio Apps Classification |
| |
| Classification of the audio apps are based on thet survey results mentioned |
| above. This section summarizes which features are highly related to each other. |
| |
| ### Class A |
| Class A mostly represents games that require to play a background sound (in |
| looping mode or not) and occasionally need to play one-shot audio effects fit |
| in this category. |
| - Single channel player with looping audio |
| - Buffering audio files entirely in memory is efficient enough, audio files |
| are small |
| - Timing of the playback doesn’t have to be precise, latency is neglectable |
| |
| ### Class B |
| Class B represents games with advanced audio. Most apps that fit in this |
| category are using advanced audio engines as their audio backend. |
| - Multi channel player |
| - Synchronization between channels/players |
| - APIs that allow developers to schedule the playback, such as frame-level |
| timers |
| - Low latency, timing of the playback needs to be precise |
| - Mixers, multiple channels need to be multiplexed into a single device buffer |
| - Music software apps require audio composition, filtering, etc |
| |
| ### Class C |
| Class C represents the media players. |
| - Remote streaming |
| - Playlisting features, multitrack playback features such as prefetching and cross fading |
| - High-level player controls such as looping and shuffling |
| - Good decoder support |