This document provides an overview of the audio capabilities in Cinder. You can use this, along with the samples in the samples/_audio folder, as an entry point into the
The
The design is meant to accomodate a wide range of applications. The
The core of the design is the modular API, which provides a set of extendable audio tools that can be connected in flexible ways depending on an application's needs. For those who plan to do extensive audio work with Cinder, they will be served best by using the Node system. It draws from concepts found in other popular modular audio API's, namely Web Audio and Pure Data, while combining many of Cinder's existing design patterns. We also take full advantage of C++11 features such as smart pointers, std::atomic
's, and std::mutex
's.
A modular API is advantageous because it is proven to be very flexible and allows for reusability without a significant loss in performance. Still, higher level constructs exist and more will be added as time permits. The Cinder philosophy remains, "easy things easy and hard things possible."
For those looking for raw DSP functionality, the
For the sake of trying to keep the code snippets short, it is assumed that the code is in a .cpp file and there is a using namespace ci;
within scope, so the ci::
qualifier has been dropped.
For those who need only to play back a sound file or want a simple processing function, you may not need to look further than what is provided in the
The following is an example of how to play a sound file with an
// declare somewhere in your class interface:
audio::VoiceRef mVoice;
void MyApp::setup()
{
audio::SourceFileRef sourceFile = audio::load( app::loadAsset( "soundfile.wav" ) );
mVoice = audio::Voice::create( sourceFile );
// Start playing audio from the voice:
mVoice->start();
}
Later, you can call mVoice->stop()
when you want to quit playing audio.
The VoiceSamplePlayerRef
, but the user will generally only need to maintain this by storing it in a VoiceRef instance variable. It is necessary to store the returned VoiceRef because once it goes out of scope, it will be disconnected from the audio graph and destroyed. This is fairly cheap, however (much cheaper than creating the SourceFile
via mVoice
instance variable.
You can also use a std::function
callback to the create()
method. This can be useful for educational purposes or quick experiments.
mPhase = 0.0f; // float stored in class
mVoice = audio::Voice::create( [this] ( audio::Buffer *buffer, size_t sampleRate ) {
float *channel0 = buffer->getChannel( 0 );
// generate a 440 hertz sine wave
float phaseIncr = ( 440.0f / (float)sampleRate ) * 2 * (float)M_PI;
for( size_t i = 0; i < buffer->getNumFrames(); i++ ) {
mPhase = fmodf( mPhase + phaseIncr, 2 * M_PI );
channel0[i] = std::sin( mPhase );
}
} );
Users should be aware that the callback is processing on the background audio thread, so any data that interacts with the main thread needs to be synchronized.
Each
void MyApp::mouseDown( app::MouseEvent event )
{
mVoice->setVolume( 1.0f - (float)event.getPos().y / (float)getWindowHeight() );
mVoice->setPan( (float)event.getPos().x / (float)getWindowWidth() );
}
Related Samples:
Before jumping into the meat and potatoes, it's a good idea to be introduced to
In both cases, there is a number of frames (a frame consists of a sample for each channel) and a number of channels that make up the layout of the shared_ptr<Buffer>
(BufferRef
's) if it makes sense to hold onto a shared buffer. This isn't required though, and in some cases you indeed want to copy the entire Buffer, such as multi-threaded situations where you are passing a Buffer from the audio thread back to the main thread. For these cases, the copy constructor is enabled.
It is also important to note that samplerate is not a property of the buffer data - this is to allow for flexibility of how the data is interpreted. Instead, samplerate is determined from the context in which the audio data is being used.
The standard
Currently all processing is handled in single floating point precision. However, the Buffer class is actually a typedef'ed BufferT<float>
, so if a different sample format is required, one can use the BufferT class directly and provide the sample type. While this is not directly interchangeable with public interfaces in the
The standard BufferDynamic*
variants. BufferDynamic
has the methods, setNumFrames()
, setNumChannels()
, and setSize()
(for both frames and channels at the same time), which will realloc the internal data store if required. If the new size (frames * channels) is smaller or equal to the previous size, no reallocation will occur - by default BufferDynamic
will only grow in order to prevent runtime reallocations where possible. If you would like to free up extra space, you can use the provided shrinkToFit()
method.
At the core of the
The
auto ctx = audio::Context::master();
mNode = ctx->makeNode( new NodeType );
There are a couple of important parameters governed by the current Context, which
These parameters are ultimately configured by the Context's OutputNode
(accessible with Context::getOutput()
), which is currently always of type OutputNodeDevice
. This means that the samplerate and frames-per-block settings are governed by your system's hardware settings.
It is worth noting that these values can change at runtime either by the user or system, which will cause all Nodes within a Context to be reconfigured. This should in general just work, though authors of
The
auto ctx = audio::Context::master();
mSine = ctx->makeNode( new audio::GenSineNode );
mGain = ctx->makeNode( new audio::GainNode );
mSine->connect( mGain );
mGain->connect( ctx->getOutput() );
mSine >> mGain >> ctx->getOutput();
To process audio, each NodeEffect
s are enabled by default, NodeInput
s must be turned on before they produce any audio. OutputNode
s are managed by their owning Content
:
mSine->enable();
ctx->enable();
It is important to note that enabling or disabling the Context controls the processing of the entire audio graph - no audio will be processed at all if it is off and 'audio time' will not progress, whether or not an individual
The reason why the above is true is that, although Node::pullInputs()
, which ultimately ends up calling the virtual Node::process()
method with the
Other Node features include:
supportsCycles()
and returns true. The build in Delay
is the primary user of this feature.Related Samples:
The endpoint in an audio graph is currently always
In order to support non device Context's in the future, Context::getOutput()
returns an OutputNode
, the parent class of OutputDeviceNode
. However, you can safely typecast it, which will allow you to get at the audio::DeviceRef
and more information related to your hardware device:
#include "cinder/audio/OutputNode.h"
auto ctx = ci::audio::master();
audio::DeviceRef dev = dynamic_pointer_cast<audio::OutputDeviceNode>( ctx->getOutput() )->getDevice();
ci::app::console() << "device name: " << dev->getName() << endl;
ci::app::console() << "num output channels: " << dev->getNumOutputChannels() << endl;
There is also an easier way to get the default device, with the static audio::Device::getDefaultOutput()
method.
If you need the Context to address a device other than the system default, you must create a LineOut with the appropriate Context::createOutputDeviceNode()
method. You then assign that as the master OutputNode
:
auto ctx = ci::audio::master();
auto device = ci::audio::Device::findDeviceByName( "Internal Speakers" );
ci::audio::OutputDeviceNodeRef output = ctx->createOutputDeviceNode( device );
ctx->setOutput( output );
The device name can be found in your system settings or by iterating the DeviceRef
's returned by Device::getDevices()
and looking at its getName()
property. As an alternative to specifying the device by name, you can use findDeviceByKey()
, which is a platform-agnostic, unique identifier that is internally generated.
If you intend to handle a channel count other than the default stereo pair, you need to create a LineOut and pass in the desired channel count in its optional Node::Format
argument.
auto format = ci::audio::Node::Format().channels( 10 );
ci::audio::OutputDeviceNodeRef output = ctx->createOutputDeviceNode( device, format );
ctx->setOutput( output );
note: Replacing the master Node::uninitialize()
and Node::initialize()
. This is because the Context controls variables that the
OutputDeviceNode
, most importantly being the getDevice()
method, which returns the owned audio::DeviceRef
. As with OutputDeviceNode
, you must create the InputDeviceNode using the platform-specific virtual method Context::createInputDeviceNode()
:
ci::audio::InputDeviceNodeRef input = ctx->createInputDeviceNode();
input >> ctx->getOutput();
input->enable();
The above creates an InputDeviceNode
with the default Device
and default audio::Node::Format
, giving you either stereo channel input or mono if that isn't available, and then connects it directly to the Context's output. As is the case for all InputNode
s (InputDeviceNode
's parent class'), you must then call its
If you want to force mono input, you can specify that with the optional format argument:
auto format = ci::audio::Node::Format().channels( 1 );
ci::audio::InputDeviceNodeRef input = ctx->createInputDeviceNode( Device::getDefaultInput(), format );
Of course, you can also use a non-default Device
, as explained in the section on OutputDeviceNode.
While the above connection simply routes your microphone input the output speakers, you are able to connect any combination of effects in-between. You can also connect the input to a MonitorNode
, which allows you to get the audio samples back on the main thread for visualization and also doesn't require connection to the speaker output. This is explained in more detail below.
There are a couple
To create and connect the MonitorNode
:
mMonitor = ctx->makeNode( new audio::MonitorNode );
mSomeGen >> mMonitor;
And later to get its
const audio::Buffer &buffer = mMonitor->getBuffer();
for( size_t ch = 0; ch < buffer.getNumChannels(); ch++ ) {
for( size_t i = 0; i < buffer.getNumFrames(); i++ ) {
draw sample..
}
}
Note that in the above example, mMonitor
was not connected to the Context::getOutput()
- it doesn't have to because it is a special subclass of
One refers to the slice of samples copied from the audio thread the window. The window size is the number of frames the buffer will have that is returned from getBuffer()
, which is always a power of two and defaults the number of frames per block in the current
auto format = audio::MonitorSpectralNode::Format().windowSize( 4096 );
mMonitor = ctx->makeNode( new audio::MonitorNode( format ) );
If all you need is to get the current volume of the audio stream, you can use the MonitorNode::getVolume()
method, which returns the RMS ('root mean squared') value of the the entire window on a scale of 0 to 1. This is generally a good indication of how loud the signal is.
Whereas MonitorNode
will give you time-domain samples on the main thread, MonitorSpectralNode
is a subclass that will give you frequency-domain samples by way of the Discrete Fourier transform (popularly thought of as the FFT, which for all practical purposes can be thought of as one in the same). This is managed internally using the MonitorSpectralNode
is all you need.
The InputAnalyzer
makes use of this class, along with an InputDeviceNode
as input, to provide a class spectrogram that looks like the following:
To create and connect up the Nodes for this:
void MyApp::setup()
{
mInputDeviceNode = ctx->createInputDeviceNode();
mMonitorSpectralNode = ctx->makeNode( new audio::MonitorSpectralNode() );
mInputDeviceNode >> mMonitorSpectralNode;
}
And later when you want to draw the magnitude spectrum, you get the magnitude spectrum data with:
void MyApp::draw()
{
const vector<float> &magSpectrum = mMonitorSpectralNode->getMagSpectrum();
// draw the vector here...
}
The MonitorSpectralNode::getMagSpectrum()
method first makes a copy of the audio samples, then it computes a forward FFT transform and finally converts to polar coordinates. What you get returned to you is the magnitudes only, the phase part is commonly ignored for visualizing the spectrum. While a full explanation of the DFT is out of scope here (Julius O. Smith's book is a great reference), you interpret the result as an array of bins, where each bin is a frequency range of the decomposed (analyzed) signal and the maximum frequency is the so-called Nyquist (half the samplerate).
Note that whereas in the time domain you receive an audio::Buffer
, getMagSpectrum()
returns a vector of floats regardless of the number of channels it is processing. If the MonitorSpectralNode
has two or more channels, the samples in each channel will first be averaged before the magnitude spectrum is computed. Computing the exact frequency range is simple and well defined:
float binFrequency = binIndex * (float)ctx->getSampleRate() / (float)mMonitorSpectralNode->getFftSize();
You can also the method
Throughout the samples, you'll find that most of the audio drawing is done with utility methods defined in samples/_audio/common/AudioDrawUtils.h, including 2D waveform and spectrum plotters. This is partly so that the samples are short and easier to follow, but also to provide users with a decent starting point to begin drawing audio data in their own applications. To make use of them, you are encouraged to copy these functions to your own project's source directory and modify how you see fit.
Related Samples:
This section explains how to read audio files and play them back within an audio graph, either as a buffer in memory or directly from file.
Audio files are represented by the
audio::SourceFileRef sourceFile = audio::load( loadAsset( "audiofile.wav" ) );
If the file could not be opened for decoding, an
To play the SourceFile within an audio graph, you use one of two flavors of SamplePlayerNode
, the first of which being audio::Buffer
and can be setup like the following:
auto ctx = audio::Context::master();
mBufferPlayer = ctx->makeNode( new audio::BufferPlayerNode() );
mBufferPlayer->loadBuffer( sourceFile );
In contrast, the streaming variant
mFilePlayer = ctx->makeNode( new audio::FilePlayerNode( sourceFile ) );
Which one you use depends on your application. Usually you want to keep small audio source's, such as sound effects, in memory so that they can quickly be accessed without file i/o. The latency is also much less when playing back directly from memory. Longer audio files, such as a soundtrack, are good candidates for reading from disk at playback time, where the latency doesn't matter as much.
Both support reading of file types async; BufferPlayer::loadBuffer
can be done on a background thread, and FilePlayer can be instructed to read from a background thread with an optional boolean argument.
For practical reasons audio::SourceFile
has built-in support for samplerate conversion, so that the samplerate of the audio you get when processing matches that of the current audio::Context
. You can find out what the native samplerate of the file is with
Sample types supported are 16-bit int, 24-bit int, and 32-bit float, all of which will be converted to floating point for processing in an audio graph.
See also:
There is currently basic support for recording audio with the audio::Buffer
, which can then be retrieved to write to disk to use in other creative ways. The following creates and connects a BufferRecorderNode
to record from an InputDeviceNode
, along with a
mInput = ctx->createInputDeviceNode();
mGain = ctx->makeNode( new ci::audio::GainNode( 0.7f ) );
mRecorder = ctx->makeNode( new ci::audio::BufferRecorderNode );
mInputDeviceNode >> mGain >> mMonitorSpectralNode >> mRecorder;
First thing to note is that again, mRecorder is not connected to
mInputDeviceNode >> mGain >> mMonitorSpectralNode;
mGain >> mRecorder;
mGain >> ctx->getOutput();
For optimal performance, you should set the length of the recording buffer before recording (the default is 1 second at a samplerate of 44.1k hertz). This can be done with the constructor or at runtime with
You start recording samples with the
mRecorder->writeToFile( "recorded.wav" );
Currently the only codec supported for encoding is the popular .wav format. Sample types supported are 16-bit int, 24-bit int, and 32-bit float, which is specified by providing a
ChannelRouterNode is used to map the channels between two connected Nodes. This can be useful in multichannel situations, or to split a stereo Node into two mono Nodes. See above for information on how to setup a multichannel OutputDeviceNode.
The following routes a SamplePlayer
to channel 5 of the Context's output (it has already been configured to as a multi-channel output):
auto format = ci::audio::Node::Format().channels( 10 );
auto channelRouter = ctx->makeNode( new audio::ChannelRouterNode( format ) );
mSamplePlayer >> mChannelRouter->route( 0, 5 );
The first argument to ChannelRouterNode::route()
is the input channel index, the second is the output channel index.
If mSamplePlayer happens to be stereo, both channels will be mapped, provided that there are enough channels (starting at the ChannelRouterNode's channel index 5 ) to accomodate. If instead you need to specifically only route a single channel, the route() method can take a third argument to specify the channel count:
mSamplePlayer >> mChannelRouter->route( 0, 5, 1 );
Related Samples:
There are quite a few other
Those in the first group inherit from the base class GenNode
itself inherits from
The remaining
Related Samples:
Many of the ci::Tween
) that are then interpreted at a sample-accurate rate.
For example, the following connects up a simple audio graph containing a GenSineNode
and GainNode
, and then ramps the parameters on each:
auto ctx = audio::master();
auto sine = ctx->makeNode( new audio::GenSineNode( 220 ) );
auto gain = ctx->makeNode( new audio::GainNode( 0 ) );
sine >> gain >> ctx->getOutput();
sine->getParamFreq()->applyRamp( 440, 0.5f );
gain->getParam()->applyRamp( 0.9f, 0.5f );
The above applies linear ramps on the Param
s, though in some cases this will sound unnatural. Instead, you can use a curve function similar to the ci::Timeline
API. The following uses audio::rampOutQuad
to provide a more natural decay for the GainNode
:
auto options = audio::Param::Options().rampFn( &audio::rampOutQuad );
mGain->getParam()->applyRamp( 0, 1.5f, options );
Now that we've covered how
The Voice API sits above and ties into the modular API, which is explained later in this document. Each Voice has a virtual Voice::getInputNode()
member function that returns the NodeRef
). This is actually a subclass of VoiceSamplePlayer
will return a SamplePlayerNode
. To avoid typecasting, the method
auto sourceFile = audio::load( loadResource( "soundfile.wav" ) );
auto voice = cinder::audio::Voice::create( sourceFile );
app::console() << "length of Voice's file: " << voice->getSamplePlayerNode()->getNumSeconds() << endl;
Because the Voice internally manages a chain of NodeRef
is returned because the actual type should be opaque to the user.
By default, a false
to
auto options = audio::Voice::Options().connectToMaster( false );
mVoice = audio::Voice::create( sourceFile, options );
mMonitor = audio::master()->makeNode( new audio::MonitorNode );
mVoice->getOutputNode() >> mMonitor >> audio::master()->getOutput();
The iOS simulator has many problems related to hardware, rendering it quite useless for developing or testing projects with audio. Instead, build for OS X desktop when dev'ing and iOS for testing, the two platforms are nearly identical with respect to audio.
While the samples demonstrate many of the techniques and tools available in a straightforward manner, there are more exhaustive test applications for each of the various components. They are currently organized into platform-specific workspaces:
These are meant to be more for feature and regression testing than anything else, but they may also be a useful way to see the entire breadth of the available functionality.