Home > FMOD > Cutting Your Teeth on FMOD Part 6: Recording and visualizing sound card output

Cutting Your Teeth on FMOD Part 6: Recording and visualizing sound card output

November 24, 2013 Leave a comment Go to comments

In this article, we’ll take a look at how to intercept and record the output from a sound card. Our primary focus here will be to create a visualizer that reflects the output sound using the frequency analysis technique discussed in part 4, however you can of course use the recording code for any purpose you wish.

I won’t go into the details of FFT frequency analysis here (see part 4 for that); we’ll just look at how to capture the real-time sound output then provide the visualizer as a usage example.

Why FMOD is annoying

The problem with any kind of analysis or processing of sound in FMOD is that it can only be done on an FMOD::Sound object that is currently playing through your own application on an FMOD::Channel. You can’t take an external sound source and process it. This is obviously a problem. The workaround is the following slightly creative procedure:

  1. Create an FMOD::Sound which is empty and has short duration
  2. Configure FMOD to record the sound card output to this sound, wrapping around and overwriting the oldest content every time it fills (using the sound buffer as a circular buffer)
  3. Play the sound in a loop with a slight lag with the volume set to zero
  4. Use any processing or analysis functions we would normally use on the sound as it plays

This works because playing the sound in a loop will ensure that the output is the same as the sound card’s output. The latency is required to ensure the sound buffer doesn’t empty at the start, or playback overruns and we end up re-playing bits of old already played sound in the buffer. Playing the sound back silently works because analysis functions are performed on the original waveform before FMOD applies volume adjustments.

This technique is essentially the same as that used in the FMOD API’s PitchBend example, except that records 5 seconds of sound then pitch bends the playback in non-realtime. Note that we cannot modify the actual sound card output using this technique, so we cannot apply DSP effects like echo and reverb for example, but we can analyze the frequency and amplitude characteristics of the audio being played.

The key, of course, is to reduce the latency as much as possible. Using FFT itself increases the latency (and moreso the higher the sample size), so whether this technique is actually useful to you or not depends on how much latency you can actually tolerate in your application. With a bog standard sound card, I found that 60ms was about the best you could achieve. ASIO drivers can improve this significantly for cards that support it.

Recording from the correct source

Sound cards generally have several outputs. To sample the combined (mixed) output, this output has to be enabled in the audio mixer in Control Panel. It will generally be called ‘Stereo Mix’, ‘What U Hear’ or similar. As a side note, some sound drivers are buggy and I found that the What U Hear output was warped and corrupted unless you set the recording frequency to certain values in Control Panel, so be sure to check this all works properly before starting (test card: Creative Audigy 2 ZS in Windows 7 64-bit with default drivers).

FMOD numbers all recording sources starting from zero. You can get the number of available sources like this:

FMOD::System *fmod;
...
int recordingSources;
fmod->getRecordNumDrivers(&recordingSources);

To get the friendly text name of a source (for example, to allow the user to select the source from a menu), use the following code:

char name[256];
fmod->getRecordDriverInfo(sourceNumber, name, 256, 0);

where sourceNumber is a number from zero to recordingSources – 1 specifying which source to query.

Create a sound buffer for recording into

Define a function (I called mine createSoundBuffer()) which describes a sound to create using FMOD_CREATESOUNDEXINFO and allocates memory for it like this:

// Global variables

static int const sampleRate = 44100;
static int const channels = 2;

FMOD::System *fmod;
FMOD::Sound *sound;

...
// Function code

// Release previous buffer if there is one
if (sound != NULL)
	sound->release();

// Create an empty sound buffer where we can capture the sound card output
FMOD_CREATESOUNDEXINFO soundInfo;

memset(&soundInfo, 0, sizeof(FMOD_CREATESOUNDEXINFO));

soundInfo.cbsize			= sizeof(FMOD_CREATESOUNDEXINFO);

// The length of the entire sample in bytes, calculated as:
// Sample rate * number of channels * bits per sample per channel * number of seconds
soundInfo.length			= sampleRate * channels * sizeof(unsigned short) * 0.5;

// Number of channels and sample rate
soundInfo.numchannels		= channels;
soundInfo.defaultfrequency	= sampleRate;

// The sound format (here we use 16-bit signed PCM)
soundInfo.format			= FMOD_SOUND_FORMAT_PCM16;

// Create a user-defined sound with FMOD_SOFTWARE | FMOD_OPENUSER
fmod->createSound(0, FMOD_SOFTWARE | FMOD_LOOP_NORMAL | FMOD_OPENUSER, &soundInfo, &sound);

Here we create a buffer that is 0.5 seconds long. The length does not really matter as long as it is enough to record a bit of sound into, but making it too long is a waste of memory and depending on how you write the code, may introduce lag of up to the length of the sound buffer if you choose to pause and then resume recording (the examples here do not suffer this problem).

We create the sound with the FMOD_SOFTWARE mixing flag because only sounds with this flag can be analyzed. Note that the software mixing incurs a small additional load penalty on the CPU.

If, instead of analysis, you just want to save the sound to a file or record for a pre-defined amount of time with no looping, simply change the number of seconds above to the maximum amount of time you want to record and remove FMOD_LOOP_NORMAL from the create flags.

Start the recording

Once again define a function (I called mine startCapture()) to initiate the recording as follows. Note the call to createSoundBuffer() runs the code above to create a new sound buffer first:

// Global variables

FMOD::Channel *channel;
int recordDriver;

...
// Function code

// Create sound recording buffer
createSoundBuffer();

// Start recording sound card output into empty sound, looping back to the start
// and over-writing the oldest data when the sound is full
fmod->recordStart(recordDriver, sound, true);

// Wait so something is recorded (this figure will introduce lag!)
Sleep(60);

// Start playing the recorded sound back, silently, so we can use its
// channel to get the FFT data. The frequency analysis is done before the
// volume is adjusted so it doesn't matter that we are playing back silently.
fmod->playSound(FMOD_CHANNEL_FREE, sound, false, &channel);
channel->setVolume(0);

This should be fairly self-explanatory. The key line is the call to recordStart which records from the source number recordDriver into sound. As explained above, we wait briefly for recording to start and then play back the sound, looping, with the volume set to zero. We save the playback channel into channel so that we can analyze the sound later.

Note that recordStart is an asynchronous (non-blocking) call, ie. it returns immediately. It does not stall while recording indefinitely. Recording instead begins in the background in another thread, and your application can continue to execute normally.

Stop the recording

You can halt the recording and free the memory allocated to the sound buffer as follows:

// Stop silent playback
channel->stop();

// Stop recording
fmod->recordStop(recordDriver);

// Free sound recording buffer
sound->release();
sound = NULL;

Analyzing the sound

Now we have a – albeit slightly lagged – captured copy of the sound card output being played through our own sound object. We can then perform an FFT analysis in our per-frame update code just as in part 4. Remember to call System::FMOD::update() each frame so that the fetched data is up to date.

// Update FMOD
fmod->update();

// Frequency analysis
float *specLeft, *specRight, *spec;
spec = new float[sampleSize];
specLeft = new float[sampleSize];
specRight = new float[sampleSize];

// Get average spectrum for left and right stereo channels
channel->getSpectrum(specLeft, sampleSize, 0, FMOD_DSP_FFT_WINDOW_RECT);
channel->getSpectrum(specRight, sampleSize, 1, FMOD_DSP_FFT_WINDOW_RECT);

for (int i = 0; i < sampleSize; i++)
	spec[i] = (specLeft[i] + specRight[i]) / 2;

... do whatever you want with this data ...

// Clean up
delete [] spec;
delete [] specLeft;
delete [] specRight;

This is exactly the same as the code in part 4 of our series.

Complete example

The following example produces the same output as that in part 4 (VU bars and numbers showing the volume of each frequency range), but it uses the sound card output as the source rather than an MP3 that we play ourselves. Try it with Spotify!

By pressing S you can cycle through all of the available recording sources. Pressing 1 and 2 decrease and increase the FFT sample size respectively.

The example also performs the beat detection and BPM estimation shown in part 4, which as you will see does not work very well on arbitrary sounds.

The code uses my SimpleFMOD library, but only for initialization so it is easily adapted, and Simple2D just for the rendering of the VU bars. All of the principle code you need to perform the steps above is included verbatim.

// FMOD Frequency Analysis demo
// Written by Katy Coe (c) 2013
// No unauthorized copying or distribution
// www.djkaty.com

#include "../SimpleFMOD/SimpleFMOD.h"
#include "Simple2D.h"

#include <queue>

using namespace SFMOD;
using namespace S2D;

class LiveFrequencyAnalysis : public Simple2D
{
public:
	// Sample rate
	static int const sampleRate = 44100;

	// Number of channels to sample
	static int const channels = 2;

private:
	// FMOD
	SimpleFMOD fmod;
	FMOD::Sound *sound;
	FMOD::Channel *channel;

	// Sound card recording source
	int recordDriver;

	// Number of recording sources available on user's system
	int recordingSources;

	// Create sound buffer
	void createSoundBuffer();

	// Start/stop recording sound from sound card
	void startCapture();
	void stopCapture();

	// Graphics
	TextFormat freqTextFormat;
	Gradient freqGradient;

	// FFT sample size
	int sampleSize;

	// Beat detection parameters
	float beatThresholdVolume;
	int beatThresholdBar;
	unsigned int beatSustain;
	unsigned int beatPostIgnore;

	int beatLastTick;
	int beatIgnoreLastTick;

	// List of how many ms ago the last beats were
	std::queue<int> beatTimes;
	unsigned int beatTrackCutoff;

	// When the music was last unpaused
	int musicStartTick;

public:
	LiveFrequencyAnalysis(Simple2DStartupInfo);
	void DrawScene();

	virtual bool OnKeyCharacter(int, int, bool, bool);
};

// Initialize application
LiveFrequencyAnalysis::LiveFrequencyAnalysis(Simple2DStartupInfo si) : Simple2D(si)
{
	// Make paintbrushes
	freqTextFormat = MakeTextFormat(L"Verdana", 10.0f);
	freqGradient = MakeBrush(Colour::Green, Colour::Red);

	// Set FFT parameters
	sampleSize = 64;

	// Set beat detection parameters
	beatThresholdVolume = 0.4f;
	beatThresholdBar = 0;
	beatSustain = 100;
	beatPostIgnore = 300;
	beatTrackCutoff = 10000;

	beatLastTick = 0;
	beatIgnoreLastTick = 0;

	musicStartTick = 0;

	// Recording from sound card

	// Get number of recording sources
	fmod.FMOD()->getRecordNumDrivers(&recordingSources);

	// Select default source
	sound = NULL;
	recordDriver = 0;

	// Start capturing
	startCapture();
}

void LiveFrequencyAnalysis::startCapture()
{
	// Create sound recording buffer
	createSoundBuffer();

	// Start recording sound card output into empty sound, looping back to the start
	// and over-writing the oldest data when the sound is full
	fmod.FMOD()->recordStart(recordDriver, sound, true);

	// Wait so something is recorded (this figure will introduce lag!)
	Sleep(60);

	// Start playing the recorded sound back, silently, so we can use its
	// channel to get the FFT data. The frequency analysis is done before the
	// volume is adjusted so it doesn't matter that we are playing back silently.
	fmod.FMOD()->playSound(FMOD_CHANNEL_FREE, sound, false, &channel);
	channel->setVolume(0);

	// Reset beat detection data
	musicStartTick = GetTickCount();
	beatTimes.empty();
}

void LiveFrequencyAnalysis::stopCapture()
{
	// Stop silent playback
	channel->stop();

	// Stop recording
	fmod.FMOD()->recordStop(recordDriver);

	// Free sound recording buffer
	sound->release();
	sound = NULL;
}

void LiveFrequencyAnalysis::createSoundBuffer()
{
	// Release previous buffer if there is one
	if (sound != NULL)
		sound->release();

	// Create an empty sound buffer where we can capture the sound card output
	FMOD_CREATESOUNDEXINFO soundInfo;

	memset(&soundInfo, 0, sizeof(FMOD_CREATESOUNDEXINFO));

	soundInfo.cbsize			= sizeof(FMOD_CREATESOUNDEXINFO);

	// The length of the entire sample in bytes, calculated as:
	// Sample rate * number of channels * bits per sample per channel * number of seconds
	soundInfo.length			= sampleRate * channels * sizeof(unsigned short) * 0.5;

	// Number of channels and sample rate
	soundInfo.numchannels		= channels;
	soundInfo.defaultfrequency	= sampleRate;

	// The sound format (here we use 16-bit signed PCM)
	soundInfo.format			= FMOD_SOUND_FORMAT_PCM16;

	// Create a user-defined sound with FMOD_SOFTWARE | FMOD_OPENUSER
	fmod.FMOD()->createSound(0, FMOD_SOFTWARE | FMOD_LOOP_NORMAL | FMOD_OPENUSER, &soundInfo, &sound);
}

// Handle keypresses
bool LiveFrequencyAnalysis::OnKeyCharacter(int key, int rc, bool prev, bool trans)
{
	// Decrease FFT sample size
	if (key == '1')
		sampleSize = max(sampleSize / 2, 64);

	// Increase FFT sample size
	if (key == '2')
		sampleSize = min(sampleSize * 2, 8192);

	// Change recording source
	if (key == 'S' || key == 's')
	{
		stopCapture();

		// Change source
		recordDriver = (recordDriver + 1) % recordingSources;

		startCapture();
	}
	return true;
}

// Per-frame code
void LiveFrequencyAnalysis::DrawScene()
{
	// Update FMOD
	fmod.Update();

	// Frequency analysis
	float *specLeft, *specRight, *spec;
	spec = new float[sampleSize];
	specLeft = new float[sampleSize];
	specRight = new float[sampleSize];

	// Get average spectrum for left and right stereo channels
	channel->getSpectrum(specLeft, sampleSize, 0, FMOD_DSP_FFT_WINDOW_RECT);
	channel->getSpectrum(specRight, sampleSize, 1, FMOD_DSP_FFT_WINDOW_RECT);

	for (int i = 0; i < sampleSize; i++)
		spec[i] = (specLeft[i] + specRight[i]) / 2;

	// Find max volume
	auto maxIterator = std::max_element(&spec[0], &spec[sampleSize]);
	float maxVol = *maxIterator;

	// Find frequency range of each array item
	float hzRange = (44100 / 2) / static_cast<float>(sampleSize);

	// Detect beat if normalization disabled
	if (spec[beatThresholdBar] >= beatThresholdVolume && beatLastTick == 0 && beatIgnoreLastTick == 0)
	{
		beatLastTick = GetTickCount();
		beatTimes.push(beatLastTick);

		while(GetTickCount() - beatTimes.front() > beatTrackCutoff)
		{
			beatTimes.pop();
			if (beatTimes.size() == 0)
				break;
		}
	}

	if (GetTickCount() - beatLastTick < beatSustain)
		Text(100, 220, "BEAT", Colour::White, MakeTextFormat(L"Verdana", 48.0f));

	else if (beatIgnoreLastTick == 0 && beatLastTick != 0)
	{
		beatLastTick = 0;
		beatIgnoreLastTick = GetTickCount();
	}

	if (GetTickCount() - beatIgnoreLastTick >= beatPostIgnore)
		beatIgnoreLastTick = 0;

	// Predict BPM
	float msPerBeat, bpmEstimate;

	if (beatTimes.size() >= 2)
	{
		msPerBeat = (beatTimes.back() - beatTimes.front()) / static_cast<float>(beatTimes.size() - 1);
		bpmEstimate = 60000 / msPerBeat;
	}
	else
		bpmEstimate = 0;

	// Draw display
    char name[256];
    fmod.FMOD()->getRecordDriverInfo(recordDriver, name, 256, 0);

	Text(10, 10, "Analyzing source " + StringFactory(recordDriver) + ": " + name, Colour::White, MakeTextFormat(L"Verdana", 14.0f));

	Text(10, 30, "Press 1 and 2 to adjust FFT size, S to change source", Colour::White, MakeTextFormat(L"Verdana", 14.0f));

	Text(10, 50, "Sample size: " + StringFactory(sampleSize) + "  -  Range per sample: " + StringFactory(hzRange) + "Hz  -  Max vol this frame: " + StringFactory(maxVol), Colour::White, MakeTextFormat(L"Verdana", 14.0f));

	// BPM estimation
	if (GetTickCount() - musicStartTick >= beatTrackCutoff && musicStartTick != 0)
		Text(10, ResolutionY - 20, "Estimated BPM: " + StringFactory(bpmEstimate) + " (last " + StringFactory(beatTrackCutoff / 1000) + " seconds)", Colour::White, MakeTextFormat(L"Verdana", 14.0f));
	else if (musicStartTick != 0)
		Text(10, ResolutionY - 20, "Estimated BPM: calculating for next " + StringFactory(beatTrackCutoff - (GetTickCount() - musicStartTick)) + " ms", Colour::White, MakeTextFormat(L"Verdana", 14.0f));
	else
		Text(10, ResolutionY - 20, "Paused", Colour::White, MakeTextFormat(L"Verdana", 14.0f));

	// Numerical FFT display
	int nPerRow = 16;

	for (int y = 0; y < sampleSize / nPerRow; y++)
		for (int x = 0; x < nPerRow; x++)
			Text(x * 40 + 10, y * 20 + 80, StringFactory(floor(spec[y * nPerRow + x] * 1000)), Colour::White, freqTextFormat);

	// VU bars
	int blockGap = 4 / (sampleSize / 64);
	int blockWidth = static_cast<int>((static_cast<float>(ResolutionX) * 0.8f) / static_cast<float>(sampleSize) - blockGap);
	int blockMaxHeight = 200;

	for (int b = 0; b < sampleSize - 1; b++)
		FillRectangleWH(static_cast<int>(ResolutionX * 0.1f + (blockWidth + blockGap) * b),
						ResolutionY - 50,
						blockWidth,
						static_cast<int>(-blockMaxHeight * spec[b]),
						freqGradient);

	// Clean up
	delete [] spec;
	delete [] specLeft;
	delete [] specRight;
}

void Simple2DStart()
{
	Simple2DStartupInfo si;
	si.WindowName = "FMOD Frequency Analysis from Sound Card";
	si.BackgroundColour = D2D1::ColorF(Colour::Black);
	si.ResizableWindow = false;

	LiveFrequencyAnalysis(si).Run();
}

I hope you found this tutorial useful! Until next time.

  1. Ajit
    December 14, 2013 at 13:11

    A very useful article! Thanks!!!

  2. February 14, 2014 at 11:31

    Could you recommend some lag-less ways known to you to do the same as you have described above? Other sound engines or even languages are fine. I’m just curious and ready to do some heavier programming in order to get the almost perfect timing, just for the science ^^

    • March 19, 2014 at 17:55

      The most basic way I know is to use the Windows ACM API directly to directly record the sound card output without having to replay it silently on an output channel as it records. ACM doesn’t include FFT (spectrum analysis) though so you’d need to implement this yourself or pilfer the various free source code examples on the web to do it.

      Katy.

  3. Josep Llodrà
    March 23, 2014 at 18:00

    Your posts are very interesting and rare, useful and fun to read. I am looking for a way to read Pattern, Instrument, and Sample information from .mod, .s3m, .xm, and .it files. FMOD can play these formats but I don’t if it also provides a way to access to all those other information, and I thought you may know it. Thanks.

    • March 28, 2014 at 13:26

      As far as I know FMOD just converts them internally into normal audio waveforms as they’re played and doesn’t make the instrument or pattern data available to the client. I’m not 100% sure though.

      • March 28, 2014 at 13:46

        Thanks for your answer :), I could finally access to some data through getTag(), data like patterns is not available to the client.

  4. supersmash94
    October 25, 2014 at 23:50

    Really nice!!! Thank you so much, the recording example on fmodex api is not really precise…
    You’re tutorial is going to be pretty useful for me and my friends, we’re working on a live frequency analysis program : we have to create a program that generates 3d particles according to a musician improvisation. We were looking for a live frequency analysis program, and you’re tutorial is like a miracle haha
    Best regards from french IMAC School!

  1. November 24, 2013 at 15:23
  2. November 24, 2013 at 15:25

Share your thoughts! Note: to post source code, enclose it in [code lang=...] [/code] tags. Valid values for 'lang' are cpp, csharp, xml, javascript, php etc. To post compiler errors or other text that is best read monospaced, use 'text' as the value for lang.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: