Cutting Your Teeth on FMOD Part 6: Recording and visualizing sound card output
In this article, we’ll take a look at how to intercept and record the output from a sound card. Our primary focus here will be to create a visualizer that reflects the output sound using the frequency analysis technique discussed in part 4, however you can of course use the recording code for any purpose you wish.
I won’t go into the details of FFT frequency analysis here (see part 4 for that); we’ll just look at how to capture the real-time sound output then provide the visualizer as a usage example.
Why FMOD is annoying
The problem with any kind of analysis or processing of sound in FMOD is that it can only be done on an FMOD::Sound object that is currently playing through your own application on an FMOD::Channel. You can’t take an external sound source and process it. This is obviously a problem. The workaround is the following slightly creative procedure:
- Create an FMOD::Sound which is empty and has short duration
- Configure FMOD to record the sound card output to this sound, wrapping around and overwriting the oldest content every time it fills (using the sound buffer as a circular buffer)
- Play the sound in a loop with a slight lag with the volume set to zero
- Use any processing or analysis functions we would normally use on the sound as it plays
This works because playing the sound in a loop will ensure that the output is the same as the sound card’s output. The latency is required to ensure the sound buffer doesn’t empty at the start, or playback overruns and we end up re-playing bits of old already played sound in the buffer. Playing the sound back silently works because analysis functions are performed on the original waveform before FMOD applies volume adjustments.
This technique is essentially the same as that used in the FMOD API’s PitchBend example, except that records 5 seconds of sound then pitch bends the playback in non-realtime. Note that we cannot modify the actual sound card output using this technique, so we cannot apply DSP effects like echo and reverb for example, but we can analyze the frequency and amplitude characteristics of the audio being played.
The key, of course, is to reduce the latency as much as possible. Using FFT itself increases the latency (and moreso the higher the sample size), so whether this technique is actually useful to you or not depends on how much latency you can actually tolerate in your application. With a bog standard sound card, I found that 60ms was about the best you could achieve. ASIO drivers can improve this significantly for cards that support it.
Recording from the correct source
Sound cards generally have several outputs. To sample the combined (mixed) output, this output has to be enabled in the audio mixer in Control Panel. It will generally be called ‘Stereo Mix’, ‘What U Hear’ or similar. As a side note, some sound drivers are buggy and I found that the What U Hear output was warped and corrupted unless you set the recording frequency to certain values in Control Panel, so be sure to check this all works properly before starting (test card: Creative Audigy 2 ZS in Windows 7 64-bit with default drivers).
FMOD numbers all recording sources starting from zero. You can get the number of available sources like this:
FMOD::System *fmod; ... int recordingSources; fmod->getRecordNumDrivers(&recordingSources);
To get the friendly text name of a source (for example, to allow the user to select the source from a menu), use the following code:
char name[256]; fmod->getRecordDriverInfo(sourceNumber, name, 256, 0);
where sourceNumber is a number from zero to recordingSources – 1 specifying which source to query.
Create a sound buffer for recording into
Define a function (I called mine createSoundBuffer()) which describes a sound to create using FMOD_CREATESOUNDEXINFO and allocates memory for it like this:
// Global variables static int const sampleRate = 44100; static int const channels = 2; FMOD::System *fmod; FMOD::Sound *sound; ... // Function code // Release previous buffer if there is one if (sound != NULL) sound->release(); // Create an empty sound buffer where we can capture the sound card output FMOD_CREATESOUNDEXINFO soundInfo; memset(&soundInfo, 0, sizeof(FMOD_CREATESOUNDEXINFO)); soundInfo.cbsize = sizeof(FMOD_CREATESOUNDEXINFO); // The length of the entire sample in bytes, calculated as: // Sample rate * number of channels * bits per sample per channel * number of seconds soundInfo.length = sampleRate * channels * sizeof(unsigned short) * 0.5; // Number of channels and sample rate soundInfo.numchannels = channels; soundInfo.defaultfrequency = sampleRate; // The sound format (here we use 16-bit signed PCM) soundInfo.format = FMOD_SOUND_FORMAT_PCM16; // Create a user-defined sound with FMOD_SOFTWARE | FMOD_OPENUSER fmod->createSound(0, FMOD_SOFTWARE | FMOD_LOOP_NORMAL | FMOD_OPENUSER, &soundInfo, &sound);
Here we create a buffer that is 0.5 seconds long. The length does not really matter as long as it is enough to record a bit of sound into, but making it too long is a waste of memory and depending on how you write the code, may introduce lag of up to the length of the sound buffer if you choose to pause and then resume recording (the examples here do not suffer this problem).
We create the sound with the FMOD_SOFTWARE mixing flag because only sounds with this flag can be analyzed. Note that the software mixing incurs a small additional load penalty on the CPU.
If, instead of analysis, you just want to save the sound to a file or record for a pre-defined amount of time with no looping, simply change the number of seconds above to the maximum amount of time you want to record and remove FMOD_LOOP_NORMAL from the create flags.
Start the recording
Once again define a function (I called mine startCapture()) to initiate the recording as follows. Note the call to createSoundBuffer() runs the code above to create a new sound buffer first:
// Global variables FMOD::Channel *channel; int recordDriver; ... // Function code // Create sound recording buffer createSoundBuffer(); // Start recording sound card output into empty sound, looping back to the start // and over-writing the oldest data when the sound is full fmod->recordStart(recordDriver, sound, true); // Wait so something is recorded (this figure will introduce lag!) Sleep(60); // Start playing the recorded sound back, silently, so we can use its // channel to get the FFT data. The frequency analysis is done before the // volume is adjusted so it doesn't matter that we are playing back silently. fmod->playSound(FMOD_CHANNEL_FREE, sound, false, &channel); channel->setVolume(0);
This should be fairly self-explanatory. The key line is the call to recordStart which records from the source number recordDriver into sound. As explained above, we wait briefly for recording to start and then play back the sound, looping, with the volume set to zero. We save the playback channel into channel so that we can analyze the sound later.
Note that recordStart is an asynchronous (non-blocking) call, ie. it returns immediately. It does not stall while recording indefinitely. Recording instead begins in the background in another thread, and your application can continue to execute normally.
Stop the recording
You can halt the recording and free the memory allocated to the sound buffer as follows:
// Stop silent playback channel->stop(); // Stop recording fmod->recordStop(recordDriver); // Free sound recording buffer sound->release(); sound = NULL;
Analyzing the sound
Now we have a – albeit slightly lagged – captured copy of the sound card output being played through our own sound object. We can then perform an FFT analysis in our per-frame update code just as in part 4. Remember to call System::FMOD::update() each frame so that the fetched data is up to date.
// Update FMOD fmod->update(); // Frequency analysis float *specLeft, *specRight, *spec; spec = new float[sampleSize]; specLeft = new float[sampleSize]; specRight = new float[sampleSize]; // Get average spectrum for left and right stereo channels channel->getSpectrum(specLeft, sampleSize, 0, FMOD_DSP_FFT_WINDOW_RECT); channel->getSpectrum(specRight, sampleSize, 1, FMOD_DSP_FFT_WINDOW_RECT); for (int i = 0; i < sampleSize; i++) spec[i] = (specLeft[i] + specRight[i]) / 2; ... do whatever you want with this data ... // Clean up delete [] spec; delete [] specLeft; delete [] specRight;
This is exactly the same as the code in part 4 of our series.
Complete example
The following example produces the same output as that in part 4 (VU bars and numbers showing the volume of each frequency range), but it uses the sound card output as the source rather than an MP3 that we play ourselves. Try it with Spotify!
By pressing S you can cycle through all of the available recording sources. Pressing 1 and 2 decrease and increase the FFT sample size respectively.
The example also performs the beat detection and BPM estimation shown in part 4, which as you will see does not work very well on arbitrary sounds.
The code uses my SimpleFMOD library, but only for initialization so it is easily adapted, and Simple2D just for the rendering of the VU bars. All of the principle code you need to perform the steps above is included verbatim.
// FMOD Frequency Analysis demo // Written by Katy Coe (c) 2013 // No unauthorized copying or distribution // www.djkaty.com #include "../SimpleFMOD/SimpleFMOD.h" #include "Simple2D.h" #include <queue> using namespace SFMOD; using namespace S2D; class LiveFrequencyAnalysis : public Simple2D { public: // Sample rate static int const sampleRate = 44100; // Number of channels to sample static int const channels = 2; private: // FMOD SimpleFMOD fmod; FMOD::Sound *sound; FMOD::Channel *channel; // Sound card recording source int recordDriver; // Number of recording sources available on user's system int recordingSources; // Create sound buffer void createSoundBuffer(); // Start/stop recording sound from sound card void startCapture(); void stopCapture(); // Graphics TextFormat freqTextFormat; Gradient freqGradient; // FFT sample size int sampleSize; // Beat detection parameters float beatThresholdVolume; int beatThresholdBar; unsigned int beatSustain; unsigned int beatPostIgnore; int beatLastTick; int beatIgnoreLastTick; // List of how many ms ago the last beats were std::queue<int> beatTimes; unsigned int beatTrackCutoff; // When the music was last unpaused int musicStartTick; public: LiveFrequencyAnalysis(Simple2DStartupInfo); void DrawScene(); virtual bool OnKeyCharacter(int, int, bool, bool); }; // Initialize application LiveFrequencyAnalysis::LiveFrequencyAnalysis(Simple2DStartupInfo si) : Simple2D(si) { // Make paintbrushes freqTextFormat = MakeTextFormat(L"Verdana", 10.0f); freqGradient = MakeBrush(Colour::Green, Colour::Red); // Set FFT parameters sampleSize = 64; // Set beat detection parameters beatThresholdVolume = 0.4f; beatThresholdBar = 0; beatSustain = 100; beatPostIgnore = 300; beatTrackCutoff = 10000; beatLastTick = 0; beatIgnoreLastTick = 0; musicStartTick = 0; // Recording from sound card // Get number of recording sources fmod.FMOD()->getRecordNumDrivers(&recordingSources); // Select default source sound = NULL; recordDriver = 0; // Start capturing startCapture(); } void LiveFrequencyAnalysis::startCapture() { // Create sound recording buffer createSoundBuffer(); // Start recording sound card output into empty sound, looping back to the start // and over-writing the oldest data when the sound is full fmod.FMOD()->recordStart(recordDriver, sound, true); // Wait so something is recorded (this figure will introduce lag!) Sleep(60); // Start playing the recorded sound back, silently, so we can use its // channel to get the FFT data. The frequency analysis is done before the // volume is adjusted so it doesn't matter that we are playing back silently. fmod.FMOD()->playSound(FMOD_CHANNEL_FREE, sound, false, &channel); channel->setVolume(0); // Reset beat detection data musicStartTick = GetTickCount(); beatTimes.empty(); } void LiveFrequencyAnalysis::stopCapture() { // Stop silent playback channel->stop(); // Stop recording fmod.FMOD()->recordStop(recordDriver); // Free sound recording buffer sound->release(); sound = NULL; } void LiveFrequencyAnalysis::createSoundBuffer() { // Release previous buffer if there is one if (sound != NULL) sound->release(); // Create an empty sound buffer where we can capture the sound card output FMOD_CREATESOUNDEXINFO soundInfo; memset(&soundInfo, 0, sizeof(FMOD_CREATESOUNDEXINFO)); soundInfo.cbsize = sizeof(FMOD_CREATESOUNDEXINFO); // The length of the entire sample in bytes, calculated as: // Sample rate * number of channels * bits per sample per channel * number of seconds soundInfo.length = sampleRate * channels * sizeof(unsigned short) * 0.5; // Number of channels and sample rate soundInfo.numchannels = channels; soundInfo.defaultfrequency = sampleRate; // The sound format (here we use 16-bit signed PCM) soundInfo.format = FMOD_SOUND_FORMAT_PCM16; // Create a user-defined sound with FMOD_SOFTWARE | FMOD_OPENUSER fmod.FMOD()->createSound(0, FMOD_SOFTWARE | FMOD_LOOP_NORMAL | FMOD_OPENUSER, &soundInfo, &sound); } // Handle keypresses bool LiveFrequencyAnalysis::OnKeyCharacter(int key, int rc, bool prev, bool trans) { // Decrease FFT sample size if (key == '1') sampleSize = max(sampleSize / 2, 64); // Increase FFT sample size if (key == '2') sampleSize = min(sampleSize * 2, 8192); // Change recording source if (key == 'S' || key == 's') { stopCapture(); // Change source recordDriver = (recordDriver + 1) % recordingSources; startCapture(); } return true; } // Per-frame code void LiveFrequencyAnalysis::DrawScene() { // Update FMOD fmod.Update(); // Frequency analysis float *specLeft, *specRight, *spec; spec = new float[sampleSize]; specLeft = new float[sampleSize]; specRight = new float[sampleSize]; // Get average spectrum for left and right stereo channels channel->getSpectrum(specLeft, sampleSize, 0, FMOD_DSP_FFT_WINDOW_RECT); channel->getSpectrum(specRight, sampleSize, 1, FMOD_DSP_FFT_WINDOW_RECT); for (int i = 0; i < sampleSize; i++) spec[i] = (specLeft[i] + specRight[i]) / 2; // Find max volume auto maxIterator = std::max_element(&spec[0], &spec[sampleSize]); float maxVol = *maxIterator; // Find frequency range of each array item float hzRange = (44100 / 2) / static_cast<float>(sampleSize); // Detect beat if normalization disabled if (spec[beatThresholdBar] >= beatThresholdVolume && beatLastTick == 0 && beatIgnoreLastTick == 0) { beatLastTick = GetTickCount(); beatTimes.push(beatLastTick); while(GetTickCount() - beatTimes.front() > beatTrackCutoff) { beatTimes.pop(); if (beatTimes.size() == 0) break; } } if (GetTickCount() - beatLastTick < beatSustain) Text(100, 220, "BEAT", Colour::White, MakeTextFormat(L"Verdana", 48.0f)); else if (beatIgnoreLastTick == 0 && beatLastTick != 0) { beatLastTick = 0; beatIgnoreLastTick = GetTickCount(); } if (GetTickCount() - beatIgnoreLastTick >= beatPostIgnore) beatIgnoreLastTick = 0; // Predict BPM float msPerBeat, bpmEstimate; if (beatTimes.size() >= 2) { msPerBeat = (beatTimes.back() - beatTimes.front()) / static_cast<float>(beatTimes.size() - 1); bpmEstimate = 60000 / msPerBeat; } else bpmEstimate = 0; // Draw display char name[256]; fmod.FMOD()->getRecordDriverInfo(recordDriver, name, 256, 0); Text(10, 10, "Analyzing source " + StringFactory(recordDriver) + ": " + name, Colour::White, MakeTextFormat(L"Verdana", 14.0f)); Text(10, 30, "Press 1 and 2 to adjust FFT size, S to change source", Colour::White, MakeTextFormat(L"Verdana", 14.0f)); Text(10, 50, "Sample size: " + StringFactory(sampleSize) + " - Range per sample: " + StringFactory(hzRange) + "Hz - Max vol this frame: " + StringFactory(maxVol), Colour::White, MakeTextFormat(L"Verdana", 14.0f)); // BPM estimation if (GetTickCount() - musicStartTick >= beatTrackCutoff && musicStartTick != 0) Text(10, ResolutionY - 20, "Estimated BPM: " + StringFactory(bpmEstimate) + " (last " + StringFactory(beatTrackCutoff / 1000) + " seconds)", Colour::White, MakeTextFormat(L"Verdana", 14.0f)); else if (musicStartTick != 0) Text(10, ResolutionY - 20, "Estimated BPM: calculating for next " + StringFactory(beatTrackCutoff - (GetTickCount() - musicStartTick)) + " ms", Colour::White, MakeTextFormat(L"Verdana", 14.0f)); else Text(10, ResolutionY - 20, "Paused", Colour::White, MakeTextFormat(L"Verdana", 14.0f)); // Numerical FFT display int nPerRow = 16; for (int y = 0; y < sampleSize / nPerRow; y++) for (int x = 0; x < nPerRow; x++) Text(x * 40 + 10, y * 20 + 80, StringFactory(floor(spec[y * nPerRow + x] * 1000)), Colour::White, freqTextFormat); // VU bars int blockGap = 4 / (sampleSize / 64); int blockWidth = static_cast<int>((static_cast<float>(ResolutionX) * 0.8f) / static_cast<float>(sampleSize) - blockGap); int blockMaxHeight = 200; for (int b = 0; b < sampleSize - 1; b++) FillRectangleWH(static_cast<int>(ResolutionX * 0.1f + (blockWidth + blockGap) * b), ResolutionY - 50, blockWidth, static_cast<int>(-blockMaxHeight * spec[b]), freqGradient); // Clean up delete [] spec; delete [] specLeft; delete [] specRight; } void Simple2DStart() { Simple2DStartupInfo si; si.WindowName = "FMOD Frequency Analysis from Sound Card"; si.BackgroundColour = D2D1::ColorF(Colour::Black); si.ResizableWindow = false; LiveFrequencyAnalysis(si).Run(); }
I hope you found this tutorial useful! Until next time.
A very useful article! Thanks!!!
You’re welcome 🙂
Could you recommend some lag-less ways known to you to do the same as you have described above? Other sound engines or even languages are fine. I’m just curious and ready to do some heavier programming in order to get the almost perfect timing, just for the science ^^
The most basic way I know is to use the Windows ACM API directly to directly record the sound card output without having to replay it silently on an output channel as it records. ACM doesn’t include FFT (spectrum analysis) though so you’d need to implement this yourself or pilfer the various free source code examples on the web to do it.
Katy.
If you are doing your own analysis I don’t think you need to pipe the sample through play. See this post: http://www.fmod.org/questions/question/forum-41005
Your posts are very interesting and rare, useful and fun to read. I am looking for a way to read Pattern, Instrument, and Sample information from .mod, .s3m, .xm, and .it files. FMOD can play these formats but I don’t if it also provides a way to access to all those other information, and I thought you may know it. Thanks.
As far as I know FMOD just converts them internally into normal audio waveforms as they’re played and doesn’t make the instrument or pattern data available to the client. I’m not 100% sure though.
Thanks for your answer :), I could finally access to some data through getTag(), data like patterns is not available to the client.
Really nice!!! Thank you so much, the recording example on fmodex api is not really precise…
You’re tutorial is going to be pretty useful for me and my friends, we’re working on a live frequency analysis program : we have to create a program that generates 3d particles according to a musician improvisation. We were looking for a live frequency analysis program, and you’re tutorial is like a miracle haha
Best regards from french IMAC School!