Home > FMOD > Cutting Your Teeth on FMOD Part 4: Frequency Analysis, Graphic Equalizer, Beat Detection and BPM Estimation

Cutting Your Teeth on FMOD Part 4: Frequency Analysis, Graphic Equalizer, Beat Detection and BPM Estimation

January 16, 2013 Leave a comment Go to comments

In this article, we shall look at how to do a frequency analysis of a playing song in FMOD and learn how to use this as the basis for a variety of useful applications.

What is frequency analysis?

Note: The SimpleFMOD library contains all of the source code and pre-compiled executables for the examples in this series.

Frequency analysis is where we take some playing audio and identify the volume level of each range of frequencies being played. In a simple example, this lets us identify the current volume of the bass, mid-range and treble in a song individually, or any other desired range of frequencies.

Stuff you don’t really need to know: The analysis is done using a process called Fast Fourier Transforms (or FFT), which looks back in time at all the recently played frequencies to build up a picture of the volume of each. The FFT covers the whole spectrum up to the sample rate of the song (typically 44.1kHz) – or specifically, the so-called nyquist rate of the song (half of the actual sample rate – the highest frequency which can be measured for the audio), but you can specify how many equal-sized ranges to break this up into. The number of ranges is known as the sample size. So, for example, a sample size of 100 on a song sampled at 44.1kHz will produce a bucket of 100 ranges, each covering 220.5Hz (that’s half the sample rate, divided by the sample size, ie. (44100 / 2) / 100). Therefore, the higher the sample size, the more accurate the measurement, but at the cost of lag since the FFT algorithm must search further back in time from the current playback position the more samples are taken into account.

Luckily, FMOD takes care of all this for you and all you need to do is tell it what sample size you want and it will return a float array containing a breakdown of the volume of each frequency range.

What is frequency analysis for?

Frequency analysis is the lowest level audio processing that must be performed to enable various other functionality. For example, you can use the resulting data to detect when there is a beat in a song (or other specific simple sound types). Here I am mostly concerned with video games and graphics, and it is a quite common effect to sync on-screen effects with the beat of a song. Using beat detection lets us trigger these effects in time with the music.

Once you have beat detection, you can then use the timing information of each beat to estimate the bpm (beats per minute) of the song. While this is generally unimportant for games, it is essential for high-level audio processing applications such as DJ tools which alter the bpm of two songs so they can be mixed (crossfaded) without a break in the music.

In this article, we will mostly look at how to use beat detection to trigger on-screen effects in games.

Retrieving the volume distribution

Down to business then. First, and very importantly, to enable frequency analysis you must use the software mixing flag when creating the sound or stream you want to analyse. This causes FMOD to mix the audio for the channel in software rather than passing it to a hardware-accelerated sound card, but DSP operations such as frequency analysis are only allowed in FMOD when software mixing is used, so we have no choice. Create your sound or stream as follows:

FMOD::Sound *song;
system->createStream("Song.mp3", FMOD_SOFTWARE, 0, &song); // or createSound

You will also need to retrieve the channel of the song once it starts playing as this is used as a handle to the frequency analysis function:

FMOD::Channel *channel;
system->playSound(FMOD_CHANNEL_FREE, song, true, &channel);

All pretty familiar so far. You can now do a frequency analysis at any time. If you are planning to draw a graphic equalizer (VU bars) from the results, or use beat detection, you will want to do this on every frame, so in your per-frame update code, first update FMOD, then perform the analysis:

// Per-frame FMOD update ('system' is a pointer to FMOD::System)
system->update();

// getSpectrum() performs the frequency analysis, see explanation below
int sampleSize = 64;

float *specLeft, *specRight;

specLeft = new float[sampleSize];
specRight = new float[sampleSize];

// Get spectrum for left and right stereo channels
channel->getSpectrum(specLeft, sampleSize, 0, FMOD_DSP_FFT_WINDOW_RECT);
channel->getSpectrum(specRight, sampleSize, 1, FMOD_DSP_FFT_WINDOW_RECT);

The first step is to allocate memory to a float array, then pass a pointer to it to getSpectrum() in which to retrieve the volume distribution. The sample size must be a power of two in the range 64-8192. The third argument specifies which part of the audio to examine; for a stereo track, 0 represents the left channel and 1 the right channel. Here we retrieve the volume distribution for both. The fourth argument specifies a smoothing filter to use to help guard against false readings (false transients). FMOD_DSP_FFT_WINDOW_RECT uses a rectangular filter, which essentially means everything is allowed through.

To get the average volume distribution for a stereo track we need to take the average of the left and right channels:

float *spec;

spec = new float[sampleSize];

for (int i = 0; i < sampleSize; i++)
    spec[i] = (specLeft[i] + specRight[i]) / 2;

Depending on what you want to do with the data, you now have a choice. The floats returned by getSpectrum are in dB (decibels) with a range of 0-1 where 1 is the loudest possible output and 0 is silence. Many of these values may be quite low even when the music is playing at maximum volume, so you can optionally normalise (scale) the data such that the loudest frequency is always represented by 1. To do this, find the maximum of all the volumes returned and then divide this into each volume as follows:

// Find max volume
auto maxIterator = std::max_element(&spec[0], &spec[sampleSize]);
float maxVol = *maxIterator;

// Normalize
if (maxVol != 0)
  std::transform(&spec[0], &spec[sampleSize], &spec[0], [maxVol] (float dB) -> float { return dB / maxVol; });

There are many ways to implement this but the combination of the C++ Standard Library template functions and a lambda function in C++11 is quite neat. The transform function performs an in-place transform of each volume to scale it relative to the maximum volume for the distribution.

Again optionally, you can calculate the range in Hz of frequencies covered by each array entry as follows, if you need it:

float hzRange = (44100 / 2) / static_cast(sampleSize);

Don’t forget to change 44100 to the sample rate (in Hz) of the audio.

You can now do whatever you want with the data. Don’t forget to clean up at the end of your function:

delete [] spec;
delete [] specLeft;
delete [] specRight;

Plotting a graphic equalizer (VU bars)

Figure 1. VU bars. In this case the sample size is 128 on a track sampled at 44.1kHz, therefore there are 128 bars representing a range of 172.266Hz each. The lowest frequency ranges are shown on the left, with the height of each bar representing its average volume.

Figure 1. VU bars. In this case the sample size is 128 on a track sampled at 44.1kHz, therefore there are 128 bars representing a range of 172.266Hz each. The lowest frequency ranges are shown on the left, with the height of each bar representing its average volume.

VU bars are usually represented by a row of rectangles (or lines) with a common bottom Y co-ordinate, with the height of each rectangle representing the volume of the frequency range it represents – see Figure 1.

How you plot this depends on the graphics engine you are using. If you want to ensure that at least one bar (the loudest) is always at the maximum height, you should normalize the frequency data before plotting. Here is some example code for my Simple2D library which uses 80% of the screen width with a 10% border at the left and right, and scales the width and gap between each bar according to the sample size:

// Earlier in code...
freqGradient = MakeBrush(Colour::Green, Colour::Red);

...
// VU bars

int blockGap = 4 / (sampleSize / 64);
int blockWidth = static_cast((static_cast(ResolutionX) * 0.8f) / static_cast(sampleSize) - blockGap);
int blockMaxHeight = 200;

// Parameters: Left-hand X co-ordinate of bar, left-hand Y co-ordinate of bar, width of bar, height of bar (negative to draw upwards), paintbrush to use

for (int b = 0; b < sampleSize - 1; b++)
    FillRectangleWH(static_cast(ResolutionX * 0.1f + (blockWidth + blockGap) * b),
                    ResolutionY - 50,
                    blockWidth,
                    static_cast(-blockMaxHeight * spec[b]),
                    freqGradient);

This produces a display identical to that shown in figure 1 when placed in your application’s DrawScene() function.

Beat detection

The principle of detecting when a beat occurs in the music is to examine a low frequency range (where the percussion occurs – typically 60-120Hz for a bass kick drum and 120-150Hz for a snare drum) to see if its volume exceeds a certain threshold value. In the following example, we simply consider the beat to have occurred when this threshold is exceeded, then ignore the volume of the track for a given period of time afterwards to avoid false positives. We also simply examine the lowest bar in the volume distribution which works for small sample sizes but will be looking at too low a frequency range for larger sample sizes. Ideally you should look at the frequency ranges above, but I leave that as an exercise for you. With a sample size of 128 and a track sampled at 44.1kHz, looking at the first item in the array will cover all frequencies from 0-172Hz, so it’s a reasonable estimate. A more sophisticated approach may aggregate the average of several bars from a larger sample size, and require the threshold to be exceeded for more than a single frame.

Normalization should not be used when coding for beat detection, as it disproportionately distorts the volume distribution during quiet periods of the music.

We need to set some variables to start with. Through trial and error, I found that these work reasonably well for dance music:

float beatThresholdVolume = 0.3f;    // The threshold over which to recognize a beat
int beatThresholdBar = 0;            // The bar in the volume distribution to examine
unsigned int beatPostIgnroe = 250;   // Number of ms to ignore track for after a beat is recognized

int beatLastTick = 0;                // Time when last beat occurred

Here is how we detect a beat:

bool beatDetected = false;

// Test for threshold volume being exceeded (if not currently ignoring track)
if (spec[beatThresholdBar] >= beatThresholdVolume && beatLastTick == 0)
{
  beatLastTick = GetTickCount();
  beatDetected = true;
}

if (beatDetected)
{
  // A beat has occurred, do something here
}

// If the ignore time has expired, allow testing for a beat again
if (GetTickCount() - beatLastTick >= beatPostIgnore)
  beatLastTick = 0;

This code can of course be adapted in a variety of ways, but as it stands, beatLastTick will retain the system tick time of the last detected beat while the audio is being ignored, and 0 at all other times. On the first frame that the beat is detected, the trigger code will execute and you can generate on-screen effects or user interactions here.

BPM Estimation

While not really needed for game development, I thought it would be interesting to include this by way of example. At the start, it should be noted that there are many ways to use beat detection data to calculate bpm – some more accurate than others – and the example below only produces makeshift estimates; it is not the best method. Also, bpm estimation relies on perfect beat detection, which is unlikely to be produced by the unsophisticated code above.

The correct way of determining a song’s bpm: The song should be scanned from start to end in memory without playing it, and a frequency analysis performed on each frame. Beat detection should be performed on each frame from the frequency analysis results, and the relative times of every beat in the song stored in an array. Heuristics should be used to ignore portions of the song with no beat, and the remaining portion of the song should have its playback time divided into the number of beats. This will provide a fairly accurate calculation of the song’s true bpm.

In our example, we will use a moving average bpm estimation over the previous 10 seconds of the playing track. First, we’ll need storage for the times of all the beats for the last 10 seconds:

// List of how many milliseconds ago the last beats were
std::queue beatTimes;

// The number of milliseconds of previous beats to keep in the list
unsigned int beatTrackCutoff = 10000;

We’ll then need to modify the beat detection code above to keep the list of detected beats up to date:

if (spec[beatThresholdBar] >= beatThresholdVolume && beatLastTick == 0)
{
    beatLastTick = GetTickCount();
    beatDetected = true;

    // Store time of detected beat
    beatTimes.push(beatLastTick);

    // Remove oldest beat if it is older than the cut-off time
    while(GetTickCount() - beatTimes.front() > beatTrackCutoff)
    {
        beatTimes.pop();
        if (beatTimes.size() == 0)
            break;
    }
}

The rest of the code remains the same. Now we have our updating list of beat times, we can do a simple calculation to estimate the track’s bpm:

// Predict BPM

float msPerBeat, bpmEstimate;

if (beatTimes.size() >= 2)
{
  msPerBeat = (beatTimes.back() - beatTimes.front()) / static_cast(beatTimes.size() - 1);
  bpmEstimate = 60000 / msPerBeat;
}
else
  bpmEstimate = 0;

What happens here is that we take the times of the oldest and newest beats in the list, and divide them by the size of the list minus one (which is the number of gaps between the beats rather than the number of beats) to get the average number of milliseconds between each beat. We then divide this into one minute (60000 milliseconds) to get the estimate of the bpm, which is stored in bpmEstimate.

Demo application

Here you can download a fully fleshed out demo using all the techniques above plus a few other tweaks. The full source code is presented below – note that Simple2D is used for rendering so the application follows the Simple2D framework. Please forgive the choice of music 🙂

FMOD_FrequencyAnalysis

To try the application, first press P to unpause the music after running the EXE file. You can use N to toggle normalization (beat detection and bpm estimation only run when normalization is off) and 1 and 2 to increase and decrease the FFT sample size. When normalization is off, the word ‘BEAT’ flashes on the screen when a beat is detected, and the current bpm estimation is shown at the bottom of the screen.

Full source code:

// FMOD Frequency Analysis demo
// Written by Katy Coe (c) 2013
// No unauthorized copying or distribution
// www.djkaty.com

#include "../SimpleFMOD/SimpleFMOD.h"
#include "Simple2D.h"

#include <queue>

using namespace SFMOD;
using namespace S2D;

class FrequencyAnalysis : public Simple2D
{
private:
	// FMOD
	SimpleFMOD fmod;
	Song song;

	// Graphics
	TextFormat freqTextFormat;
	Gradient freqGradient;

	// Normalization toggle and sample size
	bool enableNormalize;
	int sampleSize;

	// Beat detection parameters
	float beatThresholdVolume;
	int beatThresholdBar;
	unsigned int beatSustain;
	unsigned int beatPostIgnore;

	int beatLastTick;
	int beatIgnoreLastTick;

	// List of how many ms ago the last beats were
	std::queue<int> beatTimes;
	unsigned int beatTrackCutoff;

	// When the music was last unpaused
	int musicStartTick;

public:
	FrequencyAnalysis();
	void DrawScene();

	virtual bool OnKeyCharacter(int, int, bool, bool);
};

// Initialize application
FrequencyAnalysis::FrequencyAnalysis()
{
	// Make paintbrushes
	freqTextFormat = MakeTextFormat(L"Verdana", 10.0f);
	freqGradient = MakeBrush(Colour::Green, Colour::Red);

	song = fmod.LoadSong("Song.mp3", FMOD_SOFTWARE);
	song.Start(true);

	// Load song
	enableNormalize = true;
	sampleSize = 64;

	// Set beat detection parameters
	beatThresholdVolume = 0.3f;
	beatThresholdBar = 0;
	beatSustain = 150;
	beatPostIgnore = 250;
	beatTrackCutoff = 10000;

	beatLastTick = 0;
	beatIgnoreLastTick = 0;

	musicStartTick = 0;
}

// Handle keypresses
bool FrequencyAnalysis::OnKeyCharacter(int key, int rc, bool prev, bool trans)
{
	// Toggle pause
	if (key == 'P' || key == 'p')
	{
		song.TogglePause();

		// Reset bpm estimation if needed
		if (musicStartTick == 0 && !enableNormalize && !song.GetPaused())
		{
			musicStartTick = GetTickCount();
			beatTimes.empty();
		}

		else if (song.GetPaused())
			musicStartTick = 0;
	}

	// Toggle normalization
	if (key == 'N' || key == 'n')
	{
		enableNormalize = !enableNormalize;

		// Reset bpm estimation if needed
		if (!enableNormalize && !song.GetPaused())
		{
			musicStartTick = GetTickCount();
			beatTimes.empty();
		}
	}

	// Decrease FFT sample size
	if (key == '1')
		sampleSize = max(sampleSize / 2, 64);

	// Increase FFT sample size
	if (key == '2')
		sampleSize = min(sampleSize * 2, 8192);

	return true;
}

// Per-frame code
void FrequencyAnalysis::DrawScene()
{
	// Update FMOD
	fmod.Update();

	// Frequency analysis
	float *specLeft, *specRight, *spec;
	spec = new float[sampleSize];
	specLeft = new float[sampleSize];
	specRight = new float[sampleSize];

	// Get average spectrum for left and right stereo channels
	song.GetChannel()->getSpectrum(specLeft, sampleSize, 0, FMOD_DSP_FFT_WINDOW_RECT);
	song.GetChannel()->getSpectrum(specRight, sampleSize, 1, FMOD_DSP_FFT_WINDOW_RECT);

	for (int i = 0; i < sampleSize; i++)
		spec[i] = (specLeft[i] + specRight[i]) / 2;

	// Find max volume
	auto maxIterator = std::max_element(&spec[0], &spec[sampleSize]);
	float maxVol = *maxIterator;

	// Normalize
	if (enableNormalize && maxVol != 0)
		std::transform(&spec[0], &spec[sampleSize], &spec[0], [maxVol] (float dB) -> float { return dB / maxVol; });

	// Find frequency range of each array item
	float hzRange = (44100 / 2) / static_cast<float>(sampleSize);

	// Detect beat if normalization disabled
	if (!enableNormalize)
	{
		if (spec[beatThresholdBar] >= beatThresholdVolume && beatLastTick == 0 && beatIgnoreLastTick == 0)
		{
			beatLastTick = GetTickCount();
			beatTimes.push(beatLastTick);

			while(GetTickCount() - beatTimes.front() > beatTrackCutoff)
			{
				beatTimes.pop();
				if (beatTimes.size() == 0)
					break;
			}
		}

		if (GetTickCount() - beatLastTick < beatSustain)
			Text(100, 220, "BEAT", Colour::White, MakeTextFormat(L"Verdana", 48.0f));

		else if (beatIgnoreLastTick == 0 && beatLastTick != 0)
		{
			beatLastTick = 0;
			beatIgnoreLastTick = GetTickCount();
		}

		if (GetTickCount() - beatIgnoreLastTick >= beatPostIgnore)
			beatIgnoreLastTick = 0;
	}

	// Predict BPM
	float msPerBeat, bpmEstimate;

	if (beatTimes.size() >= 2)
	{
		msPerBeat = (beatTimes.back() - beatTimes.front()) / static_cast<float>(beatTimes.size() - 1);
		bpmEstimate = 60000 / msPerBeat;
	}
	else
		bpmEstimate = 0;

	// Draw display
	Text(10, 10, "Press P to toggle pause, N to toggle normalize, 1 and 2 to adjust FFT size", Colour::White, MakeTextFormat(L"Verdana", 14.0f));

	Text(10, 30, "Sample size: " + StringFactory(sampleSize) + "  -  Range per sample: " + StringFactory(hzRange) + "Hz  -  Max vol this frame: " + StringFactory(maxVol), Colour::White, MakeTextFormat(L"Verdana", 14.0f));

	// BPM estimation
	if (!enableNormalize)
	{
		if (GetTickCount() - musicStartTick >= beatTrackCutoff && musicStartTick != 0)
			Text(10, ResolutionY - 20, "Estimated BPM: " + StringFactory(bpmEstimate) + " (last " + StringFactory(beatTrackCutoff / 1000) + " seconds)", Colour::White, MakeTextFormat(L"Verdana", 14.0f));
		else if (musicStartTick != 0)
			Text(10, ResolutionY - 20, "Estimated BPM: calculating for next " + StringFactory(beatTrackCutoff - (GetTickCount() - musicStartTick)) + " ms", Colour::White, MakeTextFormat(L"Verdana", 14.0f));
		else
			Text(10, ResolutionY - 20, "Paused", Colour::White, MakeTextFormat(L"Verdana", 14.0f));
	}
	else
		Text(10, ResolutionY - 20, "Disable normalization to enable BPM calculation", Colour::White, MakeTextFormat(L"Verdana", 14.0f));

	// Numerical FFT display
	int nPerRow = 16;

	for (int y = 0; y < sampleSize / nPerRow; y++)
		for (int x = 0; x < nPerRow; x++)
			Text(x * 40 + 10, y * 20 + 60, StringFactory(floor(spec[y * nPerRow + x] * 1000)), Colour::White, freqTextFormat);

	// VU bars
	int blockGap = 4 / (sampleSize / 64);
	int blockWidth = static_cast<int>((static_cast<float>(ResolutionX) * 0.8f) / static_cast<float>(sampleSize) - blockGap);
	int blockMaxHeight = 200;

	for (int b = 0; b < sampleSize - 1; b++)
		FillRectangleWH(static_cast<int>(ResolutionX * 0.1f + (blockWidth + blockGap) * b),
						ResolutionY - 50,
						blockWidth,
						static_cast<int>(-blockMaxHeight * spec[b]),
						freqGradient);

	// Clean up
	delete [] spec;
	delete [] specLeft;
	delete [] specRight;
}

void Simple2DStart()
{
	FrequencyAnalysis test;
	test.SetWindowName(L"FMOD Frequency Analysis");
	test.SetBackgroundColour(Colour::Black);
	test.SetResizableWindow(false);
	test.Run();
}

I hope you found this exploration of frequency analysis in FMOD useful! In Part 5 we’ll check out how to generate audio on the fly from user-defined functions. If you want to do frequency analysis of real-time sound card output, check out Part 6 for the details! Enjoy.

Advertisements
  1. Paso
    May 31, 2013 at 09:26

    I’m concerned about your paramenter resolutionX resoltionY what does it mean?

    • May 31, 2013 at 13:54

      The screen or render target width and height in pixels. It is used to determine the width, maximum height and horizontal gap between each VU bar. You are free to ignore the example calculations and define the VU bar’s dimensions in any way you wish. The example code ensures that it auto-scales to the screen resolution.

      • Paso
        May 31, 2013 at 17:44

        Thanks for your reply!!
        I have 4 questions X(.
        Sorry for the disturbance. T_T

        1. When I do GetSpectrum() does this function gets the size of DB by the range of 0 ~ 44100 by itself(default)? or do i need to modify the range?

        2. Right now I can modify the number of bars by changing the parameter sampleSize. If 1 bar range is 0Hz to 2Hz (total 3Hz) does it add up all the values (For Instance 0Hz : 1 , 1Hz : 0.5 , 2Hz : 0) and divides it in to 3? ( 1 + 0.5 + 0 ) /3 ?

        3. Is the value of spec[i] maximum : 1.0 and minimum : 0.0?

        4. Right now when I look at my window media Equalizer the height of the bars are quite similar. However, the example above the heights are way to different. Therefore, does this mean that the Window media’s equalizer is showing only a part of Hz, or is it modifed some how?

        • June 1, 2013 at 22:06

          You’re not disturbing me, I just don’t always have time to reply on the blog 🙂

          1. getSpectrum gets the spectrum up to the recorded sample rate of the audio you have loaded. So if your audio is 96kHz, it will return the values for that range, if it is 44.1kHz, it will return the values for that range etc.

          2. I believe that is how it works, dB is not a linear measurement but a logarithmic one so the way the values are averaged may be logarithmic rather than a linear division (normal average) but in any case, the output is some kind of average across all the frequencies from 0 to 1.

          3. Yes

          4. I don’t know how the windows EQ works but if I had to guess I would say that the VU bars shown there do not have normalized values. The example in the article shows both normalized and non-normalized versions. The normalized version will generally have sharper differences between the height of each bar because the highest volume across the bars is searched for, multiplied up to become 1.0 and all the other bars are multiplied by the same value, so any differences between them become exaggerated.

          Hope that helps.

  2. Paso
    June 2, 2013 at 17:52

    Thanks for your help!! A lot of enhancements done in my project.

    I’m trying to normalize but I can’t go through with the code below.
    std::transform(&spec[0], &spec[sampleSize], &spec[0], [maxVol] (float dB) -> float { return dB / maxVol; });

    1. I’ve never learned [maxVol] (float dB) -> float { return dB / maxVol; before…
    what does it mean? Finding the spec[0] from spec[sampleSize], when spec[0] is found and I don’t know what happens after.

    2. I’ve also tried the Beat Detection in one of my songs. But the detecting accuracy seemed to be low. How can I make the accuracy increase? Of course, I did some googling and got a conclusion that smoothing and some mathematical process is needed. Is there another solution? I am Presently Getting Spectrum of 1024 pieces and Adding it all up, which is perhaps called the sound energy? and comparing the previous energy and the present energy every frames.

    • June 5, 2013 at 15:36

      1. This is a lambda function using C++11 syntax. It is equivalent to:

      float calculateVolume(float maxVol, float dB)
      {
      return dB / maxVol;
      }


      std::transform(&spec[0], &spec[sampleSize], &spec[0], std::bind(calculateVolume, maxVol, std::placeholders::_1));

      (and other permutations) but is much neater.

      What the call to std::transform – part of the standard C++ STL library – does is to iterate over each item in spec[], starting from 0 and finishing at sampleSize-1 (ie. every item in the array), dividing each value by maxVol and replacing the original value, ie:

      spec[0] /= maxVol;
      spec[1] /= maxVol;

      You could equally write it like this:

      for (int i = 0; i < sampleSize; i++)
      spec[i] /= maxVol;

      See my article elsewhere on the site about C++11 lambda functions for more details on both the lambda syntax and std::transform.

      2. The example above for beat detection is very crude and will only work well on eg. dance music with a clear beat. Comparing the energy across the entire sample range won't work because you are looking at all the frequencies at once; what you really want to do is just look at frequencies in the range of a bass drum etc. and compare the energy in that narrow band of the spectrum. The best solution as you mentioned is to use a smoothing function and monitor the small desired range of frequencies over several frames, being careful not to average over too many frames as that would introduce lag into the beat detection.

      I hope that helps! Katy.

  3. Jon
    June 2, 2013 at 21:00

    Is there any way for this to be worked on for a mac?

    • June 5, 2013 at 15:26

      It should do if you download the Mac version of FMOD, but I’m afraid I don’t know the specifics, only the Windows API. You will need to go through the FMOD Mac documentation and make the appropriate changes.

  4. June 19, 2013 at 04:17

    Thanks so much for writing this article. I started working on a music visualizer a couple months ago and finished the graphics engine. The thing is I’ve never done any audio coding. Google was not much help with this. I was about to start using VAMP and was cringing at the thought of having to write a host application and using a bunch of extra libraries to decode compressed audio.

    I’m so glad I did one more round of google searches and found your tutorial. I’ve got the basic stuff up now and it got me on track to doing what I want to do. Finally all that audio theory stuff from school is starting to be useful!

    I have one question. Is there any way to get the frequency of a mp3 so that I don’t have to hardcode 44khz and expect all files to be that way? I saw elsewhere you can determine it by taking the filesize and comparing it to the length of the song but I imagine that wouldn’t work for VBR files.

    Does FMOD have any facilities for returning the frequency of a file?

    thanks again!

    • June 23, 2013 at 23:57

      Once you have loaded a sound with System::createSound (or createStream) you can call Sound::getDefaults on the FMOD::Sound handle returned by System::createSound to get the default frequency, eg.

      FMOD::Sound *sound;

      float frequency;
      sound->getDefaults(&frequency);

      Hope that helps!
      Katy

  5. July 11, 2013 at 00:11

    Hi, Katy,

    I followed your tutorial up until the step you output the data because I will use it for a different purpose. I’m interested in finding the frequency of the wav file I opened with FMOD. For testing purpose, I generated a SINE wav file with 440 Hz, sampled at 8kHz online. From the formula you provided above: freqRange = (8000 / 2) / sampleSize, my sampleSize = 1024; therefore, freqRange = 3.9 hz per bin; since I already know the frequency of the wav file I’m inputting, I’m expecting to have larger value around bin #113 compared to other bins. But What I got was not what I expected, I got the largest value in bin # 512 = 3.469e-9 and second largest at bin # 256 = 3.205e-9.

    Do you have what could be the cause of it? Here is my test code:

    FMOD::Sound *sine_440hz;
    result = system->createSound(“C:\\Users\\hp\\Desktop\\sin_440Hz.wav”, FMOD_SOFTWARE, 0, &sine_440hz);
    FMODErrorCheck(result);

    FMOD::Channel *channel1;
    result = system->playSound(FMOD_CHANNEL_FREE, sine_440hz, true, &channel1);
    FMODErrorCheck(result);

    int sampleSize = 1024;
    float maxAmp = 0.0;
    int binNumb = 0;
    float *specturm;
    float SampleThres = 3.12 * pow(10.0, -9.0);
    specturm = new float[sampleSize];

    channel1->getSpectrum(specturm, sampleSize, 0, FMOD_DSP_FFT_WINDOW_RECT);

    cout << "bin:\t" << "value: " << endl;
    for(int i = 1; i SampleThres){
    cout << i << "\t" << specturm[i] << endl;
    }
    }

    Please help. Thank you

    • July 11, 2013 at 00:16

      for(int i = 1; i SampleThres){
      cout << i << "\t" << specturm[i] << endl;
      }
      }

      The last part I was messed up for some reason. Above is the correct one I used.

    • July 11, 2013 at 00:18

      it won’t let me post the code I used. but here is the idea
      for(int i = 1; i less than sampleSize ; i increment by 1){
      if(specturm array data less than SampleThres){
      cout << i << "\t" << specturm[i] << endl;
      }
      }

    • July 22, 2013 at 12:41

      If you are running this code as shown then you’re going to have problems. The code itself is correct, but you are fetching the spectrum immediately after the song starts playing, possibly before it starts playing, so you will get inaccurate results. The FFT calculation performed by getSpectrum uses historical data (ie. the last ‘sampleSize’ frames of playback) to calculate the amplitude in each frequency range, so it won’t return any meaningful data until the first ‘sampleSize’ samples have been played. Additionally, you need to call System::update repeatedly in a loop (usually once per frame of animation in a game will suffice, for example) to update FMOD’s internal state, otherwise many functions including getSpectrum won’t return the expected results.

      Hope that helps!

      • Elred
        November 17, 2013 at 17:07

        Hello,

        I’m trying to make some kind of beat detection using FMOD for some personal projects of mine, so your post is very, very useful and I’ll dig seriously into it this week, as I currently use a temporal approach to detect beats (it doesn’t work well). I’m trying to find the best way to use frequency to do the same thing, and quickly looking at your results and testing your demo app with some more musics convinced me FMOD could get me where I needed.

        However, I have a slight restriction, in the fact I need to make my beat detection BEFORE reading the music (I actually want to generate a file containing the meaningful data I can gather from automated analysis). All I’ve read about the getSpectrum function in FMOD tend to let me think it can only work if the sound is currently playing. Would you have a clue about how to work around this? Do you think reading the sound at high speed (like, 4x faster) would allow me to do it? While this is not a problem for the file generation thing I’m wanting to do right now, I’d like to reuse my system in a game that generates levels procedurally (using information extracted from a music file), and waiting a whole song time for the level to get ready isn’t really an option.

        Thanks for this article anyway, it will be really helps!

        • November 18, 2013 at 22:26

          I don’t know a way to do this. In fact the next part of the series will be about doing the same frequency analysis as here, but on the live sound card output, and even for that I had to record then silently playback the output signal at the same time, calling getSpectrum() on the played back signal. Looking at the “pitch bend” example that comes with the FMOD SDK, they do it the same way. So, I really don’t know how to do it in non-real-time using FMOD I’m afraid.

  6. August 20, 2013 at 05:59

    Hi Katy,

    Thanks very much for the excellent article!

    I’m attempting to work on a prototype that requires running FFT (and probably other algorithms) on the entire audio file before drawing anything with the data. Is this sort of iteration possible with FMOD in non-realtime?

    I’ve read elsewhere that perhaps createSound (for loading the file into memory) and seekData() are possible avenues for me to look through. What do you think?

    Thank you in advance for any help!

    • November 24, 2013 at 15:58

      As far as I can see (and I’m not an expert on FMOD by any means), you can only FFT sounds that are playing (see the problems I had with this in part 6 about recording sound card audio). You can of course use an external FFT library to do the processing. I’ve heard a few people mention Arduino lately, and StackOverflow provides some suggestions (http://stackoverflow.com/questions/463181/c-c-fft-library-with-non-gpl-license) on others you could investigate. Hope that helps!

  7. February 13, 2014 at 11:00

    Hi Katy,
    I’m a little confused – in standard provided FMOD examples, function update() is always placed inside do while function so every loop it would be called, while in your code there is no such encapsulation. So my question is – how does your code works if, as in my understanding, there is only single call of update() function in whole program runtime. Does your code in some magical way loops itself? ^^
    Winged

    • February 13, 2014 at 14:16

      The code in the example is in fact in a while loop – you just don’t see it because of the wrapper library I used (Simple2D – see elsewhere on the site). The main while loop of the application polls the Windows message pump, and periodically calls DrawScene() – once “per frame” – where a frame is an arbitrary amount of time of your choosing, or in a game or visual application, as fast as possible after the previous frame is rendered. So in other words, DrawScene() gets called repeatedly and often, triggering the call to FMOD’s update() function and the frequency analysis. Hope that makes sense! – Katy

      • February 14, 2014 at 11:38

        It fully answers my question. It’s just a pity that this wrapper is only available in Windows version (or maybe I haven’t noticed the Linux one). Anyway, thank you for a really badass tutorial here ;]

        • February 14, 2014 at 12:07

          You don’t need the wrapper, it’s just to make the example simpler and only deal with FMOD-related code and not boilerplate stuff specific to Windows (or Linux).

  8. February 20, 2014 at 20:07

    Sorry, for bothering you, but recently I’ve encountered uninteligible problems which I cannot overcome – the first one is that channel 0 is completely silent, while channel 1 is working ‘fine’ (getSpectrum of channel 0 returns nothing to an array). The second one is that I cannot set sampleSize below 8192 (on 4096 spectrum readings are ‘choppy’ and below this value they won’t ‘show’ at all). Could you point me a hint what may be wrong?

    • March 19, 2014 at 17:53

      Sounds like a bug in your sound card driver – I had almost identical problems with the Creative Audigy 2 ZS drivers. Use a recording program like Audacity and a sine wave generator (eg. YouTube video of sine wave sound), record it and see if it records a clean sine wave or not. If so, the problem is probably with your code; if not, the problem is your sound card driver. Good luck!

      Katy.

  9. stk
    December 3, 2014 at 00:31

    how can change the bass/treble/other frequency sound in fmod…?

  10. stk
    December 3, 2014 at 01:12

    i want about equalizer , not analyzer….

  11. Cane Glottis
    February 7, 2015 at 09:23

    Awesome!!! Thank you very much, this really helps while diving in to FMOD FTT audio analysis. Great article.

  1. January 25, 2013 at 07:45
  2. February 19, 2013 at 00:14
  3. November 24, 2013 at 15:19
  4. January 12, 2014 at 14:44
  5. January 12, 2014 at 14:52
  6. January 20, 2014 at 06:57
  7. April 9, 2014 at 12:03
  8. December 5, 2014 at 18:12

Share your thoughts! Note: to post source code, enclose it in [code lang=...] [/code] tags. Valid values for 'lang' are cpp, csharp, xml, javascript, php etc. To post compiler errors or other text that is best read monospaced, use 'text' as the value for lang.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: