Beginning Game Audio Programming [Electronic resources] نسخه متنی

THE DISCRETE FOURIER TRANSFORM

The core of spectrum analysis is simple. You have a sequence of numbers representing your sampled sound (or, more precisely, representing the amplitude of your sound at several different points in time). You want to generate another set of numbers, which represent the amplitudes of the various frequencies contained in your original sample. Once you have these amplitudes, you can choose to display them any way you like. You might draw a bar graph, with each bar representing a different frequency, and the height of the bar representing its amplitude (like most popular MP3 players do). Or, you might plot a 2D graph, with frequency on the vertical axis and time on the horizontal axis, using color to represent amplitude. To reiterate: once you get the array of frequency amplitudes, you're home free—from there it's just a matter of GUI programming.

Tip

In mathematical terms, when you're writing a spectrum analyzer, what you're really doing is converting your sample from the time domain to the frequency domain. In other words, you're changing the x-axis of your graph. Instead of the x-axis being time, as it usually is for sampled sounds, you're changing it to be frequency.

One way to get the frequency amplitude array is to use a Discrete Fourier Transform, or DFT. The bulk of this chapter will focus on the theory behind the DFT.

Resonance

I'm sure that you're familiar with the concept of resonance: the opera singer can shatter a wine glass by singing at its resonant frequency. Essentially, the vibrations caused by the opera signer's voice cause the wine glass to vibrate stronger and stronger, until eventually the vibration forces are too much for the glass to bear, and it shatters.

What you may not understand is exactly why the forces build up to shatter the glass. To understand that, imagine that you come across a large, heavy silver orb suspended from the ceiling by a sturdy rope. Intuitively, you know that tiny applications of force, when applied at precisely the right time, can make the orb swing higher and higher. It doesn't matter how much the orb weighs, all you have to do is give it a tiny push right when it needs it, and you'll eventually make it swing.

Now, imagine that instead of a silver orb, the thing being pushed at exactly the right time is a molecule inside the wine glass, and the thing doing the pushing is not a pair of human hands, but rather, waves of compressed air, also known as sound waves, traveling through the air. If each sound wave arrived at just the right time, they could "swing" the molecule further and further away from its resting place. Further, these sound waves would all be hitting multiple molecules in the glass, making all the molecules swing in different directions, until eventually the glass just shatters.

How You Hear

It's this property of resonance that allows you to hear. Your ear is like a biological spectrum analyzer. Inside your ear is an apparatus called the cochlea, shaped like a shell. Inside the cochlea is a membrane called the basilar membrane. The basilar membrane's base is narrow and stiff, which makes it resonate with high frequency sounds. The other end of the membrane, the apex, is wide and flexible, making its resonant frequency low. In between, the different parts of the basilar membrane have different resonant frequencies. The membrane is covered with nerve endings, and when a particular frequency causes a particular portion of the membrane to resonate, that resonation causes a nerve to send a message to your brain, and you "hear" that frequency.

Resonance Expressed Mathematically

Sinusoid waves are interesting because the sum of their parts is always zero—in other words, the waves spend the exact same amount of time below zero as they do above zero. This means that in general, if you sample a wave's amplitude at regular intervals, and then add those samples up, you're going to arrive at an answer that's close to zero. The positive and negative samples will, in general, cancel each other out.

However, something interesting happens if you sample the wave at its frequency. No matter when you start sampling, you will eventually end up with a number that diverges from zero. If you happen to start sampling when the wave's above zero, the wave will be in that exact same spot the next time you sample it, causing your sum to grow bigger and bigger over time. Similarly, if you happen to start sampling when the wave's below zero, your sum will move further and further negative with every sample. This is resonance, expressed mathematically.

This means that if you want to figure out whether a given wave has, contained within it, a wave of a particular frequency, all you need to do is multiply points of one wave with points of the other, and add up the results of all those multiplications. This could be expressed in code like this:


float sinAmp = 0.0f; // this will contain the final amplitude value
for (int k = 0; k < numsamples; k++) {
float x = 2.0f * M_PI * (float)freq * (float)k /(float)numsamples;
sinAmp += sample[k] * sin(x);
}

This is fictional code—it doesn't appear in the Ch17p1_Visuals sample program. Assume that numsamples contains the total number of samples in our input wave, and that sample is an array of floating point numbers, between –1 and 1. The code loops through the entire array of samples, multiplying each sample by sin(x), then adding that result onto sinAmp. When the loop finishes, sinAmp contains the amplitude of the wave (of frequency freq) present in the source.

To calculate x, the input to the sin function, the code multiplies the frequency in question (freq) by 2[.pi] (2 times the value of pi, or approximately 6.28). The sin function expects radian input, which is a special form of angular measurement that ranges from 0 to 2 pi radians, instead of from 0-360 degrees. Finally, the code multiplies that by k (our loop counter) over the total number of samples (numsamples). This is what "iterates over" the sine wave the code is generating—when k is zero, x is zero. When k is numsamples, k divided by numsamples is 1, and x becomes 2 pi times freq. It's this mechanism that gives you the point on the imaginary wave of frequency freq that you then multiply against the corresponding point on the input wave.

Once you've multiplied each point on the source wave by the corresponding point on the imaginary freq wave, and added all those together, you (somewhat magically) arrive at the amplitude of the freq wave contained within your source wave. In essence, you've simulated mathematically the process of resonation!

When Sine Alone Won't Cut It

It's important to realize that when "resonating" your input wave against a sine wave of a given frequency, the final amplitude you get out will vary depending on the phase of the waves. The phase is a measurement of how much the wave is shifted left or right on the graph.

Imagine that you have two source waves, both of which are a sine wave at 440Hz and amplitude 5. Wave 1 has a phase of zero, and wave 2 has a phase of pi/2 radians. If you "resonate" both of these waves, the two numbers you get for your final amplitudes will be different. Wave 1's amplitude will be higher than wave 2's. When wave 1 is at its peak of 5 or –5, the resonating wave will be at its peak of 1 or –1, whereas with wave 2, when it's at its peak, the resonating wave will be zero. In other words, the resonating wave "misses out" on the peaks of wave 2, simply because it's shifted (or, in more technical terms, has a different phase). When the input wave is at its maximum, the resonating wave is at zero, and when the resonating wave is at its maximum, the input wave is at zero. This leads to a final amplitude value that is lower than it should be.

To solve this problem, you use a cosine wave. A cosine wave is exactly like the sine wave, only it starts at one, whereas a sine wave starts at zero (see Figure 7.2). In other words, a cosine wave is a sine wave with a different phase.

Figure 17.2: A cosine wave is a sine wave with a different phase.

Tip

In fact, it can be proven mathematically that any wave containing a single frequency at a single amplitude, regardless of phase, can be represented by the sum of a sine wave and a cosine wave. There's not enough room to go into the mathematical proof, but I encourage you to explore it on your own using some of the links I've provided on your CD.

If you resonate an input wave against both sine and cosine waves, you're guaranteed to get the same final amplitude number, regardless of the phase of ("amount of shift in") your input wave. This is because the sine and cosine waves cover for each other— when one is at zero, the other is at its maximum, and so you're guaranteed to never "miss" the peaks of your input wave.

Using both sine waves and cosine waves gives us code that looks like the following:


float sinamp = 0.0f;
float cosamp = 0.0f;
for (int k = 0; k < numsamples; k++) {
float x = 2.0f * M_PI * (float)freq * (float)k /(float)numsamples;
sinamp -= sample[k] * sin(x);
cosamp += sample[k] * cos(x);
}
float amp = sqrt(sinamp*sinamp + cosamp*cosamp);

The preceding code is similar to what you saw a few sections ago, only instead of using just a sine wave, it uses both a sine and cosine wave (I subtract from the sinamp variable to make the sine wave start below the zero line, as the cosine wave does). Also, there's a new line that calculates amp as the square root of sinamp squared plus cosamp squared.

This is really just a distance calculation in disguise. Suppose you want to find the distance from the origin of a particular point, say, (3,4). To find the distance from the origin to this point, you square both components, then add the squares together and take the square root. 3 squared is 9, 4 squared is 16, 9 plus 16 is 25, so the final distance is the square root of 25, or simply 5. If you plot a 2D graph with sinamp on one axis and cosamp on the other, you can calculate the final amplification as the distance to the point formed by (cosamp, sinamp). If this isn't immediately clear to you, there is a ton of backstage mathematical ponderings as to why this is true—I encourage you to explore the links I've included on your CD, as the road to understanding this is truly a fascinating one, filled with imaginary numbers and several other forms of weirdness.

And now, at long last, you have the single most important piece of knowledge needed to implement a spectrum analyzer: you know how to figure out "how much" of a certain frequency is contained in a wave file. Now that you know this, all you need to do is perform the same operation using several different freq values, and you'll arrive at an output array suitable for graphical display.

The Nyquist Number

The only question left is, "which frequencies should I use?" To answer that, you must first come to grips with the Nyquist number. The Nyquist number, in this context, represents the highest frequency that can be properly captured using a certain sampling rate.

For example, consider a source wave with a frequency of 10Hz, sampled 20Hz. I've kept this number small to illustrate this concept—as you know, waves are usually captured at sample rates of 22,050Hz or 44,100Hz, but for this example, let's throw quality out the window and sample at 20Hz.

Sampling a 10Hz wave using a sample rate of 20Hz doesn't yield a great sample, but it does properly capture the original frequency of 10Hz. Now, imagine that you try to sample a wave of 30Hz at a sampling rate of 20Hz. The set of samples you get are completely bizarre, because the wave is undulating too fast for you to see it. This is called aliasing, and it's the audio equivalent to why fast-moving helicopter blades and tires appear to occasionally be moving backwards. Of course, they aren't really moving backwards—your brain's visual sampling rate is simply too slow to capture the true motion, so it appears as if they're moving backwards.

The Nyquist number is the highest frequency you can capture with a given sampling rate and not get any bad aliasing effects. It is always exactly half the sampling rate. Therefore, at 44,100Hz, the highest frequency you can capture is 22,050Hz.

Discrete Fourier Transforms

Now, at last, you have all the pieces to the puzzle. You know how to figure out if a wave contains a given frequency, and now, thanks to the Nyquist number, you know the highest frequency you need to look for. All that remains is to wrap the code you're already familiar with inside another for loop:



for (int bin = 0; bin <= numsamples/2; bin++) {
float cosAmp = 0.0f;
float sinAmp = 0.0f;
for (int k = 0; k < numsamples; k++) {
float x = 2.0f * M_PI * (float)bin * (float)k / (float)numsamples;
sinAmp += m_Input[k] * sin(x);
cosAmp += m_Input[k] * cos(x);
}
m_AmpOutput[bin] = sqrt(sinAmp*sinAmp + cosAmp*cosAmp);
}

The freq variable is now bin, a for loop running from zero to the Nyquist frequency, and the final amplitude is now an array of amplitudes for each frequency.

Animating Spectrum Analyzers

The preceding section of code will give you the amplitudes of various frequencies contained in the entire input wave. To implement an animating spectrum analyzer, all you need to do is calculate the frequencies for a small portion of the input wave, starting at the sample that's currently coming out the speakers, and ending some number of bytes from that point (512 is a good number). You'll learn about this in more detail in the next sections.

Beginning Game Audio Programming [Electronic resources] نسخه متنی

فارسی

کردی

العربیه

اردو

Türkçe

Русский

English

Français

کانال فیلم من

تبیان من

فایلهای من

کتابخانه من

پنل پیامکی

وبلاگ من

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی