Now that your compiler is set up, you're finally ready to dive into your first DirectX audio sample program! If you want, follow along by loading the Ch2p1_ToneGenerator sample into your IDE.
AUDIO CLIP |
Play Audio Clip 2.1 to hear the output from this chapter's sample program. |
ToneGenerator is a simple console application designed to illustrate the basics of initializing and pumping sound through DirectX Audio. ToneGenerator aims to teach you the basics of using DirectX Audio, as well as the core low-level behavior of sound. It is not terribly complex nor terribly useful, but I think it does a great job of stripping away all the abstractions and showing you, at a bare-bones level, how sound is created digitally.
ToneGenerator generates a tone of a specific frequency. It does this by calculating the bytes for a sine wave and pumping those bytes into a secondary buffer, which DirectX Audio then mixes with a primary buffer and sends out to the speakers. It's a console app—no GUI code to clutter things up. Also, it doesn't do anything fancy like sawtooth waves or square waves, though if you're into the math behind those, I encourage you to add them. See Figure 2.4 if you're not sure what a sine wave looks like, or what the terms amplitude, period, and frequency refer to.
Figure 2.4: A sine wave.
A few years ago, to a black-hat hacker, this program would have been incredibly useful, but I'm not going to tell you why or how. Also, lest they sic the lawyers on me, I should probably mention that I don't condone wreaking havoc on our nation's phone system, or trying to rip off the phone company even if they are the most incompetently run organization you've ever seen. Please use this humble program for good, not evil.
I'll bet you're all giddy with excitement now. A program with that sort of disclaimer on it has got to be cool, right? Here's how it works!
This is the first and most important step. Initialization consists of three main steps: getting the interface, setting the cooperative level, and setting the format of the primary buffer.
Tip |
When you call SetCooperativeLevel, one of the required parameters is a handle to your application's main window. Because the tone generator runs as a console window, it uses some Windows black magic to obtain its HWND. Check out the GetConsoleWindowHandle function in the sample program |
Those first two steps are easy; you can skip over the whole mess of enumerating audio cards and selecting one explicitly by passing NULL as the first parameter to Chapter 4 to learn the gory details of device enumeration). Setting the cooperative level is similarly easy—for most games, you'll want to use the DSSCL_PRIORITY, which lets you set the format of the primary buffer. If you're writing a "regular" audio application, like Winamp or ACiD, you might want to use DSSCL_NORMAL, which doesn't allow you to change the format of the primary buffer, but does facilitate faster switching between applications, and in general is "nicer" to the system than DSSCL_PRIORITY.
So that leaves the task of setting the format of the primary buffer. This is a little involved because you have to grab the primary buffer's interface, as shown in the following code:
HRESULT hr; LPDIRECTSOUNDBUFFER pDSBPrimary = NULL; // Get the primary buffer DSBUFFERDESC dsbd; ZeroMemory( &dsbd, sizeof(DSBUFFERDESC) ); dsbd.dwSize = sizeof(DSBUFFERDESC); dsbd.dwFlags = DSBCAPS_PRIMARYBUFFER; // "create it" (or get it if it already exists, which it does) if(FAILED(hr = g_pDS->CreateSoundBuffer(&dsbd, &pDSBPrimary, NULL))) { cerr << "CreateSoundBuffer failed:" << DXGetErrorString8(hr) << endl; return(false); } // set its format WAVEFORMATEX wfx; ZeroMemory( &wfx, sizeof(WAVEFORMATEX) ); wfx.wFormatTag = WAVE_FORMAT_PCM; wfx.nChannels = (WORD)2; wfx.nSamplesPerSec = 22050; wfx.wBitsPerSample = (WORD)16; wfx.nBlockAlign = (WORD)(wfx.wBitsPerSample/8*wfx.nChannels); wfx.nAvgBytesPerSec = wfx.nSamplesPerSec * wfx.nBlockAlign; if(FAILED(hr = pDSBPrimary->SetFormat(&wfx))) { cerr << "pSDBPrimary->SetFormat() failed: " << DXGetErrorString8(hr) << endl; return(false); } // we're done with it, so release! SAFE_RELEASE( pDSBPrimary );
Here you can see the code performing three steps. First, it gets the primary buffer. The method it uses to do this is called CreateSoundBuffer. CreateSoundBuffer takes a structure, and depending on the flags in that structure, either creates a secondary buffer and returns its interface, or gives back the primary buffer's interface. In this case we get the primary buffer's interface and store it in pDSBPrimary.
The next step is to fill in the WAVEFORMATEX structure with the audio format properties you want. In this case, those are 2 channels (stereo), 22,050Hz (half-CD quality!), with 16 bits per sample. That means that for each second of audio, we have 22,050 samples, and each sample is 16 bits (2 bytes), so we end up burning 44,100 bytes per second of audio. 22,050Hz is an "average quality" rate; for comparison, music that's on CD has been sampled at a rate of 44,100Hz. Consult Table 2.2 for some of the most common sample rates.
Windows Name |
Sample Rate |
Bits |
Channels |
---|---|---|---|
N/A |
8,000Hz |
8 |
1 (Mono) |
Telephone Quality |
11,025Hz |
8 |
1 (Mono) |
Radio Quality |
22,050Hz |
8 |
1 (Mono) |
CD Quality |
44,010Hz |
16 |
2 (Stereo) |
Once wfx is filled in correctly, the code makes a call to the SetFormat method of the primary buffer's interface. This turns the primary buffer into what's specified in wfx.
Finally, notice that the code releases the primary buffer's interface by calling the SAFE_RELEASE macro. Note that this doesn't actually delete the primary buffer; it just frees up the interface we created by the call to CreateSoundBuffer. A DirectX object (and actually, any COM object) is only deleted when all of its interfaces have been freed, and because DirectX Audio keeps an internal interface to the primary buffer, this doesn't happen until the code un-initializes DirectSound itself.
If this code didn't release the interface, however, the primary buffer would theoretically never get deleted, and a leak would develop. In reality, DirectX Audio is smart enough to detect when this happens and recover from it, but regardless, it's a bad thing to do; always remember to release your primary buffer's interface after you're finished with it.
With DirectSound initialized, the next step is to create a secondary buffer. Recall that a secondary buffer is where actual audio data is kept; DirectSound uses the primary buffer to mix all of the secondary buffers together on their way out to the sound card.
The good news is that because the function call is the same, you already know 90 percent of what there is to know about creating a secondary buffer. To create a secondary buffer, call the CreateSoundBuffer method of IDirectSound8, as shown by the following code:
int CreateSecondaryBuffer(LPDIRECTSOUNDBUFFER *ppBuffer, int channels, int secs, int samplerate, int bitspersample, DWORD flags) { HRESULT hr; stringstream err; DSBUFFERDESC dsbd; WAVEFORMATEX wfx; ZeroMemory( &wfx, sizeof(WAVEFORMATEX) ); wfx.wFormatTag = WAVE_FORMAT_PCM; wfx.nChannels = channels; wfx.nSamplesPerSec = samplerate; wfx.wBitsPerSample = bitspersample; wfx.nBlockAlign = (WORD) (wfx.wBitsPerSample / 8 * wfx.nChannels); wfx.nAvgBytesPerSec = wfx.nSamplesPerSec * wfx.nBlockAlign; ZeroMemory( &dsbd, sizeof(DSBUFFERDESC) ); dsbd.dwSize = sizeof(DSBUFFERDESC); dsbd.dwFlags = flags; dsbd.dwBufferBytes = samplerate * bitspersample / 8 * channels * secs; dsbd.guid3DAlgorithm = GUID_NULL; dsbd.lpwfxFormat = &wfx; if(FAILED(hr = g_pDS->CreateSoundBuffer(&dsbd, ppBuffer, NULL))) { err << "CreateSecondaryBuffer: CreateSoundBuffer failed: " << DXGetErrorString8(hr) << endl; throw(err.str());; } return(dsbd.dwBufferBytes); // return size of buffer }
Most of this code is concerned with filling in a DSBUFFERDESC structure, which specifically tells DirectSound what you want it to create. The majority of the members of the DSBUFFERDESC structure are self-explanatory, but if something's confusing you, type DSBUFFERDESC into the DirectX SDK Help index and you'll be taken to a page that explains each member in detail.
The code is creating a buffer using the WAVE_FORMAT_PCM flag. This flag tells DirectSound that the format of the data in the buffer is going to be in pulse code modulation (PCM). When you think of a "normal" sound effect, with no compression or funny codecs, you're thinking of PCM.
Creating a PCM buffer involves a little bit of math. For instance, look at the block alignment, nBlockAlign, of the wave format structure. That probably looks a little strange, but it is correct, because the documentation says that for a PCM formatted buffer, nBlockAlign must be equal to the product of nChannels and wBitsPerSample, divided by 8. This is where you shrug and say, "Well, that was easy." There's a bit of code in DirectSound, or in the sound card driver, that wants this calculation, so your best bet is to just give it what it wants and don't worry about why it wants it.
Tip |
Notice the error handling built into the buffer-creation code. You should always use the FAILED macro to check the return value of DirectSound interface methods. If something goes wrong, this macro will return true, allowing you to take the appropriate action. Generally, you'll want to use the DXGetErrorString8 function to turn the error code you got back into a descriptive error message. |
Moving on, check out the calculation for the dwBufferBytes member of the buffer description structure. Normally, if you're dealing with a WAV file you're loading from disk, you know how many bytes you need, but this sample program generates the sound data itself, so it has to figure it out. The variables samplerate, bitspersample, channels, and secs are all arguments to the CreateSecondaryBuffer function. The function multiplies all of them together to determine the total number of bytes the sample needs, then sticks that number in dwBufferBytes. For each second of audio, there are samplerate samples, so that's secs samplerate. The number of channels of the sample is stored in channels, and for each of those channels there are bitspersample bits, so that's channels bitspersample / 8 bytes for each sample.
Put the whole thing together and you end up with a long string of multiplications that explains why MP3 compression is so useful; without it, a stereo (two channel), CD-quality (16-bit samples, 44,010Hz samplerate), three-minute song (180 seconds) takes 44,010 16 / 8 2 180, or 31,687,200 bytes of storage. That's about 31 megabytes.
Returning to the code, once the DSBUFFERDESC structure is filled in, the code calls the CreateSoundBuffer method of IDirectSound8, which creates the buffer and puts an interface for it into ppBuffer. Make note of the double indirection going on here; pBuffer is a pointer to a pointer to a IDirectSoundBuffer (remember, LPDIRECTSOUNDBUFFER is really a IDirectSoundBuffer *, so a LPDIRECTSOUNDBUFFER * is really a IDirectSoundBuffer **. The LP comes from Hungarian Notation, and stands for Long Pointer, so LPDIRECTSOUNDBUFFER is just a long pointer to a direct sound buffer).
Once you have created your secondary buffer, you need to fill it with data. To fill it, you must first lock the buffer (by calling the Lock method of IDirectSoundBuffer). Lock tells DirectSound that you're about to modify the buffer's contents. When you call Lock, you get back a pointer to the buffer itself. If your bitspersample is eight, you should cast the pointer DirectSound gives you to an unsigned char *. You can then go wild with filling in the bytes. Once you're done, you must unlock the buffer (via a call to Unlock).
Here's a peek at that process in code:
void FillBuffer(LPDIRECTSOUNDBUFFER pBuffer, float frequency, int buffersize, int samplerate) { HRESULT hr; stringstream err; unsigned char *pBufferBytes; DWORD lockedsize; // Lock the buffer down if(FAILED(hr = pBuffer->Lock(0, buffersize, (void **)(&pBufferBytes), &lockedsize, NULL, NULL, 0L))) { err << "FillBuffer: Lock failed: " << DXGetErrorString8(hr) << endl; throw(err.str());; } for (DWORD q=0; q < lockedsize; q++) { // Determine cycle we're in float pos = frequency/(float)samplerate*(float)q; // take remainder and convert to radians float r = (pos - floor(pos)) * 2 * PI; float value = sin(r); // we now have a value somewhere between -1 and 1… multiply // by 127, then add 127, to get unsigned value. // change multiplier to change amplitude of wave, aka volume pBufferBytes[q] = 127+(value * 127); } // Unlock the buffer pBuffer->Unlock( pBufferBytes, lockedsize, NULL, 0 ); }
This code begins by locking the buffer, via a call to the Lock method of IDirectSoundBuffer. It passes in the first byte it wants to modify (zero), and the total number of bytes it wants to modify (which is buffersize because it's manipulating the entire buffer).
It may surprise you to learn that when you Lock something, you get back not one, but two pointers. This is to help you with streaming. Check out Figure 2.5.
Figure 2.5: When you lock something, you might get two pointers back.
DirectSound allows you to specify a starting byte and a length that would normally exceed the length of your buffer. For example, it's perfectly legal to allocate a 100-byte buffer, and then request 50 bytes starting at byte 75. You'd think that this would be a major no-no because at byte 75 there are only 25 bytes left to give, but instead, DirectSound "wraps" the buffer. If you request 50 bytes starting at byte 75 of a 100-byte buffer, you get two pointers back from DirectSound. The first pointer starts at byte 75 and goes to byte 99. The second pointer starts at byte 0 and goes to byte 39.
Pretty cool, isn't it? By automatically wrapping the buffer for you, DirectSound frees you from having to make two lock/fill/unlock cycles.
If you know that what you're locking isn't going to wrap around, you can safely pass in NULL for the second pointer, and just use the first pointer DirectSound gives you. This is what the tone generator sample program does. However, when you're streaming data, you'll want to give the Lock method two buffers and two sizes, and pay attention to what it gives back to you, so that you can handle the wrapping situation elegantly.
Tip |
The Lock method accepts a couple of useful flags. DSBLOCK_ENTIREBUFFER tells it to lock the entire buffer. I didn't use it in the example program because I wanted to explicitly show you how to lock bytes, but in a normal app, you'll probably want to use it instead of specifying zero for the first byte and the size of the buffer for the length. There's also a DSBLOCK_FROMWRITECURSOR flag that tells DirectSound that you want to write starting at the place you last stopped writing. This is sort of like writing to a file, and is very useful when streaming. |
Once the buffer's locked, the real fun begins. To get a sound to play, this sample program fills up the buffer with data generated via the sine function. If it's been a while since you've done any trigonometry, the sine function returns a value between –1 and 1, in a wave pattern.
The tricky part here is getting the frequency of the wave correct. There's a small algorithm to do this. This algorithm first determines the wave cycle it's in—that is, how many full waves have come before it. When pos is a whole number, it's the beginning of a new wave iteration. Take a peek at Figure 2.6 to help visualize this.
Figure 2.6: Use the fractional part of pos to determine where in the wave you are.
In the code, the expression pos – floor(pos) gives you the "remainder" of pos, which is how far into the current wave you are (and will always be something between zero and one). For example, if pos is 2.54 (meaning you've already completed two waves and are just over halfway done with the third), then floor(pos) is 2, and r is 0.54.
Note that regardless of how long or short the wave actually is, you always wind up with a remainder between zero and one. It has nothing to do with frequency, because you've already taken frequency into account when you determined pos, the cycle. That's the beauty of the algorithm.
From here, it's simple—plug the r value into the sine function to get back the wave height at that point. You now have a wave height between –1 and 1. The next step is to multiply this wave height by 127, making it as tall as it can be. Recall that the height of a wave is its volume, so by multiplying by 127, you're giving this tone the maximum volume. If you wanted to make it half as loud, you'd multiply by 64.
Now you have a wave height that's somewhere between –127 and 127 (or –64 and 64, if you halved the volume). The final step is to offset it so that the wave fits into our unsigned char's range of 0 to 255. To do this, you simply add 127.
Whew! So the code does that for every byte of the secondary buffer. Once it has made its way through the for loop, it unlocks the buffer by calling the Unlock method, supplying the pointers and sizes it received from Lock. No flags available here—Unlock simply unlocks what you've previously locked.
Tip |
The ToneGenerator sample program doesn't support task switching, but if you want your game to, remember that you should restore your buffers when your user task switches back into your game. This is accomplished via the Restore method of the IDirectSoundBuffer interface. For an example of how to restore "nicely," consult the DirectX SDK, DirectSound sample programs. |
After plowing though all the complexities in creating and filling the buffer, you'll be happy to learn that it's very easy to get a secondary buffer to start mixing with the primary buffer and playing out the speakers. All you have to do is call the Play method of the buffer:
void PlayBuffer(LPDIRECTSOUNDBUFFER pBuffer) { pBuffer->Play(0, 0, DSBPLAY_LOOPING); }
The first parameter to Play is reserved for use in future versions of DirectSound, and must be set to zero. The second parameter is a priority value, but requires a special type of buffer (created with the DSBCAPS_LOCDEFER flag). Because you didn't create the secondary buffer with this flag, you don't get to specify a priority (which is okay)—just enter zero. The third parameter is the omnipresent flags parameter. Again, if you had a DSBCAPS_LOCDEFER buffer, there'd be all sorts of nifty flags to choose from (consult the DirectX SDK help for the full list), but because you're just using a simple buffer, the only flag available is DSBPLAY_LOOPING, which tells DirectSound to loop the buffer infinitely.
To stop a sound from playing, use the Stop method. Stop takes no arguments; all it does is silence a buffer that's currently playing.
Once the user has heard the tone and pressed a key to stop it, the code begins cleaning up the things it had previously allocated:
// release secondary buffer cout << "Releasing sound buffer..." << endl; SAFE_RELEASE(soundbuf); // un-init DirectSound cout << "Un-Initializing Audio..." << endl; SAFE_RELEASE(g_pDS);
There are only two things the sample program has allocated: the secondary buffer and the DirectSound interface itself (which also contains the primary buffer).
In case you've never seen it before, here's what SAFE_RELEASE looks like (it's declared in
#define SAFE_RELEASE(p) { if(p) { (p)->Release(); (p)=NULL; } }
SAFE_RELEASE looks at the interface it's given, and if it's not NULL, it releases it (by calling its Release method), and then sets it to NULL (so that if anyone tries to use it again, your program will access violate and you'll be able to see what went wrong). This is standard practice for DirectX, and it behooves you to use SAFE_RELEASE instead of just calling the Release method directly.
Also, make sure you release things in the right order. You should release all secondary buffers before releasing the main DirectSound interface. If you don't, DirectSound will warn you (through debug output) that you forgot to release the secondary buffers because it assumes that any buffer not released by the time it's shutting down has been neglected and forgotten.