MPEG LAYER 3 (MP3) PLAYBACK
Because you're reading this book, I'm going to assume that you know the kitschy basics of MP3s—that MP3 stands for MPEG layer 3, and that MPEG stands for Motion Picture Experts Group (the same group of people who brought you the video algorithm used in DVDs!). That information is interesting, but what really matters is how MP3s work and how you can write code to play them.
How MP3s Work
The magic of the MP3 is that it sounds like a WAV file but is much smaller than one. A typical WAV file of a song will suck up 20-40MB of space, depending on your recording options and the length of the song. MP3s are usually one-tenth of that size, and they sound almost the same. Magic? Nope, just psycho-acoustic compression.
AUDIO CLIP | Play Audio Clips 7.1, 7.2, and 7.3 to hear how the MP3 format compresses sound. |
The reason an MP3 can be one-tenth the size of a WAV file and still sound as good lies in the human brain. Specifically, how stupid the human brain is. A song is nothing but a collection of sound waves of various frequencies and amplitudes, and the human brain can't hear all of those different sound waves at once. Some of them are too high or too low, and some are lost behind louder ones. The magic MP3 compression philosophy is "if you can't hear it, why bother encoding it?"A loud cymbal crash will mute out the acoustic guitar in the background, so there's no need to spend bytes making sure the acoustic guitar sound wave is re-created perfectly when the cymbal crashes. Additionally, there's no need to spend thousands of bytes recording the shape of a sound wave so high-pitched that it's beyond the audible range of human ears. Your dog might prefer original CD tracks over MP3s, but your ears can't detect the missing high-frequency sound waves, so MP3s sound great to you.
The MP3 File Format
Surprisingly, an MP3 file has no header—it's not like a RIFF file with an embedded hierarchy and several header bytes at the beginning. Instead, an MP3 file consists of many frames, and each frame is a self-contained snippet of the song. Each frame has its own little header, which tells you its bit rate, MPEG version (version three is the most popular, but there are also versions one and two), and other information.Figure 7.1 shows a detailed layout of a frame header. As you can see, the entire header is 32 bits long, and the information is packed into those 32 bits without regard for byte boundaries. This makes things a little irritating because you have to use bit masks to extract some pieces of information (for example, the bit rate is embedded into the high-order word of byte three). The MPEG folks did this to save space—they wanted the MP3s to be as small as possible, and that meant making sure every bit of every byte contained useful information.

Figure 7.1: What's contained in the four bytes of a frame header.
I'm not going to go into the details of what each individual piece of information means, because for what you're going to learn in this chapter, you really don't need those low-level details (however, if you're curious, I've provided some links on your CD that explain things in detail). What you should be aware of is that an MP3 is divided into frames, and each frame can have its own bit rate.At the very end of the MP3 is what's called an audio tag. This audio tag contains the title, artist, and album strings for the MP3, as shown in Figure 7.2. It's all ASCII data, except for the genre field, which is a one-byte code that stands for a particular genre, as shown in Table 7.1.

Figure 7.2: Illustrates the format of the 128-byte long audio tag.
# | Genre |
---|---|
0 | Blues |
1 | Classic Rock |
2 | Country |
3 | Dance |
4 | Disco |
5 | Funk |
6 | Grunge |
7 | Hip-Hop |
8 | Jazz |
9 | Metal |
10 | New Age |
11 | Oldies |
12 | Other |
13 | Pop |
14 | R&B |
15 | Rap |
16 | Reggae |
17 | Rock |
18 | Techno |
19 | Industrial |
20 | Alternative |
21 | Ska |
22 | Death Metal |
23 | Pranks |
24 | Soundtrack |
25 | Euro-Techno |
26 | Ambient |
27 | Trip-Hop |
28 | Vocal |
29 | Jazz & Funk |
30 | Fusion |
31 | Trance |
32 | Classical |
33 | Instrumental |
34 | Acid |
35 | House |
36 | Game |
37 | Sound Clip |
38 | Gospel |
39 | Noise |
40 | Alternative Rock |
41 | Bass |
42 | Soul |
43 | Punk |
44 | Space |
45 | Meditative |
46 | Instrumental Pop |
47 | Instrumental Rock |
48 | Ethnic |
49 | Gothic |
50 | Darkwave |
51 | Techno-Industrial |
52 | Electronic |
53 | Pop-Folk |
54 | Eurodance |
55 | Dream |
56 | Southern Rock |
57 | Comedy |
58 | Cult |
59 | Gangsta |
60 | Top 40 |
61 | Christian Rap |
62 | Pop/Funk |
63 | Jungle |
64 | Native American |
65 | Cabaret |
66 | New Wave |
67 | Psychedelic |
68 | Rave |
69 | Showtunes |
70 | Trailer |
71 | Lo-Fi |
72 | Tribal |
73 | Acid Punk |
74 | Acid Jazz |
75 | Polka |
76 | Retro |
77 | Musical |
78 | Rock & Roll |
79 | Hard Rock |
80 | Folk |
81 | Folk-Rock |
82 | National Folk |
83 | Swing |
84 | Fast Fusion |
85 | Bebop |
86 | Latin |
87 | Revival |
88 | Celtic |
89 | Bluegrass |
90 | Avant-garde |
91 | Gothic Rock |
92 | Progressive Rock |
93 | Psychedelic Rock |
94 | Symphonic Rock |
95 | Slow Rock |
96 | Big Band |
97 | Chorus |
98 | Easy Listening |
99 | Acoustic |
100 | Humour |
101 | Speech |
102 | Chanson |
103 | Opera |
104 | Chamber Music |
105 | Sonata |
106 | Symphony |
107 | Booty Brass |
108 | Primus |
109 | Porn Groove |
110 | Satire |
111 | Slow Jam |
112 | Club |
113 | Tango |
114 | Samba |
115 | Folklore |
116 | Ballad |
117 | Power Ballad |
118 | Rhythmic Soul |
119 | Freestyle |
120 | Duet |
121 | Punk Rock |
122 | Drum Solo |
123 | A Cappella |
124 | Euro-House |
125 | Dance Hall |
To read this audio tag, seek to 128 bytes from the end of the MP3 you're interested in, and look for the three ASCII letters, TAG. If you don't find them, the MP3 doesn't have an audio tag. If you do find them, read the last 128 bytes and parse it using the information in Figure 7.2.Here's how that looks in code:
bool CMP3AudioTag::Read(std::string filename)
{
char header[4] = { 0 };
char title[31] = { 0 };
char artist[31] = { 0 };
char album[31] = { 0 };
char year[5] = { 0 };
char comment[31] = { 0 };
char genre = 0;
int handle = open(filename.c_str(), O_RDONLY);
lseek(handle, -128L, SEEK_END);
read(handle, header, 3);
bool result = false;
if (!stricmp(header, "TAG")) {
// valid header... this MP3 has an audio tag!
read(handle, title, 30); m_Title = title;
read(handle, artist, 30); m_Artist = artist;
read(handle, album, 30); m_Album = album;
read(handle, year, 4); m_Year = year;
read(handle, comment, 30); m_Comment = comment;
read(handle, &genre, 1);
m_Genre = (MP3_GENRE)genre;
m_GenreStr = GenreToString(m_Genre);
result = true;
}
close(handle);
return(result);
}
This code is from a new class, CMP3AudioTag. If you've done file I/O in C, this is easy. The only thing subversive is the call to lseek, which puts you 128 bytes back from the end of the file. From there, you just read the data in the order described, and you're all set!
Tip | My fingers are tired. I made a giant enum, MP3_GENRE, which contains the contents of Table 7.1. I also made a method, CMP3AudioTag::GenreToString, which converts an MP3_GENRE into an STL string. |
MP3s in DirectMusic—When a WAV Is Not a WAV
Chapter 3 (Ch3p1_WAVPlayback), and you'll encounter this error.Even with this DirectMusic limitation, there's a way you can get Ch3p1_WAVPlayback to play your favorite MP3, and it doesn't even involve changing code or switching formats! Here's the trick:Take your favorite MP3, and convert it to a WAV file. You should end up with a file that's around 30 or 40 megabytes. Now, this would be a pretty lame trick if that's all there was to it. Of course you can get WAV files to play, but you want MP3s! Don't give up yet; continue reading.Take that WAV file and load it into Sound Recorder (you know, the lame recording program that comes with Windows as an accessory, shown in Figure 7.3).

Figure 7.3: The Windows Recorder in all its glory.
Select Properties from the File menu, then click the Convert Now button. A dialog box will appear (as shown in Figure 7.4).

Figure 7.4: The Convert Now dialog box.
From the Format drop-down, select MPEG Layer 3.Click OK twice to dismiss the little dialog as well as the property page. Nothing will appear to happen.From the File menu, select Save. Now if you go to Explorer and look at the file size of the WAV file, you'll see it's now on the order of one or two megabytes—the size of an MP3.Sure enough, now Ch3p1 will load and play this WAV file.What's going on? First of all, remember that a WAV file is a RIFF file, and a RIFF can store many different types of data. Up until now, you've used WAV files to store PCM formatted data (a.k.a. uncompressed sound). But WAV files can actually store sound in a variety of compressed formats, including MPEG Layer 3.If you open your converted WAV file in a binary viewer, you'll notice that it still has the RIFF header, except the PCM data has been replaced by MP3 data. This is the secret—the DirectMusic loader can load an MP3 if it's got a RIFF header on it, like a WAV file. The RIFF header says, "Hey, this WAV is compressed using MP3 technology," and the Loader then uses the MP3 codec to decompress the pseudo-MP3 and load it into a segment.Remember that normal MP3 files have no header data—they just contain a whole bunch of frames, and maybe an audio tag at the end. The DirectMusic Loader doesn't know what to make of this, so it assumes it's an unsupported file format and issues the DMUS_E_UNSUPPORTED_STREAM error. But if you take that same set of MP3 frames and wrap it inside a WAV RIFF header, the Loader can make sense of it, and load it without problems.
Tip | The converted WAV file should actually be smaller than the MP3 you started with. This is because the converter can only convert into MP3s with a 56kbps bit rate, whereas your original was probably somewhere in the 128–192 kbps range. The 56kbps rate severely degrades the sound, so you don't want to use it in an actual game (even though it does make for a cool demo). |
So, if you're in a pinch, and you absolutely must load things directly through the Loader, but you still want the small file sizes of MP3 files, this is one way to get the best of both worlds.
Playing MP3s Using DirectShow
To play a regular MP3 file, you'll need to enlist the help of another DirectX component: DirectShow.DirectShow's specialty is in dealing with complex compressed file formats, like MP3. You can also use it to play WMA files, as well as AIFF, AU, SND, and even WAVs and MIDIs. Happily, the process I'm about to describe to you will work without one change for all of those types of files. This makes it well worth the effort to learn a bit about how DirectShow works.
DirectShow Interface Basics
DirectShow works by stringing together filters. Essentially, a filter is something that does one thing to a multimedia stream. Think of filters as the software equivalent to rack hardware you so desperately want to buy at your favorite music shop. For example, a record player's job is to feel out the grooves on some vinyl and send electronic pulses out the audio L/R output jacks. If you're like me, you'll probably want to connect the output of the record player to the input of an equalizer, whose sole job is to adjust the amplitudes of sound frequencies. Then, you'll probably want to connect the output of the equalizer to the input of the amplifier, whose job is to send those electronic pulses out to your speakers.It works the same way in DirectShow. For example, to play an MP3 file, you might start with an MP3 codec, which will unpack the MP3 and send the audio bytes out. You might connect the output of the codec to the input of the Default DirectSound device, which is analogous to the amplifier—its job is to send the audio bytes out to the speakers. Of course, you could get fancier than that, but you don't have to. What you've created is a simple filter graph. In DirectShow terminology, a filter graph is a set of filters all wired together.In fact, you don't even have to deal with connecting the filters at all. You can use DirectShow's filter graph manager to automatically create standard filter graphs for different types of source files. You just say, "Hey, Mr. Filter Graph Manager sir, I want to play this MP3," and the graph manager takes care of all the dirty work of hooking up the filters.So, getting an MP3 to play takes a couple of steps, but it's nothing too complex. Here's how it works.
Streaming an MP3
The first thing you must do is create some interfaces. You'll need three: an IGraphBuilder interface, which you'll tell to build the filter graph; an IMediaControl interface (no relation to Windows's Media Control Interface API), which has methods for playing, pausing, and stopping a filter graph; and an IMediaEvent interface, so you can listen for events (like the event that says, "Hey, your MP3 is done playing"). Some people like to also create a fourth interface, IMediaSeeking, so they can have virtual fast forward and rewind buttons.Of those interfaces, the graph builder interface is the only one you have to create. You can query the graph builder for the other three interfaces.Here's how the creation code works:
CoCreateInstance(CLSID_FilterGraph, NULL, CLSCTX_INPROC_SERVER,
IID_IGraphBuilder, (void **)&pGraphBuilder);
pGraphBuilder->QueryInterface(IID_IMediaControl,
(void **)&pMediaControl);
pGraphBuilder->QueryInterface(IID_IMediaEventEx,
(void **)&pMediaEventEx);
Here you can see the code to create a graph builder and query for the other two interfaces. This is pretty standard stuff for you COM programmers. The IIDs are defined in dshow.h.Once you have your interfaces created, you tell the graph builder to build the filter graph by calling its RenderFile method, like so:
WCHAR widefilename[MAX_PATH];
DXUtil_ConvertGenericStringToWide( widefilename, filename.c_str());
mp3->m_GraphBuilder->RenderFile(widefilename, NULL);
As with most things DirectX, RenderFile wants a wide-character version of the name of the file for which you want it to create the graph. The second parameter is currently reserved, and must be set to NULL.Once you have the filter graph in place, you can press the virtual play button by using the media control interface:
ThrowIfFailed(pMediaControl->Run(), " Run failed.");
There are also media control interface methods for stopping (Stop) and pausing (Pause). However, Stop doesn't automatically rewind the MP3 to the beginning—for that, you'll need to use the IMediaSeeking interface.Once the MP3 is done playing, you can Release all of your interfaces. Again, follow good COM programming tactics and release them in the opposite order you created them: media control, media seek (if applicable), media event, and finally IGraphBuilder.
DirectShow Messages
As you play, stop, and otherwise manipulate the filter graph, DirectShow sends you messages, called events. You retrieve these messages using the IMediaEvent interface, like so:
long eventcode, param1, param2;
while (m_MediaEventEx->GetEvent(
&eventcode, ¶m1, ¶m2, 0) == S_OK) {
if (eventcode == EC_COMPLETE) {
// MP3 is done playing
}
m_MediaEventEx->FreeEventParams(eventcode, param1, param2);
}
This code looks very similar to what you did last chapter with DirectMusic notifications. Essentially, you call the GetEvent method, which pulls the first event off the internal queue and returns it in the spaces you've provided. If there are no more messages, GetEvent doesn't return S_OK, and the loop ends. Once you're finished with the event, you must call FreeEventParams to clean up the memory DirectShow allocated.Nothing to it! Now that you know all that, here's how to add MP3 support to the audio engine.
Adding MP3 Support to the Audio Engine
For starters, you need to link with strmiids.lib, which contains the IIDs of the interfaces you're using. That's easy enough.Next up—some new classes. I've created a CDirectShowMusic class which represents any type of audio file you can stream from DirectMusic. CDirectShowMusic is derived from CSound, the top-level generic sound object.I derived a CMP3 class from CDirectShowMusic just in case you wanted to add any methods specific to MP3s (say, manipulating their audio tags). For that matter, I also created a new CMP3AudioTag class, which you got a glimpse of a couple sections ago.Right now, CMP3 is essentially an empty derived class. It just contains a CMP3AudioTag. All the interesting code is in CDirectShowMusic. The CDirectShowMusic class implements the pure virtual functions Play, Stop, and IsPlaying (defined in CSound).The most difficult part of adding MP3 support to the beloved audio engine is handling the events. There's no easy way to ask a filter graph if it's still playing, so the audio engine needs a bulletproof way of emulating this, so that the IsPlaying pure virtual function works reliably.At first it seems easy: make a Boolean member variable and set it to false in the constructor. Whenever Play is called, set it to true, and whenever Stop is called, set it to false. But this doesn't handle the case in which the MP3 file plays in its entirety, then stops on its own. To capture this requires listening for DirectShow events.So, I tacked some new code onto the DispatchNotificationMessages method of the audio manager (the method you learned about in the last chapter). DispatchNotificationMessages now calls a static method of CDirectShowMusic, called DispatchDirectShowNotifications.DispatchDirectShowNotifications loops through all of the CDirectShowMusic classes currently instantiated, and calls the non-static member function ProcessMyDirectShowNotifications. This method works as described in the previous section: it calls GetEvent repeatedly until no more events are in the queue. If it sees an EC_COMPLETE event, it sets the Boolean m_IsPlaying to false. This guarantees that m_IsPlaying is always accurate.Finally, I added one new method to CAudioManager. The new method, LoadMP3, loads an MP3 into a new CMP3 object and returns it in a CSoundPtr. Pretty run of the mill stuff, but be sure to step through the code if you're unclear on anything.