Beginning Game Audio Programming [Electronic resources] نسخه متنی

RIFFs

To understand WAV, one must first understand RIFF. That's because the WAV file format is a subset of Microsoft's RIFF technology. RIFF stands for Resource Interchange File Format, a file format designed specifically to contain multimedia data. Interestingly enough, DirectMusic songs and other musical data are also stored using the RIFF format.

The RIFF file format is a beast of mind-boggling complexity. Fortunately, to load a WAV file, you don't have to understand all of it. In fact, unless you're doing something very advanced, you don't even have to understand most of it. All you need to know for a normal WAV file is where each element of data is stored. That's what this section is going to teach you. If you're morbidly curious, you can explore the higher-level RIFF philosophy by cruising around MSDN or by following some of the links I've provided on your CD.

Chunks

Nope, I didn't make that up. The different parts of a RIFF file are actually called chunks. These chunks are like pieces of fruit at the grocery store. Just like a Granny Smith apple has a 4017 tag on it, a chunk has a four-character "chunk ID" that tells you what the chunk is. This special kind of ID is called a FOURCC (four-character code).

A FOURCC is really just a 32-bit integer (it's typedef'd as a DWORD) that has four 8-bit ASCII values packed into it (see Figure 4.2).

Figure 4.2: The FOURCC character code for RIFF—note the character swapping.

A WAV file is made up of three chunks. First, there's the main chunk, which has a FOURCC of "RIFF". Inside the RIFF chunk is data telling you you're dealing with a wave format. Also inside the RIFF chunk are two other sub-chunks: "fmt" (notice the space in there) and "data". The fmt chunk contains the format of the wave file (8 or 16 bits, stereo or mono, sample rate, and so on). The data chunk contains the actual bytes that form the sound wave.

Every chunk has two things in common. The very first four bytes of a chunk will always contain the FOURCC of that chunk. Because the WAV file itself is a RIFF file, the first four bytes of a valid WAV file will always be RIFF.

Tip

Uppercase FOURCCs are standardized; lowercase FOURCCs usually signify a proprietary format.

Next, in every chunk, immediately following the chunk's ID, are 4 bytes that tell you the size of that chunk. Note that this size does not include the size of the chunk's ID, or the size of the size bytes. In other words, the size contained in here is 8 bytes less than the total size of the chunk because the first four bytes are the chunk's FOURCC, and the next four bytes are the size of the chunk. This is tremendously important to remember; if it helps, think of the size bytes as specifying "size from this point forward."

The following sections will examine the main, format, and data chunks in detail.

The Main Chunk

This chunk contains the other two chunks. Its FOURCC is RIFF.

This chunk contains one piece of data. After the chunk ID and the chunk size, the only thing left is a FOURCC that describes what's actually in this RIFF file. For a WAV file, these four data bytes will always contain "WAVE." For other types of RIFF files, there will be different codes here.

So really, all the main chunk says is, "I'm a RIFF file, and I contain a wave."

Note that the size of this chunk is the size of the next two chunks plus the four bytes used to contain the format data.

The Format Chunk

Now it gets interesting. The format chunk contains everything you need to properly interpret the data chunk. As you can see from Figure 4.1, this chunk tells you the audio format, number of channels, sample rate, byte rate, block alignment, and bits per sample of the data.

Audio Format

First, there's the audio format. This is a two byte piece of data that tells you whether the data chunk is compressed. For uncompressed WAVs, this attribute is 1, which means the data chunk is stored using Pulse Code Modulation (PCM). PCM is a fancy way of saying that the wave data is uncompressed.

Values other than 1 indicate that this wave file was compressed somehow (see Table 4.1).

Table 4.1: Common Wave Format IDs
Wave Format	#define in MMREG.H	Number
Unknown (this is bad)	`WAVE_FORMAT_UNKNOWN`	0
Pulse Code Modulation (PCM)	`WAVE_FORMAT_PCM`	1
Adaptive Differential PCM	`WAVE_FORMAT_ADPCM`	2
32-bit floating point	`WAVE_FORMAT_IEEE_FLOAT`	3
CCITT G.711 A-law	`WAVE_FORMAT_ALAW`	6
CCITT G.711 u-law	`WAVE_FORMAT_MULAW`	7
MPEG Layer 3 (MP3)	`WAVE_FORMAT_MPEGLAYER3`	55

There are many other formats besides the ones listed in the table; check out MMREG.H for the entire list.

Tip

There are a few libraries available online that allow you to read various compressed wave formats. I've included links to a few of them on your CD.

As you can see, there are a ton of formats available, and writing code for all of them would take forever. Fortunately, the vast majority of the WAVE files out there are PCM, and you can always use a program to convert a non-PCM wave into a PCM one. If you do have to (or want to) write your own WAV file parser, my advice would be to only support PCM.

Number of Channels

Immediately following the audio format are two bytes that tell you how many channels the WAV file has. If it's mono, there will be a one here; if it's stereo, there will be a two. There's the potential for other numbers—for example, a WAV could contain six-channel surround-sound data. Mono and stereo are the most common, however.

Sample Rate

This is self-explanatory. These four bytes will contain 8,000 for 8,000Hz, 44,100 for 44.1KHz, and so on.

Byte Rate

This field tells you how many bytes are used per second of audio. This number is always the sample rate multiplied by the number of channels, multiplied by the bits per sample divided by eight (remember, 16 bits per sample really means two bytes per sample).

Block Align

This tells you the number of bytes for one sample. It is always the number of channels multiplied by the bits per sample divided by eight.

Bits per Sample

An 8 here means that you're dealing with an 8-bit sample, a 16 means 16 bits, and so on. 8 and 16 bits are the most common values.

The Data Chunk

Once you've read all of the format data, you know exactly how to interpret the data in this section. That's good, because in the data chunk, after the FOURCC and size fields, is an ocean of raw bytes. How these bytes are interpreted depends on the format block.

Tip

Again, it's important to remember that you usually don't have to deal with the wave file format directly; most of the time you'll use the DirectMusic Loader.

For uncompressed WAV files, the size of this data block is always going to be the number of samples multiplied by the number of channels, multiplied by the bits per sample divided by eight.

Putting All This to Use

Now that you know what's inside a WAV file, it seems only natural to enhance our beloved audio engine with a few new methods.

CWAVFile

These methods allow direct manipulation of wave file data through a class called CWAVFile:


class CWAVFile
{
public:
friend class CAudioManager;
void Init();
CWAVFile() { Init(); }
CWAVFile(const CWAVFile &r) { Init(); Copy(r); }
virtual ~CWAVFile();
int m_AudioFormat;
int m_NumberOfChannels;
int m_SampleRate;
int m_ByteRate;
int m_BlockAlign;
int m_BitsPerSample;
unsigned int m_DataLen;
CWAVFile &operator=(const CWAVFile &r) { Copy(r); return(*this); }
unsigned char *GetData() { return(m_Data); }
void SetData(const unsigned char *data, unsigned int len);
void Load(const unsigned char *data);
void Load(std::string filename);
unsigned char *Save() const;
bool Save(std::string filename) const; 
int GetTotalSize() const { return(44 + m_DataLen); }
std::string GetInfo();
protected:
void Copy(const CWAVFile &r); // snip
// data chunk
unsigned char *m_Data;
};

This class keeps track of a WAV file's format and data chunks. It has exposed variables for all of the pieces of information found in the format block, as well as a protected m_Data pointer that stores the data block. I didn't build accessors onto this class because none of these data types are going to change, but if you feel more comfortable using Get/Set methods, go for it.

Tip

CWAVFile has an overloaded assignment operator so that it can properly copy its m_Data pointer. Failure to override operator= for a class that dynamically allocates memory is a major C++ no-no; be careful. And remember your virtual destructors, too.

Loading into CWAVFile

Loading a WAV file into CWAVFile is fairly straightforward once you understand the file format. Notice in the CWAVFile class that there are two overloads for loading—one for a file and one for in-memory data. The file overload just reads the file into memory and calls the in-memory data overload. It does all the real work:


void CWAVFile::Load(const unsigned char *data)
{
FOURCC riff, wave, fmt, datacc;
// check for RIFF ChunkID
memcpy(&riff, &data[0], sizeof(FOURCC));
if (riff != mmioFOURCC('R', 'I', 'F', 'F')) { 
Throw("invalid WAV file data");
}
// check for WAVE format
memcpy(&wave, &data[8], sizeof(FOURCC));
if (wave != mmioFOURCC('W', 'A', 'V', 'E')) { 
Throw("Invalid WAV file data (not WAVE format).");
}
// check for FMT ChunkID
memcpy(&fmt, &data[12], sizeof(FOURCC));
if (fmt != mmioFOURCC('f', 'm', 't', ' ')) { 
Throw("invalid WAV file data (FMT subchunk not found).");
}
// check for DATA ChunkID
memcpy(&datacc, &data[36], sizeof(FOURCC));
if (datacc != mmioFOURCC('d', 'a', 't', 'a')) { 
Throw("invalid WAV file data (DATA subchunk not found).");
}
// load in data members
memcpy(&m_AudioFormat,      &data[20], sizeof(short int));
memcpy(&m_BitsPerSample,    &data[34], sizeof(short int));
memcpy(&m_BlockAlign,       &data[32], sizeof(short int)); 
memcpy(&m_ByteRate,         &data[28], sizeof(int));
memcpy(&m_NumberOfChannels, &data[22], sizeof(short int));
memcpy(&m_SampleRate,       &data[24], sizeof(short int));
unsigned int datasize=0;
memcpy(&datasize, &data[40], sizeof(int));
SetData(&data[44], datasize);
}

This code is pretty pessimistic; before it does any real work, it does everything it can to make sure that data actually points to a valid WAV file. If all the FOURCCs check out, it proceeds to pick the relative data bits out of the memory chunk, using memcpy. Finally, it grabs the wave data itself, via a call to its SetData method. Note that this class doesn't assume ownership of the data pointer; it makes a copy of it, which means the client should free this memory as soon as possible.

The mmioFOURCC function is a handy Win32 function, defined in mmsystem.h, that takes four characters and turns them into a FOURCC. A FOURCC is really a 32-bit value, so there's no compression magic going on, just some clever byte rearranging and casting.

Saving from CWAVFile

The CWAVFile Save methods are symmetrical to the Load ones. There are two Saves, one for saving to disk, and one for saving to memory. The disk overload just calls the memory overload and then saves out that memory.

I'm not going to include the Save method here, but suffice it to say that it allocates enough memory, re-creates the FOURCCs, puts the CWAVFile members into the proper offsets, and tacks on the wave data. Once that's all done, it returns a pointer to the memory it's created, again with the understanding that the caller owns the memory once Save returns.

Tip

For extra credit, try integrating into the audio engine a library that reads compressed wave file formats. It would be really cool if you could tell the audio engine to read a wave file from disk and have it automatically decompress that wave file as it put it into a CWaveFile object.

The Ch4p1_WAVFileXRay sample program shows how to read a WAV file into memory.