Chapter 13: 3D Sound Using DirectX Audio

Have you had enough of 2D sound? Are you sick of having to manually set a sound's volume and panning position? Do you want cool effects (like the Doppler effect) in your game? Then read on, because this chapter will teach you how to do all that and more! This chapter introduces you to the concepts and techniques behind 3D sound using DirectX Audio.
PRINCIPLES OF 3D SOUND
Before you start learning how to code 3D sound, you first need to spend some time thinking about what 3D sound actually is. When you say a game has 3D sound, what do you mean? This is similar to saying that a game is 3D—for both graphics and sound, there are different degrees to which a game can be 3D. Obviously, first-person shooters have fully 3D graphics (and probably fully 3D sound as well), but what about an RTS whose background is 2D but whose units are 3D?To understand what 3D sound means, you need to understand the principles of how the human brain interprets sound. Scientists studying the perception of sound have discovered that your brain relies on the following cues to orient sound relative to itself:
Loudness: The louder the sound, the closer your brain thinks it is. Of course, different things have different natural volumes, but for any given sound sample, if it's louder, your brain will think it is closer.
Interaural intensity difference: A sound directly to your left will still come in through your right ear, but it will be a little less loud, cueing your brain to interpret the sound as coming from your left.
Interaural time difference: This is similar to the concept above; sounds coming from your left will hit your right ear a millisecond or so later than your left (depending on how far apart your ears are).
Muffling: The shape of your ear is best suited to hear things in front of you. If a sound is behind you, it will sound muffled. To exaggerate this effect, cup your hands behind your ears, with your palms facing out, and then listen to a sound while facing it and with it behind you. It will sound different, and amazingly, scientists can simulate this difference mathematically.
Fortunately, DirectX Audio takes care of all these details for you. All you need to do is tell it where in the virtual world the sound is coming from, and where the player's ears are. To do that, you use listeners and buffers.
Listeners and Buffers
In DirectX Audio's world of 3D sound, there are only two kinds of objects—buffers, which create sound, and listeners, which "hear" sound. Put a more silly way, in THX, the audience is listening, but in 3D sound, the audience is a listener. Usually, you'll have several buffers and one listener. To compute the sound coming out of your speakers for a particular buffer, DirectX looks at the position of that buffer relative to the position of the listener.
Tip | If a tree falls in a forest, and no one is around, does it still make a sound? If a 3D sound buffer containing a sample of a tree falling starts playing, and there's no 3D listener, it doesn't make a sound. In 3D sound, you need at least one buffer and one listener to hear anything out your speakers. |
For example, if your listener is at position (0,0,0) in your 3D world, and there's a chainsaw buffer at (-10,0,0), the sound will come (mostly) out the left speaker. If the chainsaw buffer is at (-20,0,0), it'll still be mostly out the left speaker, but its volume will be reduced.
Tip | Again, just because a 3D sound buffer is positioned directly left of a listener doesn't mean that sound from that buffer will only come out the left speaker, and the right speaker will be completely silent. In real life, if a sound is generated to your left, you hear it through both ears, but the sound waves arrive at your right ear just a little bit later and softer than at your left, and your brain interprets this to mean that the sound is coming from the left. 3D sound simulates what you'd hear in real life, so sound will come out the right speaker, but it will be softer and slightly delayed (based on mathematical principles of sound perception), so that together both speakers "trick" your brain into thinking the sound is coming from your left. This is an important principle of 3D sound—it's not about just scaling a sound's volume and panning it to one speaker or another; it's about simulating actual sound waves. |
The Doppler Effect
If you haven't heard of the term before, imagine you're standing next to some train tracks. In the distance you see a train. It's going fast. Just as the train is about to pass you, it sounds its whistle, at a constant pitch. As the train passes you, the pitch of the whistle appears to drop. That's the Doppler effect, named for the Austrian mathematician and physicist, Christian Doppler, who discovered it and first modeled it mathematically in the nineteenth century.DirectX Audio can simulate the Doppler effect, but in addition to the positions, it also needs the velocities of the listener and the buffer.
AUDIO CLIP | Listen to Audio Clip 13.1 for an example of the Doppler effect. |
Properties of Buffers and Listeners
Now that you've got a grip on all of the concepts, here are the actual variables that DirectX Audio uses to properly simulate 3D sound.
Position
For starters, both listeners and buffers have a position, a coordinate triplet (x,y,z) specifying their position in world space. Note that there are no matrices or local coordinate spaces; you simply tell DirectX Audio, "This chainsaw buffer is at (50,-50,10) in world space."
Velocity
Listeners and buffers also have velocities. These are specified as direction vectors with magnitude. In practice, you can form a velocity vector by using the object's position in the last frame, and its position on the current frame.
Minimum and Maximum Distance (Buffers Only)
A 3D sound buffer has two properties that specify how far away you can hear it from, and how close you have to get until it becomes maximum volume. These two properties are the minimum and maximum distances (see Figure 13.1).

Figure 13.1: The minimum and maximum distances tell DirectX Audio how far away a sound can be heard.
The minimum distance specifies how close you have to get for full volume on the buffer. If you get closer than this, the volume is capped—DirectX Audio won't amplify your buffers. Similarly, the maximum distance is the distance at which the sound fades to nothing. If you're a smidgen closer than this distance, you can just barely hear the sound; if you're slightly farther away, you hear nothing. Skip to the distance factor property section if you want to know how DirectX Audio interpolates between the minimum and maximum volumes of a buffer.
Orientation (Listener Only)
A DirectX Audio listener also has an orientation, so that DirectX Audio knows which direction is "front," and can properly muffle sounds coming from the back. Orientation is specified by two vectors: an up vector and a front vector. If you're making a FPS shooter, this is easy—you can get your orientation vectors from your camera.
Sound Cones (Buffers Only)
Buffers don't have orientation, though they have something similar called a sound cone. A sound cone is akin to a directional light's light cone. Essentially, you specify an inner and outer cone. A listener inside the inner cone will hear the sound at the maximum volume possible based on his distance from the sound (see Figure 13.2).

Figure 13.2: The various buffer properties that influence the volume of a sound.
Tip | By default, a sound's cone is set up so that the outer cone and inner cone are the same, and the angle of both cones is 360 degrees. If you think about it, when the angle of the sound cone is 360 degrees, it's not really a sound cone any more. Instead, it becomes a sound sphere, and the sound no longer has any orientation—it's like a point light. |
A listener in between the inner and outer cones will experience two volume cuts—one based on how far away he is from the object, and one based on his position between the inner and outer cone. (Actually, there might be more than just two volume cuts, as the listener's orientation and a few other things are also thrown into the mix, but the basic idea is that a sound cone, and specifically the listener's position between the inner and outer sound cone, is just one more factor in determining the overall volume of the sound.)
Rolloff Factor (Listener Only)
The rolloff factor is a property unique to the listener (buffers don't have rolloff factors). It specifies the attenuation of all the buffers, in other words, how quickly sounds fade out as the listener moves away from them, or how quickly they fade in when the listener travels toward them. DirectX Audio's rolloff factor is actually a multiplication factor specifying how much to deviate from the real world. With a rolloff factor of one, sound volumes in the game are exactly as they would be in real life. A rolloff factor less than one (but greater than zero—negative values are illegal) specifies that sound attenuation happens less in DirectX Audio than it does in the real world. In other words, a value of 0.5 means that you can hear sounds from farther away.Specifically, if a sound becomes inaudible at 40 meters away in the real world, in DirectX Audio's virtual world it becomes inaudible at 80 meters away. If the rolloff factor is zero, sound volumes don't change based on distance—you can either hear something, or you can't.
Distance Factor (Listener Only)
There's also a property of the listener called the distance factor. Similar to the rolloff factor, the distance factor is a multiplier for the unit of measure used in the game. By default, DirectX Audio assumes that all units are in meters. That is, if you have a listener at (10,0,0) and a sound at (40,0,0), DirectX Audio interprets that as "the buffer is 30 meters away from the listener." You need to make sure that if DirectX Audio thinks something is 30 meters away, then it appears on the screen as if it were 30 meters away. Sure, you could do this by modeling your game world using meters, but if you've already modeled using some other real-world units, you'll need to set this scaling value appropriately. For example, if all of your models are in feet (that is, the two local coordinates for a yardstick in your virtual world are (-1.5, 0, 0) and (1.5, 0, 0)), you'll need to set your scale value to 0.3048 (the number of feet in a meter). The distance factor is mainly used when DirectX Audio calculates the Doppler effect.
Tip | At first thought, it may seem weird that DirectX Audio made the distance and rolloff factors a property of the listener, and not the buffers. After all, what if you want to set distance factors on a per-buffer basis, so that certain buffers could be heard further away than other buffers?To simulate that, you should use buffer sound cones. Think of the distance factor and rolloff factor as global scaling values you can use in a pinch. You can use the distance factor to avoid re-scaling all of your graphics, and you can use the rolloff factor for special effects. For example, if your player finds a "Helmet of Keen Hearing" powerup, you can pull the rolloff factor closer to zero to simulate superhuman hearing abilities. |
Doppler Factor (Listener Only)
Like the distance and rolloff factors, the Doppler factor is a multiplier that scales all Doppler effects. This is useful when you want to create the sensation of speed without actually having to make a big world and put fast moving things in it. You can create a smaller world, with slower moving objects, and just scale up the Doppler factor so that they sound like they're going really fast.
Processing Mode (Buffers Only)
DirectX Audio allows you to feed it the buffer properties you've just learned about in one of two modes: normal mode or head-relative mode. You can set the mode for each buffer independently.In normal mode, you specify position and orientation in world space. That's easy enough.In head-relative mode, position and orientation are relative to the listener. This means that if your listener is at position (7,3,0), and you tell DirectX Audio that there's a buffer at position (-3,5,0) in head-relative mode, you're saying that the buffer is three units to the left and five units above the listener. DirectX Audio will calculate the actual world position of the buffer to be (4,8,0) (see Figure 13.3).

Figure 13.3: In head-relative mode, buffer positions are relative to the listener.
The mode you should use depends on your sound effect. You may find it easier to use head-relative mode for buffers that are always going to be attached to the listener (imagine those things bolted directly onto the player's head). For example, in a simple flight simulator, you could use head-relative mode for the plane's engines, provided that your player could never rotate his virtual pilot's head in the game—he could only rotate the plane itself. If the player could rotate his own head in the game, you wouldn't want to use head-relative mode for the engines.
Immediate versus Deferred Settings
Every time you change one of the variables of a buffer or the listener, DirectX Audio recalculates things and remixes the final sound output right then and there. This can lead to a huge waste of resources if you're changing a whole bunch of buffers, because every time you change one, DirectX Audio recalculates everything!To save resources in this situation, you'd want to use deferred settings. With deferred settings, you tell DirectX Audio to hold off on recalculating until you've given it all of the new parameters— definitely the way to go if you're updating all the buffers and the listener every frame. This chapter's sample code will show you how to use deferred settings—essentially, you pass the DS3D_DEFERRED flag when you're calling the methods to set the buffer and listener parameters, and then at the very end you call CommitDeferredSettings to apply all of the changes. It's a slight added hassle, but one that's well worth it to conserve processing power!