A BRIEF HISTORY OF GAME AUDIO
Game audio has undergone many changes throughout the years. When fanny packs were high style and Pac-Man was hi-tech, it was much too difficult for games to use samples as sound effects. The most powerful gaming machines of the ‘80s had between 8KB and 128KB of memory, and a typical sound effect, even at poor quality, was between 30 and 50KB.Early PC audio revolved around what's known as frequency modulation synthesis, or FM synth. I remember the first Sound Blaster card I bought for my 386, proudly proclaiming that it supported FM synth.AUDIO CLIP | Play Audio Clip 1.5 to hear early FM synth music. |
FM synthesis uses algorithms to re-create sound waves on the fly. Instead of storing an actual sampled wave, FM synth approximates that wave by mathematically re-creating waves of different shapes, frequencies, and volumes. For example, FM synth says something like, "Well, if I create this kind of wave with this frequency, and combine it with another wave of frequency, y, I get a sound that's pretty close to a trumpet."Unfortunately, pretty close didn't sound too great, especially on instruments like trumpets and snare drums. Flutes and other instruments whose sound was close to a pure sine wave sounded tolerable, and synthesizer instruments sounded fine, but everything else was pretty bad—cheap electronic keyboard bad.Fast forward a couple of years to the next revolution, which occurred when the Amiga catapulted the MOD music format into popularity. MOD music sounded vastly better than FM synth because it used real samples instead of mathematical approximations. The MOD music format originally supported four tracks, and at any given time each of those four tracks could be playing one sample at any pitch. Computers now had enough memory to store instrument samples in memory, and enough processing power to manipulate the pitch of those instruments in real time. The sound quality wasn't great—most of the samples were in the 11KHz or 22KHz sample rate, but it was vastly better than FM synth. Drums especially sounded much better because they were recordings of actual drums.
AUDIO CLIP | Play Audio Clip 1.6 to hear MOD music from the early days of tracking, and listen to Audio Clip 1.7 for a clip from the later days. |
This technique was a primitive form of what's now known as Wave Table synthesis. Wave Table synth gets its name from the fact that the creation of the music relies on a table of sampled waves. Check out Figure 1.5 for a comparison between FM synth and Wave Table synth.

Figure 1.5: A comparison between FM synth and Wave Table synth.
It's debatable which tracked music format came first, but once the idea of tracked music caught on, dozens of trackers sprung up: StarTrekker, ProTracker, NoiseTracker, Oktalyzer, and many others. A demo group calling themselves the Future Crew created and popularized the STM (ScreamTracker) and, later, S3M (ScreamTracker 3) music formats, which allowed more channels, better effects, and higher instrument sample rates. Impulse Tracker took tracked music even further, bringing even more advanced sampling abilities to the video gaming masses. For a few years in the early to mid nineties, tracked music was all the rage.
Tip | Descendants of the MOD music format are still in use today. Frequently, games distributed online use some form of tracked music because it's much smaller than sampled songs. A typical tracked piece of music is in the 200–300KB range, whereas a similar MP3 is usually measured in megabytes.Tracked music has a very unique sound to it. Often, game developers use it for techno songs. I've included a small spattering of tracked music, including some tracked songs I created back in the day, on the CD. Tracking is still somewhat popular, so I've also thrown in links to a couple of tracking Web sites. |
MIDI
Parallel to all of this FM synth and tracked music development, there was another revolution occurring—this time in the land of MIDI. MIDI is an acronym for Musical Instrument Digital Interface. MIDI version 1.0 went gold in 1983, and was quickly adopted by professional digital musicians. As the price of MIDI synthesizers fell, amateurs and the public in general started using MIDI.MIDI is very similar to tracked music, only with a slightly different philosophy behind it. Tracked music started as a closed system—the CPU did everything; it stored the instruments, stored the notes, applied the effects, and played the finished product through its own speakers.Conversely, MIDI's philosophy is that of several machines working together to create great sound. A typical professional MIDI setup includes several devices, each with its own purpose. A MIDI buff might have a drum machine, a synthesizer, multiple instrument banks, a computer, and a few effects processors. All of these devices need to talk to each other, and MIDI was developed so they could.At its core, MIDI is really just a special kind of networking protocol. It specifies how notes are stored, and how they're transmitted across wires to different devices. In particular, the MIDI interface specified a set of 128 different instruments—everything from flutes to violins, to special effects like gunshots and helicopters. These 128 different instruments were called the General MIDI Level 1 Sound Set, or GM sound set for short. That way, when a MIDI synthesizer got a message saying, "Play a fifth octave B-flat at this volume using instrument #47," it would know what instrument had code 47, and could re-create the appropriate timbre. The GM sound set was sort of like the ASCII character set for MIDI. It's important to note, however, that although the MIDI specification dictated which number went with which instrument, it didn't say a thing about how the instruments actually sounded.(Imagine your MIDI data as sheet music for a player piano. The song data—the sheet music—is the same sequence of note on/note off commands, but how good it sounds is directly related to the quality of the player piano that plays it.) The sound of the instruments was left up to the MIDI hardware manufacturer. Most professional hardware manufacturers sampled real instruments and stored the samples in ROM. Most consumer-level sound hardware used FM synth to re-create the GM sound set, because storing real samples in ROM would have made the cards too expensive for your average consumer (remember, this was back when a couple of megabytes of memory cost what half a gig does now). It was much more efficient to store the mathematical recipes for making "close enough" instrument approximations than to store the sampled instrument waves themselves.The GM sound set was broken down into the categories shown in Table 1.1.
Numeric Range | Family |
---|---|
1–8 | Pianos |
9–16 | Chromatic Percussion |
17–24 | Organs |
25–32 | Guitars |
33–40 | Basses |
41–48 | Strings |
49–56 | Ensembles |
57–64 | Brass Instruments |
65–72 | Reed Instruments |
73–80 | Pipe Instruments |
81–88 | Synth Leads |
89–96 | Synth Pads |
97–104 | Synth Effects |
105–112 | Ethnic Instruments |
113–120 | Percussion Instruments |
121–128 | Sound Effects and Special Effects |
DLS1
So, in the nineties, all of these different approaches to game music were swimming together. MIDI and tracked music were the two most popular, and they both had their benefits and drawbacks.MIDI was great because MIDI files were typically very small (less than 100KB) because the instruments of a song weren't stored with the song itself.Unfortunately, this meant that good playback of a MIDI song depended significantly on the quality of the instruments. Professionals had instrument libraries with high-quality samples of the 128 General MIDI instruments, but everyone else (game players in particular) had shoddy approximations of those instruments, frequently using FM synth instead of sampled sound. This meant that a MIDI file would sound great when played through the (professional) setup of the composer, but would sound awful when played on the typical consumer (game player) hardware.The alternative was tracked music. Tracked music was great because the instruments were contained directly in the song. This meant that the songs sounded the same regardless of the hardware that played them. Also, because tracked music used actual song samples instead of FM synth, the quality was often better, and musicians could create their own instruments instead of being limited to the 128 MIDI ones. Some tracked music actually had lyrics created by making custom instruments for each word or phrase in the song!The drawback was that these songs were huge (remember, 100–200KB was a lot back then— almost a quarter of a 3.5 inch disk's capacity, and that was for just one song). Also, they required more CPU power because the CPU had to calculate the appropriate sample pitches and mix all the data together itself.Progress eventually saved the day. Once hardware got faster and storage space got bigger, the scene was set for tracked music and MIDI music to merge into one unstoppable super format. In 1997, the MMA (MIDI Manufacturer's Association) approved and ratified the specification for Downloadable Sounds Level 1 (DLS1). DLS1 gave consumers real wave-table synthesis (that is, synthesizing instruments by using a table of wave samples to represent their notes).In fact, DLS1 took MIDI one step further by allowing each instrument to have multiple samples. The samples were broken out by region; you could have one "low note" sample for the low notes of a piano and another "high note" sample for upper octaves. In fact, you could have up to 16 different regions for each instrument. To do this same thing in tracked music required a little trickery.DLS1 greatly advanced the popularity of MIDI as a game music format. The DLS spec included guidelines on how the instruments should sound, and all of a sudden, MIDI files played on consumer hardware began to sound very similar (and certainly close enough) to their professional hardware counterparts. Game music composers could now safely write MIDI knowing that what they heard would be very close to what the player heard. Further, musicians were no longer limited by the 128 GM instruments—they could sample whatever they wanted and include their custom instruments in the MIDI song, like the tracked music formats could.Microsoft further solidified DLS's reign by including, as part of DirectX, a GM sound set of sampled instruments created by Roland, (a highly respected manufacturer of musical instruments, MIDI keyboards, and MIDI software). Now, as long as game developers used that same DirectX software synthesizer, what they heard was what the player heard.
DLS2
The MMA knew that when you had a hit, as they so clearly did with DLS, you made a sequel. In 1999 they ratified the DLS level 2 specification. DLS2 did a number of technical things to fix technical shortcomings in the DLS1 specification. I won't list them all here, but suffice it to say, the DLS2 is better than DLS1, and there was much rejoicing.DirectMusic 8 was the first to support DLS level 2. Microsoft's synthesizer, included in DirectMusic, is DLS level 2 compliant and requires very little CPU usage, even on slower machines.
CD Audio (Redbook Audio)
I haven't talked about CD audio yet; I've decided to save the best for last. For commercial games (distributed on a CD), CD audio has been by far the music storage mechanism of choice.Redbook audio allowed games published on CD to skip the whole MIDI/tracked music mess. Songs on CD sounded exactly the same no matter where they were played. The CD-ROM drive did most of the work associated with playing a track, so there was little CPU usage, and a CD held so much data (650MB) that storage space was a non-issue.
Tip | CD audio is commonly called Redbook audio because the original spec came from Sony in a red binder. The Redbook audio spec is also the exact specification audio CDs are written to. All audio CDs are made according to Sony's original Redbook specification. If you have a game stored on CD and you can play its music in your car's CD player, the game is using Redbook audio as its music format. |
Redbook Audio: Squeezed Out?
The first CD prototype was developed by Philips in 1979, and the first CD player was released by Sony in 1983 (and cost $1,000). The first album you could play on it was Billy Joel's 52nd Street. In 1983 there were 30,000 players and 80,000 discs in the US. Seven years later there were 9.2 million players and 288 million CDs.The skyrocketing popularity of the CD, combined with the seemingly limitless storage capacity it made available to PCs, earned it the reputation of being the best way to store audio content for video games. However, nowadays, Redbook audio is becoming less attractive for game development. 650MB isn't nearly as much as it once was, and why use uncompressed Redbook audio tracks when you can deliver (virtually) the same audio experience using MP3s, for a tenth of the storage space? Also, CD music has no chance of being dynamic. But MIDI synthesis, or even sampled music snippets, can form the building blocks of dynamic soundtracks that can change based on what's going on in the game.For many games, Redbook audio remains a great choice, but it's lost a lot of ground lately to compressed and dynamic music scores. The constraints that once made Redbook audio a great idea are slowly becoming a thing of the past, and it remains to be seen whether Redbook audio will continue to be a popular format for games.
So Where Does All This Leave Me?
Chapter 6, "MIDI/DirectMusic Playback," for details).In the end, for commercial games, it usually comes down to a choice between MP3 (for static soundtracks) and DirectMusic or your own system for dynamic music. For online games, it's usually a choice between MP3 and tracked music.At any rate, I hope you've enjoyed this whirlwind tour of game music through the ages. Your CD contains additional links to Web pages that can supplement what I've talked about here—check them out!