17.1 PC Audio Types
Sound cards support two categories of
audio, which are detailed in the following sections:
Waveform audio files, also called simply
sound files, store actual audio data. When you
record waveform audio, the sound card encodes the analog audio data
in digital format and stores it as a file. When you play waveform
audio, the sound card reads the digital audio data contained in the
file and converts it to analog audio, which is then reproduced on
speakers or headphones. Waveform audio files can store any type of
audio, including speech, singing, instrumental music, and sound
effects. The playback quality of waveform audio depends primarily on
how much detail was captured in the original recording and how much
of that data, if any, was lost when compressing the data before
storing it on disk. Uncompressed waveform audio files (such as .WAV
files) are large, requiring as much as 10 MB per minute of audio
stored. Compressed audio files may be 1/20 that size or smaller,
although high compression generally results in lower sound quality.
Rather
than storing actual audio data, Musical Instrument Digital
Interface (MIDI ) files store
instructions that a sound card can use to create audio on the fly.
MIDI audio files store only instrumental music and sound effects, not
speech or singing. Originally used almost solely by professional
musicians, MIDI is now commonly used by games and other applications
for background music and sound effects, so MIDI support has become an
important sound card issue. Because MIDI sound is created
synthetically by the sound card, playback quality of MIDI files
depends both on the quality of the MIDI file itself and on the
features and quality of the MIDI support in the sound card. A MIDI
file of an orchestral concert, for example, may sound like a
child's toy when played by a cheap sound card, but
may closely resemble a CD recording when played by a high-end sound
card. MIDI audio files are small, requiring only a few KB per minute
of audio stored.
17.1.1 Waveform Audio
Waveform audio
files are created using a process called
sampling or digitizing to
convert analog sound to digital format. Sampling takes periodic
snapshots, or samples, of the instantaneous state of the analog
signal, encodes the data, and stores the audio in digital form. Just
as digital images can be stored at different resolutions according to
their intended use, audio data can be stored at different resolutions
to trade off sound quality against file size. Five parameters
determine the quality of digital sound files and how much space they
occupy:
Sample size specifies how much data is stored
for each sample. A larger sample size stores more information about
each sample, contributing to higher sound quality. Sample size is
specified as the number of bits stored for each sample. CD audio, for
example, uses 16-bit samples, which allow the waveform amplitude to
be specified as one of 65,536 discrete values. All sound cards
support at least 16-bit samples.
Sampling rate specifies how often samples are
taken. Sampling rate is specified in Hz (Hertz, or cycles/second) or
kHz (kilohertz, 1000 Hertz). Higher-frequency data inherently changes
more often. Changes that occur between samples are lost, so the
sampling rate determines the highest-frequency sounds that can be
sampled. Two samples are required to capture a change, so the highest
frequency that can be sampled, called the Nyquist
frequency, is half the sampling rate. For example, a
10,000 Hz sampling rate captures sounds no higher than 5,000 Hz. In
practice, the danger is that higher frequencies will be improperly
sampled, leading to distortion, so real-world implementations filter
the analog signal to cut off audio frequencies higher than some
arbitrary fraction of the Nyquist frequencyfor example, by
filtering all frequencies higher than 4,500 Hz when using a 10,000 Hz
sampling rate. CD audio, for example, uses a 44,100 Hz sampling rate,
which provides a Nyquist frequency of 22,050 Hz, allowing
full-bandwidth response up to ~ 20,000 Hz after filtering. All sound
cards support at least 44,100 Hz sampling, and many support the
Digital Audio Tape (DAT)
standard of 48,000 Hz.
Sampling method specifies how samples are taken
and encoded. For example, Windows WAV files use
either Pulse Coded Modulation
(PCM), a linear method that encodes the absolute
value of each sample as an 8-bit or 16-bit value, or
Adaptive Delta PCM (ADPCM),
which encodes 4-bit samples based on the differences (delta) between
one sample and the preceding sample. ADPCM generates smaller files,
but at the expense of reduced audio quality and the increased
processor overhead needed to encode and decode the data.
Recording format specifies how data is
structured and encoded within the file and what means of compression,
if any, is used. Common formats, indicated by filename extensions,
include WAV (Windows audio);
AU (Sun audio format, commonly used by Unix
systems and on the Internet); AIFF or
AIF (Audio Interchange File Format, used by
Apple and SGI); RA (RealAudio, a proprietary
streaming audio format); MP3 (MPEG-1 Layer 3);
and OGG (Ogg Vorbis). Some formats use
lossless compression, which provides lower
compression ratios, but allows all the original data to be recovered.
Others use lossy compression, which sacrifices
some less-important data in order to produce the smallest possible
file sizes. Some, such as PCM WAV, do not compress the data at all.Compressed formats, such as MP3 and OGG, may use fixed
bitrate (FBR) compression (also called constant
bitrate [CBR] compression), variable bitrate
(VBR) compression, or both (although not at the same
time). FBR compresses each second of source material to the same
amount of disk space, regardless of the contents of that material.
For example, after FBR compression, 10 seconds of silence occupies
the same amount of disk space as 10 seconds of complex chamber music.
VBR dynamically varies compression ratio according to the complexity
of the material being compressed. For example, after VBR compression,
10 seconds of silence may occupy only a few bytes of disk space,
while 10 seconds of chamber music may occupy many kilobytes. VBR
typically provides better sound quality than FBR for a given file
size because VBR uses space more efficiently.Either compression type may use selectable compression ratios or a
fixed ratio. For example, standard MP3 uses FBR compression, but most
MP3 compressors allow you to select among various fixed bitrates,
typically from 64 kilobits/second (kb/s) to 320 kb/s. FBR compression
is exact. If you compress 100 seconds of audio at 256 kb/s, the
resulting file always occupies 25,600 kilobits. Conversely, VBR
compression is approximate because compression varies with the
complexity of the source material. If you use VBR to compress 100
seconds of audio at a nominal 256 kb/s, the resulting file will
probably occupy about 25,600 kilobits, but can be larger or smaller
depending on how easily the source material could be compressed.Some VBR applications use an arbitrary number to specify compression
ratio. For example, Ogg Vorbis allows you to specify quality on a
scale of 0 through 10, where 0-quality is roughly equivalent to 64
kb/s FBR, 5-quality to 160 kb/s FBR, and 10-quality to 400 kb/s FBR.
|
Depending on the recording setup, one channel
(monaural or mono sound),
two channels (stereo sound), or more can be
recorded. Additional channels provide audio separation, which
increases the realism of the sound during playback. Various formats
store one, two, four, or five audio channels. Some formats store only
two channels, but with additional data that can be used to simulate
additional channels.
Table 17-1 lists the three standard Windows
recording modes for PCM WAV, which is the most common uncompressed
waveform audio format, and representative MP3 and OGG modes. MP3 at
256 kb/s uses a little more storage than Windows' AM
radio mode, but produces sound files that are nearly CD quality.
OGG-3 produces files that average about 17.5% smaller than 128 kb/s
MP3 files, but provide superior sound quality. OGG-5 produces files
that average about 40% smaller than 256 kb/s MP3 files, but provide
comparable sound quality. OGG-10 produces files that average about
one-third the size of uncompressed .WAV files, but provide sound
quality that to our ears is indistinguishable from the original CD
audio, even when played on a high-quality home audio system. MP3 and
OGG bitrates are approximate.
17.1.2 MIDI Audio
A
MIDI file is the digital equivalent of sheet music. Rather than
containing actual audio data, a MIDI file contains detailed
instructions for creating the sounds represented by that file. And,
just as the same sheet music played by different musicians can sound
different, the exact sounds produced by a MIDI file depend on which
sound card you use to play it.
|
ago, originally as a method to provide a standard interface between
electronic music keyboards and electronic sound generators such as
Moog synthesizers. A MIDI interface supports 16 channels, allowing up
to 16 instruments or groups of instruments (selected from a palette
of 128 available instruments) to play simultaneously. MIDI interfaces
can be stacked. Some MIDI devices support 16 or more interfaces
simultaneously, allowing 256 or more channels.The MIDI
specification defines both a serial communication protocol and the
formatting of the MIDI messages transferred via that protocol. MIDI
transfers 8-bit data at 31,250 bps over a 5 mA current loop, using
optoisolators to electrically isolate MIDI devices from each other.
All MIDI devices use a standard 5-pin DIN connector, but the MIDI
port on a sound card is simply a subset of the pins on the standard
DB-15 gameport connector (see Chapter 21). That
means a gameport-to-MIDI adapter is needed to connect a sound card to
an external MIDI device such as a MIDI keyboard.MIDI messages are simply a string of ASCII
bytes encoded to represent the important characteristics of a musical
score, including instrument to be used, note to be played, volume,
and so on. MIDI messages usually comprise a status byte followed by
one, two, or three data bytes, but a MIDI feature called
Running Status allows any number of additional
bytes received to be treated as data bytes until a second status byte
is received. Here are the functions of those byte types:
MIDI messages always begin with a status byte,
which identifies the type of message and is flagged as a status byte
by having the high-order bit set to 1. The most significant
(high-order) four bits (nibble) of this byte define the action to be
taken, such as a command to turn a note on or off or to modify the
characteristics of a note that is already playing. The least
significant nibble defines the channel to which the message is
addressed, which in turn determines the instrument to be used to play
the note. Although represented in binary as a 4-bit value between 0
and 15, channels are actually designated 1 through 16.
A data byte is flagged as such by having its
high-order bit set to zero, which limits it to communicating 128
states. What those states represent depends on the command type of
the status byte. When it follows a Note On command, for example, the
first data byte defines the pitch of the note. Assuming standard
Western tuning (A=440 Hz), this byte can assume any of 128 values
from C-sharp/D-flat (17.32 Hz) to G (25087.69 Hz). The second data
byte specifies velocity, or how hard the key was
pressed, which corresponds generally to volume, depending on the MIDI
device and instrument. The note continues playing until a status byte
with a Note Off command for that note is received, although it may,
under programmatic control, decay to inaudibility in the
interim.