6.2 Oracle and Disk I/O Resources
From the perspective of machine resources, an input/output
operation, or I/O, can be defined as the operating system of the
computer reading or writing some bytes from or to the underlying disk
subsystem of the database server. I/Os can be small, such as 4 KB of
data, or large, such as 64 KB or 128 KB of data. The lower and upper
limits on the size of an I/O operation vary according to the
operating system. Your Oracle database also has a block size that you
can define; we refer to this as the database block
size.An Oracle
database issues I/O requests in two basic sizes:Single database block I/Os
For example, one 8 KB datablock at a time. This type of request reads
or writes a specific block. For example, after looking up a row in an
index, Oracle uses a single block I/O to retrieve the desired
database block.
Multiblock I/Os
For example, 32 database blocks, each consisting of 8 KB, for a total
I/O size of 256 KB. Multiblock I/O is used for large-scale
operations, such as full table scans. The number of blocks in one
multiblock I/O is determined by the
initialization parameter DB_FILE_MULTIBLOCK_READ_COUNT.
The Oracle database can read larger amounts of data with multiblock
I/Os, so there are times when a full table scan might actually
retrieve data faster than an index-based retrieval (e.g., if the
selectivity of the index is low). Oracle can perform multiblock
operations faster than the corresponding collection of single-block
operations.
6.2.1 I/O Planning Principles for an Oracle Database
When you're planning
the disk layout and subsequent placement of
the various files that make up your database, you need to consider
the different reasons Oracle performs I/O and the potential
performance impacts.The
main destinations of the I/O operations Oracle performs are the
following:Redo logsData contained in tablesIndexes on the tablesThe data dictionary, which goes in the SYSTEM tablespaceSort activity, which goes in the TEMP tablespace of the user
performing the sortRollback information, which is spread across the datafiles of the
tablespace containing the database's rollback
segmentsArchived redo logs, which go to the archived log destination
(assuming the database is in ARCHIVELOG mode)
The following simple principles for managing these types of I/O can
optimize Oracle's use of the database
server's disk subsystem:Use disk-striping technologies to spread I/O evenly across multiple spindles
These
technologies are covered in detail in Section 6.2.2.
Use tablespaces to clearly segregate and target different types of I/O
Using tablespaces to segregate objects simplifies tuning later on.
Oracle implements I/O activity at the level of the datafile, or the
physical object the operating system sees as a file. Placing specific
objects in specific tablespaces allows you to accurately measure and
direct the I/O for those objects by tracking and moving the
underlying datafiles as needed. For example, consider a database with
several large, busy tables. Placing multiple large tables in a single
tablespace makes it difficult to determine which table is causing the
I/O to the underlying datafiles. Segregating the objects allows you
to directly monitor the I/O associated with each object. Your Oracle
documentation details the other factors to consider in mapping
objects to tablespaces.
Place redo logs and redo log mirrors on the two least-busy devices
This placement maximizes throughput
for transactional systems. Oracle writes to all copies of the redo
log file, and this I/O is not completed until all copies have been
successfully written to. If you have two copies of the redo log file,
one on a slow device and the other on a fast device, your redo log
I/O performance will be constrained by the slower device.
Distribute "system overhead" evenly over the available drives
System overhead consists of I/O to
the SYSTEM tablespace for the data dictionary, the TEMP tablespace
for sorting, and the tablespaces that contain rollback segments for
undo information. You should consider the system profile in spreading
the system overhead over multiple drives. For example, if the
application generates a lot of data changes versus data reads, the
I/O to the rollback segments may increase due to higher writes for
changes and higher reads for consistent read functionality.Sort activity can also affect disk I/O. If you can get the majority
of sorts to occur in memory through tuning the SORT_AREA_SIZE
parameter in the initialization file, you can then minimize physical
I/O to the TEMP tablespace. Oracle constantly queries and updates the
data dictionary stored in the SYSTEM tablespace. This information is
cached in the shared pool section of the SGA, so sizing your shared
pool properly is key to overall performance. With Oracle Database
10g, you can allow Oracle to automatically and
dynamically size the different pools in the SGA.
Use a different device for archiving and redo log files
To avoid archiving performance issues due
to I/O contention, make sure that the archive log destination uses
different devices from those used for the redo logs and redo log
mirrors.
Some other principles to consider from the perspective of database
availability include the following:If taking database backups to disk, store the backups on devices that don't contain any database components
This protects the system from
the potential loss of the database and the needed backups from the
failure of an I/O device.
Make sure the device used for the archive log destination doesn't contain any database components or database backups
If the failure of a single device results
in the loss of both database components and archived redo logs, or
backup components and archived redo logs, recovery will be
endangered.
|
6.2.2 Using RAID Disk Array Technology
One
of the most powerful ways to reduce performance bottlenecks due to
disk I/O is the use of RAID disk arrays. RAID stands for
Redundant Array of Inexpensive (or Independent) Disks and is used to
group disks into arrays for two reasons: redundancy and performance.
The use of RAID for redundancy is detailed in Chapter 10. Our focus
in this chapter is on the performance aspects of RAID technology.
operations across multiple spindles, reducing contention on
individual drives. For example, suppose you place a datafile
containing an index on a single drive. If multiple processes use the
index simultaneously, they will all issue I/O requests to the one
disk drive, resulting in contention for the use of that drive.Instead, suppose you place the same datafile on a
"disk" that is actually an array of
five physical disks. Each physical disk in the array can perform I/O
operations independently on different data blocks of the index,
automatically increasing the amount of I/O Oracle can perform without
causing contention.Simply using disk arrays won't, by itself, give you
optimal I/O performance. As discussed earlier, you also have to think
about how to logically place the different types of Oracle files
across the available drives, even if the drives are grouped into
arrays.Disk-striping technology for RAID arrays can be implemented in
various ways, as described in the following sections. All of these
are transparent to your Oracle database. Oracle Database 10g
striping considerations are made simpler through the
introduction of Automatic Storage
Management (ASM). ASM provides automatic striping and rebalancing of
stripe sets. By default, ASM also provides automated mirroring.
6.2.2.1 Host-based software
With
host-based striping, logical volume-management software runs on the
database server. Examples of this type of software include Hewlett
Packard's Logical Volume Manager (LVM) and Veritas
Software's Volume Manager, which act as an interface
between the operating system that requests I/O and the underlying
physical disks. The volume-management software groups disks
into arrays, which are then seen by the operating system as single
"disks." The actual disks are
usually individual devices attached to controllers or disks contained
in a prepackaged array containing multiple disks and controllers.
This striping is handled by the volume-management software and is
completely transparent to Oracle. Figure 6-1 illustrates host-based
volume management.
Figure 6-1. Host-based volume management

Oracle9i Release 2 introduced its own volume
manager for Linux and Windows with the Real Application Clusters
option. Oracle Database 10g includes a portable
cluster filesystem and volume manager in the database that is
leveraged by ASM.
6.2.2.2 Dedicated storage subsystems
Dedicated storage systems, often
referred to as disk farms, contain
disks, controllers, CPUs, and
(usually) memory used as an I/O cache. Vendors include
EMC, Hewlett-Packard
(formerly Compaq) and Storagetek. These subsystems offload the task
of managing the disk arrays from the database server. The I/O
subsystem is attached to the server using controllers. These
dedicated storage devices are sometimes grouped into
storage area networks (SANs) to denote their logical
organization as a separate set of networked devices. The disk arrays
are defined and managed within the dedicated I/O subsystem, and the
resulting logical "disks" are seen
by the operating system as physical disks.This type of disk-volume management is completely transparent to the
database server and offers many benefits:The database server does not spend CPU resources managing the disk
arrays.The I/O subsystem uses memory for an I/O cache, so the performance of
Oracle I/O can improve dramatically (for example, from an average I/O
time of 10-12 milliseconds to 3-5 milliseconds).Write I/O is completed as soon as the data has been written to the
subsystem's cache.The I/O subsystem will de-stage the data from cache to actual disk
later.Read I/O can be satisfied from the cache. The subsystem can employ
some type of algorithm to sense I/O patterns and preload the cache in
anticipation of pending read activity.
Note that you must back up the cache with
some type of battery so a power failure doesn't
result in the loss of data that was written to the cache, but
hasn't yet been de-staged to the physical disk.
Otherwise, data that Oracle assumes made it to disk may be lost,
thereby potentially corrupting the database. Figure 6-2 illustrates a
database server with a dedicated I/O subsystem.
Figure 6-2. Dedicated I/O subsystems

6.2.2.3 Combined host-based and I/O subsystem volume management
In this configuration, disks are
grouped into arrays within the I/O subsystem and grouped again into
coarser arrays using operating system volume management. On EMC
systems, for example, the physical disks are grouped into either
RAID-1 mirrored disk pairs or into a RAID-S
striped
configuration using four disks per stripe set.
RAID-S is the term EMC
(http://www.emc.com) uses for their specialized
striping hardware and software.Using EMC technology as an example, the operating system sees
horizontal sections of disk space across each RAID-1 disk or RAID-S
array as single "disks." You can
use the operating system volume management to group these
"disks" into arrays. With RAID-1
disks, this configuration delivers the benefits of using a dedicated
I/O subsystem with its own cache and processing power while
leveraging striping for simplicity. With RAID-S arrays you get the
benefit of the dedicated I/O subsystem and further simplify disk
management by a striping multiplier effect. An array of five
"disks" at the operating system
level could map back to five arrays of four disks each in the I/O
subsystem. This configuration maps a logical disk seen by Oracle to
20 physical disks in the underlying I/O subsystem. Figure 6-3
illustrates a logical drive on the database server mapping to
horizontal sections across multiple RAID-S arrays.
Figure 6-3. Combining host-based striping and an EMC I/O subsystem

6.2.3 Flexibility, Manageability, and Disk Arrays
Many systems today use some type of RAID technology that groups
multiple individual disk drives, also referred to as
spindles,
into arrays. Each disk array is then treated as a single
logical disk for the purpose of planning I/O.
Striping allows you
to simply spread I/O across multiple disks, without incurring the
planning and administrative overhead of dealing with many individual
disk drives. Striping is transparent to Oracle, but this
doesn't mean that a DBA or system administrator can
ignore it. Understanding the mapping of Oracle database files to
arrays, and the arrays to the underlying disks and controllers, is
crucial to planning and maintaining your Oracle database.The decision about how many disks should be in each array is often
the topic of intense debate. At one extreme, using multiple disks
without grouping any of them into arrays provides the most control
and flexibility because every disk is visible and can be targeted in
isolation by placing certain files on each disk. However, this
approach requires more planning and can result in more ongoing
administration, because you will have to deal with every individual
disk drive. As databases become larger and larger, this approach
becomes unmanageable.At the other extreme, you can group all disks into one single array,
seen by the operating system and Oracle as a single
"disk." This makes for extremely
simple planning and administration; no effort is required to analyze
where you should place the various files, as there is only one
"disk." However, this approach
sacrifices flexibility and leads to brute force solutions to I/O
bottlenecks. If I/O performance across the array is unsatisfactory,
the solution is to add more controllers and disks. The entire set of
disks becomes a black box that either works or
doesn't work as a unit.The most useful configuration is one that balances manageability with
flexibility. For example, consider a system with 1,000 disks. Neither
a single array of 1,000 disks nor a set of 1,000 individual disks is
likely to be appropriate. Perhaps 50 arrays of 20 disks each would
provide the needed I/O performance without any undue administrative
burden. If less flexibility is needed, 20 arrays of 50 disks are more
suitable. On the other hand, grouping all the disks into one array
may be the simplest way to manage a system with only five disks. For
the `right' answer, you must assess
your needs to determine the appropriate balance.Oracle Database 10g
simplifies this debate by automating the striping and
stripe set rebalancing process. ASM divides files into 1 MB extents
and spreads the extents evenly across each disk group. Pointers are
used to track placement of each extent (instead of using a
mathematical function such as a hashing algorithm to stripe the
data). So when the disk group configuration changes, individual
extents can be moved. In comparison to traditional algorithm-based
striping techniques, the need to rerun that algorithm and reallocate
all of the data is eliminated. Extent maps are updated when
rebalancing the load after a change in disk configuration, opening a
new database file, or extending a database file by enlarging
tablespace. By default, each 1 MB extent is also mirrored, so
management of redundancy is also simplified. Mirroring can be
extended to triple mirroring or can be turned off. Although you still
have to consider how many disk groups to use, implementation of these
groups with striping and redundancy is automated with ASM.
6.2.4 How Oracle I/O and Striped Arrays Interact
In almost all large databases,
some type of disk striping increases disk I/O rates without adding
too heavy an administrative burden for managing a large number of
datafiles across many individual disks. The disks may be organized
into RAID arrays using a volume manager on the database server, a
dedicated I/O subsystem, or a combination of both.If you are using an Oracle release without ASM, when you set up
striped
disk arrays, you can set the chunk size used to
stripe across the disks. The chunk size is the amount of data written
to one disk before moving to the next disk in the array.
Understanding the interaction between different stripe chunk sizes
and the two sizes of Oracle I/O is critical in maximizing your I/O
performance.Consider an Oracle database with an 8 KB data block size and the
DB_FILE_ MULTIBLOCK_READ_COUNT initialization parameter set to 32.
There will be two sizes of I/O by Oracle: a single 8 KB data block
and a 256 KB multiblock read (32 times 8 KB). Suppose you then
configure a four-disk array for use by Oracle with a chunk size of 64
KB so that the 256 KB of data will be spread across the four drives,
with 64 KB on each.Each 8-KB I/O will hit one spindle, as the 8 KB will lie within one
64-KB chunk.[2] Striping can increase performance for small
I/Os by maximizing concurrency: each disk can service a different
I/O. The multiblock I/Os of 256 KB may hit all four disks. If the
chunk size was 256 KB instead of 64 KB, on average each 256-KB I/O
call would hit one disk. In this case, the multiblock I/O will
require fewer I/O calls with a larger chunk size on the disks. In
either case, a single disk will clearly satisfy single-data-block I/O
calls. Striping can increase I/O rates for large reads by driving
multiple disks with a single I/O call, as illustrated with a 64-KB
chunk size and a 256-KB multiblock I/O.[2] It's difficult to say
exactly what will occur due to the alignment of the stripe-chunk
boundaries with Oracle data blocks, but to illustrate the single
versus multiple disk point, let's assume the simple
casethey line up! For a more detailed discussion of striping
issues, see the document "Configuring Oracle Server
for VLDB," by Cary Millsap of hotsos.com, formerly
with Oracle Corporation (see Appendix B). Anyone who is tempted is
welcome to perform detailed testing for all the permutations of
stripe chunk size and Oracle I/O. If you happen to perform this
extensive testing, please tell all the rest of us what you
find!
Figure 6-4 illustrates the interaction of different-sized Oracle I/Os
with arrays striped using different chunk sizes.
Figure 6-4. Oracle I/O and chunk size interaction
