Oracle Essentials [Electronic resources] : Oracle Database 10g, 3rd Edition

Jonathan Stern

نسخه متنی -صفحه : 167/ 59

6.2 Oracle and Disk I/O Resources

From the perspective of machine resources, an input/output operation, or I/O, can be defined as the operating system of the computer reading or writing some bytes from or to the underlying disk subsystem of the database server. I/Os can be small, such as 4 KB of data, or large, such as 64 KB or 128 KB of data. The lower and upper limits on the size of an I/O operation vary according to the operating system. Your Oracle database also has a block size that you can define; we refer to this as the database block size.

An Oracle database issues I/O requests in two basic sizes:

Single database block I/Os

For example, one 8 KB datablock at a time. This type of request reads or writes a specific block. For example, after looking up a row in an index, Oracle uses a single block I/O to retrieve the desired database block.

Multiblock I/Os

For example, 32 database blocks, each consisting of 8 KB, for a total I/O size of 256 KB. Multiblock I/O is used for large-scale operations, such as full table scans. The number of blocks in one multiblock I/O is determined by the initialization parameter DB_FILE_MULTIBLOCK_READ_COUNT.

The Oracle database can read larger amounts of data with multiblock I/Os, so there are times when a full table scan might actually retrieve data faster than an index-based retrieval (e.g., if the selectivity of the index is low). Oracle can perform multiblock operations faster than the corresponding collection of single-block operations.

6.2.1 I/O Planning Principles for an Oracle Database

When you're planning the disk layout and subsequent placement of the various files that make up your database, you need to consider the different reasons Oracle performs I/O and the potential performance impacts.

The main destinations of the I/O operations Oracle performs are the following:

Redo logs

Data contained in tables

Indexes on the tables

The data dictionary, which goes in the SYSTEM tablespace

Sort activity, which goes in the TEMP tablespace of the user performing the sort

Rollback information, which is spread across the datafiles of the tablespace containing the database's rollback segments

Archived redo logs, which go to the archived log destination (assuming the database is in ARCHIVELOG mode)

The following simple principles for managing these types of I/O can optimize Oracle's use of the database server's disk subsystem:

Use disk-striping technologies to spread I/O evenly across multiple spindles

These technologies are covered in detail in Section 6.2.2.

Use tablespaces to clearly segregate and target different types of I/O

Using tablespaces to segregate objects simplifies tuning later on. Oracle implements I/O activity at the level of the datafile, or the physical object the operating system sees as a file. Placing specific objects in specific tablespaces allows you to accurately measure and direct the I/O for those objects by tracking and moving the underlying datafiles as needed. For example, consider a database with several large, busy tables. Placing multiple large tables in a single tablespace makes it difficult to determine which table is causing the I/O to the underlying datafiles. Segregating the objects allows you to directly monitor the I/O associated with each object. Your Oracle documentation details the other factors to consider in mapping objects to tablespaces.

Place redo logs and redo log mirrors on the two least-busy devices

This placement maximizes throughput for transactional systems. Oracle writes to all copies of the redo log file, and this I/O is not completed until all copies have been successfully written to. If you have two copies of the redo log file, one on a slow device and the other on a fast device, your redo log I/O performance will be constrained by the slower device.

Distribute "system overhead" evenly over the available drives

System overhead consists of I/O to the SYSTEM tablespace for the data dictionary, the TEMP tablespace for sorting, and the tablespaces that contain rollback segments for undo information. You should consider the system profile in spreading the system overhead over multiple drives. For example, if the application generates a lot of data changes versus data reads, the I/O to the rollback segments may increase due to higher writes for changes and higher reads for consistent read functionality.

Sort activity can also affect disk I/O. If you can get the majority of sorts to occur in memory through tuning the SORT_AREA_SIZE parameter in the initialization file, you can then minimize physical I/O to the TEMP tablespace. Oracle constantly queries and updates the data dictionary stored in the SYSTEM tablespace. This information is cached in the shared pool section of the SGA, so sizing your shared pool properly is key to overall performance. With Oracle Database 10g, you can allow Oracle to automatically and dynamically size the different pools in the SGA.

Use a different device for archiving and redo log files

To avoid archiving performance issues due to I/O contention, make sure that the archive log destination uses different devices from those used for the redo logs and redo log mirrors.

Some other principles to consider from the perspective of database availability include the following:

If taking database backups to disk, store the backups on devices that don't contain any database components

This protects the system from the potential loss of the database and the needed backups from the failure of an I/O device.

Make sure the device used for the archive log destination doesn't contain any database components or database backups

If the failure of a single device results in the loss of both database components and archived redo logs, or backup components and archived redo logs, recovery will be endangered.

Fault-tolerant disk arrays don't eliminate the need for a sound backup and recovery strategy. Fault-tolerant storage merely reduces the likelihood of undergoing database recovery due to the failure of a single drive. For full coverage of Oracle databases and high availability, see Chapter 10.

6.2.2 Using RAID Disk Array Technology

One of the most powerful ways to reduce performance bottlenecks due to disk I/O is the use of RAID disk arrays. RAID stands for Redundant Array of Inexpensive (or Independent) Disks and is used to group disks into arrays for two reasons: redundancy and performance. The use of RAID for redundancy is detailed in Chapter 10. Our focus in this chapter is on the performance aspects of RAID technology.

RAID Basics

RAID disk arrays provide a hardware solution for both reliability and performance. There are different levels of RAID hardware; the following are most relevant to performance:

RAID-0

Where availability isn't a concern, the disks can be configured as RAID-0, which is nonredundant disk striping.

RAID-1

Provides the simplest form of redundancy, full duplication of data, which is referred to as mirroring.

RAID-0+1

Combines the one-to-one mirroring of RAID-1 with the striping of RAID-0.

RAID-3

Provides redundancy by storing parity information on a single disk in the array. This parity information can help to recover the data on other disks should they fail. RAID-3 saves on disk storage compared to RAID-1, but isn't often used because the parity disk can be a bottleneck.

RAID-5

Uses parity data for redundancy in a way that is similar to RAID-3, but stripes the parity data across all of the disks, like the way in which the actual data is striped. This alleviates the bottleneck on the parity disk.

There are additional levels of RAID, including RAID-6, which adds dual parity data, and RAID-7 and RAID-8, which add performance enhancements to the characteristics of RAID-5.

RAID groups disk drives into arrays to automatically spread I/O operations across multiple spindles, reducing contention on individual drives. For example, suppose you place a datafile containing an index on a single drive. If multiple processes use the index simultaneously, they will all issue I/O requests to the one disk drive, resulting in contention for the use of that drive.

Instead, suppose you place the same datafile on a "disk" that is actually an array of five physical disks. Each physical disk in the array can perform I/O operations independently on different data blocks of the index, automatically increasing the amount of I/O Oracle can perform without causing contention.

Simply using disk arrays won't, by itself, give you optimal I/O performance. As discussed earlier, you also have to think about how to logically place the different types of Oracle files across the available drives, even if the drives are grouped into arrays.

Disk-striping technology for RAID arrays can be implemented in various ways, as described in the following sections. All of these are transparent to your Oracle database. Oracle Database 10g striping considerations are made simpler through the introduction of Automatic Storage Management (ASM). ASM provides automatic striping and rebalancing of stripe sets. By default, ASM also provides automated mirroring.

6.2.2.1 Host-based software

With host-based striping, logical volume-management software runs on the database server. Examples of this type of software include Hewlett Packard's Logical Volume Manager (LVM) and Veritas Software's Volume Manager, which act as an interface between the operating system that requests I/O and the underlying physical disks. The volume-management software groups disks into arrays, which are then seen by the operating system as single "disks." The actual disks are usually individual devices attached to controllers or disks contained in a prepackaged array containing multiple disks and controllers. This striping is handled by the volume-management software and is completely transparent to Oracle. Figure 6-1 illustrates host-based volume management.

Figure 6-1. Host-based volume management

Oracle9i Release 2 introduced its own volume manager for Linux and Windows with the Real Application Clusters option. Oracle Database 10g includes a portable cluster filesystem and volume manager in the database that is leveraged by ASM.

6.2.2.2 Dedicated storage subsystems

Dedicated storage systems, often referred to as disk farms, contain disks, controllers, CPUs, and (usually) memory used as an I/O cache. Vendors include EMC, Hewlett-Packard (formerly Compaq) and Storagetek. These subsystems offload the task of managing the disk arrays from the database server. The I/O subsystem is attached to the server using controllers. These dedicated storage devices are sometimes grouped into storage area networks (SANs) to denote their logical organization as a separate set of networked devices. The disk arrays are defined and managed within the dedicated I/O subsystem, and the resulting logical "disks" are seen by the operating system as physical disks.

This type of disk-volume management is completely transparent to the database server and offers many benefits:

The database server does not spend CPU resources managing the disk arrays.

The I/O subsystem uses memory for an I/O cache, so the performance of Oracle I/O can improve dramatically (for example, from an average I/O time of 10-12 milliseconds to 3-5 milliseconds).

Write I/O is completed as soon as the data has been written to the subsystem's cache.

The I/O subsystem will de-stage the data from cache to actual disk later.

Read I/O can be satisfied from the cache. The subsystem can employ some type of algorithm to sense I/O patterns and preload the cache in anticipation of pending read activity.

Note that you must back up the cache with some type of battery so a power failure doesn't result in the loss of data that was written to the cache, but hasn't yet been de-staged to the physical disk. Otherwise, data that Oracle assumes made it to disk may be lost, thereby potentially corrupting the database. Figure 6-2 illustrates a database server with a dedicated I/O subsystem.

Figure 6-2. Dedicated I/O subsystems

6.2.2.3 Combined host-based and I/O subsystem volume management

In this configuration, disks are grouped into arrays within the I/O subsystem and grouped again into coarser arrays using operating system volume management. On EMC systems, for example, the physical disks are grouped into either RAID-1 mirrored disk pairs or into a RAID-S striped configuration using four disks per stripe set. RAID-S is the term EMC (http://www.emc.com) uses for their specialized striping hardware and software.

Using EMC technology as an example, the operating system sees horizontal sections of disk space across each RAID-1 disk or RAID-S array as single "disks." You can use the operating system volume management to group these "disks" into arrays. With RAID-1 disks, this configuration delivers the benefits of using a dedicated I/O subsystem with its own cache and processing power while leveraging striping for simplicity. With RAID-S arrays you get the benefit of the dedicated I/O subsystem and further simplify disk management by a striping multiplier effect. An array of five "disks" at the operating system level could map back to five arrays of four disks each in the I/O subsystem. This configuration maps a logical disk seen by Oracle to 20 physical disks in the underlying I/O subsystem. Figure 6-3 illustrates a logical drive on the database server mapping to horizontal sections across multiple RAID-S arrays.

Figure 6-3. Combining host-based striping and an EMC I/O subsystem

6.2.3 Flexibility, Manageability, and Disk Arrays

Many systems today use some type of RAID technology that groups multiple individual disk drives, also referred to as spindles, into arrays. Each disk array is then treated as a single logical disk for the purpose of planning I/O. Striping allows you to simply spread I/O across multiple disks, without incurring the planning and administrative overhead of dealing with many individual disk drives. Striping is transparent to Oracle, but this doesn't mean that a DBA or system administrator can ignore it. Understanding the mapping of Oracle database files to arrays, and the arrays to the underlying disks and controllers, is crucial to planning and maintaining your Oracle database.

The decision about how many disks should be in each array is often the topic of intense debate. At one extreme, using multiple disks without grouping any of them into arrays provides the most control and flexibility because every disk is visible and can be targeted in isolation by placing certain files on each disk. However, this approach requires more planning and can result in more ongoing administration, because you will have to deal with every individual disk drive. As databases become larger and larger, this approach becomes unmanageable.

At the other extreme, you can group all disks into one single array, seen by the operating system and Oracle as a single "disk." This makes for extremely simple planning and administration; no effort is required to analyze where you should place the various files, as there is only one "disk." However, this approach sacrifices flexibility and leads to brute force solutions to I/O bottlenecks. If I/O performance across the array is unsatisfactory, the solution is to add more controllers and disks. The entire set of disks becomes a black box that either works or doesn't work as a unit.

The most useful configuration is one that balances manageability with flexibility. For example, consider a system with 1,000 disks. Neither a single array of 1,000 disks nor a set of 1,000 individual disks is likely to be appropriate. Perhaps 50 arrays of 20 disks each would provide the needed I/O performance without any undue administrative burden. If less flexibility is needed, 20 arrays of 50 disks are more suitable. On the other hand, grouping all the disks into one array may be the simplest way to manage a system with only five disks. For the `right' answer, you must assess your needs to determine the appropriate balance.

Oracle Database 10g simplifies this debate by automating the striping and stripe set rebalancing process. ASM divides files into 1 MB extents and spreads the extents evenly across each disk group. Pointers are used to track placement of each extent (instead of using a mathematical function such as a hashing algorithm to stripe the data). So when the disk group configuration changes, individual extents can be moved. In comparison to traditional algorithm-based striping techniques, the need to rerun that algorithm and reallocate all of the data is eliminated. Extent maps are updated when rebalancing the load after a change in disk configuration, opening a new database file, or extending a database file by enlarging tablespace. By default, each 1 MB extent is also mirrored, so management of redundancy is also simplified. Mirroring can be extended to triple mirroring or can be turned off. Although you still have to consider how many disk groups to use, implementation of these groups with striping and redundancy is automated with ASM.

6.2.4 How Oracle I/O and Striped Arrays Interact

In almost all large databases, some type of disk striping increases disk I/O rates without adding too heavy an administrative burden for managing a large number of datafiles across many individual disks. The disks may be organized into RAID arrays using a volume manager on the database server, a dedicated I/O subsystem, or a combination of both.

If you are using an Oracle release without ASM, when you set up striped disk arrays, you can set the chunk size used to stripe across the disks. The chunk size is the amount of data written to one disk before moving to the next disk in the array. Understanding the interaction between different stripe chunk sizes and the two sizes of Oracle I/O is critical in maximizing your I/O performance.

Consider an Oracle database with an 8 KB data block size and the DB_FILE_ MULTIBLOCK_READ_COUNT initialization parameter set to 32. There will be two sizes of I/O by Oracle: a single 8 KB data block and a 256 KB multiblock read (32 times 8 KB). Suppose you then configure a four-disk array for use by Oracle with a chunk size of 64 KB so that the 256 KB of data will be spread across the four drives, with 64 KB on each.

Each 8-KB I/O will hit one spindle, as the 8 KB will lie within one 64-KB chunk.^[2] Striping can increase performance for small I/Os by maximizing concurrency: each disk can service a different I/O. The multiblock I/Os of 256 KB may hit all four disks. If the chunk size was 256 KB instead of 64 KB, on average each 256-KB I/O call would hit one disk. In this case, the multiblock I/O will require fewer I/O calls with a larger chunk size on the disks. In either case, a single disk will clearly satisfy single-data-block I/O calls. Striping can increase I/O rates for large reads by driving multiple disks with a single I/O call, as illustrated with a 64-KB chunk size and a 256-KB multiblock I/O.

^[2] It's difficult to say exactly what will occur due to the alignment of the stripe-chunk boundaries with Oracle data blocks, but to illustrate the single versus multiple disk point, let's assume the simple casethey line up! For a more detailed discussion of striping issues, see the document "Configuring Oracle Server for VLDB," by Cary Millsap of hotsos.com, formerly with Oracle Corporation (see Appendix B). Anyone who is tempted is welcome to perform detailed testing for all the permutations of stripe chunk size and Oracle I/O. If you happen to perform this extensive testing, please tell all the rest of us what you find!

Figure 6-4 illustrates the interaction of different-sized Oracle I/Os with arrays striped using different chunk sizes.