Oracle Essentials [Electronic resources] : Oracle Database 10g, 3rd Edition نسخه متنی

10.3 Protecting Against System Crashes

There are a variety of approaches you can
take to help protect your system against the ill effects of system
crashes, including the following:

Providing component redundancy

Using Real Application Clusters/Oracle Parallel Server

Using Transparent Application Failover software services

10.3.1 Component Redundancy

As
basic protection, the various hardware components that make up the
database server itself must be fault-tolerant.
Fault-tolerance,
as the name implies, allows the overall hardware system to continue
to operate even if one of its components fails. This feature, in
turn, implies redundant components and the ability to detect
component failure and seamlessly integrate the failed
component's replacement. The major system components
that should be fault-tolerant include the following:

Disk drives

Disk controllers

CPUs

Power supplies

Cooling fans

Network cards

System buses

Disk failure is the largest area of exposure for hardware failure,
because disks have the shortest times between failure of any of the
components in a computer system. Disks also present the greatest
variety of redundant solutions, so discussing that type of failure in
detail should provide the best example of how high availability can
be implemented with hardware.

10.3.1.1 Disk redundancy

Disk failure is the most common cause
of system failure. Although the mean time to failure of an individual
disk drive is very high, the ever-increasing number of disks used for
today's very large databases results in more
frequent disk failures. Protection from disk failure is usually
accomplished using RAID technology. The term RAID
(Redundant Array of Inexpensive Disks) originated in a paper
published in 1987 by Patterson, Gibson, and Katz at the University of
California. (RAID also means Redundant Array of Independent Disks.)
The use of redundant storage has become common for systems of all
sizes and types for two primary reasons: the real threat of disk
failure and the proliferation of packaged, relatively affordable RAID
solutions.

RAID technology uses one of two concepts to achieve redundancy:

Mirroring

The actual data is
duplicated on another disk in the system.

Striping with parity

Data is striped on multiple disks, but
instead of duplicating the data itself for redundancy, a mathematical
calculation termed
parity
is performed on the data and the result is stored on another disk.
You can think of parity as the sum of the striped data. If one of the
disks is lost, you can reconstruct the data on that disk using the
surviving disks and the parity data. The lost data represents the
only unknown variable in the equation and can be derived. You can
conceptualize this as a simple formula:

A + B + C + D = E

in which A-D are data striped across four disks and E is the parity
data on a fifth disk. If you lose any of the disks, you can solve the
equation to identify the missing component. For example, if you lose
the B drive you can solve the formula as

B = E - A - C - D.

There are a number of different disk configurations or types of
RAID
technology, which are formally termed levels.
The basics of RAID technology were introduced in Chapter 6, but Table
10-2 summarizes the most relevant levels of RAID in a bit more
detail, in terms of their cost, high availability, and the way Oracle uses
each RAID level.

Table 10-2. RAID levels relevant to high availability
Level	Disk configuration	Cost	Comments	Oracle usage
0	Simple striping, no redundancy	Same cost as unprotected storage.	Also referred to as JBOD (Just a Bunch of Disks). The term RAID-0 is used to describe striping, which increases read and write throughput. However, this is not really RAID, as there is no actual redundancy.	Striping simplifies administration for Oracle datafiles. Suitable for all types of data for which redundancy isn't required.
1	Mirroring	Twice the cost of unprotected storage.	Same write performance as a single disk. Read performance may improve through servicing reads from both copies.	Lack of striping adds complexity of managing a larger number of devices for Oracle. Often used for redo logs, because the I/O for redo is typically relatively small sequential writes. Striped arrays are more suited to large I/Os or to multiple smaller, random I/Os.
0+1	Striping and mirroring	Twice the cost of unprotected storage.	Best of both worlds striping increases read and write performance and mirroring for redundancy avoids "read-modify-write" overhead of RAID-5.	Same usage as RAID-0, but provides protection from disk failure.
5	Striping with rotating or distributed parity	Storage capacity is reduced by 1/N, where N is the number of disks in the array. For example, the storage is reduced by 20%, or 1/5 of the total disk storage, for a 5-disk array.	Parity data is spread across all disks, avoiding the potential bottleneck found in some other types of RAID arrays. Striping increases read performance. Maintaining parity data adds additional I/O, decreasing write performance. For each write, the associated parity data must be read, modified, and written back to disk. This is referred to as the "read-modify- write" penalty.	Cost-effective solution for all Oracle data except redo logs. Degraded write performance must be taken into account. Popular for data warehouses as they involve mostly read activity. However, write penalties may slow loads and index builds. Often avoided for high-volume OLTP due to write penalties. Some storage vendors, such as EMC, have proprietary solutions (RAID-S) to minimize parity overhead on writes.

Figure 10-3 illustrates the disk configurations for various RAID
levels.

Figure 10-3. RAID levels commonly used with an Oracle database

10.3.2 Automatic Storage Management

Oracle Database 10g includes a new capability
called Automatic Storage Management (ASM), which provides striping
and mirroring for many types of disks, including JBOD, as described
earlier. You can specify groups of disks, and designate a failover
group to be used in the result of a disk failure. ASM includes the
ability to detect disk "hot spots"
and redistribute data to avoid disk bottlenecks, as well as the
capability of adding disks to a disk group without any interruption
in service.

ASM is designed both to simplify disk management and to allow you to
use cheaper disk systems and still obtain higher levels of
reliability and performance.

Which RAID Levels Should You Use with Oracle?

Some people say that you
should never use RAID-5 for an Oracle database because of the
degraded write performance of this level of RAID. RAID-1 and RAID-0+1
offer better performance, but double the cost of disk storage. RAID-5
offers a cheaper and reasonable solution, provided that you can meet
performance requirements despite the extra write overhead for
maintaining parity data. Use these generic guidelines to help
determine the appropriate uses of different RAID levels:

Use RAID-1 for redo log files

Use RAID-5 for database files, provided that the write overhead is
acceptable

Use RAID-1 or RAID-0+1 for database files if RAID-5 write overhead is
unacceptable

10.3.3 Simple Hardware Failover

Oracle recovers automatically from a
system crash. This automatic recovery protects the integrity of the
data, which is the most essential feature of any relational database,
but it also results in downtime as the database recovers from a
crash. When a hardware failure occurs, the ability to quickly detect
a system crash and initiate recovery is crucial to minimizing the
associated downtime.

When an individual server fails, the instance running on that node
fails as well. Depending on the cause, the failed node may not return
to service quickly, or the failure may not be detected immediately by
a human operator. Either way, companies that wish to protect their
systems from the failure of a node typically employ a cluster of
machines to achieve simple hardware failover.
Failover is the ability of a surviving node in a cluster to assume
the responsibilities of a failed node. Although failover
doesn't directly address the issue of the
reliability of the underlying hardware, automated failover can reduce
the downtime from hardware failure.

The concept is very simple: a combination of software and hardware
"watches" over the cluster.
Typically, this monitoring is done by regularly checking a
"heartbeat," which is a message
sent between machines in the cluster. If Machine A fails, Machine B
will detect the failure through the loss of the heartbeat and will
execute scripts to take over control of the disks, assume Machine
A's network address, and restart the processes that
failed with Machine A. From an Oracle database perspective, the
entire set of events is identical to an instance crash followed by an
instance recovery. The instance uses the control files, redo log
files, and database files to perform crash recovery. The fact that
the instance is now running on another machine is
irrelevantthe various Oracle files on disk are the key.

Most failover
solutions include software that runs on the machine to monitor
specific processes, such as the background processes of the Oracle
instance. If the primary node itself has not failed but some process
has, the monitoring software will detect the failure of the process
and take some action based on scripts set up by the system
administrator. For example, if the Oracle instance fails, the
monitoring software may attempt to restart the Oracle instance three
times. If all three attempts are unsuccessful, the software may
initiate physical hardware failover, transferring control to the
alternate node in the cluster.

Figure 10-4 and Figure 10-5 illustrate the process of implementing a simple
failover.

Figure 10-4. Before failover

Figure 10-5. After failover

10.3.3.1 Outage duration for hardware failover

The
time for failover to take effect, and therefore the length of the
associated database downtime, depends on the following intervals:

Time for the alternate node to detect the failure of the primary node

The alternate node monitors the primary node using a heartbeat
mechanism. The frequency of this check is usually
configurablefor example, every 30 secondsproviding
control over the maximum time that a primary node failure will go
undetected.

Time for the alternate node to execute various startup actions

The time needed for such actions (e.g., assuming control of the disks
used to store the Oracle database) may vary by system and should be
determined through testing. One important consideration is the time
required for a filesystem check. The larger the database, the larger
the number of filesystems that may have been used. When the alternate
node assumes control of the disks, it must check the state of the
various filesystems on the disks. This time can be reduced by using a
journaled filesystem, such as the one provided by Veritas Software
(http://www.veritas.com). This software
essentially uses a logging scheme similar to
Oracle's to protect the integrity of the filesystem,
thus eliminating the need for a complete filesystem check. Even with
journaled filesystems, disk takeover can easily take several minutes,
particularly on a busy system with high levels of disk activity.

Time for Oracle crash recovery

As we mentioned, you can effectively control this time period using
checkpoints. Oracle provides a simple way to control recovery times
using the initialization parameter FAST_START_IO_TARGET and the more
recently introduced FAST_START_MTTR_TARGET parameter.

When the instance fails, users will typically receive some type of
error message and will typically attempt to log in again. Application
developers can deal with this sequence of failover events with
generic or specific error handling in their applications, or they can
use the Transparent Application Failover functionality described
later in this chapter.

10.3.3.2 Failover and operating system platform

Failover capability has long been available in the Unix world; more
recently, it was introduced to Windows
with the availability of Microsoft Cluster Server clustering
technology. The Unix vendors typically offer a simple failover
solution consisting of two machines, an interconnect between the
machines, and the required software. No additional software is
required from Oracle. In the Windows arena, Oracle includes
Fail Safe,
software that provides a GUI interface for configuring the Oracle
database for hardware failover. The mechanics of the failover are the
samethe GUI is simply an administrative convenience.
Configuration wizards may also be used to enable failover in the
Oracle Application Server middle tier.

Oracle offered a failover solution for Windows NT with Oracle7 even
before Microsoft had delivered its clustering solution. Early
versions of Fail Safe were available on NT hardware from vendors with
clustering and failover experience, such as Digital Equipment
Corporation (subsequently Compaq and now HP). When Microsoft
delivered its Cluster Server, Oracle simply implemented Fail Safe
using the Cluster Server interfaces instead of the hardware vendor
interfaces previously used. Fail Safe can be configured for clusters
of up to four nodes.

10.3.4 Real Application Clusters

Oracle first introduced Oracle Parallel Server (OPS), the predecessor
to Real Application Clusters (RAC), in 1989 on Digital Equipment
Corporation's VAX clusters running the VMS operating
system. OPS became available in the Unix environment in 1993. Oracle
now offers Real Application Clusters on virtually every commercially
available cluster or Massively Parallel Processing (MPP) hardware
configuration.

At first glance, Real Application Clusters may look
similar to the clustered solutions described earlier in Section 10.3.3.
Both failover and Real Application Clusters involve clustered
hardware with access to disks from multiple nodes. The key difference
is that Real Application Clusters uses multiple Oracle instances that
provide concurrent access to the same database. With simple hardware
failover only one node has an active instance, but with Real
Application Clusters each node is an active Oracle instance. Clients
can connect to any of the instances to access the same database.

Because each Oracle instance runs on its own node, if a node fails,
the instance on that node also fails. The overall Oracle database
remains available from the surviving instances still running on other
working nodes.

Figure 10-6 illustrates Real Application Clusters on a cluster.

Figure 10-6. Oracle Real Application Clusters on a cluster

10.3.4.1 Real Application Clusters and hardware failover

Which technology achieves better availability, Real Application
Clusters or simple hardware failover capability? The
Real Application Clusters option
can typically provide higher levels of availability than simple
hardware failover, as we explain in the remainder of this section.
This option can also provide additional flexibility for scaling the
application across multiple machines, although it does require more
sophisticated system and database administration skills. Simply put,
if the higher availability and flexibility of Real Application
Clusters doesn't justify the additional expense and
complexity, the use of a simple hardware failover solution is
probably a more appropriate choice.

While the Real Application Clusters option is an add-on to your
Oracle database license, it is not a separate database product.
Although it supports Oracle instances across multiple nodes, it is
based on and uses the same core Oracle database product. Installing
Real Application Clusters adds code to the Oracle executable but
doesn't replace the core database program.

Real Application Clusters increases
availability by enabling
avoidance of complete database blackouts. With simple hardware
failover, the database is completely unavailable until node failover,
instance startup, and crash recovery are complete. With Real
Application Clusters, clients can connect to a surviving instance any
time. Clients may be able to continue working with no interruption,
depending on whether the data they need to work on was under the
control of the failed instance. You can think of the failure of a
Real Application Clusters instance as a potential database
"brownout," as opposed to the
guaranteed blackout caused by hardware failover.

Some other key differences between hardware failover and Real
Application Clusters include the following:

The Real Application Clusters option avoids the various activities
involved in disk takeover: mounting volumes, validating filesystem
integrity, opening Oracle database files, and so on. Not performing
these activities can significantly reduce the time required to
achieve full system availability.

The Real Application Clusters option doesn't require
the creation and maintenance of the complex scripts typically used to
control the activities for hardware failover. For example, there is
no need to script which disk volumes will be taken over by a
surviving node. The automatic nature of Real Application Clusters
avoids the complex initial system administration to set up the
failover environment, as well as the ongoing administration needed as
additional disk volumes are used. In fact, adding disk volumes to
your database but forgetting to add the volumes to the various
failover scripts can cause a hardware failover solution to fail
itself!

In a simple two-way cluster used for hardware failover, both machines
should have equal processing power and should be sized so that each
can handle the entire workload. This equivalence is clearly required
because only one node of the cluster is used at any point for the
entire workload. If one node fails, the other should be capable of
running the same workload with equal performance.

With Real Application Clusters, you can use both nodes of the cluster
concurrently to spread the workload, reducing the load on one machine
or node. You must still make sure that each machine will be powerful
enough to adequately handle the entire workload (albeit at a reduced
performance level) to meet basic business requirements when a node is
not available.

Of course, using Real Application Clusters to spread the workload
over several machines will result in a lower percentage of each
machine's resources being used in normal operating
conditions, which is typically more expensive than using fully
utilized machines. Each machine in the cluster must devote some
overhead to maintaining its role in the cluster, although Oracle
claims that this overhead may reduce overall machine throughput by
only as little as 10-15%. You will have to weigh the benefits of
carrying on without any performance degradation in the event of a
node failure versus the cost of buying more powerful machines. The
economics of your situation may dictate that a decrease in
performance in the event of a node failure is more palatable than a
larger initial outlay for larger systems.

Using Real Application Clusters for scalability
is a bit more complex than it is for high availability, but much of
the complexity of tuning and programming has been removed since
Oracle9i. Deployment has been simplified with
Oracle Database 10g through the introduction of integrated
clusterware. You can learn more about Real Application Clusters
scalability in Oracle documentation and in Chapter 8 of this book.

10.3.4.2 Node failure and Real Application Clusters

The
database instances provide
protection for each otherif an instance fails, one of the surviving
instances will detect the failure and automatically initiate Real
Applications Clusters recovery. This type of recovery is different
from the hardware failover, discussed previously. No actual
"failover" occursno disk
takeover is required, because all nodes already have access to the
disks used for the database. There is no need to start an Oracle
instance on the surviving node or nodes, because Oracle is already
running on all the nodes. The Oracle software performs the necessary
actions without using scripts; the required steps are an integral
part of Real Application Clusters software.

The phases of Real Application Clusters recovery are the following:

Cluster reorganization

When an instance failure occurs, Real Application Clusters must first
determine which nodes of the cluster remain in service.
Oracle9i introduced a disk-based heartbeat in
which each database group member votes on what members are part of
the current group. Based on arbitration, a correct current group
configuration is established. The time required for this operation is
very brief.

Lock database rebuild

The lock database, which contains the information used to coordinate
Real Application Clusters traffic, is distributed across the multiple
active instances. Therefore, a portion of that information is lost
when a node fails. The remaining nodes have sufficient redundant data
to reconstruct the lost information. Once the cluster membership has
been determined, the surviving instances reconstruct the lock
database. The time for this phase depends on how many locks must be
recovered, as well as whether the rebuild process involves a single
surviving node or multiple surviving nodes. Oracle speeds the lock
remastering process by allowing optimization of lock master locations
in the background while users are accessing the system. In a two-node
cluster, node failure leaves a single surviving node that acts as a
dictator and processes the lock operations very quickly.

Instance recovery

Once the lock database has been rebuilt, the redo logs from the
failed instance perform crash recovery. This is similar to
single-instance crash recoverya rollforward phase followed by
a nonblocking, deferred rollback phase. The key difference is that
the recovery isn't performed by restarting a failed
instance. Rather, it's performed by the instance
that detected the failure.

While Real
Application Clusters recovery is in progress, clients connected to
surviving instances remain connected and can continue working. In
some cases users may experience a slight delay in response times, but
their sessions aren't terminated. Clients connected
to the failed instance can reconnect to a surviving instance and can
resume working. Uncommitted transactions will be rolled back and will
have to be resubmitted. Queries that were active will also have been
terminated and will require resubmission. A very powerful feature,
Transparent Application Failover (TAF) can
be used to automatically continue query processing on a surviving
node without requiring users to resubmit their queries. You can also
use TAF to resubmit transactions without user intervention.

10.3.4.3 Parallel Fail Safe / RACGuard

Oracle Parallel Fail Safe was renamed
RACGuard in
Oracle9i
and integrated into the core RAC product in
Oracle
Database 10g. Prior to Oracle Database
10g, it was a feature in Real Application
Clusters that leveraged the clustering technology of systems vendors.
It supported such features as:

Automated, fast, and bounded recovery times from Oracle instance
crashes

Automatic capture of diagnostic data

Guaranteed primary and secondary configuration

Support for features such as Transparent Application Failover
(described in the next section)

Client preconnection to secondary instances to speed
reconnection

10.3.5 Oracle Transparent Application Failover

Oracle introduced the Transparent
Application Failover (TAF) capability in the first release of
Oracle8. As the name implies, TAF provides a seamless migration of
users' sessions from one Oracle instance to another.
You can use TAF to mask the failure of an instance for transparent
high availability or to migrate users from an active instance to a
less active one. Figure 10-7 illustrates TAF with Real Application
Clusters.

Figure 10-7. Failover with TAF and Real Application Clusters

As shown in this figure, TAF can automatically reconnect clients to
another instance of the database, which provides access to the same
database as the original instance. The high-availability
benefits of TAF include the following:

Transparent reconnection

Clients don't have to manually reconnect to a
surviving instance. You can optimally reconfigure TAF to preconnect
clients to an alternate instance in addition to their primary
instance when they log on. Preconnecting clients to an alternate
instance removes the overhead of establishing a new connection when
automatic failover takes place. For systems with a large number of
connected clients, this preconnection avoids the overhead and delays
caused by flooding the alternate instance with a large number of
simultaneous connection requests.

Automatic resubmission of queries

TAF can automatically resubmit queries that were active at the time
the first instance failed and can resume sending results back to the
client. Oracle will re-execute the query as of the time the original
query started. Oracle's read consistency will
therefore provide the correct answer regardless of any activity since
the query began. However, when the user requests the
"next" row from a query, Oracle
will have to process through all rows from the start of the query
until the requested row, which may result in a performance lag.

Callback functions

Oracle8i
enhanced TAF by enabling the application developer to register a
"callback function" with TAF. Once
TAF has successfully reconnected the client to the alternate
instance, the registered function will be called automatically. The
application developer can use the callback function to reinitialize
various aspects of session state as desired.

Failover-aware applications

Application developers can leverage TAF by writing
"failover-aware" applications that
resubmit transactions that were lost when the
client's primary instance failed, further reducing
the impact of failure. Note that unlike query resubmission, TAF
itself doesn't automatically resubmit the
transactions that were in-flight. Rather, it provides a framework for
a seamless failover that can be leveraged by application developers.

10.3.5.1 How TAF works

TAF
is implemented in the
Oracle
Call Interface (OCI) layer, a low-level API for establishing and
managing Oracle database connections. When the instance to which a
client is connected fails, the client's server
process ceases to exist. The OCI layer in the client can detect the
absence of a server process on the other end of the channel and
automatically establish a new connection to another instance. The
alternate instance to which TAF reconnects users is specified in the
Oracle Net configuration files, which are described in the Oracle Net
documentation.

Because OCI is a low-level API, writing programs with OCI requires
more effort and sophistication on the part of the developer.
Fortunately, Oracle uses OCI to write client tools and various
drivers, so that applications using these tools can leverage TAF.
Support for TAF in ODBC and JDBC drivers is especially useful; it means
that TAF can be leveraged by any client application that uses these
drivers to connect to Oracle. For example, TAF can provide automatic
reconnection for a third-party query tool that uses ODBC. To
implement TAF with ODBC, set up an ODBC data source that uses an
Oracle Net service name that has been configured to use TAF in the
Oracle Net configuration files. ODBC uses Oracle Net and can
therefore leverage the TAF feature.

10.3.5.2 TAF and various Oracle configurations

Although the TAF-Real Application Clusters combination is the most
obvious combination for high
availability, TAF can be used with a single Oracle instance or with
multiple databases, each accessible from a single instance. Some
possible configurations are as follows:

TAF can automatically reconnect clients back to their original
instances for cases in which the instance failed but the node did
not. An automated monitoring system, such as the Oracle Enterprise
Manager, can detect instance failure quickly and restart the
instance. The fast-start recovery features in Oracle enable very low
crash recovery times. Users that aren't performing
heads-down data entry work can be automatically reconnected by TAF
and might never be aware that their instance failed and was
restarted.

In simple clusters, TAF can reconnect users to the instance started
by simple hardware failover on the surviving node of a cluster. The
reconnection cannot occur until the alternate node has started Oracle
and has performed crash recovery.

When there are two distinct databases, each with a single instance,
TAF can reconnect clients to an instance that provides access to a
different database running in another data center. This clearly
requires replication of the relevant data between the two databases.
Oracle fortunately provides automated support for data replication,
which is covered in Section 10.5.