10.3 Protecting Against System CrashesThere are a variety of approaches you can take to help protect your system against the ill effects of system crashes, including the following:Providing component redundancyUsing Real Application Clusters/Oracle Parallel ServerUsing Transparent Application Failover software services 10.3.1 Component RedundancyAs basic protection, the various hardware components that make up the database server itself must be fault-tolerant. Fault-tolerance, as the name implies, allows the overall hardware system to continue to operate even if one of its components fails. This feature, in turn, implies redundant components and the ability to detect component failure and seamlessly integrate the failed component's replacement. The major system components that should be fault-tolerant include the following:Disk drivesDisk controllersCPUsPower suppliesCooling fansNetwork cardsSystem buses Disk failure is the largest area of exposure for hardware failure, because disks have the shortest times between failure of any of the components in a computer system. Disks also present the greatest variety of redundant solutions, so discussing that type of failure in detail should provide the best example of how high availability can be implemented with hardware. 10.3.1.1 Disk redundancyDisk failure is the most common cause of system failure. Although the mean time to failure of an individual disk drive is very high, the ever-increasing number of disks used for today's very large databases results in more frequent disk failures. Protection from disk failure is usually accomplished using RAID technology. The term RAID (Redundant Array of Inexpensive Disks) originated in a paper published in 1987 by Patterson, Gibson, and Katz at the University of California. (RAID also means Redundant Array of Independent Disks.) The use of redundant storage has become common for systems of all sizes and types for two primary reasons: the real threat of disk failure and the proliferation of packaged, relatively affordable RAID solutions.RAID technology uses one of two concepts to achieve redundancy:Mirroring The actual data is duplicated on another disk in the system. Striping with parity Data is striped on multiple disks, but instead of duplicating the data itself for redundancy, a mathematical calculation termed parity is performed on the data and the result is stored on another disk. You can think of parity as the sum of the striped data. If one of the disks is lost, you can reconstruct the data on that disk using the surviving disks and the parity data. The lost data represents the only unknown variable in the equation and can be derived. You can conceptualize this as a simple formula: A + B + C + D = E in which A-D are data striped across four disks and E is the parity data on a fifth disk. If you lose any of the disks, you can solve the equation to identify the missing component. For example, if you lose the B drive you can solve the formula as B = E - A - C - D. There are a number of different disk configurations or types of RAID technology, which are formally termed levels. The basics of RAID technology were introduced in Chapter 6, but Table 10-2 summarizes the most relevant levels of RAID in a bit more detail, in terms of their cost, high availability, and the way Oracle uses each RAID level.
levels. Figure 10-3. RAID levels commonly used with an Oracle database![]() 10.3.2 Automatic Storage ManagementOracle Database 10g includes a new capability called Automatic Storage Management (ASM), which provides striping and mirroring for many types of disks, including JBOD, as described earlier. You can specify groups of disks, and designate a failover group to be used in the result of a disk failure. ASM includes the ability to detect disk "hot spots" and redistribute data to avoid disk bottlenecks, as well as the capability of adding disks to a disk group without any interruption in service.ASM is designed both to simplify disk management and to allow you to use cheaper disk systems and still obtain higher levels of reliability and performance. 10.3.3 Simple Hardware FailoverOracle recovers automatically from a system crash. This automatic recovery protects the integrity of the data, which is the most essential feature of any relational database, but it also results in downtime as the database recovers from a crash. When a hardware failure occurs, the ability to quickly detect a system crash and initiate recovery is crucial to minimizing the associated downtime.When an individual server fails, the instance running on that node fails as well. Depending on the cause, the failed node may not return to service quickly, or the failure may not be detected immediately by a human operator. Either way, companies that wish to protect their systems from the failure of a node typically employ a cluster of machines to achieve simple hardware failover. Failover is the ability of a surviving node in a cluster to assume the responsibilities of a failed node. Although failover doesn't directly address the issue of the reliability of the underlying hardware, automated failover can reduce the downtime from hardware failure.The concept is very simple: a combination of software and hardware "watches" over the cluster. Typically, this monitoring is done by regularly checking a "heartbeat," which is a message sent between machines in the cluster. If Machine A fails, Machine B will detect the failure through the loss of the heartbeat and will execute scripts to take over control of the disks, assume Machine A's network address, and restart the processes that failed with Machine A. From an Oracle database perspective, the entire set of events is identical to an instance crash followed by an instance recovery. The instance uses the control files, redo log files, and database files to perform crash recovery. The fact that the instance is now running on another machine is irrelevantthe various Oracle files on disk are the key.Most failover solutions include software that runs on the machine to monitor specific processes, such as the background processes of the Oracle instance. If the primary node itself has not failed but some process has, the monitoring software will detect the failure of the process and take some action based on scripts set up by the system administrator. For example, if the Oracle instance fails, the monitoring software may attempt to restart the Oracle instance three times. If all three attempts are unsuccessful, the software may initiate physical hardware failover, transferring control to the alternate node in the cluster.Figure 10-4 and Figure 10-5 illustrate the process of implementing a simple failover. Figure 10-4. Before failover![]() Figure 10-5. After failover![]() 10.3.3.1 Outage duration for hardware failoverThe time for failover to take effect, and therefore the length of the associated database downtime, depends on the following intervals:Time for the alternate node to detect the failure of the primary node The alternate node monitors the primary node using a heartbeat mechanism. The frequency of this check is usually configurablefor example, every 30 secondsproviding control over the maximum time that a primary node failure will go undetected. Time for the alternate node to execute various startup actions The time needed for such actions (e.g., assuming control of the disks used to store the Oracle database) may vary by system and should be determined through testing. One important consideration is the time required for a filesystem check. The larger the database, the larger the number of filesystems that may have been used. When the alternate node assumes control of the disks, it must check the state of the various filesystems on the disks. This time can be reduced by using a journaled filesystem, such as the one provided by Veritas Software (http://www.veritas.com). This software essentially uses a logging scheme similar to Oracle's to protect the integrity of the filesystem, thus eliminating the need for a complete filesystem check. Even with journaled filesystems, disk takeover can easily take several minutes, particularly on a busy system with high levels of disk activity. Time for Oracle crash recovery As we mentioned, you can effectively control this time period using checkpoints. Oracle provides a simple way to control recovery times using the initialization parameter FAST_START_IO_TARGET and the more recently introduced FAST_START_MTTR_TARGET parameter. When the instance fails, users will typically receive some type of error message and will typically attempt to log in again. Application developers can deal with this sequence of failover events with generic or specific error handling in their applications, or they can use the Transparent Application Failover functionality described later in this chapter. 10.3.3.2 Failover and operating system platformFailover capability has long been available in the Unix world; more recently, it was introduced to Windows with the availability of Microsoft Cluster Server clustering technology. The Unix vendors typically offer a simple failover solution consisting of two machines, an interconnect between the machines, and the required software. No additional software is required from Oracle. In the Windows arena, Oracle includes Fail Safe, software that provides a GUI interface for configuring the Oracle database for hardware failover. The mechanics of the failover are the samethe GUI is simply an administrative convenience. Configuration wizards may also be used to enable failover in the Oracle Application Server middle tier.Oracle offered a failover solution for Windows NT with Oracle7 even before Microsoft had delivered its clustering solution. Early versions of Fail Safe were available on NT hardware from vendors with clustering and failover experience, such as Digital Equipment Corporation (subsequently Compaq and now HP). When Microsoft delivered its Cluster Server, Oracle simply implemented Fail Safe using the Cluster Server interfaces instead of the hardware vendor interfaces previously used. Fail Safe can be configured for clusters of up to four nodes. 10.3.4 Real Application ClustersOracle first introduced Oracle Parallel Server (OPS), the predecessor to Real Application Clusters (RAC), in 1989 on Digital Equipment Corporation's VAX clusters running the VMS operating system. OPS became available in the Unix environment in 1993. Oracle now offers Real Application Clusters on virtually every commercially available cluster or Massively Parallel Processing (MPP) hardware configuration.At first glance, Real Application Clusters may look similar to the clustered solutions described earlier in Section 10.3.3. Both failover and Real Application Clusters involve clustered hardware with access to disks from multiple nodes. The key difference is that Real Application Clusters uses multiple Oracle instances that provide concurrent access to the same database. With simple hardware failover only one node has an active instance, but with Real Application Clusters each node is an active Oracle instance. Clients can connect to any of the instances to access the same database.Because each Oracle instance runs on its own node, if a node fails, the instance on that node also fails. The overall Oracle database remains available from the surviving instances still running on other working nodes.Figure 10-6 illustrates Real Application Clusters on a cluster. Figure 10-6. Oracle Real Application Clusters on a cluster![]() 10.3.4.1 Real Application Clusters and hardware failoverWhich technology achieves better availability, Real Application Clusters or simple hardware failover capability? The Real Application Clusters option can typically provide higher levels of availability than simple hardware failover, as we explain in the remainder of this section. This option can also provide additional flexibility for scaling the application across multiple machines, although it does require more sophisticated system and database administration skills. Simply put, if the higher availability and flexibility of Real Application Clusters doesn't justify the additional expense and complexity, the use of a simple hardware failover solution is probably a more appropriate choice.While the Real Application Clusters option is an add-on to your Oracle database license, it is not a separate database product. Although it supports Oracle instances across multiple nodes, it is based on and uses the same core Oracle database product. Installing Real Application Clusters adds code to the Oracle executable but doesn't replace the core database program.Real Application Clusters increases availability by enabling avoidance of complete database blackouts. With simple hardware failover, the database is completely unavailable until node failover, instance startup, and crash recovery are complete. With Real Application Clusters, clients can connect to a surviving instance any time. Clients may be able to continue working with no interruption, depending on whether the data they need to work on was under the control of the failed instance. You can think of the failure of a Real Application Clusters instance as a potential database "brownout," as opposed to the guaranteed blackout caused by hardware failover.Some other key differences between hardware failover and Real Application Clusters include the following:The Real Application Clusters option avoids the various activities involved in disk takeover: mounting volumes, validating filesystem integrity, opening Oracle database files, and so on. Not performing these activities can significantly reduce the time required to achieve full system availability.The Real Application Clusters option doesn't require the creation and maintenance of the complex scripts typically used to control the activities for hardware failover. For example, there is no need to script which disk volumes will be taken over by a surviving node. The automatic nature of Real Application Clusters avoids the complex initial system administration to set up the failover environment, as well as the ongoing administration needed as additional disk volumes are used. In fact, adding disk volumes to your database but forgetting to add the volumes to the various failover scripts can cause a hardware failover solution to fail itself! In a simple two-way cluster used for hardware failover, both machines should have equal processing power and should be sized so that each can handle the entire workload. This equivalence is clearly required because only one node of the cluster is used at any point for the entire workload. If one node fails, the other should be capable of running the same workload with equal performance.With Real Application Clusters, you can use both nodes of the cluster concurrently to spread the workload, reducing the load on one machine or node. You must still make sure that each machine will be powerful enough to adequately handle the entire workload (albeit at a reduced performance level) to meet basic business requirements when a node is not available.Of course, using Real Application Clusters to spread the workload over several machines will result in a lower percentage of each machine's resources being used in normal operating conditions, which is typically more expensive than using fully utilized machines. Each machine in the cluster must devote some overhead to maintaining its role in the cluster, although Oracle claims that this overhead may reduce overall machine throughput by only as little as 10-15%. You will have to weigh the benefits of carrying on without any performance degradation in the event of a node failure versus the cost of buying more powerful machines. The economics of your situation may dictate that a decrease in performance in the event of a node failure is more palatable than a larger initial outlay for larger systems.Using Real Application Clusters for scalability is a bit more complex than it is for high availability, but much of the complexity of tuning and programming has been removed since Oracle9i. Deployment has been simplified with Oracle Database 10g through the introduction of integrated clusterware. You can learn more about Real Application Clusters scalability in Oracle documentation and in Chapter 8 of this book. 10.3.4.2 Node failure and Real Application ClustersThe database instances provide protection for each otherif an instance fails, one of the surviving instances will detect the failure and automatically initiate Real Applications Clusters recovery. This type of recovery is different from the hardware failover, discussed previously. No actual "failover" occursno disk takeover is required, because all nodes already have access to the disks used for the database. There is no need to start an Oracle instance on the surviving node or nodes, because Oracle is already running on all the nodes. The Oracle software performs the necessary actions without using scripts; the required steps are an integral part of Real Application Clusters software.The phases of Real Application Clusters recovery are the following:Cluster reorganization When an instance failure occurs, Real Application Clusters must first determine which nodes of the cluster remain in service. Oracle9i introduced a disk-based heartbeat in which each database group member votes on what members are part of the current group. Based on arbitration, a correct current group configuration is established. The time required for this operation is very brief. Lock database rebuild The lock database, which contains the information used to coordinate Real Application Clusters traffic, is distributed across the multiple active instances. Therefore, a portion of that information is lost when a node fails. The remaining nodes have sufficient redundant data to reconstruct the lost information. Once the cluster membership has been determined, the surviving instances reconstruct the lock database. The time for this phase depends on how many locks must be recovered, as well as whether the rebuild process involves a single surviving node or multiple surviving nodes. Oracle speeds the lock remastering process by allowing optimization of lock master locations in the background while users are accessing the system. In a two-node cluster, node failure leaves a single surviving node that acts as a dictator and processes the lock operations very quickly. Instance recovery Once the lock database has been rebuilt, the redo logs from the failed instance perform crash recovery. This is similar to single-instance crash recoverya rollforward phase followed by a nonblocking, deferred rollback phase. The key difference is that the recovery isn't performed by restarting a failed instance. Rather, it's performed by the instance that detected the failure. While Real Application Clusters recovery is in progress, clients connected to surviving instances remain connected and can continue working. In some cases users may experience a slight delay in response times, but their sessions aren't terminated. Clients connected to the failed instance can reconnect to a surviving instance and can resume working. Uncommitted transactions will be rolled back and will have to be resubmitted. Queries that were active will also have been terminated and will require resubmission. A very powerful feature, Transparent Application Failover (TAF) can be used to automatically continue query processing on a surviving node without requiring users to resubmit their queries. You can also use TAF to resubmit transactions without user intervention. 10.3.4.3 Parallel Fail Safe / RACGuardOracle Parallel Fail Safe was renamed RACGuard in Oracle9i and integrated into the core RAC product in Oracle Database 10g. Prior to Oracle Database 10g, it was a feature in Real Application Clusters that leveraged the clustering technology of systems vendors. It supported such features as:Automated, fast, and bounded recovery times from Oracle instance crashesAutomatic capture of diagnostic dataGuaranteed primary and secondary configurationSupport for features such as Transparent Application Failover (described in the next section)Client preconnection to secondary instances to speed reconnection 10.3.5 Oracle Transparent Application FailoverOracle introduced the Transparent Application Failover (TAF) capability in the first release of Oracle8. As the name implies, TAF provides a seamless migration of users' sessions from one Oracle instance to another. You can use TAF to mask the failure of an instance for transparent high availability or to migrate users from an active instance to a less active one. Figure 10-7 illustrates TAF with Real Application Clusters. Figure 10-7. Failover with TAF and Real Application Clusters![]() As shown in this figure, TAF can automatically reconnect clients to another instance of the database, which provides access to the same database as the original instance. The high-availability benefits of TAF include the following:Transparent reconnection Clients don't have to manually reconnect to a surviving instance. You can optimally reconfigure TAF to preconnect clients to an alternate instance in addition to their primary instance when they log on. Preconnecting clients to an alternate instance removes the overhead of establishing a new connection when automatic failover takes place. For systems with a large number of connected clients, this preconnection avoids the overhead and delays caused by flooding the alternate instance with a large number of simultaneous connection requests. Automatic resubmission of queries TAF can automatically resubmit queries that were active at the time the first instance failed and can resume sending results back to the client. Oracle will re-execute the query as of the time the original query started. Oracle's read consistency will therefore provide the correct answer regardless of any activity since the query began. However, when the user requests the "next" row from a query, Oracle will have to process through all rows from the start of the query until the requested row, which may result in a performance lag. Callback functions Oracle8i enhanced TAF by enabling the application developer to register a "callback function" with TAF. Once TAF has successfully reconnected the client to the alternate instance, the registered function will be called automatically. The application developer can use the callback function to reinitialize various aspects of session state as desired. Failover-aware applications Application developers can leverage TAF by writing "failover-aware" applications that resubmit transactions that were lost when the client's primary instance failed, further reducing the impact of failure. Note that unlike query resubmission, TAF itself doesn't automatically resubmit the transactions that were in-flight. Rather, it provides a framework for a seamless failover that can be leveraged by application developers. 10.3.5.1 How TAF worksTAF is implemented in the Oracle Call Interface (OCI) layer, a low-level API for establishing and managing Oracle database connections. When the instance to which a client is connected fails, the client's server process ceases to exist. The OCI layer in the client can detect the absence of a server process on the other end of the channel and automatically establish a new connection to another instance. The alternate instance to which TAF reconnects users is specified in the Oracle Net configuration files, which are described in the Oracle Net documentation.Because OCI is a low-level API, writing programs with OCI requires more effort and sophistication on the part of the developer. Fortunately, Oracle uses OCI to write client tools and various drivers, so that applications using these tools can leverage TAF. Support for TAF in ODBC and JDBC drivers is especially useful; it means that TAF can be leveraged by any client application that uses these drivers to connect to Oracle. For example, TAF can provide automatic reconnection for a third-party query tool that uses ODBC. To implement TAF with ODBC, set up an ODBC data source that uses an Oracle Net service name that has been configured to use TAF in the Oracle Net configuration files. ODBC uses Oracle Net and can therefore leverage the TAF feature. 10.3.5.2 TAF and various Oracle configurationsAlthough the TAF-Real Application Clusters combination is the most obvious combination for high availability, TAF can be used with a single Oracle instance or with multiple databases, each accessible from a single instance. Some possible configurations are as follows:TAF can automatically reconnect clients back to their original instances for cases in which the instance failed but the node did not. An automated monitoring system, such as the Oracle Enterprise Manager, can detect instance failure quickly and restart the instance. The fast-start recovery features in Oracle enable very low crash recovery times. Users that aren't performing heads-down data entry work can be automatically reconnected by TAF and might never be aware that their instance failed and was restarted.In simple clusters, TAF can reconnect users to the instance started by simple hardware failover on the surviving node of a cluster. The reconnection cannot occur until the alternate node has started Oracle and has performed crash recovery.When there are two distinct databases, each with a single instance, TAF can reconnect clients to an instance that provides access to a different database running in another data center. This clearly requires replication of the relevant data between the two databases. Oracle fortunately provides automated support for data replication, which is covered in Section 10.5. |