10.1 What Is High Availability?Before we can begin a discussion of how to ensure a high level of availability for your data, you need to understand the exact meaning of the term availability.Availability can mean different things for different organizations. For this discussion, we'll consider a system to be available when that system is both up (which means that the database can be accessed by users) and working (which means that the database is delivering the expected functionality to users with the expected performance).Businesses have always depended on their data being available. With the expansion of the user base that has come with increased use of the Web, database failures can have an even more dramatic impact on business. Failure of web-based systems with links outside a company's employees are, unfortunately, immediately visible to the outside world and can seriously affect the company's financial health as well as its image and the loyalty of its customers and partners. Consider the Internet service offered by UPS and FedEx to share package status and tracking with their customers. As customers come to depend heavily on a service offered over the Web, interruptions in that service can cause these same customers to move to competitors.In addition, the demands posed by integrating multiple systems means a single failure can cause entire supply chains to become unavailable.To implement systems that are highly availabile, you must include techniques to avoid downtime, such as redundant hardware, as well as techniques to allow recovery from disasters, such as implementing the appropriate backup routines. 10.1.1 Measuring and Planning AvailabilityMost organizations automatically assume that they need 24/7 availability, meaning that the system must be available 24 hours a day, 7 days a week. Quite often, this requirement is stated with little examination of the business functions the system will support. With the perceived cost of technology components on the decline and their reliability on the increase, most users and a fair number of IT personnel feel that achieving very high levels of availability should be simple and cheap. Many of us leave our PCs running all the time, and many people have not experienced the joy of losing their hard drives or having their power supplies fail, at least recently.Unfortunately, while some components are certainly becoming cheaper and more reliable, component availability doesn't equate to system availability. The complex layering of hardware and software in today's two- and three-tier systems introduces multiple interdependencies and points of failure. Achieving very high levels of availability for a system with varied and interdependent components is not necessarily either simple or inexpensive.To provide some perspective, consider Table 10-1, which translates the percentage of system availability into days, minutes, and hours of annual downtime based on a 365-day year.
well into the millions of dollars to design and implement, with high ongoing operational costs as well. Marginal increases in availability require large incremental investments in all system components. Moving from 95% to 99% availability is likely to be very costly, while moving from 99% to 99.99% will probably cost even more.Another key aspect of measuring availability is the definition of when the system must be available. A required availability of 99% of the time during normal working hours from 8 a.m. to 5 p.m. is very different from 99% availability based on a 24-hour day. In the same way that you must carefully define your required level of availability, you must also consider the hours during which availability is measured. For example, a lot of companies still typically take orders during business hours. The cost of a down order-entry system is very high during the business day. However, the cost of downtime drops after 5 p.m. This factor points to opportunities for scheduled downtime after hours that will, in turn, help reduce unplanned failures during business hours. At the other end of the spectrum, consider web-based and multinational companies, whose global reach implies that the business day never ends.The often casually stated default requirement that a system be available 24/7 must be put in the context of the high costs required. Even an initial examination of the complexity and cost of very high availability will often lead to more realistic goals and associated budgets for system availability.The costs of achieving high availability are certainly justified in some cases. It may cost a brokerage house millions of dollars an hour for each hour that their key systems are down, while a less demanding business, such as catalog sales, may lose only thousands of dollars an hour, based on a less efficient manual system that acts as a stopgap measure. But, regardless of the cost of lost business opportunity, an unexpected loss of availability can cut into the productivity of employees and IT staff alike. 10.1.2 Causes of Unplanned DowntimeThere are many different causes of unplanned downtime. You can prevent some very easily, while others require significant investments in site infrastructure, telecommunications, hardware, software, and skilled employees. Figure 10-1 summarizes some of the more common causes of system failures. Figure 10-1. Causes of unplanned downtime![]() When creating a plan to guarantee the availability of your application, you should make a point of considering all the items shown in this chart as well as other potential causes of system interruption that are specific to your own circumstances. As with all planning, it's much better to consider all options, even if you quickly dismiss them, than to be caught off guard when an unexpected event occurs. 10.1.3 System Availability Versus Component AvailabilityA complete system is composed of various hardware, software, and networking components operating as a technology stack. Ensuring the availability of individual components doesn't necessarily guarantee system availability. Different strategies and solutions exist for achieving high availability for each of the system components. Figure 10-2 illustrates the technology stack used to deliver a potential system. Figure 10-2. Components of a system![]() As this figure shows, a variety of physical and logical layers must cooperate to deliver an application. Some systems may involve fewer components; for example, a two-tier client/server system would not have the additional application server components.Failures in the components above the database can effectively prevent access to the database even though the database itself may be available. The database server and the database itself serve as the foundation for the stack. When a database fails, it immediately affects the higher levels of the stack. If the failure results in lost or corrupted data, the overall integrity of the application may be affected.The potential threats to availability span all the components involved in an application system, but in this chapter we'll examine only availability issues relating specifically to the database. |