Microsoft Windows Server 2003 Deployment Kit [Electronic resources] : Planning Server Deployments

Microsoft Corporation

نسخه متنی -صفحه : 122/ 72

Ensuring Availability in NLB Solutions

By including multiple cluster hosts that provide the same applications and services, Network Load Balancing inherently provides fault tolerance. However, to provide a complete high-availability solution, your design must include more than just Network Load Balancing. The network infrastructure and system hardware that are associated with the cluster also affect the availability of the applications and services running on the cluster. In addition, include application-level monitoring, such as the monitoring provided by MOM or Application Center 2000, to ensure that applications are operating correctly. Ensuring availability is the final task in designing a network load balancing solution, as shown in Figure 8.17.

Figure 8.17: Ensuring Availability in Network Load Balancing Solutions

Include the following items to ensure high availability for clients that access applications and services on the cluster:

Cluster hosts with fault-tolerant hardware

Signed device drivers and software only

Fault-tolerant network infrastructure

Also, you can improve the availability of applications and services by using methods that are specific to the applications and services running on the cluster. For more information about improving the availability of services running on Network Load Balancing, see "Additional Resources" later in this chapter.

Note

After you design the specifications for ensuring availability, document your decisions. For a Word document to assist you in recording your decisions, see "NLB Cluster Host Worksheet" (Sdcnlb_1.doc) on the Windows Server 2003 Deployment Kit companion CD (or see "NLB Cluster Host Worksheet" on the Web at http://www.microsoft.com/reskit).

Including Fault-Tolerant Hardware on Cluster Hosts

The cluster host hardware that you specify in your design can affect the uptime of the applications and services in your solution. Including system hardware with a longer mean time between failure (MTBF) can ensure that you experience fewer cluster host failures. In addition, including cluster hosts with fault-tolerant hardware can prevent unnecessary outages in your cluster.

For more information about including fault-tolerant hardware in your design, see "Planning for High Availability and Scalability" in this book.

Including Signed Device Drivers and Software Only

Another method of improving application and services uptime is to include only signed device drivers and software on the cluster hosts. Drivers and software that are signed have been certified by Microsoft, your organization, or third-party companies that your organization trusts. Because unstable drivers and software can affect cluster uptime, including only signed device drivers and software helps ensure the stability of the cluster.

You can specify Group Policy settings in Active Directory to centrally configure the cluster hosts for the appropriate driver signing settings. When you are unable to specify driver signing by using Active Directory, specify the Local Security policies for each cluster host.

For more information about signed device drivers and software, see "Driver signing for Windows" in Help and Support Center for Windows Server 2003. For more information about specifying Group Policy settings, see "Designing a Group Policy Infrastructure" in Designing a Managed Environment of this kit.

Including a Fault-Tolerant Network Infrastructure

Even if you perform all the previous steps to ensure fault tolerance for improving application and services uptime, your solution is not complete. Even with a highly optimized cluster, failures in the network infrastructure between the clients and the cluster can reduce uptime for applications and services.

To include a fault-tolerant network infrastructure between the clients and the cluster, complete the following steps:

Identify the intermediary network segments, routers, and switches between the clients and the cluster.

Determine if any of the intermediary network segments, routers, and switches between the clients and the cluster are potential points of failure that can cause application and services outages.

Modify your design to provide a fault-tolerant network infrastructure, based on the information in Table 8.24.

Table 8.24: Providing Network Infrastructure Fault Tolerance Based on Limitations
Potential Failure Points	Include Any of These Fault-Tolerance Solutions
Network connection failure	Redundant network connections to provide fault tolerance in the event that a network connection fails. For example, if you are connected to the Internet by a single T1 connection, a failure of the T1 connection would prevent clients from accessing the cluster. Specify a redundant T1 connection to help prevent this type of failure.
Switch failure	Redundant switches to provide fault tolerance in the event that a switch fails.
Router failure	Redundant routers and redundant routes to provide fault tolerance in the event that a router fails.

Example: Ensuring Availability in NLB Solutions

An organization provides VPN remote access to the organization's users through the Internet. The organization's design includes Network Load Balancing to eliminate any application outages and improve performance. The VPN remote access servers, running Routing and Remote Access and Windows Server 2003, reside in the organization's perimeter network, which is located between the Internet and the organization's private network.

The design includes ISA Server, which protects the VPN remote access servers in the perimeter network. The ISA Server servers are in a cluster (ISANLB-01) that provides load balancing and fault tolerance.

During the pilot testing of the Web content caching solution, with ISA Server, the deployment team experiences a number of outages that affect the entire solution. Figure 8.18 illustrates the Web content caching design, incorporating ISA Server, that is tested.

Figure 8.18: VPN Remote Access Test Environment

Table 8.25 lists the results of the pilot test for each portion of the design that is illustrated in Figure 8.18.

Table 8.25: Results of VPN Pilot Test
Design Portion Tested	Results
Network infrastructure	A failure of Router-01 resulted in total outage of VPN services. A failure of Switch-01 resulted in total outage of VPN services. A total outage of VPN services occurred for failure of the network segment between Router-01 and the Internet, or between Switch-01 and Router-01.
Cluster host hardware	A failure of a disk drive in a cluster host resulted in a total cluster host failure. Network adapters in the cluster hosts have unsigned device drivers.

After the pilot test, the VPN remote access design is modified. Figure 8.19 illustrates the modified version of the VPN design.

Figure 8.19: Revised VPN Remote Access Design

Table 8.26 lists the design decisions that the organization makes to improve the uptime for the VPN remote access solution and the reasons for making those decisions.

Table 8.26: Improving VPN Solution Uptime Design Decisions and Their Justification
Decision	Reason for the Decision
Added Router-02 and additional Internet connection.	Provides a redundant route path to the Internet in the event that Router-01 or the corresponding Internet connection fails.
Added Switch-02.	Provides redundant paths in the event that Switch-01 fails.
RAID disk controllers were used in each cluster host.	Provide disk fault tolerance to help prevent disk failures and cluster host failures.
Group Policy was established to allow cluster hosts to load signed device drivers.	Provides trusted software to help ensure a stable environment, and prevents cluster host failure.