The Ultimate Windows Server 1002003 System Administrators Guide [Electronic resources] نسخه متنی

UNDERSTANDING CLUSTER SERVICES

Clustering involves linking two servers (known as nodes) that share a disk drive(s) and configured clustering software. All configuration and resource data is stored on the shared storage devices, and the nodes are networked through independent interconnects that ensure reliable communication. They are grouped under one name. Since all nodes are aware of what is being processed locally and on sister nodes, a single group name is used to manage the cluster as a whole.

In the Windows Server 2003 clustering implementation, there are two forms of software. Clustering software manages intercommunications between the nodes; Microsoft Cluster Services (MSCS) manages internode activities. The Resource Monitor checks on the viability of cluster communications and the health of the nodes. The Cluster Administrator views, configures, and modifies cluster operations and is invoked through Start menu Administrative Tools Cluster Administrator or by invoking cluster from the command prompt.

As a node goes online, it searches for other nodes to join by polling the designated internal network. In this way, all nodes are notified of the new node's existence. If other nodes cannot be found on a preexisting cluster, the new node takes control of the quorum resources residing on the shared disk that contains state and configuration data. It will theoretically receive current information, since Cluster Services maintains the latest copy of the quorum resource database. The quorum-capable resource selected to maintain the configuration data is necessary for the cluster's recovery because this data contains all of the changes applied to the cluster database. The quorum resource is generally accessible to other cluster resources so that all cluster nodes have access to database changes. If the node is the first to be placed in the cluster, it will automatically create the quorum resources database, which can be shared as other nodes come online.

Failover and Failback

Every node is aware of when another node goes offline or returns to service. When one of the clustered servers fails, its processes are transferred to another member node. This process is known as failover. When the failed node comes back online, the workload is transferred back to it in a process known as failback (see Figure 17.1).

Figure 17.1. Basic Cluster Failover

While every node is aware of the activities of all the others, each performs specialized functions while online to balance processing loads. The administrator collects cluster resources into groups and assigns group activities to particular nodes. If a node fails, the affected group functions are transferred to an operational node until the failed node can be brought back online. Every group contains a priority list of which node should execute its functions. Group functions can be owned by only one server at a time. Although they ordinarily comprise related objects, this is not a requirement. However, resources that depend on one another must reside in the same group.

Clusters can perform a number of general or special functions, but are most often established as file, print, application, and Web servers. These can take the form of physical or virtual servers. A virtual server is a special group type that acts ordinarily as a server. However, its functions will failover just like those of any other cluster group. Although like any physical server, virtual servers have a network name and an IP address, failover capabilities make its resources available even when the system fails. Clusters are also commonly created as high-availability file servers, printer servers, Web servers, and application servers (Figure 17.2).

Figure 17.2. Common Generic and Specific Cluster Uses

Cluster Network Concerns

Clusters depend on the ability to communicate reliably between nodes. Therefore, they will use any privately available network to locate and communicate with their nodes. For reasons of security and reliability, clusters do not use networks that are configured for public use.

Nodes that are unable to communicate are said to be partitioned. When this occurs, Cluster Services shuts down all but one node in order to ensure data consistency.

A problem can arise if each node has only one network adapter. If an adapter on one of the nodes fails, that node will be unable to communicate with the others, and thus all nodes will automatically attempt to take control of quorum resources. Should the node with the failed adapter take control, all cluster resources will be out of reach by clients that depend on them. For that reason there should be more than one network adapter to provide added redundancy.

Traffic is another clear issue that arises when only one network adapter is available on cluster nodes. The single adapter must handle node-to-node traffic and also cluster-to-client systems traffic. Thus, a multihomed environment is recommended so that all traffic can be appropriately routed.

The Cluster Database

The cluster database is maintained on every node and housed in the Windows Server 2003 Registry. To ensure consistent data, all updates are globally distributed to all nodes in the cluster through periodic checks. As discussed earlier, the quorum resource database creates a recovery log so that configuration and state data can be used in case of massive failures.

In addition to configuration information, the cluster database contains data about cluster objects and properties. A cluster object can be a group of functions that have been gathered to run on a designated node, for example. Cluster objects also include internal network data, network adapters and interfaces, node resources, and cluster resources such as shared storage devices.

Basic Cluster Planning

A number of commonsense steps should be taken during the cluster-planning phase. Since the primary purpose of a cluster is to provide redundancy, risk assessment is one of the steps. The system administrator must minimize any area of potential failure while ensuring that a redundant system will perform in the event of a node failure. The following sections describe key planning issues that should be resolved prior to deployment of Windows Server 2003 clusters.

APPLICATION SELECTION

The selection of applications to be hosted on a cluster represents one aspect of planning. This selection requires that a number of criteria be met by an application before it can be used in a cluster environment, since not all applications have been written to the API and so are not cluster-aware. Nevertheless, many applications that are not cluster-aware can function in a cluster if they meet the following basic requirements:

Storage of the application data is flexible and configurable. That is, in order to ensure failover of data, it is necessary to direct storage to the shared cluster disks.

The application is able to retry and recover connection in the event of temporary network disconnections.

The application uses TCP/IP. Applications that are aware of DCOM, named pipes, or RPC in concert with TCP/IP will have greater reliability.

NOTEApplications written solely to support NetBEUI or IPX are not candidates for Windows Server 2003 clusters.

GROUPING AND CAPACITY

Cluster capacity is often determined by how applications and functions are grouped and assigned to nodes. All cluster-qualified applications must be examined in terms of resource requirements and dependencies, and all those with mutual dependencies should be grouped together. Groups of applications should then be assigned to nodes to achieve a relative workload balance. All nodes must have sufficient disk space for the applications and sufficiently fast CPU capacity and memory. Remember, all nodes in one group must be identically configured in terms of hardware capacity.

Nodes must belong to the same domain as either domain controllers or member servers. Cluster nodes that are also domain controllers require additional system capacity, since they can be heavily burdened by authentication and replication services in large networked environments.

NETWORKS

Clusters are connected through one or more independent physical networks, also known as interconnects. It is strongly recommended that there be at least two PCI bus-based interfaces, and that prior to the cluster software installation, the nodes be interconnected using TCP/IP for all interfaces. No routers can be used to connect nodes; they must be directly connected and assigned IP addresses on the same network. DHCP can be used to lease IP addresses, but if a lease expires and the DHCP is unavailable, connectivity can be interrupted or an automatic failover can occur. If DHCP is to be used, obtain a permanently leased IP address.

Hubs can be used. The hub that connects nodes should create a dedicated segment for the interconnect, which has no other systems connected to it.

Once the nodes are interconnected, the clustering software can be installed. The communications between cluster nodes are known as heartbeats and primarily involve keeping track of node states. If an irregular heartbeat is detected from a cluster partner, the process of failover begins.

REQUIRED DOMAIN ACCOUNTS

Computer accounts must be established for each node before installation of Cluster Services, and they must be in the same domain. An account should become a member of the local Administrators group; if the nodes are configured as domain controllers, however, it becomes a member of the Domain Administrators group. The Change password on next logon property should also be disabled.

POSSIBLE POINTS OF FAILURE

Clustering assumes reliability of hardware and interconnectivity, so the system administrator has a responsibility to reduce the number of possible failure points. As discussed in Chapter 14, "Disk Management," Windows Server 2003 supports both disk mirrors and RAID-5 striping with parity. It is strongly recommended that nodes be built with multiple network interfaces in case an adapter fails and that disk redundancy be implemented. A UPS system is also highly recommended.

Cluster Administration

Clusters are managed with a number of tools and utilities. The Application Configuration Wizard manages cluster-serviced software. The Cluster Administrator tool, automatically installed on all nodes with Cluster Services, views cluster configurations and manages failover activities. The cluster.exe utility provides command-line support.

THE APPLICATION CONFIGURATION WIZARD

The Application Configuration Wizard configures software for clusters. It helps to define dependencies and sets failover/failback policies, and it creates a virtual server to run the applications. The virtual server must have an IP address in order to run applications. You should reserve the IP address before running the Applications Configuration Wizard.

UNDERSTANDING DEFAULT GROUPS

The Cluster Administrator tool is used to view cluster and disk groups, which contain default settings and resources for generic clusters and failover. The cluster group attributes contain the information necessary for network connectivity. This informationa name and IP address that apply to the entire clusteris entered during Cluster Services setup. The cluster group should never be deleted, renamed, or modified. If changes must be made, do it through the disk group.

Each shared storage disk has its resources identified through the disk group. Also part of this group is the physical disk resource data. It is possible to add IP address resources and the network name resource. The latter can then be renamed to distinguish its function.

FAILOVER ADMINISTRATION

Failover occurs when either the entire node or a node resource fails. In this case, the Cluster Administrator tool is notified. Resource failures do not always bring down the entire node and may cause only group failovers. Unfortunately, group failures are more difficult to detect. The Cluster Administrator should be employed to periodically check group ownerships to determine if a failover has occurred; even minor failovers can cause performance degradation and decrease resource availability. It should also be used regularly to review cluster states and make appropriate corrections.

USING THE CLUSTER.EXE COMMAND-LINE UTILITY

Many administrative tasks can be accomplished using the cluster.exe utility from the command prompt or Start menu Run. The syntax is shown here, with the cluster name as an optional component:


cluster [cluster name] /option

These are the cluster.exe command options:

/rename: new name renames the client.

/version displays the version number for Cluster Services.

/quorumresource: resource name [/path:path] [maxlogsize:maximum size in kilobytes] changes the name of the quorum resource and its location and log size.

/list: domain name lists the cluster in the named domain.

/? provides more cluster.exe syntax information.

The following are the cluster.exec functions that can be used for administration:

Cluster [name] Node [name] /option locates the option information for the specified node. Options include status, pause, resume, evict (or remove node), listinterfaces, privproperties (private properties), properties, and help.

Cluster [name] Group [name] /option permits viewing and managing of cluster groups. The options include status, create, delete, rename, moveto, online, offline, properties, privproperties, listowners, setowners, and help.

Cluster [name] Network [name] /option permits the management of cluster networks. Options include status, properties, privproperties, rename, and listinterfaces.

Cluster [name] Netinterfaces [name] /option supports the management of the network interfaces. The options include status, properties, and privproperties.

Cluster [name] Resources [name] /option administers cluster resources. The options include status, create, delete, rename, addowner, removeowner, listowner, moveto, properties, privproperties, online, fail, offline, listdependencies, removedependencies, and help.

Setting Cluster Properties

Cluster configuration can be changed by altering the settings in the selected cluster's File menu Properties dialog box.

Group properties are viewed and modified from the State option, which also allows the node that currently owns the group to be viewed. The State option is used to change the Name, Description, and Preferred owner properties.

Failover policy and failback policy are set by selecting their respective tabs and making appropriate changes. The number of times a group can failover during a given period is established by the Failover tab. If a group failover occurs more often than the amount set in the policy, Cluster Services takes the resource offline until it can be fixed. The default policy is not to have this function occur automatically. However, in some cases it may be appropriate to have an automatic failback after a specific number of hours.

The Properties dialog box is used to check and modify other settings, including

Resource dependencies.
A great number of dependencies affect clusters. For example, DHCP, File share, ISS Server Instance, Message Queuing Services, WINS Services, and Print spoolers all depend on the availability of the physical disk or storage device. Many also depend on network name and IP address availability.

Network settings.
This option is used to view and change the network name, description, state, IP address, and address mask, as well as to enable or disable the network cluster. If enabled, the communication types can be set as client-to-cluster and/or node-to-node.

Advanced settings.
This option is used to determine whether the resource is to be restarted or allowed to fail. If the latter, the Affect the group box must be checked. Then a Thresholdthe number of times a restart is attempted within a defined Period of timemust be defined. When that threshold is exceeded during that period, failover occurs. It will not occur if the Affect the group option is not checked.

Clustering Features New to Windows Server 2003

Windows Server 2003 introduces new or significantly enhanced features to its server clustering technology. The following highlights some of the most relevant enhancements for administrators:

Active Directory support for virtual servers.
A computer object in Active Directory supports cluster-aware and Active Directoryaware applications. The new object in Active Directory permits the association of related configuration information. IT administrators use this feature to find virtual server computer objects. On the downside, viewing of cluster topology information or the association between virtual servers and clusters is not supported.

Cluster password changes without reboot.
A requirement to reboot a clustered system obviously defeats the purpose of a failover; fortunately, Windows Server 2003 reduces this requirement. It allows the cluster account to be updated while the cluster service remains online. For example, administrators can now change passwords on a node without rebooting the server. This is accomplished through the cluster.exe command.

Cluster availability metrics for individual groups.
An individual group in a cluster can now be reviewed. It logs group success and failure of events either online or offline in the system's event log. Internode clock skew events are written to the system's event log. Administrators can review the event log streams from all the cluster nodes to examine the times between offline and online events. A calculation of the amount of time a group is online versus off line can then be determined. This will provide information on the availability of a cluster resource group.

Network load virtual clusters balancing.
Windows Server 2003 provides for the setting of port rules. This means that the administrator can set rules for every hosted application, Web site, or Virtual Internet Protocol (IP) address in a Network Load Balance (NLB) cluster. Virtual clusters can thereby be managed independently.

Quorum of nodes.
Windows Server 2003 introduces a quorum device that is accessible to all nodes in a cluster. Since this disk is used in the event of communication failures to centrally store highly available configuration data, administrators can use the Quorum of Nodes feature to support multiple sites. This is practically helpful in a geographically dispersed cluster because it lessens the need for shared storage technology to span the sites.

SAN support.
Storage Area Network (SAN) technology is designed to allow all the disks in a cluster to be included on the same storage fabric through a single host bus adapter (HBA). All server storage can be centralized into a SAN. This includes boot, page file, and system disks using a single or multiple redundant HBAs. All disks, except the boot disk, the system disk, and disks that contain page files, can be configured as shared disks regardless of the storage bus technology. Since SAN technology is a method of data consolidation and eased management, administrators of clustered services should delve beyond the scope of this book for additional information on the use of Windows Server 2003 expanded facilities.

Virtual name Kerberos authentication.
This feature permits Kerberos authentication when clients access a cluster via a cluster virtual name. When enabled, Active Directory creates a virtual computer object.

NOTEA convenience notation on printer driver enhancements is appropriate to round out this section. Windows 2000 painfully required the manual installation of each printer driver on every node of the cluster. A new feature provides a method to install a printer driver on the virtual cluster. The installation will propagate to all nodes of the cluster. This feature is available via Start Control Panel Printers and Other Hardware.

Postscript on Cluster Technology

This section provides a conceptual and administrative overview of Windows Server 2003 clustering technology, with shared disk clustering at its center. Much information needs to be gathered before deploying a cluster. Moreover, the technology is shifting. Thus, it is highly recommended that a system administrator download the latest Microsoft white papers on clusters prior to installing and deploying Cluster Services.