Designing the Support Structure for Server Clusters
A server cluster depends on network connectivity for communication with clients and end users, and for intra-cluster configuration information between nodes. The other fundamental requirement for a server cluster is storage — the storage of application information that is shared and failed over between nodes, and the storage of cluster configuration data. Note that your storage design, in addition to providing sufficient storage capacity, must be able to deliver information to the cluster in a timely and efficient manner.Before you design the support structure for your server clusters, assemble the information that you collected during the steps outlined earlier in this chapter, including:Usage statistics, such as the number of users and clients who will be using applications running on your clusters.
The number of clusters you plan to run on your network, and the number of nodes in each cluster.
The amount of storage your applications need.
The scalability needs in your organization, particularly projections for anticipated growth in storage needs and network traffic in the near future.
Figure 7.14 shows the high-level process for analyzing your network and infrastructure needs.

Figure 7.14: Designing the Support Structure for Server Clusters
| Important | All storage and networking solutions for a server cluster must be supported by Microsoft. For more information about qualified storage and networking solutions, see the Windows Catalog link on the Web Resources page at http://www.microsoft.com/windows/reskits/webresources. | 
Choosing a Cluster Storage Method
Currently, the most common storage I/O technologies used with Windows Server 2003 clusters are Parallel SCSI and Fibre Channel.Beginning with the Windows Server 2003 release, SCSI interconnects are supported only on two-node clusters running the 32-bit version of Windows Server 2003, Enterprise Edition. SCSI is not supported on Windows Server 2003, Datacenter Edition, or any 64-bit version of the Windows Server 2003 family.SCSI storage and two types of Fibre Channel — arbitrated loops and switched network — are currently qualified storage configurations in the Windows Catalog. Both SCSI and Fibre Channel are discussed later in this section.
| Note | This section provides an overview of storage technology and implementation recommendations. The information here is intended to provide deployment guidelines and help administrators make informed deployment decisions. For procedural information about installing or configuring storage in your server cluster, see "Storage configuration options" in Help and Support Center for Windows Server 2003. | 
In addition to configuring adequate storage for applications, you also need a dedicated disk to use as the quorum device. This disk must have a minimum capacity of 50 megabytes (MB). For optimal NTFS file system performance, it is recommended that the quorum disk have at least 500 MB. For more information about the quorum device, see "Choosing a Cluster Model" later in this chapter. It is recommended that you do not use the quorum disk for storing applications or anything other than quorum data.
Using RAID in Server Clusters
It is strongly recommended that you use a RAID solution for all disks running the Windows Server 2003 operating system. A RAID configuration prevents disk failure from being a single point of failure in your server cluster.It is possible to use hardware-based RAID with Windows Server 2003. By using a controller provided by the hardware vendor or RAID solution provider, you can configure physical disks into fault-tolerant sets or volumes. The sets can be made visible to clusters either as whole disks or as smaller partitions.
Cluster nodes are unaware of the physical implementation of disks and treat each volume as a physical disk. For an ideal configuration, it is recommended that you associate the logical volume size as closely as possible with the actual physical disk size. This practice avoids taxing the physical disk with too many I/O operations from multiple logical volumes.If your RAID controller supports dynamic logical unit (LUN) expansion, cluster disks can be extended without rebooting, providing excellent scalability for your organization's needs. The command-line tool DiskPart.exe lets administrators apply the physical extension of the disks to the logical partitions with no disruption of service. For more information about DiskPart.exe, in Help and Support Center for Windows Server 2003, click Tools, and then click Command-line reference A–Z.
Using Storage Interconnects with Server Clusters
Make sure that all interconnects used on a server cluster are supported by Microsoft by checking the Windows Catalog. This also applies to any software that is used to provide fault tolerance or load balancing for adapters or interconnects. For more information about qualified interconnects and software, see the Windows Catalog link on the Web Resources page at http://www.microsoft.com/windows/reskits/webresources.Consider establishing redundant access routes to stored data. Providing multiple paths to your storage is another way to provide high availability. With versions of Windows operating systems earlier than Windows Server 2003, vendors and storage providers implemented two or more storage fabrics that were configured for load balancing and fault tolerance. The specific solution was designed by the vendor and required specially designed configurations and drivers. With the release of Windows Server 2003, Microsoft developed and supplied vendors with a multipath driver, which they can use in place of customized drivers.
Using SCSI Storage with Server Clusters
SCSI is supported only on two-node clusters running the 32-bit version of Windows Server 2003, Enterprise Edition. Server clusters using SCSI are not supported on the 64-bit version of Windows Server 2003. General guidelines for deploying SCSI in your server cluster are listed below:
All devices on the SCSI bus, including disks, must have unique SCSI IDs (for example, by default, most SCSI adapters have an ID of 7).
SCSI hubs are not supported in server clusters.
The SCSI bus must be terminated. Use physical terminating devices, and do not use controller-based or device-based termination.
Using Fibre Channel Storage with Server Clusters
There are two supported methods of Fibre Channel-based storage in a Windows Server 2003 server cluster: arbitrated loops and switched fabric.
| Important | When evaluating both types of Fibre Channel implementation, read the vendor's documentation and be sure you understand the specific features and restrictions of each. | 
Although the term Fibre Channel implies the use of fiber-optic technology, copper coaxial cable is also allowed for interconnects.
Arbitrated Loops (FC-AL)
A Fibre Channel arbitrated loop (FC-AL) is a set of nodes and devices connected into a single loop. FC-AL provides a cost-effective way to connect up to 126 devices into a single network. As with SCSI, a maximum of two nodes is supported in an FC-AL server cluster configured with a hub. An FC-AL is illustrated in Figure 7.15.

Figure 7.15: FC-AL Connection
FC-ALs provide a solution for two nodes and a small number of devices in relatively static configurations. All devices on the loop share the media, and any packet traveling from one device to another must pass through all intermediate devices.If your high-availability needs can be met with a two-node server cluster, an FC-AL deployment has several advantages:
The cost is relatively low.
Loops can be expanded to add storage (although nodes cannot be added).
Loops are easy for Fibre Channel vendors to develop.
The disadvantage is that loops can be difficult to deploy in an organization. Because every device on the loop shares the media, overall bandwidth in the cluster is lowered. Some organizations might also be unduly restricted by the 126-device limit.
Switched Fabric (FC-SW)
For any cluster larger than two nodes, a switched Fibre Channel fabric (FC-SW) is the only supported storage technology. In an FC-SW, devices are connected in a many-to-many topology using Fibre Channel switches (illustrated in Figure 7.16).

Figure 7.16: FC-SW Connection
When a node or device communicates with another node or device in an FC-SW, the source and target set up a point-to-point connection (similar to a virtual circuit) and communicate directly with each other. The fabric itself routes data from the source to the target. In an FC-SW, the media are not shared, any device can communicate with any other device, and communication occurs at full bus speed. This is a fully scalable enterprise solution and, as such, is highly recommended for deployment with server clusters.FC-SW is the primary technology employed in SANs. Other advantages of FC-SW include ease of deployment, the ability to support millions of devices, and switches that provide fault isolation and rerouting. Also, there are no shared media as there are in FC-AL, allowing for faster communication. However, be aware that FC-SWs can be difficult for vendors to develop, and the switches can be expensive. Vendors also have to account for interoperability issues between components from different vendors or manufacturers.
Using SANs with Server Clusters
For any large-scale cluster deployment, it is recommended that you use a SAN for data storage. Smaller SCSI and stand-alone Fibre Channel storage devices work with server clusters, but SANs provide superior fault tolerance.A SAN is a set of interconnected devices (such as disks and tapes) and servers that are connected to a common communication and data transfer infrastructure (FC-SW, in the case of Windows Server 2003 clusters). A SAN allows multiple server access to a pool of storage in which any server can potentially access any storage unit.The information in this section provides an overview of using SAN technology with your Windows Server 2003 clusters. For more information about deploying server clusters on SANs, see the Windows Clustering: Storage Area Networks link on the Web Resources page at http://www.microsoft.com/windows/reskits/webresources.
| Note | Vendors that provide SAN fabric components and software management tools have a wide range of tools for setting up, configuring, monitoring, and managing the SAN fabric. Contact your SAN vendor for details about your particular SAN solution. | 
The following sections provide an overview of SAN concepts that directly affect a server cluster deployment.
HBAs
Host bus adapters (HBAs) are the interface cards that connect a cluster node to a SAN, similar to the way that a network adapter connects a server to a typical Ethernet network. HBAs, however, are more difficult to configure than network adapters (unless the HBAs are preconfigured by the SAN vendor).
Zoning and LUN Masking
Zoning and LUN masking are fundamental to SAN deployments, particularly as they relate to a Windows Server 2003 cluster deployment. Both methods provide isolation and protection of server cluster data within a SAN, and in most deployments one or the other is sufficient. Work with your hardware vendor to determine whether zoning or LUN masking is more appropriate for your organization.
Zoning
Many devices and nodes can be attached to a SAN. With data stored in a single cloud, or storage entity, it is important to control which hosts have access to specific devices. Zoning allows administrators to partition devices in logical volumes and thereby reserve the devices in a volume for a server cluster. That means that all interactions between cluster nodes and devices in the logical storage volumes are isolated within the boundaries of the zone; other noncluster members of the SAN are not affected by cluster activity.Figure 7.17 is a logical depiction of two SAN zones (Zone A and Zone B), each containing a storage controller (S1 and S2, respectively).

Figure 7.17: Zoning
In this implementation, Node A and Node B can access data from the storage controller S1, but Node C cannot. Node C can access data from storage controller S2.Zoning needs to be implemented at the hardware level (with the controller or switch) and not through software. The primary reason is that zoning is also a security mechanism for a SAN-based cluster, because unauthorized servers cannot access devices inside the zone (access control is implemented by the switches in the fabric, so a host adapter cannot gain access to a device for which it has not been configured). With software zoning, the cluster would be left unsecured if the software component failed.In addition to providing cluster security, zoning also limits the traffic flow within a given SAN environment. Traffic between ports is routed only to segments of the fabric that are in the same zone.
LUN masking
A LUN is a logical disk defined within a SAN. Server clusters see LUNs and think they are physical disks. LUN masking, performed at the controller level, allows you to define relationships between LUNs and cluster nodes. Storage controllers usually provide the means for creating LUN-level access controls that allow access to a given LUN to one or more hosts. By providing this access control at the storage controller, the controller itself can enforce access policies to the devices.LUN masking provides more granular security than zoning, because LUNs provide a means for zoning at the port level. For example, many SAN switches allow overlapping zones, which enables a storage controller to reside in multiple zones. Multiple clusters in multiple zones can share the data on those controllers. Figure 7.18 illustrates such a scenario.

Figure 7.18: Storage Controller in Multiple Zones
LUNs used by Cluster A can be masked, or hidden, from Cluster B so that only authorized users can access data on a shared storage controller.
Requirements for Deploying SANs with Windows Server 2003 Clusters
The following list highlights the deployment requirements to meet when using a SAN storage solution with your server cluster. For a white paper to provide more complete information about using SANs with server clusters, see the Windows Clustering: Storage Area Networks link on the Web Resources page at http://www.microsoft.com/windows/reskits/webresources.
| Note | A utility that might be helpful in deploying SANs with server clusters is Mountvol.exe. Mountvol.exe is included with the Windows Server 2003 operating system, and provides a means of creating volume mount points and linking volumes without requiring a drive letter. For more information about Mountvol.exe, see "Mountvol" in Help and Support Center for Windows Server 2003. | 
Each cluster on a SAN must be deployed in its own zone. The mechanism the cluster uses to protect access to the disks can have an adverse effect on other clusters that are in the same zone. By using zoning to separate the cluster traffic from other cluster or noncluster traffic, there is no chance of interference.
All HBAs in a single cluster must be the same type and have the same firmware version. Many storage and switch vendors require that all HBAs in the same zone — and, in some cases, the same fabric — share these characteristics.
All storage device drivers and HBA device drivers in a cluster must have the same software version.
Never allow multiple nodes access to the same storage devices unless they are in the same cluster.
| Caution | If multiple nodes in different clusters can access a given disk, data corruption will result. | 
Never put tape devices into the same zone as cluster disk storage devices. A tape device could misinterpret a bus rest and rewind at inappropriate times, such as during a large backup.
Guidelines for Deploying SANs with Windows Server 2003 Clusters
In addition to the SAN requirements discussed in the previous section, the following practices are highly recommended for server cluster deployment:
In a highly available storage fabric, you need to deploy clustered servers with multiple HBAs. In these cases, always load the multipath driver software. If the I/O subsystem sees two HBAs, it assumes they are different buses and enumerates all the devices as though they were different devices on each bus. The host, meanwhile, is seeing multiple paths to the same disks. Failure to load the multipath driver will disable the second device, because the operating system sees what it thinks are two independent disks with the same signature.
Hardware snapshots can go either to a server outside the server cluster or to a backup clone disk. When the original disk fails, you can replace it in the server cluster with its clone. When using cloned disks, it is very important that only one disk with a given disk signature be exposed to the server cluster at one time. Many controllers provide snapshots at the controller level that can be exposed to the cluster as a completely separate LUN. Cluster performance is degraded when multiple devices have the same signature. If a hardware snapshot is exposed back to the node with the original disk online, the I/O subsystem attempts to rewrite the signature. However, if the snapshot is exposed to another node in the cluster, the Cluster service does not recognize it as a different disk and the result could be data corruption. Although this is not specifically a SAN issue, the controllers that provide this functionality are typically deployed in a SAN environment.
| Caution | Exposing a hardware snapshot to a node on the cluster might result in data corruption. | 
Choosing a Cluster Model
The term cluster model refers to the manner in which the quorum resource is used in the server cluster. Server clusters require a quorum resource, which contains all of the configuration data necessary for recovery of the cluster. The cluster database, which resides in the Windows Server 2003 registry on each cluster node, contains information about all physical and logical elements in a cluster, including cluster objects, their properties, and configuration data. When a cluster node fails and then comes back online, the other nodes update the failed node's copy of the cluster database. It is the quorum resource that allows the Cluster service to keep every active node's database up to date.The quorum resource, like any other resource, can be owned by only one node at a time. A node can form a cluster only if it can gain control of the quorum resource. Similarly, a node can join a cluster (or remain in an existing cluster) only if it can communicate with the node that controls the quorum resource.
Servers negotiate for ownership of the quorum resource in order to avoid split-brain scenarios, which occur when nodes lose communication with one another and multiple partitions of the cluster converge with a well-defined set of members, each partition believing that it is the one and only instance of the cluster. In such a case, each partition would continue operating in the belief that it had control of any cluster-wide shared resources or state, ultimately leading to data corruption.The concept of a quorum ensures that only one partition of a cluster survives a split-brain scenario. When the cluster is partitioned, the quorum resource is used as an arbiter. The partition that owns the quorum resource is allowed to continue. The other partitions of the cluster are said to have lost quorum, and Cluster service — along with any resources hosted on the nodes that are not part of the partition that has quorum — is terminated.The Cluster service stores cluster configuration data in a quorum log file. This file is usually located on a shared disk that all nodes in the cluster can access, and it acts as the definitive version of the cluster configuration. It holds cluster configuration information such as which servers are part of the cluster, what resources are installed in the cluster, and the state of those resources (for example, online or offline). Because the purpose of a cluster is to have multiple physical servers acting as a single virtual server, it is critical that each of the physical servers has a consistent view of how the cluster is configured.A quorum resource can be any resource that:
Provides a means for arbitration leading to membership and cluster state decisions.
Provides physical storage to store configuration information.
Uses NTFS.
The quorum resource you choose is dictated by the server cluster model you deploy (number of servers, number of sites, and so on). Do not build your server cluster around a particular type of quorum resource; instead, design the cluster that best supports your applications, and then choose one of the cluster models described in the following sections.Three types of cluster models are available for deployment with server clusters:
Local quorum
Single quorum device
Majority node set
For more information about the cluster database and the quorum resource, see "Quorum resource" in Help and Support Center for Windows Server 2003.
Local Quorum Cluster
This cluster model is for clusters that consist of only one node. This model is also referred to as a local quorum. It is typically used for:
Deploying dynamic file shares on a single cluster node, to ease home directory deployment and administration.
Testing.
Development.
Single Quorum Device Cluster
The most widely used cluster model is the single quorum device cluster (also referred to as the standard quorum model), which is illustrated in Figure 7.19. The definitive copy of the cluster configuration data is on a single cluster storage device connected to all nodes.

Figure 7.19: Single Quorum Device Cluster
Majority Node Set Cluster
With Windows Server 2003, Microsoft introduces the majority node set cluster. The majority node set quorum model is intended for sophisticated, end-to-end clustering solutions. Each node maintains its own copy of the cluster configuration data. The quorum resource ensures that the cluster configuration data is kept consistent across the nodes. For this reason, majority node set quorums are typically found in geographically dispersed clusters. Another advantage of majority node set quorums is that a quorum disk can be taken offline for maintenance and the cluster will continue to operate.
The major difference between majority node set clusters and single quorum device clusters is that single quorum device clusters can survive with just one node, but majority node set clusters need to have a majority of the cluster nodes survive a failure for the server cluster to continue operating. A majority node set cluster is illustrated in Figure 7.20.

Figure 7.20: Majority Node Set Cluster
A majority node set cluster is a good solution for controlled, targeted scenarios as part of a cluster solution offered by an independent software vendor (ISV) or independent hardware vendor (IHV). By abstracting storage from the Cluster service, majority node set clusters provide ISVs with much greater flexibility for designing sophisticated cluster scenarios.There are strict requirements to adhere to when you deploy majority node set clusters. The deployment of majority node set clusters is appropriate in some specialized configurations that need tightly consistent cluster features and do not have shared disks, for example:
Clusters that host applications that can fail over, but that use other, application-specific methods to keep data consistent between nodes. Database log shipping and file replication for relatively static data are examples of this kind of application.
Clusters that host applications that have no persistent data, but that need to cooperate in a tightly coupled manner to provide consistent volatile state.
Multisite clusters, which span more than one physical site.
For more information about majority node sets and geographically dispersed clusters, see "Protecting Data From Failure or Disaster" later in this chapter.
Designing the Cluster Network
A network performs one of the following roles in a cluster:
A private network carries internal cluster communication.
A public network provides client systems with access to cluster application services.
A mixed network (public and private) carries internal cluster communication and connects client systems to cluster application services.
Figure 7.21 shows a typical four-node server cluster deployment with a public (intranet) and private network.

Figure 7.21: Four-Node Cluster Service Cluster with Private and Public Networks
A network might have no role in a cluster; this is also known as disabling a network for cluster use. In the case of single quorum device clusters, the Cluster service will not use the network for internal traffic, nor will it bring IP Address resources online. Other cluster-related traffic, such as traffic to and from a domain controller for authentication, might or might not use this network.Use these guidelines for designing the network segments of the server clusters:
You can provide fault tolerance in your network by grouping network adapters on multiple ports to a single physical segment. This practice is known as teaming network adapters and is described in "Eliminating Single Points of Failure in Your Network Design" later in this chapter.
You can use any network adapter that is certified for Network Driver Interface Specification (NDIS).
Use only network adapters that comply with specifications for the Peripheral Component Interconnect (PCI) local bus. Do not use network adapters compatible with an Industrial Standard Architecture (ISA) or Extended Industrial Standard Architecture (EISA) bus.
For each server cluster network, use only identical network adapters on all nodes (that is, the same make, model, and firmware version). This is not a requirement, but it is strongly recommended. Consider the loads on each type of adapter and plan accordingly; for example, the private network adapters can require 10 MB to 100 MB of throughput, although the public network adapters can require 1 gigabyte (GB) of throughput.
Eliminating Single Points of Failure in Your Network Design
You must incorporate failsafe measures within the cluster network design. To achieve this end, adhere to the following rules:
Configure at least two of the cluster networks for internal cluster communication to eliminate a single point of failure. A server cluster whose nodes are connected by only one network is not supported.
Design each cluster network so that it fails independently of other cluster networks. This requirement implies that the components that make up any two networks must be physically independent. For example, the use of a multiport network adapter to attach a node to two cluster networks does not meet this requirement if the ports share circuitry.
If using multihoming nodes, make sure the adapters reside on separate subnets.
Do not connect multiple adapters on one node to the same network; doing so does not provide fault tolerance or load balancing.
| Note | If a cluster node has multiple adapters attached to one network, the Cluster service will recognize only one network adapter per node per subnet. | 
The best way to guard against network failure in your server cluster is by teaming network adapters. By grouping network adapters on multiple ports to a single physical segment, you can provide fault tolerance to your cluster network. If a port fails — whether the failure occurs on the adapter, cable, switch port, or switch — another port takes over automatically. To the operating system and other devices on the network, this failover is transparent.An important consideration, however, is that networks dedicated to internal server cluster communication cannot be teamed. Teaming network adapters is supported only on networks that are not dedicated to internal cluster traffic. Teaming network adapters on all cluster networks concurrently is not supported. There are alternative methods for achieving redundancy on networks that are dedicated to internal cluster traffic. These alternatives include adding a second private network, which avoids the cost of adding network adapters (however, this second network must be for intra-cluster network traffic only).Teaming network adapters on other cluster networks is acceptable; however, if communication problems occur on a teamed network, it is recommended that you disable teaming.
Determining Domain Controller Access for Server Clusters
For Windows Server 2003 clusters to function properly, any node that forms part of the cluster must validate the Cluster service account in the local domain. To do this, each node must be able to establish a secure channel with a domain controller. If a domain controller cannot validate the account, the Cluster service does not start. This is also true for other clustered applications that need account validation, such as SQL Server and Microsoft Exchange 2000. There are three ways to provide the necessary access, presented here in order of preference:
Configure cluster nodes as member servers within a Windows domain and give them fast, reliable connectivity to a local domain controller.
If the connectivity between cluster nodes and domain controllers is slow or unreliable, locate a domain controller within the cluster.
Configure the cluster nodes as domain controllers, so that the Cluster service account can always be validated.
The security rights applicable to a domain controller administrator are often inappropriate or nonfunctional for a cluster administrator. A domain controller administrator can apply global policy settings to a domain controller that might conflict with the cluster role. Domain controller administrative roles span the network; this model generally does not suit the security model for clusters.It is strongly recommended that you do not configure the cluster nodes as domain controllers, if at all possible. But if you must configure the cluster nodes as domain controllers, follow these guidelines:
If one node in a two-node cluster is a domain controller, both nodes must be domain controllers. If you have more than two nodes, you must configure at least two nodes as domain controllers. This gives you failover assurance for the domain controller services.
A domain controller that is idle can use between 130 MB and 140 MB of Random Access Memory (RAM), which includes running the Cluster service. If these domain controllers have to replicate with other domain controllers within the domain and across domains, replication traffic can saturate bandwidth and degrade overall performance.
If the cluster nodes are the only domain controllers, they must be Domain Name System (DNS) servers as well. You must address the problem of not registering the private interface in DNS, especially if the interface is connected by a crossover cable (two-node cluster only). The DNS servers must support dynamic updates. You must configure each domain controller and cluster node with a primary — and at least one secondary — DNS server. Each DNS server needs to point to itself for primary DNS resolution and to the other DNS servers for secondary resolution.
If the cluster nodes are the only domain controllers, they must also be global catalog servers.
The first domain controller in the forest takes on all single-master operation roles. You can redistribute these roles to each node; however, if a node fails over, the single-master operation roles that the node has taken on are no longer available.
If a domain controller is so busy that the Cluster service is unable to gain access to the quorum as needed, the Cluster service might interpret this as a resource failure and cause the resource group to fail over to the other node.
Clustering other applications, such as SQL Server or Exchange 2000, in a scenario where the nodes are also domain controllers can result in poor performance and low availability due to resource constraints. Be sure to test this configuration thoroughly in a lab environment before you deploy it.
You must promote a cluster node to a domain controller by using the Active Directory Installation Wizard before you create a cluster on that node or add it to an existing cluster.
Practice extreme caution when demoting a domain controller that is also a cluster node. When a node is demoted from a domain controller, security settings are changed. For example, certain domain accounts and groups revert back to the default built-in accounts and groups originally on the sever. This means the security ID (SID) of the Domain Admins group changes to that of the local Administrators group. As a result, the Administrators entry in the security descriptor for the Cluster service is no longer valid because its SID still matches the Domain Admins group, and not the SID for the local Administrators group.
 لطفا منتظر باشید ...
        لطفا منتظر باشید ...
     
                     
                
                