Microsoft Windows Server 2003 Deployment Kit [Electronic resources] : Planning Server Deployments نسخه متنی

Planning File Server Availability

When you plan for file server availability, begin by ensuring the uptime of the physical file server. Next, determine whether the file server contains business-critical data. If it does, you need to decide whether to use an availability strategy, such as replication or clustering, to ensure the availability of the data. If data on the file server is temporary or not business critical, continue to "Planning File Server Security" later in this chapter.

Figure 2.9 describes the planning process for file server availability.

Figure 2.9: Planning File Server Availability

Planning for File Server Uptime

The following guidelines provide basic steps to increase the uptime of file servers. For comprehensive coverage of these topics, see "Planning for High Availability and Scalability" in this book.

Choosing Hardware for Reliability and Availability

Implement a well-planned hardware strategy to increase file server availability while reducing support costs and failure recovery times. You can choose hardware for reliability and availability by following these guidelines:

Choose hardware from the Windows Server Catalog for products in the Windows Server 2003 family.

Establish hardware standards.

Keep spares or stand-by systems for redundancy.

Use fault-tolerant network components, server cooling, and power supplies.

Use error checking and correcting (ECC) memory. ECC memory uses a checking scheme to guarantee that the failure of any one bit out of a byte of information is corrected.

Use fault-tolerant storage components, such as redundant disk controllers, hot-swappable disks and hot spares, and disks configured as redundant arrays of independent disks (RAID). For more information about RAID, see "Planning the Layout and RAID Level of Volumes" later in this chapter.

Use disk resource management tools, such as disk quotas, to ensure that users always have available disk space.

Keep a log or database of changes made to the file server. The log should contain dated entries for hardware failures and replacements, service pack and software installations, and other significant changes.

Deploy FRS-compliant antivirus software on link target computers prior to adding files to FRS replicated links or adding new members to the replica set.

Maintaining the Physical Environment

Maintain high standards for the environment in which the file servers must run. Neglecting the environment can negate all other efforts to maintain the availability of file servers. Take the following measures to maintain the physical environment:

Maintain proper temperature and humidity.

Protect servers from dust or contaminates.

For power outages, provide a steady supply of power for the servers by using uninterruptible power supply (UPS) units or standby generators.

Maintain server cables.

Secure the server room.

Planning for Backup and Preparing for Recovery

Backups are essential for high-availability file servers, because the ultimate recovery method is to restore from backup. To plan for backup and recovery:

Create a plan for backup.

Monitor backups.

Decide between local and network backups.

Check the condition of backup media.

Perform trial restorations on a regular basis. A trial restoration confirms that your files are properly backed up and can uncover hardware problems that do not show up with software verifications. Be sure to note the time it takes to either restore or re-replicate the data so that you will know how long it takes to bring a server back online after a failure.

Perform regular backups of clustered file servers and test cluster failures and failover policies. For more information about backing up server clusters and testing policies, see "Backing up and restoring server clusters" and "Test cluster failures and failover policies" in Help and Support Center for Windows Server 2003.

If you are using DFS and FRS, put a procedure in place for recovering failed members of an FRS replica set. For more information about troubleshooting FRS, see the Distributed Services Guide of the Windows Server 2003 Resource Kit (or see the Distributed Services Guide on the Web at http://www.microsoft.com/reskit).

Choosing Software for Reliability

Installing incompatible or unreliable software reduces the overall availability of file servers. To choose software for reliability, follow these guidelines:

Select software that is compatible with Windows Server 2003.

Select signed drivers. Microsoft promotes driver signing for designated device classes as a mechanism to advance the quality of drivers, to provide a better user experience, and to reduce support costs for vendors and total cost of ownership for customers. For more information about driver signing, see the Driver Signing and File Protection link on the Web Resources page at http://www.microsoft.com/windows/reskits/webresources.

Select software that supports the high-availability features you require, such as server clusters and online backup.

Identifying Business-Critical File Servers

After you plan for file server uptime, identify file servers that must be highly available. These file servers typically contain business-critical data that is required for an organization's central purpose, such as software distribution points, e-mail or business databases, and internal or external Web sites. File servers need to be highly available for the following reasons as well:

The file server contains one or more stand-alone DFS roots. If you create a stand-alone DFS root on a file server, and the server fails or is taken offline for maintenance, the entire DFS namespace is unavailable. Users can access the shared folders only if they know the name of the file servers where the shared folders are located. To make stand-alone DFS namespaces fault tolerant, you can create the roots on clustered file servers.

Your organization is consolidating file servers. Consolidation leads to a greater dependency on fewer file servers, so you must ensure the availability of the remaining file servers.

Your organization has existing SLAs. Service Level Agreements (SLAs) and Organization Level Agreements (OLAs) specify the required uptime for file servers, usually defined as the percentage of time that the file server is available for use. For example, your organization might require that file servers have 99.7-percent uptime regardless of the type of data they contain. To meet these agreements, some organizations deploy clustered file servers and assign experienced administrators to manage those file servers. In addition to deploying hardware, these administrators use defined and tested processes to fulfill these agreements.

You do not need to implement additional availability strategies on file servers that store temporary or non-business-critical files. For example, if the file server fails in some way, but users can tolerate the loss of the file server for the time it takes to repair the file server or restore the data from backup, you do not need to use an availability strategy such as replication or clustering. Instead, you can work to decrease the amount of time it takes to restore the file server.

Choosing an Availability Strategy for Business-Critical Data

If a file server contains business-critical data, you need to make certain that the data is highly available. Windows Server 2003 provides two primary strategies for increasing data availability: FRS and clustering.

FRS This strategy involves creating one or more domain-based DFS namespaces, using link targets that point to multiple file servers and using File Replication service (FRS) to synchronize the data in the link targets. This chapter describes the design and deployment process for FRS, although you can also synchronize data manually by using tools such as Robocopy or by using third-party replication tools.

Clustering A server cluster is a group of individual computer systems working together cooperatively to provide increased computing power and to provide continuous availability of business-critical applications or resources. This group of computers appears to network clients as if it were a single system, by virtue of a common cluster name. A cluster can be configured so that the workload is distributed among the group, and if one of the cluster members fails, another cluster member automatically assumes its duties.

Both of these strategies involve using multiple file servers to ensure data availability. If for some reason you cannot use multiple file servers, follow the guidelines in "Planning for File Server Uptime" earlier in this chapter to increase the availability of the physical server.

When evaluating these two strategies, you must keep in mind your organization's tolerance for inconsistent data. FRS can cause temporary data inconsistency as data is replicated across multiple servers. Clustered file servers maintain only one copy of the data; therefore, data inconsistency does not occur.

Note

If your organization plans to implement geographically dispersed clusters for disaster tolerance, you need to understand your data consistency needs in different failure and recovery scenarios and work with the solution vendors to match your requirements. Different geographically dispersed cluster solutions provide different replication and redundancy strategies, ranging from synchronous mirroring across sites to asynchronous replication. For more information about geographically dispersed clusters, see "Designing and Deploying Server Clusters" in this book.

Using FRS as an Availability Strategy

You can use FRS to replicate data in domain-based DFS namespaces on file servers running a Windows 2000 Server or Windows Server 2003 operating system. When evaluating FRS, you must determine whether your organization can tolerate periods of inconsistent data that can occur within a replica set. Data inconsistency can occur at the file and folder level as follows:

FRS uses a "last writer wins" algorithm for files. This algorithm is applied in two situations: when the same file is changed on two or more servers, and when two or more different files with the same name are added to the replica tree on different servers. The most recent update to a file in a replica set becomes the version of the file that replicates to the other members of the replica set, which might result in data loss if multiple masters have updated the file. In addition, FRS cannot enforce file-sharing restrictions or file locking between two users who are working on the same file on two different replica set members.

FRS uses a "last writer wins" algorithm when a folder on two or more servers is changed, such as by changing folder attributes. However, FRS uses a "first writer wins" algorithm when two or more identically named folders on different servers are added to the replica tree. When this occurs, FRS identifies the conflict during replication, and the receiving member protects the original copy of the folder and renames (morphs) the later inbound copy of the folder. The morphed folder names have a suffix of "_NTFRS_xxxxxxxx," where "xxxxxxxx" represents eight random hexadecimal digits. The folders are replicated to all servers in the replica set, and administrators can later merge the contents of the folders or take some other measure to reestablish the single folder.

Temporary data inconsistency due to replication latency is more likely to occur in geographically diverse sites with infrequent replication across slow WAN links. If you want to use replication among servers in the same site, consistency is probably not an issue, because the replication can occur quickly after the file changes — assuming that only one user makes changes to the data. If two users make changes to the data, replication conflicts occur and one user loses those changes.

Replication works well in the following scenarios.

When the data is read-only or changes infrequently

Because changes occur infrequently, the data is usually consistent. In addition, FRS has less data to replicate, so network bandwidth is not heavily affected.

When the sites are geographically dispersed and consistency is not an issue

Geographically dispersed sites might have slower bandwidth connections, but if your organization does not require the data in those sites to always be consistent with each other, you can configure replication in those sites on a schedule that make sense for your organization. For example, if your organization has sites in Los Angeles and Zimbabwe, you can place one or more replicas of the data in servers in those sites and schedule replication to occur at night or during periods of low bandwidth use. Because in this scenario replication could take hours or days to update every member, the delay must be acceptable to your organization.

When each file is changed by only one person from one location

Replication conflicts rarely occur if only a single user changes a given file from a single location. Some common scenarios for single authorship are redirected My Documents folders and other home directories. Conversely, if users roam between sites, replication latency could cause the file to be temporarily inconsistent between sites.

When replication takes place among a small number of servers in the same site

Replication latency is reduced by frequently replicating data using high-speed connections. As a result, data tends to be more consistent.

Replication should not be used in the following scenarios.

In organizations with no operations group or dedicated administrators

Organizations that do not have the staff or the time to monitor FRS event logs on each replica member should not implement FRS. Organizations must also have well-defined procedures in place to prevent the accidental or unintentional deletion of data in the replica set, because deleting a file or folder from one replica member causes the file or folder (and its contents) to be deleted from all replica members. In addition, if a folder is moved out of the replica tree, FRS deletes the folder and its contents on the remaining replica members. To avoid having to restore the files or folders from backup, you can enable shadow copies on some of the replica members so that you can easily restore a file or folder that was accidentally deleted. For more information about shadow copies, see "Designing a Shadow Copy Strategy" later in this chapter. For more information about FRS logs, see the Distributed Services Guide of the Windows Server 2003 Resource Kit (or see the Distributed Services Guide on the Web at http://www.microsoft.com/reskit).

In organizations that do not update virus signatures or closely manage folder permissions

A virus in FRS-replicated content can spread rapidly to replica members and to clients that access the replicated data. Viruses are especially damaging in environments where the Everyone group has share permissions or NTFS permissions to modify content. To prevent the spread of viruses, it is essential that replica members have FRS-compatible, up-to-date virus scanners installed on the servers and on clients that access replicated data. For more information about preventing the spread of viruses, see "Planning Virus Protection for File Servers" and "Planning DFS and FRS Security" later in this chapter.

When the rate of change exceeds what FRS can replicate

If you plan to schedule replication to occur during a specified replication window, verify that FRS can replicate all the changed files within the window. Replication throughput is determined by a number of factors:

The number and size of changed files

The speed of the disk subsystem

The speed of the network

Whether you have optimized the servers by placing the replica tree, the staging directory, and the FRS data on separate disks.

Each organization will have different FRS throughput rates, depending on these factors. In addition, if your data compresses extremely well, your file throughput will be higher. To determine the replication rate, perform testing in a lab environment that resembles your production environment.

If the amount of data changes exceeds what FRS can replicate in a given period of time, you need to change one of these factors, such as increasing the speed of the disk subsystem (number of disks, mechanical speed, or disk cache) or network. If no change is possible, FRS is not recommended for your organization.

In organizations that always use clustered file servers

Some organizations use clustered file servers regardless of whether the server contains business-critical data. Although storing FRS-replicated content on the cluster storage of a clustered file server might imply increased availability of the data, combining clustering and FRS is not recommended. Data might become inconsistent among the members of the replica set, thus defeating the purpose of clustering, which is to have highly available data that remains consistent because only one copy of the data exists. In addition, Windows Server 2003 does not support configuring FRS to replicate data on cluster storage.

In organizations that use Remote Storage

Remote Storage is a feature in Windows Server 2003 that automatically copies infrequently used files on local volumes to a library of magnetic tapes or magneto-optical disks. Organizations that use Remote Storage must not use FRS on the same volume. Specifically, do not perform any of the following tasks:

Do not create a replica set on a volume that is managed by Remote Storage.

Do not add a volume that contains folders that are part of an FRS replica set to Remote Storage.

If you use Remote Storage for volumes that contain FRS replica sets, backup tapes might be damaged or destroyed if FRS recalls a large number of files from Remote Storage. The damage occurs because FRS does not recall files in media order. As a result, files are extracted randomly, and the process can take days to complete and might damage or destroy the tape in the process. Random extraction from magneto-optical disks can also be extremely time consuming.

Caution

Windows Server 2003 does not prevent you from using Remote Storage and FRS replica sets on the same volumes, so take extra precautions to avoid using these two features on the same volume.

When locks by users or processes prevent updates to files and folders

FRS does not replicate locked files or folders to other replica members, nor does FRS update a file on a replica member if the local file is open. If users or processes frequently leave files open for extended periods, consider using clustering instead of FRS.

When the data to be replicated is on mounted drives

If a mounted drive exists in a replica tree, FRS does not replicate the data in the mounted drive.

When the data to be replicated is encrypted by using EFS

FRS does not replicate files encrypted by using EFS, nor does FRS warn you that EFS-encrypted files are present in the replica set.

When the FRS jet database, FRS logs, and staging directory are stored on volumes where NTFS disk quotas are enabled

If you plan to store a replica set on a volume where disk quotas are enabled, you must move the staging directory, FRS jet database, and FRS logs to a volume where disk quotas are disabled. For more information, see "Planning the Staging Directory" later in this chapter.

Using Clustering as an Availability Strategy

If the data changes frequently and your organization requires consistent data that is highly available, use clustered file servers. Clustered file servers allow client access to file services during unplanned and planned outages. When one of the servers in the cluster is unavailable, cluster resources and applications move to other available cluster nodes. Server clusters do not guarantee nonstop operation, but they do provide sufficient availability for most business-critical applications, including file services. A cluster service can monitor applications and resources and automatically recognize and recover from many failure conditions. This ability provides flexibility in managing the workload within a cluster and improves overall system availability.

Server cluster benefits include the following:

High availability Ownership of resources, such as disks and IP addresses, is automatically transferred from a failed server to a surviving server. When a system or application in the cluster fails, the cluster software restarts the failed application on a surviving server, or it disperses the work from the failed node to the remaining nodes. As a result, users experience only a momentary pause in service.

Manageability You can use the Cluster Administrator snap-in to manage a cluster as a single system and to manage applications as if they were running on a single server. You can move applications to different servers within the cluster, and you can manually balance server workloads and free servers for planned maintenance. You can also monitor the status of the cluster, all nodes, and resources from anywhere on the network.

Scalability Server clusters can grow to meet increased demand. When the overall client load for a clustered file server exceeds the cluster's capabilities, you can add additional nodes.

Clustered file servers work well in the following scenarios.

When multiple users access and change the files

Because only one copy of the file exists, Windows Server 2003 can enforce file locking so that only one user can make changes at a time. As a result, data is always consistent.

When large numbers of users access data in the same site

Clustered file servers are useful for providing access to users in a single physical site. In this case, you do not need a replication method to provide data consistency among sites.

When files change frequently and data consistency is a must

Even with a large number of changes, data is always consistent and there is no need to replicate the changes to multiple servers.

When you want to reduce the administrative overhead associated with creating many shared folders

On clustered file servers, you can use the Share Subdirectories feature to automatically share any folders that are created within a folder that is configured as a File Share resource. This feature is useful if you need to create a large number of shared folders.

When you want to ensure the availability of a stand-alone DFS root

Creating stand-alone DFS roots on clustered file servers allows the namespaces to remain available, even if one of the nodes of the cluster fails.

When you want to make encrypted files highly available

Windows Server 2003 supports using the EFS clustered file servers. Using EFS in FRS replica sets is not supported. For more information about using EFS on clustered file servers, see "Planning Encrypted File Storage" later in this chapter.

Some issues to consider when using clustered file servers include the following.

Dynamic disks are not available

If you want to use dynamic disks, you must use them on nonclustered file servers or on the local storage devices of the cluster. If the node hosting the local storage devices fails, the data becomes unavailable until the node is repaired and brought back online.

If you need to extend the size of basic volumes used for shared cluster storage, you can do so by using DiskPart. For more information about extending basic volumes, see "Extend a basic volume" in Help and Support Center for Windows Server 2003.

Clustered file servers must use complete cluster systems

For your server clusters to be supported by Microsoft, you must choose complete cluster systems from the Windows Server Catalog for the Windows Server 2003 family. For more information about support for server clusters, see article Q309395, "The Microsoft Support Policy for Server Clusters and the Hardware." To find this article, see the Microsoft Knowledge Base link on the Web Resources page at http://www.microsoft.com/windows/reskits/webresources.

Increasing Data Availability by Using FRS

To design an FRS replication strategy, you need to take the following steps:

Identify data to replicate.

Choose the replication topology.

Design the replication schedule.

Plan the staging directory.

Determine connection priorities.

The following sections describe each of these steps.

Important

If you configure FRS to replicate files on file servers running Windows 2000, it is highly recommended that you install the Windows 2000 Service Pack 3 (SP3) and the post-SP3 release of Ntfrs.exe, or install later service packs that include this release. For more information about updating FRS, see article 811370, "Issues That Are Fixed in the Post-Service Pack 3 Release of Ntfrs.exe." To find this article, see the Microsoft Knowledge Base link on the Web Resources page at http://www.microsoft.com/windows/reskits/webresources.

For more information about FRS security, see "Planning DFS and FRS Security" later in this chapter.

Identifying Data to Replicate

If you identify which DFS roots and links require multiple replicated targets, as described in "Increasing the Availability of DFS Namespaces" earlier in this chapter, you also need to consider the following issues.

Size of the USN journal

The default USN journal size in Windows Server 2003 is 512 MB. If your volume contains 400,000 files or fewer, no additional configuration is required. For every 100,000 additional files on a volume containing FRS-replicated content, increase the update sequence number (USN) journal by 128 MB. If files on the volume are changed or renamed frequently (regardless of whether they are part of the replica set), consider sizing the USN journal larger than these recommendations to prevent USN journal wraps, which can occur when large numbers of files change so quickly that the USN journal must discard the oldest changes (before FRS has a chance to detect the changes) to stay within the specified size limit. To recover from a journal wrap, you must perform a nonauthoritative restore on the server to synchronize its files with the files on the other replica members. For more information about USN journal wraps, see article Q292438, "Troubleshooting Journal_Wrap Errors on Sysvol and DFS Replica Sets." To find this article, see the Microsoft Knowledge Base link on the Web Resources page at http://www.microsoft.com/windows/reskits/webresources.

Where to store the replica tree

To provide the best consistency and minimize administrative intervention, carefully plan where to store the replica tree. Your choice will depend on the amount of storage on the server, the volume layout, the number of files on the volume that do not participate in a replica set, and the number of replica sets on the server.

If you have a single replica tree, store it on its own volume if possible, or store it on a volume that also stores other user data. Adjust the USN journal size based on the total number of files on the volume.

If you have more replica trees than volumes, store multiple replica trees on each volume and adjust the USN journal size based on the total number of files on the volume.

If you can create a volume for each replica tree, store each replica tree on its own volume and adjust the USN journal accordingly.

The staging directory also requires careful placement. For more information, see "Planning the Staging Directory" later in this chapter.

Replicated link targets to omit from the DFS namespace

When you omit a replicated link target from the namespace, you disable referrals so that DFS does not direct clients to the link target. However, the link target still replicates with other members. Disabling DFS referrals for replica members is useful for a number of scenarios. For example, you can use such a member for any of the following purposes:

As a backup server. For example, assume you have five servers in a replica set. You might make four of the servers part of the DFS namespace and use the fifth server as a backup source. In this scenario, DFS never refers clients to the fifth server.

As a reference server. For example, you omit (from the namespace) the server where all changes originate, thus ensuring that you can always introduce new content without being blocked because of open files. The reference server becomes the "known good" server; all other servers should contain identical content.

As a way to update application data. For example, you have two replica members, ServerA and ServerB, which contain application data that is frequently updated. You can disable referrals for ServerA, update the application data, and enable referrals to ServerA. You can then repeat the procedure on ServerB. In this scenario, one of the replica members is always available while the other is being updated.

To omit a replicated link target from the namespace, configure the link target within the DFS namespace and configure replication as desired by using the Distributed File System snap-in. Then use the Enable or Disable Referral command in the Distributed File System snap-in to disable referrals to the link targets that you want to omit from the namespace.

Files or folders to exclude from replication

You can use the Distributed File System snap-in to set filters that exclude subfolders (and their contents) or files from replication. You exclude subfolders by specifying their name, and you exclude files by using wild cards to specify file names and extensions. By default, no subfolders are excluded. The default file filters exclude the following files from replication:

File names starting with a tilde (~) character

Files with .bak or .tmp extensions

Filters act as exclusion filters only for new files and folders added to a replica set. They have no effect on existing files in the replica set. For example, if you change the existing file filter from "*.tmp, *.bak" to "*.old, *.bak," FRS does not go through the replica set and exclude all files that match *.old, nor does it go through the replica set and begin to replicate all files that match *.tmp. After the filter change, new files matching *.old that are added to the replica set are not replicated. New files matching *.tmp that are added to the replica set are replicated.

Any file in the replica set that was excluded from replication under the old file filter (such as Test.tmp, created when the old filter was in force) is automatically replicated under the new file filter only after the file is modified. Likewise, changes to any file that was not excluded from replication under the old filter (such as Test.old, created when the old filter was in force) continue to replicate under the new filter, until you explicitly delete the file.

These rules apply in the same manner to the directory exclusion filter. If a directory is excluded, all subdirectories and files under that directory are also excluded.

Regardless of the filters you set, FRS always excludes the following from replication:

NTFS mounted drives

Files encrypted by using EFS

Any reparse points except those associated with DFS namespaces. If a file has a reparse point used for Hierarchical Storage Management (HSM) or Single Instance Store (SIS), FRS replicates the underlying file but not the reparse point.

Files on which the temporary attribute has been set

For more information about configuring filters, see the Distributed Services Guide of the Windows Server 2003 Resource Kit (or see the Distributed Services Guide on the Web at http://www.microsoft.com/reskit).

Number of servers in the replica set

Will you replicate data among three servers or three hundred servers? As the number of servers increases, configuring the topology and schedule becomes more complex, and it takes longer to replicate data to all members of the replica set. If many members exist in the replica set, and if any member can originate new data, the amount of aggregate bandwidth consumed can be very high. Also note that any server with outbound connections is a server that can potentially originate content and thus requires careful monitoring.

Number of replica sets per server

Although there is no limit to the number of replica sets that can be hosted on a single server, it is recommended that you host no more than 150 replica sets on a single server to provide optimal replication performance. The optimal number of replica sets for servers in your organization depends on the CPU, memory, disk input/output (I/O) throughput, and the amount of data changed.

Network bandwidth between sites

Replicating data between sites that are connected with slow WAN links requires careful planning of the topology and schedule, because FRS does not provide bandwidth throttling. If the sites have a high-bandwidth connection, but business-critical databases and other applications use that connection as well, schedule replication so that it does not consume bandwidth that is needed for other uses.

Amount of data in the replica tree

If the replica tree includes a large amount of data, and a new member is served by a low-bandwidth link, prestage the replica tree on the new member. Prestaging a replica tree preserves security descriptors and object IDs and involves restoring the data to the new member without using the low-bandwidth link. For example, you can restore a backup of the replica tree on the new member, or you can place the new member in the same site as an existing member and prestage the replica tree by using a high-bandwidth local area network (LAN) connection. For more information about adding new replica members, see "Adding a New Member to an Existing Replica Set" later in this chapter.

How frequently the data changes

If the data that is to be replicated changes frequently, estimate the total amount of data that will be generated by each replica member over a period of time. Data that changes frequently is more difficult to keep synchronized across multiple servers, especially if those servers are located across WAN links. If replication latency is a concern, consider clustering instead of replication.

Whether you want to use single master replication

FRS is a multimaster replication engine, which means that new content and changes to existing content can originate from any member. If you want to limit the servers that can originate data, you can set up single master replication in one of the following ways:

Set NTFS permissions on targets to control who can update data in the replica tree.

Configure the replication topology so that a target does not have any outbound connections.

Important

If you use this method, do not directly make changes on replica members that have no outbound connections. Changes made directly on a server that has no outbound connections will not replicate, and the server replica tree will be inconsistent and potentially unpredictable.

Speed of recovery from hardware failures

If a server contains a large amount of replicated, business-critical data, be sure to use redundant server components, such as hardware RAID, redundant disk controllers, and redundant network cards to reduce the likelihood that a hardware failure could cause the server to become unavailable. To further increase availability, use multiple servers in each site so that if a server in the site is unavailable due to hardware problems, clients in that site can access the remaining servers as you repair and restore the data on the failed server. Using multiple servers is also useful for remote sites where repair time is lengthy and in cases where the amount of replicated data is so large that it cannot be restored from a central site within the required restoration window.

Expected growth of replicated data

You need to know whether you plan to replicate larger and larger amounts of data over time so that you can make sure that your topology, schedule, and bandwidth can handle the additional data.

Expected increase in the number of replica members

If you plan to deploy a small number of servers at first and then deploy additional servers over time, you need to make sure that your topology and bandwidth can handle the new servers. For example, if you plan to add 100 more servers, and each of those servers can originate data, you must make sure that your network has the bandwidth to replicate this data. On the other hand, if you are deploying 100 new servers, but only one of those servers can originate data, bandwidth might not be a limiting factor, although you do need to make sure that your bandwidth can handle the initial synchronization of data among servers.

Choosing a Replication Topology

The replication topology describes the logical connections that FRS uses to replicate files among servers. To minimize the network bandwidth required for replication, identify where your bandwidth is highest and lowest on your network, and model your FRS replication topology after the physical topology of your network.

You use the Distributed File System snap-in to select a replication topology. After you add a second link target to a link, you are prompted to configure replication by using the Configure Replication Wizard, which allows you to perform the following tasks:

Choose the initial master whose contents are replicated to the other targets. The initial master is only relevant during the creation of the replica set. After that, the server that acted as the initial master is treated no differently from any other server.

Choose the location of the staging directory. By default, the staging directory is placed on a different volume than the content to be replicated.

Choose the replication topology: ring, hub and spoke, full mesh, or custom. If you choose custom, you can add or delete connections to or from the replica set. For the other three standard topologies, you can only enable or disable connections between target servers.

The following describes the four topology types available in the Distributed File System snap-in. For an Excel spreadsheet to assist you in documenting your decision after you choose the topology, see "FRS Configuration Worksheet" (Sdcfsv_2.xls) on the Windows Server 2003 Deployment Kit companion CD (or see "FRS Configuration Worksheet" on the Web at http://www.microsoft.com/reskit).

Ring topology

Figure 2.10 illustrates a ring topology. In a ring topology, files replicate from one server to another in a circular configuration, with each server connected to the servers on either side of it in the ring. Choose a ring topology if your physical network topology resembles a ring topology. For example, if each server is located in a different site and has existing connectivity with neighboring servers, you can choose the ring topology so that each server connects only to neighboring servers. Because the ring topology is bidirectional, each connection is fault tolerant. If a single connection or server fails, data can still replicate to all members in the opposite direction.

Figure 2.10: Ring Topology

If you plan to create a ring topology with more than seven members, consider adding shortcut connections between some of the members to reduce the number of hops required for data to replicate to all members.

Hub and spoke topology

Figure 2.11 illustrates hub and spoke topology. In a hub and spoke topology, you designate one server as the hub. Other servers, called spokes, connect to the hub. This topology is typically used for WANs that consist of faster network connections between major computing hubs and slower links connecting branch offices. In this topology, files replicate from the hub server to the spoke servers and vice versa, but files do not replicate directly between two spoke servers. When you choose this topology, you must choose which server will act as the hub. If you want to set up multiple hubs, use a custom topology. Using multiple hubs is recommended, because using only one hub means that the hub is a single point of failure.

Figure 2.11: Hub and Spoke Topology

Full mesh topology

Figure 2.12 illustrates a full mesh topology. In a full mesh topology, every server connects to every other server. A file created on one server replicates directly to all other servers. Because each member connects to every other member, the propagation of change orders for replicating files can impose a heavy burden on the network. To reduce unnecessary traffic, use a different topology or delete connections you do not actually need. A full mesh topology is not recommended for replica sets with five or more members.

Figure 2.12: Full Mesh Topology

Custom topology

With a custom topology, you create the connections between the servers yourself. One example of a custom topology is a redundant hub and spoke topology. In this configuration, a hub site might contain two file servers that are connected by a high-speed link. Each of these two hub servers might connect with four branch file servers in a hub and spoke arrangement. An example of this topology is shown in "Example: An Organization Designs a Replication Strategy" later in this chapter. Figure 2.13 illustrates another type of hub and spoke topology known as a multitier redundant hub and spoke with an optional staging server used for introducing changes into the replica set.

Figure 2.13: Multitier Redundant Hub and Spoke Topology

Designing a Replication Schedule

When you use the Distributed File System snap-in to configure replication, FRS schedules replication to take place 24 hours a day, seven days a week. Whenever you copy a file to a target that participates in replication, or when an existing file in the replica set is changed, FRS replicates the entire file to the other targets using the connections specified in the replication topology. Continuous replication is advised only for environments that meet the following criteria:

The replica members are all connected by high-bandwidth connections, such as hub servers in a redundant hub and spoke topology.

The amount of data to be replicated is not so large that it interferes with the bandwidth required for other purposes during peak or off-peak hours.

The application whose data is being replicated needs rapid distribution of file changes, such as antivirus signature files or login scripts.

If your environment does not meet these criteria, plan to create a replication schedule so that data is replicated at night or during other times as appropriate. A replication schedule involves enabling and disabling replication so that it occurs at specified times on specified days. For example, you might schedule replication to occur beginning at 12:00 midnight and ending at 2:00 A.M. every day.

When determining the replication schedule, consider the following issues:

Amount of data to be replicated and the length of the replication window Estimate the amount of data to be replicated so that you can choose a duration that is long enough to replicate all the data. In DFS namespaces, replication stops when the schedule window closes, and any remaining files are delayed until the next replication window opens. If you occasionally put a large amount of data into the replica set, FRS might take several replication periods to replicate all the data.

Amount of latency your organization can tolerate If you plan to replicate data that changes frequently, consider the amount of latency that you can tolerate in keeping your targets synchronized and the amount of bandwidth that is consumed during replication.

Whether you want to stagger schedules If you are using a redundant hub and spoke topology with several hubs, you can split the replication load among the hubs by staggering the schedule. An example of this schedule is described in "Sdcfsv_2.xls) on the Windows Server 2003 Deployment Kit companion CD (or see "FRS Configuration Worksheet" on the Web at http://www.microsoft.com/reskit).

Planning the Staging Directory

The staging directory acts as a buffer by retaining copies of updated files until replication partners successfully receive them and move them into the target directory. The following sections describe the benefits and design considerations for staging directories. For an Excel spreadsheet to assist you in documenting the staging directory settings and location, see "FRS Configuration Worksheet" (Sdcfsv_2.xls) on the Windows Server 2003 Deployment Kit companion CD (or see "FRS Configuration Worksheet" on the Web at http://www.microsoft.com/reskit).

Why FRS Uses a Staging Directory

The staging directory acts as a queue for changes to be replicated to downstream partners. After the changes are made to a file and the file is closed, the file content is compressed, written to the staging directory, and replicated according to schedule. Any further use of that file does not prevent FRS from replicating the staging file to other members. In addition, if the file is replicated to multiple downstream partners or to members with slow data links, using a staging file ensures that the underlying file in the replica tree can still be accessed.

Note

A new or updated file remains in the staging directory of a replica member with downstream partners even after the file has replicated to all members. The file is retained for seven days in order to optimize future synchronizations with new downstream partners. After seven days, FRS deletes the file. Do not attempt to delete any files in the staging directory yourself.

Staging Directory Size

The size of the staging directory governs the maximum amount of disk space that FRS can use to hold those staging files and the maximum file size that FRS can replicate. The default size of the staging directory is approximately 660 MB, the minimum size is 10 MB, and the maximum size is 2 TB. The largest file that FRS can replicate is determined by the staging directory size on both the upstream partner (the replica member that sends out the changed file) and downstream partners (the replica member that receives the changed file) and whether the replicated file, to the extent that it can be compressed, can be accommodated by the current staging directory size. Therefore, the largest file that FRS can replicate is 2 TB, assuming that the staging directory size is set to the maximum on upstream and downstream partners.

Managing Staging Directory Capacity

When FRS tries to allocate space for a staging file and is not successful, either because there is not enough physical disk space or because the size of the files in the staging directory has reached 90 percent of the staging directory size, FRS starts to remove files from the staging directory. Staged files are removed (in the order of the longest time since the last access) until the size of the staging directory has dropped below 60 percent of the staging directory limit. Additionally, staging files for downstream partners that have been inaccessible for more than seven days are deleted. As a result, FRS does not stop replicating if the staging directory runs out of free space. This means that if a downstream partner goes offline for an extended period of time, the offline member does not cause the upstream partner's staging directory to fill with accumulated staging files.

Note

In Windows Server 2003, FRS detects and suppresses excessive replication that could be caused when you use Group Policy to overwrite file permissions or when you use antivirus software that overwrites security descriptors. In these examples, writes to files cause no net content changes. When FRS detects a significant increase in the number of changes made to a file, FRS logs an event with ID 13567 in the File Replication Service event log. Despite the FRS response to excessive replication caused by applications, services, or security policies, you should investigate the source of the excessive replication and eliminate it.

The fact that FRS removes files from the staging directory does not mean that the underlying file is deleted or will not be replicated. The change order in the outbound log still exists, and the file will eventually be sent to downstream partners when they process the change order. However, before replication can take place, the upstream partner must recreate the file in the staging directory, which can affect performance. Recreating the staging file can also cause a replication delay if the file on the upstream partner is in use, preventing FRS from creating the staging file.

A staging directory can reach 90 percent of its capacity in the following situations:

More data is being replicated than the staging directory can hold. If you plan to replicate more than 660 MB of data, or if you expect that the largest file to replicate will be 660 MB or larger, you must increase the size of the staging directory. Use Table 2.7 for sizing guidelines.

Table 2.7: Staging Directory Size Guidelines per Replica Set
Scenario	Minimum	Acceptable	Best Performance
Adding a new replica member	Whichever is larger: 660 MB Take the size of the 128 largest files in the replica tree, multiplied by the number of downstream partners, and then multiply that number by 1.2.	Whichever is larger: 660 MB Take the size of the 128 largest files in the replica tree, multiplied by the number of downstream partners, and then multiply that number by 1.2.	Whichever is larger: 660 MB Take the size of the 128 largest files in the replica tree, multiplied by the number of downstream partners, and then multiply that number by 1.2.
Increasing the staging directory space to account for backlog caused by replication schedules	No additional requirement	Add space equal to the maximum quantity of expected file changes in a seven-day period multiplied by 1.2.	Add space equal to the expected size of the replica tree, multiplied by 1.2.
Additional recommendations	No additional recommendations	Use dedicated disks for the staging directory.	Use dedicated, high-performance disks for the staging directory.
Suitability for large replica sets	Not recommended	Recommended	Recommended for organizations that require highest performance.

A slow outbound partner. Construct replica connections that have comparable bandwidth for all outbound partners. It is also a good idea to balance that bandwidth for inbound and outbound connections. If you cannot balance the bandwidth, increase the size of the staging directory.

Replication is disabled. If a file is changed or added to the replica set while replication is disabled, the staging directory must be large enough to hold data awaiting replication during the off-time or else older staged files are removed. Replication still occurs, but less optimally.

Table 2.7 describes a range of sizing guidelines for staging directories. Use the numbers in Table 1.7 as a baseline for testing and then adjust the numbers as necessary for your environment. Also, note that the numbers in Table 1.7 apply to each replica set on a server. If a server contains multiple replica sets, you must follow these guidelines for each staging directory.

Note

The recommendations in Table 2.7 offer sufficient staging directory space for a worst-case scenario. In fact, the staging directory needs to be only as large as the largest 128 files, multiplied by the number of downstream partners that are concurrently performing a full synchronization against the replica member.

For more information about configuring the staging directory, see article Q329491, "Configuring Correct Staging Area Space for Replica Sets." To find this article, see the Microsoft Knowledge Base link on the Web Resources page at Deploying FRS" later in this chapter.

Staging Directory Compression Considerations

FRS replica members running Windows 2000 Server with Service Pack 3 (SP3) or Windows Server 2003 compress the files replicated among them. Compression reduces the size of files in the staging directory on the upstream partners, over the network between compression-enabled partners, and in the staging directory of downstream partners prior to files being moved into their final location.

To maintain backward compatibility, servers running Windows 2000 Server with SP3 or Windows Server 2003 generate uncompressed staging files when they send changed files to downstream partners running Windows 2000 Server with Service Pack 2 (SP2) or earlier (or to servers running Windows 2000 Server with SP3 or Windows Server 2003 that have compression explicitly disabled in the registry).

To accommodate environments that contain a combination of compression-enabled and compression-disabled partners, the server originating or forwarding changes must generate two sets of staging files for each modified file or folder: one compressed version for servers running Windows 2000 Server with SP3 or Windows Server 2003, and one uncompressed version for partners that have compression explicitly disabled or that are running Windows 2000 Server with SP2 or earlier. To mitigate the generation of two sets of staging files, you can take one of the following steps:

Confirm that you have sufficient disk space and an appropriately sized staging directory on all replica members.

Explicitly disable compression in the registry until all members of the replica set are running Windows 2000 Server with SP3 or Windows Server 2003. For information about disabling replication, see article Q288160, "Error Message: Error Compressing Data," in the Microsoft Knowledge Base. To find this article, see the Microsoft Knowledge Base link on the Web Resources page at http://www.microsoft.com/windows/reskits/webresources.

Rapidly deploy Windows 2000 Server with SP3 or Windows Server 2003 on all members of the replica set, especially if you are replicating a large amount of content.

Choosing the Staging Directory Location

You can choose the location of the staging directory when you use the Distributed File System snap-in to configure a replica set. By default, all replica sets on the server use the same staging directory, and the staging directory space limit applies to all staging directories on the local server. When the staging directory is shared among multiple replica sets, any one replica set can cause the staging directory to reach 90% capacity, at which point FRS begins to delete the oldest staging files for any of the replica sets.

To simplify staging directory management and any future recovery procedures, it is recommended that you configure replica sets in one of the following ways:

Give each replica set its own staging directory, and place the staging directories on the same volume. This solution provides good recoverability while minimizing cost.

Give each replica set its own staging directory, and put each staging directory on its own volume. This solution provides better recoverability, but at a higher cost.

Do not store the staging directory on a volume where NTFS disk quotas are enabled.

Improving FRS Performance

To distribute disk traffic, store the staging directory, FRS logs, the FRS jet database (Ntfrs.jdb), and replica trees on separate disks. Using separate disks for the log files is especially important when you configure FRS logging by using a high severity level. For the best replication performance, distribute disk I/O by placing the FRS logs and the staging directory on separate disks from that of the operating system and the disks containing the replica tree. In a hub and spoke topology, using separate disks is especially recommended for the hub servers.

Important

If you store the FRS jet database on a disk that has its write cache enabled, FRS might not recover if power to the drive is interrupted and critical updates are lost. Windows Server 2003 warns you about this by adding an entry with Event 13512 in the File Replication Service event log. To leave write caching enabled for improved performance and to ensure the integrity of the FRS jet database, use an uninterrupted power supply (UPS) or a disk controller and cache with a battery backup. Otherwise, disable write caching until you can install a backup power supply.

When moving the FRS logs and FRS jet database, be sure that NTFS disk quotas are disabled on the target volume. For more information about moving the FRS logs and FRS jet database, see article Q221093, "HOW TO: Relocate the NTFRS Jet Database and Log Files," in the Microsoft Knowledge Base. To find this article, see the Microsoft Knowledge Base link on the Web Resources page at http://www.microsoft.com/windows/reskits/webresources.

Determine Connection Priorities

Connection priorities control the sequencing of the initial synchronization that occurs when you add a new member to a replica set or when you perform a nonauthoritative restore. By changing connection priorities, you can control which inbound partners go first based on resource considerations. (A server's inbound partners, also known as upstream partners, are those servers from which the new or recovered member can receive replicated content.) For example, you can specify that a new member first synchronize with a partner:

Across a high-bandwidth network connection.

That has low server activity.

That has the most up-to-date files.

You configure connection priorities by using the three priority levels in the Distributed File System snap-in:

High. All connections marked High must successfully synchronize before FRS attempts to synchronize medium-priority connections.

Medium. At least one medium-priority connection must successfully complete an initial synchronization before FRS attempts to synchronize any low-priority connections. FRS attempts to synchronize all connections in the medium-priority level, but only one must be successful before FRS attempts to synchronize low-priority connections.

Low. The default connection priority is low. For these connections, FRS attempts to synchronize one time, but any failure does not delay other synchronization attempts. If no medium- or high-priority connections exist, at least one low-priority connection must succeed before FRS considers the initial synchronization operation complete. If medium- or high-priority connections exist, none of the connections in the low-priority class needs to succeed for FRS to consider the initial synchronization operation complete.

Important

Connection priorities are used for initial synchronizations only on custom topologies. If you use a ring, hub and spoke, or full mesh topology, connection priorities are used only on reinitialization during a nonauthoritative restore operation.

When evaluating your topology to determine connection priority, identify the following details for each replica member:

All inbound connections

The speed of those connections

Based on this data, determine which connection priority to assign to each inbound connection. The following guidelines will assist you in choosing the appropriate connection priority:

Use medium priority or high priority only for high-bandwidth connections.

If you plan to create a high-priority connection, note that if this connection fails, no other synchronization takes place for lower-priority partners until all high-priority connections have succeeded.

Use low priority for low-bandwidth or unreliable connections.

Note

Keep in mind that after the initial synchronization, connection priorities are ignored until you need to perform a nonauthoritative restore to recover a failed replica member.

For an Excel spreadsheet to assist you in documenting connection priorities, see "FRS Configuration Worksheet" (Sdcfsv_2.xls) on the Windows Server 2003 Deployment Kit companion CD (or see "FRS Configuration Worksheet" on the Web at http://www.microsoft.com/reskit).

Figure 2.14 illustrates an organization's planned topology. The letters A through C represent inbound connections. The hub servers are connected by a high-bandwidth network connection, and two of the hub servers connect with branch servers by using low-bandwidth network connections.

Figure 2.14: Sample FRS Topology

In this topology, the administrator assigns connection priorities as follows:

"A" connections use medium priority because the hubs are connected by high-bandwidth connections and the hub servers are expected to be up to date.

"B" and "C" connections use low priority because the branch servers connect to the hubs across a low-bandwidth network.

When the hub servers in this example are deployed, Hub1 will be the initial master. Because the "A" connections are marked as medium priority, the two other hub servers will not attempt replication with the branch servers until initial replication completes with at least one other hub server.

The administrator specifies connections "B" and "C" as low priority because the branch servers have low-bandwidth network connections. In the event of a nonauthoritative restore for any of the hubs, using low priority causes the repaired hub server to replicate first from other hubs before replicating with the branch servers.

Note

To increase the availability of data, the administrator can place multiple servers in each branch. If the servers in the branch use high-bandwidth connections to each other, the administrator can set medium priority for those inbound connections. That way, if one of the branch servers fails and is later restored, it will first attempt to replicate with a local branch server before replicating from the hub server across a slow network connection.

Example: An Organization Designs a Replication Strategy

An organization uses Group Policy to distribute 10 GB of software to users in four sites. The software changes nightly, and it is important that clients have access to these updates the following day. To make the software files and updates highly available, the organization creates a domain-based DFS namespace for each branch and then uses the Distributed File System snap-in to configure replication. To make the updated software available even if a hub server is offline, the organization uses redundant hub servers that are connected by high-speed LAN links. The organization also uses staggered replication schedules to distribute the load between the hub servers. Figure 2.15 describes the FRS topology.

Figure 2.15: Redundant Hub and Spoke Topology for Software Distribution to Branch Offices

The organization uses a separate namespace for each branch so that if the local branch server is not available, the clients transparently access one of the hubs instead of another branch server. The organization enables the least-expensive target selection feature of DFS so that the clients always choose the local branch server when it is available.

In this scenario, the organization configures replication as follows:

Each branch has its own replica set, and each replica set contains three members: the branch and the two hubs.

Replication between the hubs is enabled at all times, so changes at one hub are immediately replicated to the other hub.

Inbound connections between the hub servers are medium priority; all other inbound connections are low priority.

Odd-numbered branches replicate with Hub1 from 6:00 P.M. until 7:30 P.M. and with Hub2 from 8:00 P.M. until 9:30 P.M.

Even-numbered branches replicate with Hub2 from 6:00 P.M. until 7:30 P.M. and with Hub1 from 8:00 P.M. until 9:30 P.M.

As shown in Adding a New Member to an Existing Replica Set" later in this chapter.

Increasing Data Availability by Using Clustering

The information presented in this section assumes an understanding of basic cluster terminology. If you are not familiar with clustering, you can refer to a number of sources for further information:

For general information on server clusters, including terminology, procedures, checklists, and best practices, see Help and Support Center for Windows Server 2003.

For information on designing server clusters, see "Designing and Deploying Server Clusters" in this book.

You can use clustered file servers to make sure that users can access business-critical data, such as home directories and software distribution shares. You can also use clustered servers to consolidate multiple nonclustered file servers onto a clustered file server. Each server is then recreated on the cluster as a virtual server.

Important

Because multiple network names might belong to the same server, File Share resources might appear under these names and might also be accessible through them. To avoid client connectivity interruptions, make sure that clients use the network name that is associated with the group in which the File Share resource belongs. For more information about using the proper network name, see article Q170762, "Cluster Shares Appear in Browse List Under Other Names" in the Microsoft Knowledge Base. To find this article, see the Microsoft Knowledge Base link on the Web Resources page at http://www.microsoft.com/windows/reskits/webresources.

Choosing Virtual Server Names

If you are migrating nonclustered file servers to clustered file servers, and you do not currently use (or plan to use) DFS, you should use the IP address and network name of each migrated server to create an identical virtual server on the clustered file server. Doing so allows users to continue to access data by using the server names that they are familiar with. You also ensure that links and shortcuts continue to work as they did before the migration.

Choosing the File Share Resource Type

A clustered file server uses the File Share resource type to make data available to users. You have three options when configuring this resource type:

Normal share

Normal shares function similarly to shared folders created on nonclustered file servers, except that you use the Cluster Administrator snap-in to create them. When you create a normal share, you publish a single folder to the network under a single name. Normal shares do not have required dependencies; however, if the normal share is located on cluster storage, the normal shares should, at a minimum, have dependencies on the cluster storage device (the Physical Disk or other storage-class resource), with the preferential addition of dependencies on the network name and IP address.

You can create a maximum of 1,674 resources, including File Share resources, on a cluster. To provide good failover performance, do not approach this limit in production environments. In addition, if you need to create a large number of normal shares, it is better to use the Share Subdirectories option.

Share subdirectories (or hide subdirectories)

A share subdirectories share publishes several network names: one each for a folder and all of its immediate subfolders. For example, if you create a share subdirectories share called Users, any folder you add to the Users folder is automatically shared.

Important

When sharing subdirectories, do not use the same name for subfolders under different folders (for example, \folderl\name1 and \folder2\name1). The first shared subdirectory to come online will remain online, but the second shared subdirectory will not be initialized and will not come online.

Share subdirectories offer a number of advantages:

They do not require you to create a File Share resource for every folder. As a result, you reduce the potential time and CPU load needed to detect failures.

They are an efficient way to create large numbers of related file shares on a single file server. For example, you can use this method to create a home directory for each user who has files on the server, and you can hide the subdirectory shares to prevent users from seeing the subdirectory shares when they browse the network.

They allow administrators who are not experienced with server clusters or using the Cluster Administrator snap-in to easily create folders that are automatically shared.

When using the share subdirectories feature, determine the number of file shares you plan to host on the cluster, because the number of file shares affects server cluster capacity planning and failover policies. Specifically, you need to determine the following:

The number of nodes you plan to have in the server cluster

The number of node failures you want the server cluster to withstand while still providing acceptable performance

Whether the remaining nodes can handle the load of the failed nodes

The number of nodes is important because a single node can support a limited number of file shares, which varies according to the amount of RAM in the server and which is described in "Reviewing File Server Limits" later in this chapter. If you want the cluster to be able to survive the loss of all but one node, make sure that the cluster hosts no more than the maximum number of file shares that can be hosted by a single node. This is especially important for two-node clusters, where the failure of one node leaves the single remaining node to pick up all the file shares.

In a four-node or eight-node cluster, you have other options that might be more appropriate, depending on the failure scenarios that you want to protect against. For example, if you want a four-node cluster to survive one node failing at any point, you can configure the file shares so that if one node fails, its file shares are spread across the remaining three nodes. In this scenario, each node can be loaded to 66 percent of the maximum number of file shares and still be within the maximum limit of a single node in the event of a single failure. In this case, the cluster can host three times the number of file shares that a single server can host. If you want to survive two nodes failing, a four-node cluster can hold twice as many files shares (because if two nodes fail, the remaining two nodes pick up the load from the two failed servers) and so on.

For more information about server cluster capacity planning and failover policies, see "Designing and Deploying Server Clusters" in this book.

Note

For more information about using the Share Subdirectories option to create home directories, see article Q256926, "Implementing Home Folders on a Server Cluster," in the Microsoft Knowledge Base. To find this article, see the Microsoft Knowledge Base link on the Web Resources page at http://www.microsoft.com/windows/reskits/webresources.

DFS root

You can use the File Share resource type to create a resource that manages a stand-alone DFS root on a cluster. The DFS root File Share resource has required dependencies on a network name, which can be either the cluster name or any other network name for a virtual server. However, it is recommended that you do not use the cluster name (in the Cluster group) for any resources.

The Cluster service manages a resource group as a single unit of failover. Therefore, if you want four DFS roots to fail over independently, create them in different groups. Each group has a network name or the virtual server name with which the root is associated.

When creating a DFS root in clustered file servers, review the following issues:

The name that you specify for one DFS root cannot overlap with the names for the other DFS roots in the same cluster. For example, if you name one DFS share C:\Dfsroots\Root1, you cannot create other roots using names that are derived from the first DFS root, such as C:\Dfsroots or C:\Dfsroots\Rootl\Root2, and so forth.

On server clusters, do not create clustered DFS roots that have the same name as nonclustered DFS roots or shared folders.

Clustered file servers running Windows Server 2003, Enterprise Edition and Windows Server 2003, Datacenter Edition support multiple stand-alone DFS roots. These DFS roots can exist in multiple resource groups, and each group can be hosted on a different node.

Clustered file servers running Microsoft Windows 2000 Advanced Server or Windows 2000 Datacenter Server support one DFS root per server. Mixed-version clusters running Windows Server 2003, Enterprise Edition or Windows Server 2003, Datacenter Edition on some nodes and Windows 2000 on others only support one DFS root per cluster as well. For more information about mixed-version clusters and rolling upgrades, see "Deploying Clustered File Servers" later in this chapter.

Caution

Do not make DFS configuration changes (for example, creating new roots, adding new links, new link targets, and so on) while operating a mixed-version cluster, because all DFS changes are lost upon failover.

For more information about creating DFS roots on a clustered file server, see article Q301588, "HOW TO: Use DFS on a Server Cluster to Maintain a Single Namespace," in the Microsoft Knowledge Base. To find this article, see the Microsoft Knowledge Base link on the Web Resources page at http://www.microsoft.com/windows/reskits/webresources.

Using Shadow Copies on Server Clusters

Windows Server 2003 supports creating shadow copies on cluster storage. Shadow copies are point-in-time copies of files that are stored on file servers running Windows Server 2003. By enabling shadow copies, you can reduce the administrative burden of restoring previously backed-up files for users who accidentally delete or overwrite important files.

If you plan to enable shadow copies on a server cluster, review the following issues:

Cluster-managed shadow copies can be created only on cluster storage with a Physical Disk resource. In a cluster without cluster storage, shadow copies can be created and managed only locally.

The recurring scheduled task that generates volume shadow copies must run on the same node that currently owns the storage volume (where the shadow copies are stored).

The cluster resource that manages the scheduled task must be able to fail over with the Physical Disk resource that manages the storage volume.

If you place the source volume (where the user files are stored) and the storage volume (where the shadow copies are stored) on separate disks, those disks must be in the same resource group.

When you configure shadow copies using the procedure described in "Enable Shadow Copies of Shared Folders in a cluster" in Help and Support Center for Windows Server 2003, the resources and dependencies are set up automatically. Do not attempt to modify or delete these resources and dependencies directly.

To ensure availability, plan to enable shadow copies before you deploy the server cluster. Do not enable shadow copies in a deployed cluster. When you enable shadow copies, the shadow copy volumes (as well as all resources that directly or indirectly depend on the disk) go offline for a brief period while the resource dependencies are created. The shadow copy volumes are not accessible to applications and users while they are offline.

Note

If you enable shadow copies before clustering your servers, you must disable, and then re-enable, shadow copies. If you cluster a disk containing a previously created shadow copy, the shadow copy might stop functioning after that disk's Physical Disk resource fails over.

For additional guidelines and information about shadow copies, see "Designing a Shadow Copy Strategy" later in this chapter. For in-depth information about using shadow copies on server clusters, see "Using Shadow Copies of Shared Folders in a server cluster" in Help and Support Center for Windows Server 2003.

Example: An Organization Designs a Clustered File Server

A large organization uses centralized file servers to store home directories for 3,000 users. The organization has an existing SLA that specifies its file servers must be 99.99 percent available, which means that the file servers can be offline no more than 53 minutes a year. To meet this level of availability, the organization implements clustered file servers in its data center, using a storage area network (SAN) for storage.

After evaluating storage and performance requirements, the organization determines that it needs one file server per 1,000 users, for a total of three file servers. The organization could implement a three-node cluster, but it decides to implement a four-node cluster instead. It will use three nodes to provide access to user directories and the fourth node to perform the following functions:

Host a stand-alone DFS namespace to provide a unified view of the user directories on the remaining three nodes.

Perform backups of the other three nodes in the evenings.

Take over the resources of another node if the node fails or is taken offline for maintenance.

Figure 2.16 illustrates how the organization designs its four-node cluster.

Figure 2.16: Four-Node Cluster

To create the user directories, the administrator creates a File Share resource on three nodes and specifies the Share Subdirectories option. The administrator places each File Share resource in a virtual server (a resource group containing IP address and Network Name resources) and sets up dependencies between the File Share resource and the Network Name resource. The administrator then uses a script to quickly create the user directories, which are automatically shared after creation, and to set appropriate permissions for each user directory. The administrator creates another File Share resource on the fourth node and specifies the DFS root option and then uses the Distributed File System snap-in to create a link to each user directory. Users can access their shares by using the DFS path \\Home\Folders\Username. Logon scripts also map user shares to each user's U: drive.

Note

If the organization needs to provide access to user data when the complete site fails, due either to a total loss of power as a result of a natural disaster or another event, the organization can implement a geographically dispersed cluster.

For more information about designing server clusters, including capacity planning, failover policies, and geographically dispersed clusters, see "Designing and Deploying Server Clusters" in this book.