The Business Case for Storage Networks [Electronic resources] نسخه متنی

The Move to Storage Networks

The first SAN at this firm was built in April of 1999, long before the creation of a dedicated storage support team, and long before many firms had even considered using Fibre Channel SAN technology in production environments. Rather than just move a small test environment to a SAN infrastructure, however, the storage services team decided instead to tackle one of the largest data warehouse environments, which at the time approached 30 TB in size.

The resulting data warehouse SAN was comprised of 10 small, 16-port fixed switches, 12 of the largest hosts, and approximately 10 external storage arrays.

Behind this decision to build a Fibre Channel SAN was a critical business driver: the need for flexibility.

Ease of use for provisioning large amounts of data is a fundamental component of operational efficiency. To increase the number of TB managed by an individual storage manager, the need to quickly provision storage for rapidly growing environments became a prerequisite. The data warehouse environment in which the first SAN was implemented was growing quickly with an estimated 2 TB added every six months. Because the environment was direct-attached Small Computer Systems Interface (SCSI), adding new disks required long outages, which could not be done during the day due to system availability requirements. The migration of the data warehouse to a Fibre Channel SAN gave the team the advantage of adding storage on an ad hoc basis to meet rampant and often unpredictable growth demands.

An additional driver behind the migration to SAN storage was the need for solid, first-hand exposure to what the team believed would ultimately become the new standard for storage deployments. Although many of the system administrators had already been exposed to Fibre Channel protocols while deploying smaller Fibre Channel Arbitrated Loop (FC-AL) JBOD devices, management believed that it was crucial for the remainder of the staff to become experts quickly in what would inevitably become a heavily-leveraged technology.

In particular, senior staff wanted to familiarize themselves with switched Fibre Channel ahead of the mainstream adoption curve so that they could better understand how performance and interoperability issues might affect their environments.

Interoperability of Storage Solutions

Although this firm certainly falls into the early adopter categorization for Fibre Channel switches, it is interesting to note that the storage services team faced few of the same tribulations as other early adopters (in particular, interoperability issues with products from various HBA and disk vendors). Because the operating systems in use at the time had strong support for Fibre Channel, the system administrators were able to make the jump from FC-AL to switched Fibre Channel with little difficulty. The team's earlier experiences with FC-AL technology coupled with the operating systems' capability to easily integrate with switched Fibre Channel allowed the group to sidestep many of the problems familiar to users of operating systems with weaker support for Fibre Channel.

The cutovers in the FC-AL environments, which had already utilized World Wide Names (WWNs) (the method of identifying objects on a SAN fabric), were smoother than the typical parallel SCSI migration to Fibre Channel. In such circumstances, HBA compatibility and interoperability issues were minimized by the early adoption of Fibre Channel.

Networked Storage Implementation

The current SAN implementation, which has grown considerably since 1999 from 10 16-port fixed switches to over 100 switches total (comprised of a mix of 16-, 32-, and 128-port switches), now hosts almost all of the organization's mission-critical data. These switches make up multiple fabrics in redundant configurations for a total of over 2000 SAN ports. The FC fabrics themselves are location-based, similar to typical IP networks. Each datacenter is divided into multiple raised floor rooms, and each room hosts its own redundant, dual-fabric SAN. Over time, the storage services team expects to see the fabrics in each datacenter connected via SAN extensions; however, currently the environments remain separate and distinct in an effort to limit the impact of propagated errors. For now, the architecture remains a collection of tiered fabrics with no distinct core and no distinct edge.

Tiered Storage Implementation

To lower costs, the storage services team implemented a three-tiered storage architecture. At the high end are the typical large, high-performance arrays with single or multiple FC fabrics as needed for redundancy. In the middle are the same high-performance arrays or mid-sized arrays configured without redundant switch fabrics. At the low end, Serial ATA (SATA) drive arrays are provided for large data sets that are written once and rarely read. Environments that fall into this category are low criticality applications with minimal performance requirements or database dumps that are used to quickly recreate databases in the event of data loss or corruption.

The use of approximately 200 TB of SATA drives since the second quarter of 2003 has made a significant financial impact, saving a large amount of money in terms of immediate cost avoidance. Large, high-performance arrays previously utilized for these types of applications were released to be used for applications with higher IO performance requirements.

SATA disk drives have an acquisition cost of approximately one-third or less than typical external array drives, and if used appropriately, management costs are minimized. The decreased mean time between failure (MTBF) of SATA drives typically scares off some decision makers; however, using these devices in a RAID or mirrored configuration and for less IO-intensive applications should offset the costs associated with a higher failure rate. In addition, data with lower availability requirements can be offline on occasion making the choice for SATA drive-based arrays ideal, as long as corresponding service level agreements (SLAs) are accordingly adjusted.

Even though the storage services team has built a tiered infrastructure, the team is not currently implementing solutions as part of a broader Information Lifecycle Management (ILM) strategy. For the foreseeable future, the team plans to stick with the current architecture, whereby data with the highest availability requirements are hosted on redundant, dual-fabric SANs with multiple, replicated copies of the data stored on large, high-performance storage arrays. Applications or data sets with more modest availability and uptime requirements are stored on SATA drives. Data whose requirements fit in between are hosted on dual fabrics and large or mid-sized arrays with no replication.

The storage services team believes that today the costs associated with implementing a more rigid ILM framework and managing heterogeneous storage across the enterprise outweigh the benefits. Accordingly, there are currently no plans to utilize network-attached storage (NAS) storage to build out additional tiers.

Note

There are plans, however, to utilize NAS storage for office-automation applications and user home directories, but those environments are not managed by the storage services team.

The storage services team concedes that currently client SLAs are poorly documented and that the few SLAs that are documented do not explicitly state a client's expected availability in terms of percentages. Although these SLAs are not rigid devices to be used to adjust charge-back schedules, they are well established agreements between parties that document which applications reside on what type of storage and which applications receive priority status.

As far as charge-back is concerned, the storage services team knows approximately how much storage their clients consume; however, the mechanism required to do a complete departmental charge-back has not been fully developed. The storage services team is currently updating the request and reservation systems and plans to move to a full charge-back system by the end of 2004.

To execute this initiative, the team needs access to accurate and up-to-date TCO and utilization data, which is difficult to come by. In the absence of these numbers, however, the storage services team can adjust storage cost estimates based on the storage lease schedule, which sees frames regularly rotating out of the datacenter as leases expire.

Storage TCO

The storage services team has an estimate of the overall TCO per MB: It is a floating number that depends primarily on the number and frequency of backups and, obviously, administrator efficiencies. With six individuals supporting approximately 180 TB each, and with data storage still growing, the team continues to push the limit of TB per administrator supported.

The storage services team uses a home-grown spreadsheet solution to model storage usage and associated costs, but with costs continually fluctuating and the number of storage frames onsite changing weekly, determining an exact TCO number is often a futile exercise. To get a more accurate picture of TCO, new incoming arrays and old outgoing arrays are added and removed from the spreadsheet as soon as possible. Therefore, the spreadsheet solution is a valuable tool for capacity planning and analysis. Each individual environment must be analyzed separately to determine the actual TCO per MB, but for the purposes of discussing an enterprise-wide cost model, the generalizations are worthy of a closer look.

The most enlightening part of the analysis is the fact that backups are by far the most expensive cost component of the storage TCO. Like most TCO calculators, the team's model accounts for the costs of datacenter floor tiles, utilization efficiencies, Full-time Equivalent (FTE) labor, hardware and software acquisition costs, and the cost of backups. The team also factors in the costs of tape retention for multiple years. If you factor in numerous backups of multi-terabyte environments and include retention periods of up to seven years for some data sets, backup costs become significant.

Note

It is worth noting that a TCO that includes the costs of seven-year tape retention is significantly higher than the typical one-year snapshot of TCO. Both models are effective mechanisms for calculating the TCO for storage; however, a seven-year TCO model is better for estimating the value of the data that is stored on tape. The seven-year model (and the associated costs of backups) also leads one to believe that there is potential return on investment (ROI) to be achieved by adjusting tape retention policies and by archiving less critical data.

Ultimately, there is no one right way to model storage TCO. Either method, as long as it provides value to the firm, is appropriate.

In summary, backups represent over 80 percent of the cost structure for storage in the IT hosting environment. Hardware and software acquisition costs are only 10 percent of the TCO, and FTE time is only 3 percent. Approximately 1 percent of the total cost is attributed to the datacenter and facilities expenses (power, cooling, and floor space). The TCO cost components are illustrated in Figure 7-1.

Figure 7-1. TCO Components

If the TCO is recalculated to include backup retention costs for a single year only, the portion of storage TCO that is attributed to backups drops below 60 percent. The retention-adjusted TCO components are shown in Figure 7-2.

Figure 7-2. Retention-Adjusted TCO Components

Supplemental headcount is provided as a service by the team's primary storage vendor. This additional labor is accounted for in the storage hardware acquisition costs. The additional resources increase staff efficiencies by at least five percent and subsequently increase the number of terabytes effectively managed by each storage manager. Operational efficiency is also increased by consolidation, which decreases the number of storage devices that require management.

A key element of the TCO model used by the storage services team is the outline of the storage life cycle, or the workflow that governs the procurement, configuration, deployment, and decommissioning of storage devices. The storage life cycle includes weighting factors for the number of groups and individuals whose time and efforts are consumed at each stage of an asset's useful life. Between the time a storage array is leased and the time it is decommissioned, individuals from 11 different teams are engaged to perform one or more tasks associated with the 14 stages of the storage life cycle. The weights are then used to establish a value for FTE resources consumed.

Note

It is critical to differentiate between a storage life cycle and ILM. The storage life cycle, as outlined in this case study, refers specifically to the various stages of the storage asset's useful life, and the amount of labor associated with each stage.

Instead of doing one large consolidation project each year, the storage services team performs rolling consolidations, which keep the overall costs down (and which keep the storage team fully utilized throughout the year). By using a leased storage model, the storage services team realizes the benefits of consolidation without incurring additional depreciation during the migrations when duplicate environments are temporarily created.

Note

The key benefit of a leased storage model is that a leased asset incurs no depreciation expense. Depending on the timing of payments, however, additional leased units for consolidation can increase the operational run rate in the short term.

For environments that have more than 500 TB, the depreciation expense for duplicate environments can be significant. For example, if 30 3-TB frames are consolidated to four 20-TB frames, and the migration for half of the data is estimated to take at least three months, then the additional depreciation expense during the migration (when the new assets would remain unused) would approach 10 percent of the total capital cost. In this scenario, a leased storage model avoids the expense associated with increased depreciation.

The workload for team members involved in a massive consolidation effort can become unmanageable in the short term. Ultimately, the decision about how often and how much to consolidate is up to the team and the team's management.

Replication

All of the business-critical 900 TB is spread across four datacenters.

Dark fiber between five metro-area endpoints, including the four datacenters and a metropolitan-area network point-of-presence (MAN POP), provides connectivity for Fibre Channel (FC) over Dense Wave Division Multiplexing (DWDM) and for data replication. A portion of the replication is continuous, but the majority of the copies are done with simple sync-and-splits over the wire either nightly or during business hours, depending on performance requirements and the availability of application downtime windows. For example, the ETL and data warehouse applications are read-only for most of the day and that data is synchronized at night when the daily data load processes are completed. Financial and CRM applications are also typically copied after business hours when utilization periods for these systems are low.

According to the storage services team, it is prepared for typical disaster recovery scenarios (weather events being the most common), with the most critical data being copied frequently enough between the multiple regional datacenters to allow for quick recovery and optimal business uptime.

Organizational Impact

From the inception of the first SAN until mid-2002, system administrators performed storage support duties on a strictly ad hoc, server-centric basis. Prior to the creation of the storage services team, systems administrators handled storage provisioning requests as a part of their daily activities.

In November, 2002, the dedicated storage support team, currently comprised of six full-time storage administrators, was formed to focus specifically on the ongoing consolidation efforts and other storage-related initiatives. Working almost full-time on consolidation efforts to reduce points of management, increase utilization efficiencies, and conserve datacenter resources, the storage services team directly addresses storage supply issues by managing the storage inventory onsite.

As the storage support processes mature, the storage services team is better prepared to outline in specific detail how much storage each application group consumes. The team admits, however, that it is not its policy to police storage

The Business Case for Storage Networks [Electronic resources] نسخه متنی

فارسی

کردی

العربیه

اردو

Türkçe

Русский

English

Français

کانال فیلم من

تبیان من

فایلهای من

کتابخانه من

پنل پیامکی

وبلاگ من

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی