Sizing Machine Capacity
We’ve discussed the major concerns many companies have when looking at implementing a new filtering solution. As you may have guessed, supporting 100,000 users may not be something that a single machine can handle by itself. Designing the filter to handle a distributed cluster of machines will make it much more scalable in these environments. One of the factors to consider in deploying a filter on a large network is the approximate number of users a single machine can handle or, alternatively, the number of messages per second it can process. Depending on how the system is designed, a series of processing agents may be distributed on multiple servers, with actual user data residing on another layer of machines behind them, or there might be only one layer of machines, with a certain number of users stored on each machine. We’ll discuss all of these scenarios later in this chapter.
Sizing the machine is important in determining how scalable the filtering software will be. In cases in which there is no data on which to base a hypothesis, this can be determined only through predictive analysis or trial and error. Other times, mathematical formulas can be applied to determine execution time, peak processing, and other variables. Considering some of the general questions a systems administrator will ask is a good way to begin sizing a machine for use on large networks. After we take a look at the general questions, we’ll discuss the scientific factors, which are also necessary and designed to provide answers for a specific sizing model.
General Resource Planning
General resource planning involves answering several practical questions to discern the realistic issues involved in a deployment model. General questions focus more on political and financial constraints than on scientific hardware sizing. Some of these questions include the following:
What type of hardware is available?
What operating system should the software run on?
How many individual machines are available?
What is the company’s methodology?
How many administrators are available?
What Type of Hardware Is Available?
Sometimes companies are bound by specific financial agreements with vendors and have a set of hardware they deploy on the network. As a result, only specific hardware with one of many configurations is available. Depending on the number of users at each location, the cost of the hardware, and the operating system, the beefiest box may not necessarily be the best box.
What Operating System Should the Software Run On?
The operating system may play a role in the number of users the hardware is capable of supporting. Some operating systems are built to handle heavier loads than others, and some are designed to run as fast as possible but can handle only a certain load. The networking limitations of the operating system (such as the TCP stack) and other variables may also play a role in which operating system is best suited for the particular implementation.
How Many Individual Machines Are Available?
If there’s an initial limit on the number of machines available, each machine will have no choice but to support a specific number of users. Five machines for 100,000 users create a minimum requirement of 20,000 users per machine. If a limited amount of hardware is available, what optimizations will be necessary, and what features will need to be dropped in order to support the requirement?
What Is the Company’s Methodology?
Sometimes the company’s methodology may play a role in the type and size of hardware that is chosen. If the company has a very strict methodology with regard to testing, staging, and production, X number of machines will be needed to develop a test lab. If the methodology calls for a specific amount of redundancy, it may be necessary to deploy many small machines rather than a few large ones, and a certain level of fail-over resources will need to be planned.
How Many Administrators Are Available?
If only a single administrator is available, having fewer machines will help avoid a maintenance headache. When new updates are available and additional systems have to be deployed, one or two administrators will be required to carry out these requests. Since people are more expensive than machinery, in cases where staff is limited, beefier boxes will sometimes be necessary.
Assessing Resource Utilization
A resource-planning matrix takes the factors of present-day utilization of the mail system into consideration. It is important to identify the following:
The peak hour of processing
The overall volume
The concurrent workload
The target CPU utilization
The estimated fail-over
Identifying the Peak Hour of Processing
The peak hour of processing is the time period during which CPU utilization on the machines is at its highest. This will be when you receive the most mail into the system. The peak hour of processing will identify the amount of load for consideration in a deployment. In identifying this peak, consider the volume of email throughout the year. If you’re uncertain as to what your peak capacity is, it may be necessary to implement some graphing utilities, such as MRTG (the Mullti-Router Traffic Grapher).
Identifying the Volume
Once you’ve determined the peak processing capacity, it’s necessary to identify the different components of the mail system that use a significant load. This can not only help to determine if additional hardware will be required, but may play a role in determining how much of the filter’s work will be necessary. For example, many users may opt not to be enrolled in a spam-filtering solution. Other companies may choose not to pass internal company email through the filter to conserve load. If a virus scanner is present on the network, what percentage of inbound messages will be caught before they make it to the filter? Determining the overall volume can help to identify the overall peak demand of the filter.
Identifying the Concurrent Workload
The concurrent workload takes into account the total number of operations running in parallel on a system. If the mail server is processing 3,600 messages per hour, this load could be calculated as 1 concurrent operation per second, or it could be as many as 10 concurrent operations over a period of ten seconds. Identifying the concurrent workload will help to identify areas of need in the storage implementation (such as locking) and will also help estimate I/O contention to better size the required storage hardware.
Identifying Target CPU Utilization
Based on the peak hour of processing, determine the desired level of CPU utilization during this time period. This should take into account unexpected spikes in load, total capacity, and whether or not it is appropriate for performance to degrade during a fail-over.
Estimating Fail-Over
Based on the peak hour of processing, determine the number of machines that can be allowed to fail while still meeting the requirements for desired CPU utilization at the time of the peak. A minimum of two machines in a distributed environment should be allowed to fail simultaneously. What resources will be required in order to manage this failure? Will fail-over take place instantaneously, or will the systems administrator need to perform other tasks in order to establish fail-over?