High Performance Linux Clusters with OSCAR, Rocks, OpenMosix, and MPI [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

High Performance Linux Clusters with OSCAR, Rocks, OpenMosix, and MPI [Electronic resources] - نسخه متنی

Joseph D. Sloan

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید








3.2 Environment


You
are going to need some place to put your computers. If you are lucky
enough to have a dedicated machine room, then you probably have
everything you need. Otherwise, select or prepare a location that
provides physical security, adequate power, and adequate heating and
cooling. While these might not be issues with a small cluster, proper
planning and preparation is essential for large clusters. Keep in
mind, you are probably going to be so happy with your cluster that
you'll want to expand it. Since small clusters have
ways of becoming large clusters, plan for growth from the start.


3.2.1 Cluster Layout


Since the more computers you have, the
more space they will need, plan your layout with wiring, cooling, and
physical access in mind. Ignore any of these at your peril. While it
may be tempting to stack computers or pack them into large shelves,
this can create a lot of problems if not handled with care. First,
you may find it difficult to physically access individual computers
to make repairs. If the computers are packed too tightly,
you'll create heat dissipation problems. And while
this may appear to make wiring easier, in practice it can lead to a
rat's nest of cables, making it difficult to divide
your computers among different power circuits.

From the perspective of maintenance, you'll want to
have physical access to individual computers without having to move
other computers and with a minimum of physical labor. Ideally, you
should have easy access to both the front and back of your computers.
If your nodes are headless (no monitor, mouse, or keyboard), it is a
good idea to assemble a crash cart. So be sure to leave enough space
to both wheel and park your crash cart (and a chair) among your
machines.

To prevent overheating, leave a small gap
between computers and take care not to obstruct any ventilation
openings. (These are occasionally seen on the sides of older
computers!) An inch or two usually provides enough space between
computers, but watch for signs of overheating.

Cable management is also a concern.
For the well-heeled, there are a number of cable management systems
on the market. Ideally, you want to keep power cables and data cables
separated. The traditional rule of thumb was that there should be at
least a foot of separation between parallel data cables and power
cables runs, and that data cables and power cables should cross at
right angles. In practice, the 60Hz analog power signal
doesn't affect high-speed digital signals. Still,
separating cables can make your cluster more manageable.

Standard equipment racks are very nice if
you can afford them. Cabling is greatly simplified. But keep in mind
that equipment racks pack things very closely and heat can be a
problem. One rule of thumb is to stay under 100 W per square foot.
That is about 1000 W for a 6-foot, 19-inch rack.

Otherwise, you'll probably be using standard
shelving. My personal preference is metal shelves that are open on
all sides. When buying shelves, take into consideration both the size
and the weight of all the equipment you will have.
Don't forget any displays, keyboards, mice, KVM
switches, network switches, or uninterruptible power supplies that
you plan to use. And leave yourself some working room.


3.2.2 Power and Air Conditioning


You'll need to make sure you have adequate power for
your cluster, and to remove all the heat generated by that power,
you'll need adequate air conditioning. For small
clusters, power and air conditioning may not be immediate concerns
(for now!), but it doesn't hurt to estimate your
needs. If you are building a large cluster, take these needs into
account from the beginning. Your best bet is to seek professional
advice if it is readily available. Most large organizations have
heating, ventilation, and air conditioning (HVAC) personnel and
electricians on staff. While you can certainly estimate your needs
yourself, if you have any problems you will need to turn to these
folks for help, so you might want to include them from the beginning.
Also, a second set of eyes can help prevent a costly mistake.


3.2.2.1 Power

In an ideal
universe, you would simply know the power requirements of your
cluster. But if you haven't built it yet, this
knowledge can be a little hard to come by. The only alternative is to
estimate your needs. A rough estimate is fairly straightforward: just
inventory all your equipment and then add up all the wattages. Divide
the total wattage by the voltage to get the amperage for the circuit,
and then figure in an additional 50 percent or so as a safety factor.

For a more careful analysis, you should take into account the

power factor . A switching power supply can draw
more current than reported by their wattage ratings. For example, a
fully loaded 350 W power supply may draw 500 W for 70 percent of the
time and be off the other 30 percent of the time. And since a power
supply may be 70 percent efficient, delivering those 500 W may
require around 715 W. In practice, your equipment will rarely operate
at maximum-rated capacity. Some power supplies are

power-factor corrected (

PFC ). These power
supplies will have power factors closer to 95 percent than 70
percent.

As you can see, this can get complicated very quickly. Hopefully, you
won't be working with fully loaded systems. On the
other hand, if you expect your cluster to grow, plan for more. Having
said all this, for small clusters a 20-amp circuit should be
adequate, but there are no guarantees.

When doing your inventory, the trick is remembering to include
everything that enters the environment. It is not just the computers,
network equipment, monitors, etc., that make up a cluster. It
includes everythingequipment that is only used occasionally
such as vacuum cleaners, personal items such as the refrigerator
under your desk, and fixtures such as lights. (Ideally, you should
keep the items that potentially draw a lot of current, such as vacuum
cleaners, floor polishers, refrigerators, and laser printers, off the
circuits your cluster is on.) Also, be careful to ensure you
aren't sharing a circuit unknowinglya
potential problem in an older building, particularly if you have
remodeled and added partitions.

The quality of your power can be an issue. If in doubt, put a line
monitor on your circuit to see how it behaves. You might consider an
uninterruptible power supply (UPS),
particularly for your servers or head nodes. However, the cost can be
daunting when trying to provide UPSs for an entire cluster. Moreover,
UPSs should not be seen as an alternative to adequate wiring. If you
are interested in learning more about or sizing a UPS, see the UPS
FAQ at the site of the Linux Documentation Project (http://www.tldp.org/).

While you are buying UPSs, you may also want to consider buying other
power management equipment. There are several vendors that supply
managed power distribution systems.
These often allow management over the Internet, through a serial
connection, or via SNMP. With this equipment, you'll
be able to monitor your cluster and remotely power-down or reboot
equipment.

And one last question to the wise:


Do you know how to kill the power to your system?

This is more than idle curiosity. There may come a time when you
don't want power to your cluster. And you may be in
a big hurry when the time comes.

Knowing where the breakers are is a good start. Unfortunately, these
may not be close at hand. They may even be locked away in a utility
closet. One alternative is a
scram switch. A scram switch should be
installed between the UPS and your equipment. You should take care to
ensure the switch is accessible but will not inadvertently be thrown.

You should also ensure that your maintenance staff knows what a UPS
is. I once had a server/UPS setup in an office that flooded. When I
came in, the UPS had been unplugged from the wall, but the computer
was still plugged into the UPS. Both computer and UPS were
drencheda potentially deadly situation. Make sure your
maintenance staff knows what they are dealing with.


3.2.2.2 HVAC

As with most everything else,
when it comes to electronics, heat kills. There is no magical
temperature or temperature range that if you just keep your computers
and other equipment within that range, everything will be OK.
Unfortunately, it just isn't that simple.

Failure rate is usually a nonlinear function of temperature. As the
temperature rises, the probability of failure also increases. For
small changes in temperature, a rough rule of thumb is that you can
expect the failure rate to double with an 18F (10C) increase in
temperature. For larger changes, the rate of failure typically
increases more rapidly than the rise in temperature. Basically, you
are playing the odds. If you operate your machine room at a higher
than average temperature, you'll probably see more
failures. It is up to you to decide if the failure rate is
unacceptable.

Microenvironments
also matter. It doesn't matter if it is nice and
cool in your corner of the room if your equipment rack is sitting in
a corner in direct sunlight where the temperature is 15F (8C) warmer.
If the individual pieces of equipment don't have
adequate cooling, you'll have problems. This means
that computers that are spread out in a room with good ventilation
may be better off at a higher room temperature than those in a
tightly packed cluster that lacks ventilation, even when the room
temperature is lower.

Finally, the failure rate will also depend on the actual equipment
you are using. Some equipment is designed and constructed to be more
heat tolerant, e.g., military grade equipment. Consult the
specifications if in doubt.

While occasionally you'll see recommended
temperature ranges for equipment or equipment rooms, these should be
taken with a grain of salt. Usually, recommended temperatures are a
little below 70F (21C). So if you are a
little chilly, your machines are probably comfortable.

Maintaining a consistent temperature can be a problem, particularly
if you leave your cluster up and running at night, over the weekend,
and over holidays. Heating and air conditioning are often turned off
or scaled back when people aren't around.
Ordinarily, this makes good economic sense. But when the air
conditioning is cut off for a long Fourth of July weekend, equipment
can suffer. Make sure you discuss this with your HVAC folks before it
becomes a problem. Again, occasional warm spells

probably won't be a problem,
but you are pushing your luck.

Humidity
is also an issue. At a high humidity, condensation can become a
problem; at a low humidity, static electricity is a problem. The
optimal range is somewhere in between. Recommended ranges are
typically around 40 percent to 60 percent.

Estimating your air conditioning needs is straightforward but may
require information you don't have. Among other
things, proper cooling depends on the number and area of external
walls, the number of windows and their exposure to the sun, the
external temperature, and insulation. Your maintenance folks may have
already calculated all this or may be able to estimate some of it.

What you are adding is heat contributed by your equipment and staff,
something that your maintenance folks may not have been able to
accurately predict. Once again, you'll start with an
inventory of your equipment. You'll want the total
wattage. You can convert this to British
Thermal Units per hour by multiplying the wattage by 3.412. Add in
another 300 BTU/H for each person working in the area. Add in the
load from the lights, walls, windows, etc., and then figure in
another 50 percent as a safety factor. Since air conditioning is
usually expressed in tonnage, you may need to divide the BTU/H total
by 12,000 to get the tonnage you need. (Or, just let the HVAC folks
do all this for you.)


3.2.3 Physical Security


Physical security includes both
controlling access to computers and protecting computers from
physical threats such as flooding. If you are concerned about someone
trying to break into your computers, the best solution is to take
whatever steps you can to ensure that they don't
have physical access to the computers. If you can't
limit access to the individual computers, then you should password
protect the CMOS, set the boot order so the system only boots from
the hard drive, and put a lock on each case. Otherwise, someone can
open the case and remove the battery briefly (roughly 15 to 20
minutes) to erase the information in CMOS including the
password.[3] With the password
erased, the boot order can be changed. Once this is done, it is a
simple matter to boot to a floppy or CD-ROM, mount the hard drive,
and edit the password files, etc. (Even if you've
removed both floppy and CD-ROM drives, an intruder could bring one
with them.) Obviously, this solution is only as good as the locks you
can put on the computers and does very little to protect you from
vandals.

[3] Also, there is usually a jumper that will
immediately discharge the CMOS.


Broken pipes and similar disasters can be devastating. Unfortunately,
it can be difficult to access these potential threats. Computers can
be damaged when a pipe breaks on another floor. Just because there is
no pipe immediately overhead doesn't mean that you
won't be rained on as water from higher floors makes
its way to the basement. Keeping equipment off the floor and off the
top of shelves can provide some protection. It is also a good idea to
keep equipment away from windows.

There are several web sites and books that deal with disaster
preparedness. As the importance of your cluster grows, disaster
preparedness will become more important.


/ 142