High Performance Linux Clusters with OSCAR, Rocks, OpenMosix, and MPI [Electronic resources] نسخه متنی

7.1 Installing Rocks

In
this section we'll look at a default Rocks
installation. We won't go into the same level of
detail as we did with OSCAR, in part because Rocks offers a simpler
installation. This section should give you the basics.

7.1.1 Prerequisites

There are several things you need to do before you begin your
installation. First, you need to plan your system. A Rocks cluster
has the same basic architecture as an OSCAR cluster (see Figure 6-1). The head node or
frontend is a server with two network
interfaces. The

public
interface is attached to the campus network or the Internet while the

private interface is attached to the cluster.
With Rocks, the first interface (e.g., eth0) is
the private interface and the second (e.g.,
eth1) is the public interface. (This is the
opposite of what was described for OSCAR.)

You'll install the frontend first and then use it to
install the compute nodes. The compute nodes use HTTP to pull the Red
Hat and cluster packages from the front-end. Because Rocks uses
Kickstart and Anaconda (described in Chapter 8), heterogeneous
hardware is supported.

Diskless clusters are not an
option with Rocks. It assumes you will have hard disks in all your
nodes. For a default installation, you'll want at
least an 8 GB disk on the frontend. For
compute nodes, by altering the defaults,
you can get by with smaller drives. It is probably easier to install
the software on the compute nodes by booting from a CD-ROM, but if
your systems don't have CD-ROM drives, you can
install the software by booting from a floppy or by doing a network
boot. Compute nodes should be configured to boot without an attached
keyboard or should have a keyboard or KVM switch attached.

Rocks supports both Ethernet and
Myrinet. For the cluster's private network, use a
private address space distinct from the external address space per
RFC 1918. It's OK to let an external DHCP server
configure the public interface, but you should let Rocks configure
the private interface.

7.1.2 Downloading Rocks

To
install Rocks, you'll first need the appropriate
CD-ROMs. Typically, you'll go to the Rocks web site
http://rocks.npaci.edu/Rocks/,
follow the link to the download page, download the ISO images you
want, and burn CD-ROMs from these images. (This is also a good time
to download the user manuals if you haven't already
done so.) Rocks currently supports x86
(Pentium and Athlon), x86_64 (AMD Opteron), and IA-64 (Itanium)
architectures.

Be sure to download the software that is appropriate for your
systems. You'll need at least two ISO images, maybe
more depending upon the software you want.
Every installation will
require the Rocks Base and HPC Roll. The core install provides
several flavors of MPICH, Ganglia, and PVFS. If you want additional
software that is not part of the core Rocks installation,
you'll need to download additional rolls. For
example, if you want tripwire and
chkrootkit, two common security enhancements,
you could download the Area 51 roll. If you are interested in moving
on to grid computing, Rocks provides rolls that ease that process
(see the sidebar, "Rocks and
Grids").

Currently available rolls include the following:

Sun Grid Engine (SGE) roll

This roll includes the Sun Grid Engine, a job queuing system for
grids. Think of this as a grid-aware alternative to openPBS. This is
open source distributed management software. For more information on
SGE, visit http://gridengine.sunsource.net.

Grid roll

The NSF Middleware
Initiative (NMI) grid roll contains a full complement of grid
software, including the Globus toolkit, Condor-G, Network Weather
Service, and MPICH-G2, to name only a few. For more information on
the NMI project, visit http://www.nsf-middleware.org.

Intel roll

This roll
installs and configures the Intel C compiler and the Intel FORTRAN
compiler. (You'll still need licenses from Intel.)
It also includes the MPICH environments built for these compilers.
For more information on the Intel compilers and their use with Rocks,
visit http://www.intel.com/software/products/distributors/rock_cluster.

Area 51 roll

This
roll currently includes tripwire and
chkrootkit. tripwire is a
security auditing package. chrootkit examines a
system for any indication that a root kit has been installed. For
more information on these tools, visit the sites http://www.tripwire.org and http://www.chkrootkit.org.

Scalable Cluster Environment (SCE) roll

This roll includes the OpenSCE software
that originated at Kasetsart University, Thailand. For more
information on OpenSCE, visit http://www.opensce.org.

Java roll

The Java
roll contains the Java Virtual Machine. For more information on Java,
visit http://java.sun.com.

PBS roll

The
Portable Batch System roll includes the
OpenPBS and Maui queuing and scheduling software.
For more information on these packages, see Chapter 11 or visit
http://www.openpbs.org.

Condor roll

This
roll includes the Condor workload management software. Condor
provides job queuing, scheduling, and priority management along with
resource monitoring and management. For more information on Condor,
visit http://www.cs.wisc.edu/condor/.

Some rolls are not available for all architectures.
It's OK to install more than one roll, so get what
you think you may need now. Generally, you won't be
able to add a roll once the cluster is installed. (This should change
in the future.)

Once you've burned CD-ROMs from the ISO images, you
are ready to start the installation. You'll start
with the frontend.

Rocks and Grids

While grids are beyond the scope of this
book, it is worth mentioning that, through its rolls mechanism, Rocks
makes it particularly easy to move into grid computing. The grid roll
is particularly complete, providing pretty much everything
you'll need to get startedliterally dozens of
software tools and packages. Software includes:

Globus Toolkita collection of modular technologies, including
tools for authentication, scheduling and file transfer that
simplifies collaboration among sites.

Condor-Gthe Condor software with grid and Globus compatibility.

Network Weather Servicea monitoring service that dynamically
forecasts network and resource performance.

MPICH-G2a grid-enabled implementation of MPICH.

Grid Packaging Toolsa collection of packaging tools built
around XML. This is a package management system.

KX.509/KCAtechnology that provides a bridge between Kerberos
and PKI infrastructure.

GSI OpenSSHa modified version of SSH that supports GSI
authentication (Grid Security Infrastructure).

MyProxya credential repository for grids.

Gridconfig Toolsa set of tools to configure and tune grid
technologies.

These are just the core. It you are new to grids and want to get
started, this is the way to go. (The Appendix A
includes the URLs for these tools.)

7.1.3 Installing the Frontend

The
frontend installation should go very smoothly. After the initial boot
screens, you'll see a half dozen or so screens
asking for additional information along with other screens giving
status information for the installation. If you've
installed Red Hat Linux before, these screens will look very
familiar. On a blue background, you'll see the Rocks
version information at the very top of the screen and interface
directions at the bottom of the screen. In the center of the screen,
you'll see a gray window with fields for user
supplied information or status information. Although you can probably
ignore them, as with any Red Hat installation, the Linux virtual
consoles are available as shown in Table 7-1. If
you have problems, don't forget these.

Table 7-1. Virtual consoles
Console	Use	Keystroke
1	Installation	Cntl-Alt-F1
2	Shell prompt	Cntl-Alt-F2
3	Installation log	Cntl-Alt-F3
4	System messages	Cntl-Alt-F4
5	Other messages	Cntl-Alt-F5

Boot the frontend with the Rocks Base CD and stay with the machine.
After a moment, you will see a boot screen giving you several
options. Type frontend at the
boot: prompt and press Enter. You need to do this
quickly because the system will default to a compute node
installation after a few seconds and the prompt will disappear. If
you miss the prompt, just reboot the system and pay closer attention.

After a brief pause, the system prompts you to register your roll
CDs. When it asks whether you have any roll CDs, click on Yes. When
the CD drive opens, replace the Rocks Base CD with the HPC Roll CD.
After a moment the system will ask if you have another roll CD.
Repeat this process until you have added all the roll CDs you have.
Once you are done, click on No and the system will prompt you for the
original Rocks Base CD. Registration is now done, but at the end of
the installation you'll be prompted for these disks
again for the purpose of actual software installation.

The next screen prompts you for information that will be included in
the web reports that Ganglia creates. This includes the cluster name,
the cluster owner, a contact, a URL, and the latitude and longitude
for the cluster location. You can skip any or all of this
information, but it only takes a moment to enter. You can change all
this later, but it can be annoying trying to find the right files. By
default, the web interface is not accessible over the public
interface, so you don't have to worry about others
outside your organization seeing this information.

The next step is partitioning the disk drive. You can select
Autopartition and let Rocks partition the disk using default values
or you can manually partition the disk using Disk Druid. The current
defaults are 6 GB for / and 1 GB for swap space.
/export gets the remaining space. If you
manually partition the drive, you need at least 6 GB for
/ and you must have a
/export partition.

The next few screens are used to configure the network. Rocks begins
with the private interface. You can choose to have DHCP configure
this interface, but since this is on the internal network, it
isn't likely that you want to do this. For the
internal network, use a private address range that
doesn't conflict with the external address range.
For example, if your campus LAN uses 10.X.X.X, you
might use 172.16.1.X for your internal network.
When setting up clients, Rocks numbers machines from the highest
number downward, e.g., 172.16.1.254,
172.16.1.253, ....

For the public interface, you can manually enter an IP address and
mask or you can rely on DHCP. If you are manually entering the
information, you'll be prompted for a routing
gateway and DNS servers. If you are using DHCP, you
shouldn't be asked for this information.

The last network setup screen asks for a node name. While it is
possible to retrieve this information by DHCP, it is better to set it
manually. Otherwise, you'll need to edit
/etc/resolv.conf after the installation to add
the frontend to the name resolution path. Choose the frontend name
carefully. It will be written to a number of files, so it is very
difficult to change. It is a very bad idea to try to change hostnames
after installing Rocks.

Once you have the network parameters set, you'll be
prompted for a root password. Then Rocks will format the filesystem
and begin installing the packages. As the installation proceeds,
Rocks provides a status report showing each package as it is
installed, time used, time remaining, etc. This step will take a
while.

Once the Rocks Base CD has been installed, you'll be
prompted for each of the roll CDs once again. Just swap CDs when
prompted to do so. When the last roll CD has been installed, the
frontend will reboot.

Your frontend is now installed. You can move onto the compute nodes
or you can stop and poke around on the frontend first. The first time
you log onto the frontend, you will be prompted for a file and
passphrase for SSH.

Rocks Frontend Node - Wofford Rocks Cluster
Rocks 3.2.0 (Shasta)
Profile built 17:10 29-Jul-2004
Kickstarted 17:12 29-Jul-2004
It doesn't appear that you have set up your ssh key.
This process will make the files:
/root/.ssh/identity.pub
/root/.ssh/identity
/root/.ssh/authorized_keys
Generating public/private rsa1 key pair.
Enter file in which to save the key (/root/.ssh/identity): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/identity.
Your public key has been saved in /root/.ssh/identity.pub.
The key fingerprint is:
86:ad:c4:e3:a4:3a:90:bd:7f:f1:bd:7a:df:f7:a0:1c root@frontend.public

The default file name is reasonable, but you really should enter a
passphraseone you can remember.

7.1.4 Install Compute Nodes

The next step is to install the compute
nodes. Before you do this, you may want to make a few changes to the
defaults. For example, you might want to change how the disks will be
partitioned, what packages will be installed, or even which kernel
will be used. For now, we'll stick with the
defaults. Customizations are described in the next two sections, so
you may want to read ahead before going on. But it's
really easy to reinstall the compute nodes, so don't
feel you have to master everything at once.

To install the compute nodes, you'll begin by
running the program insert-ethers as root on the
frontend. Next, you'll boot a compute node using the
Rocks Base CD. Since the Rocks Base CD defaults to compute node
install, you won't need to type anything on the
cluster node. The
insert-ethers program listens for a DHCP query from
the booting compute node, assigns it a name and IP address, records
information in its database, and begins the installation of the
client.

Let's look at the process in a little more detail.
insert-ethers collects MAC address information
and enters it into the Rocks cluster database. It can also be used to
replace (--replace), update
(--update), and remove
(--remove) information in the database. This
information is used to generate the DHCP configuration file and the
host file.

There is one potential problem you might face when using
insert-ethers. If you have a managed Ethernet
switch, when booted it will issue a DHCP request. You
don't want to treat it like a compute node.
Fortunately, the Rocks implementers foresaw this problem. When you
start insert-ethers, you are given a choice of
the type of appliance to install. You can select Ethernet
Switch as an option and configure your switch. When you
are done, quit and restart insert-ethers. This
time select Compute. Now you are ready to boot
your compute nodes. If you aren't setting up an
Ethernet switch, you can just select Compute the
first time you run insert-ethers.

The next step is to boot your compute nodes. As previously noted, you
can use the Rocks Base CD to do this. If your compute nodes
don't have CD-ROM drives, you have two other
options. You can use a network boot if your network adapters support
a PXE boot, or you can create a PXE boot floppy. Consult your
hardware documentation to determine how to do a PXE boot using a
network adapter. The Rocks FAQ, included in

NPSCI Rocks
Cluster Distribution: Users Guide , has the details for
creating a PXE boot floppy.

When insert-ethers runs, it displays a window
labeled Inserted Appliances. As each compute node is booted, it
displays the node's MAC address and assigned name.
Typically, insert-ethers will name the systems
compute-0-0, compute-0-1,
etc. (The file /etc/host defines aliases for
these, c0-0, c0-1, etc.,
for those of us who don't type well.) If you start
insert-ethers with the command-line option
--cabinet=1, it will generate the names
compute-1-0, compute-1-1,
etc. This allows you to create a two-tier naming system, if you want.
You can change the starting point for the second number with the
--rank. See the
insert-ethers(8) manpage for more details.

A couple of minutes after you reboot your compute node, it will eject
the CD-ROM. You can take the CD-ROM and move on to your next machine.
If you have a terminal connected to the system,
you'll get a status report as the installation
proceeds.

If you need to reinstall a node, you can use the
shoot-node command. This is useful when changing
the configuration of a node, e.g., adding a new package. This command
takes the name of the machine or machines as an argument.

[root@frontend root]# shoot-node compute-0-0

Since this is run on the frontend, it can be used to remotely
reinstall a system. This command is described in the
shoot-node(8) manpage.

7.1.5 Customizing the Frontend

Since Rocks installs Linux for you,
you will need to do a little digging to see how things are set up.
Among other services, Rocks installs and configures 411 (an NIS
replacement), Apache, DHCP, MySQL, NFS, NTP, Postfix, and SSH, as
well as cluster-specific software such as Ganglia and PVFS.
Configuration files are generally where you would expect them.
You'll probably want to browse the files in
/etc, /etc/init.d,
/etc/ssh, and
/etc/xinetd.d. Other likely files include
crontab, dhcpd.conf,
exports, fstab,
gmetad.conf, gmond.conf,
hosts, ntp.conf, and
ntp/step-tickers. You might also run the
commands

[root@frontend etc]# ps -aux | more
...
[root@frontend etc]# /sbin/service --status-all | more
...
[root@frontend etc]# netstat -a | more
...

The cluster software that Rocks installs is in
/opt or /usr/share.

If you have been using Red Hat for a while, you probably have some
favorite packages that Rocks may not have installed. Probably the
best way to learn what you have is to just poke around and try
things.

7.1.5.1 User management with 411

Starting with Rocks 3.1.0,
411 now replaces
NIS. 411 automatically synchronizes the files listed in
/var/411/Files.mk. The password and group files
are among these. When you add users, you'll want to
use useradd.

[root@frontend 411]# useradd -p xyzzy -c "Joe Sloan" \
> -d /export/home/sloanjd sloanjd
...

This automatically invokes 411. When a user changes a password,
you'll need to sync the changes with the compute
nodes. You can do this with the command

[root@frontend root]# make -C /var/411

A more complete discussion of 411 can be found in the Rocks
user's guide. At this time, there
isn't a 411 man page. To remove users, use
userdel.

7.1.5.2 X Window System

You'll probably
want to start the X Window System so you can run useful graphical
tools such as Ganglia. Before you can run X the first time,
you'll need to run
redhat-config-xfree86. If you are comfortable
setting options, go for it. If you are new to the X Window System,
you'll probably be OK just accepting the defaults.
You can then start X with the xstart command.
(If you get a warning message about no screen savers, just ignore
it.)

Once X is working, you'll need to do the usual local
customizations such as setting up printers, creating a message of the
day, etc.

7.1.6 Customizing Compute Nodes

Rocks uses Kickstart and
Anaconda to install the individual compute nodes. However, rather
than use the usual flat, text-based configuration file for Kickstart,
Rocks decomposes the Kickstart file into a set of XML files for the
configuration information. The Kickstart configuration is generated
dynamically from these. These files are located in the
/export/home/install/rocks-dist/enterprise/3/en/os/i386/build/nodes/
directory. Don't change these. If you need to create
customization files, you can put them in the directory
/home/install/site-pro/image/library/english/10030_3.2.0/nodes/ for
Rocks Version 3.2.0. There is a sample file
skeleton.xml that you can use as a template when
creating new configuration files. When you make these changes,
you'll need to apply the configuration change to the
distribution using the rocks-dist command. The
following subsections give examples. (For more information on
rocks-dist, see the
rocks-dist(1) manpage.)

7.1.6.1 Adding packages

If you want to install additional RPM packages, first copy those
packages to the directory
/home/install/contrib./enterprise/3/public/arch/RPMS,
where arch is the architecture you are
using, e.g., i386.

[root@frontend root]# mv ethereal-0.9.8-6.i386.rpm \
> /home/install/contrib/enterprise/3/public/i386/RPMS/
[root@frontend root]# mv ethereal-gnome-0.9.8-6.i386.rpm \
> /home/install/contrib/enterprise/3/public/i386/RPMS/

Next, create a configuration file
extend-compute.xml. Change to the profile
directory, copy skeleton.xml, and edit it with
your favorite text editor such as

vi .

[root@frontend root]# cd /home/install/site-pro/image/library/english/10030_3.2.0/nodes
[root@frontend nodes]# cp skeleton.xml extend-compute.xml
[root@frontend nodes]# vi extend-compute.xml
...

Next, add a line to extend-compute.xml for each
package.

<package> ethereal </package>
<package> ethereal-gnome </package>

Notice that only the base name for a package is used; omit the
version number and .rpm suffix.

Finally, apply the configuration change to the distribution.

[root@frontend nodes]# cd /home/install
[root@frontend install]# rocks-dist dist
...

You can now install the compute nodes and the desired packages will
be included.

7.1.6.2 Changing disk partitions

In general, it is
probably a good idea to stick to one disk-partitioning scheme. Unless
you turn the feature off as described in the next subsection, compute
nodes will automatically be reinstalled after a power outage. If you
are using multiple partitioning schemes, the automatic reinstallation
could result in some drives with undesirable partitioning. Of course,
the downside of a single-partitioning scheme is that it may limit the
diversity of hardware you can use.

To change the default disk partitioning scheme used by Rocks to
install compute nodes, first create a replacement partition
configuration file. Begin by changing to the directory where the site
profiles are stored. Create a configuration file
replace-auto-partition.xml.
Change to the profile directory, copy
skeleton.xml,
and edit it.

[root@frontend root]# cd /home/install/site-pro/image/library/english/10030_3.2.0/nodes
[root@frontend nodes]# cp skeleton.xml replace-auto-partition.xml
[root@frontend nodes]# vi replace-auto-partition.xml
...

Under the main section, you'll add something like
the following:

<main>
<part> / --size 2048 --ondisk hda </part>
<part> swap --size 500 --ondisk hda </part>
<part> /mydata --size 1 --grow --ondisk hda </part>
</main>

Apart from the XML tags, this is standard Kickstart syntax. This
example, a partitioning scheme for an older machine, uses 2 GB for
the root partition, 500 MB for a swap partition, and the rest of the
disk for the /mydata partition.

The last step is to apply the configuration change to the
distribution.

[root@frontend nodes]# cd /home/install
[root@frontend install]# rocks-dist dist
...

You can now install the system using the new partitioning scheme.

7.1.6.3 Other changes

By default, a compute node will attempt to reinstall itself whenever
it does a hard restart, e.g., after a power failure. You can disable
this behavior by executing the next two commands.

[root@frontend root]# cluster-fork '/etc/rc.d/init.d/rocks-grub stop'
compute-0-0: 
Rocks GRUB: Setting boot action to 'boot current kernel': [  OK  ]
...
[root@frontend root]# cluster-fork '/sbin/chkconfig --del rocks-grub'
compute-0-0: 
...

The command
cluster-fork is used to execute a command on every
machine in the cluster. In this example, the two commands enclosed in
quotes will be executed on each compute node. Of course, if you
really wanted to, you could log onto each, one at a time, and execute
those commands. cluster-fork is a convenient
tool to have around. Additional information can be found in the Rocks
user's guide. There is no manpage at this time.

Creating and installing custom kernels on the compute nodes, although
more involved, is nonetheless straightforward under Rocks.
You'll first need to create a compute node, build a
new kernel on the compute node, package it using
rpm, copy it to the frontend, rebuild the Rocks
distribution with rocks-dist, and reinstall the
compute nodes. The details are provided in the Rocks
user's guide along with descriptions of other
customizations you might want to consider.