High Performance Linux Clusters with OSCAR, Rocks, OpenMosix, and MPI [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

High Performance Linux Clusters with OSCAR, Rocks, OpenMosix, and MPI [Electronic resources] - نسخه متنی

Joseph D. Sloan

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید








11.1 OpenPBS


Before
the emergence of clusters, the Unix-based

Network Queuing
System (NQS) from NASA Ames Research Center was a
commonly used batch-queuing system. With the emergence of parallel
distributed system, NQS began to show its limitations. Consequently,
Ames led an effort to develop requirements and specifications for a
newer, cluster-compatible system. These requirements and
specifications later became the basis for the IEEE 1003.2d POSIX
standard. With NASA funding, PBS, a system conforming to those
standards, was developed by Veridian in the early 1990s.

PBS is available in two
formsOpenPBS or PBSPro. OpenPBS is the unsupported original
open source version of PBS, while PBSPro is a newer commercial
product. In 2003, PBSPro was acquired by Altair Engineering and is
now marketed by Altair Grid Technologies, a subsidiary of Altair
Engineering. The web site for OpenPBS is http://www.openpbs.org; the web site for
PBSPro is http://www.pbspro.com.
Although much of the following will also apply to PBSPro, the
remainder of this chapter describes OpenPBS, which is often referred
to simply as PBS. However, if you have the resources to purchase
software, it is well worth looking into PBSPro. Academic grants have
been available in the past, so if you are eligible, this is worth
looking into as well.

As an unsupported product,
OpenPBS has its problems. Of the software
described in this book, it was, for me, the most difficult to
install. In my opinion, it is easier to install OSCAR, which has
OpenPBS as a component, or Rocks along with the PBS roll than it is
to install just OpenPBS. With this warning in mind,
we'll look at a typical installation later in this
chapter.


11.1.1 Architecture


Before
we install PBS, it is helpful to describe its architecture. PBS uses
a client-server model and is organized as a set of user-level
commands that interact with three system-level daemons. Jobs are
submitted using the user-level commands and managed by the daemons.
PBS also includes an API.

The pbs_server daemon,
the job server, runs on the server system and is the heart of the PBS
system. It provides basic batch services such as receiving and
creating batch jobs, modifying the jobs, protecting jobs against
crashes, and running the batch jobs. User commands and the other
daemons communicate with the pbs_server over the
network using TCP. The user commands need not be installed on the
server.

The job server manages one or more queues. (Despite the name, queues
are not restricted to first-in, first-out scheduling.) A scheduled
job waiting to be run or a job that is actually running is said to be
a member of its queue. The job server supports two types of queues,
execution and routing. A job in an execution queue is waiting to
execute while a job in a routing queue is waiting to be routed to a
new destination for execution.

The
pbs_mom daemon executes the individual batch
jobs. This job executor daemon is often called the

MOM
because it is the "mother" of all
executing jobs and must run on every system within the cluster. It
creates an execution environment that is as nearly identical to the
user's session as possible. MOM is also responsible
for returning the job's output to the user.

The final daemon,
pbs_sched, implements the
cluster's job-scheduling policy. As such, it
communicates with the pbs_server and
pbs_mom daemons to match available jobs with
available resources. By default, a first-in, first-out scheduling
policy is used, but you are free to set your own policies. The
scheduler is highly extensible.

PBS provides both a GUI interface as well as 1003.2d-compliant
command-line utilities. These commands fall into three categories:
management, operator, and user commands. Management and operator
commands are usually restricted commands. The commands are used to
submit, modify, delete, and monitor batch jobs.


11.1.2 Installing OpenPBS


While
detailed installation directions can be found in the
PBS

Administrator
Guide , there are enough
"gotchas" that it is worth going
over the process in some detail. Before you begin, be sure you look
over the

Administrator Guide as well. Between
the guide and this chapter, you should be able to overcome most
obstacles.

Before starting with the installation proper, there are a couple of
things you need to check. As noted, PBS provides both command-line
utilities and a graphical interface. The graphical interface requires
Tcl/Tk 8.0 or later, so if you
want to use it, make sure Tcl/Tk is installed.
You'll want to install Tcl/Tk before you install
PBS. For a Red Hat installation, you can install Tcl/Tk from the
packages supplied with the operating system. For more information on
Tcl/Tk, visit the web site http://www.scriptics.com/. In order to build
the GUI, you'll also need the X11 development
packages, which Red Hat users can install from the supplied RPMs.

The first step in the installation proper is to download the
software. Go to the
OpenPBS web site (http://www-unix.mcs.anl.gov/openpbs/) and
follow the links to the download page. The first time through, you
will be redirected to a registration page. With registration, you
will receive by email an account name and password that you can use
to access the actual download page. Since you have to wait for
approval before you receive the account information,
you'll want to plan ahead and register a couple of
days before you plan to download and install the software. Making
your way through the registration process is a little annoying
because it keeps pushing the commercial product, but it is
straightforward and won't take more than a few
minutes.

Once you reach the download page, you'll have the
choice of downloading a pair of RPMs or the patched source code. The
first RPM contains the full PBS distribution and is used to set up
the server, and the second contains just the software needed by the
client and is used to set up compute nodes within a cluster. While
RPMs might seem the easiest way to go, the available RPMs are based
on an older version of Tcl/Tk (Version 8.0). So unless you want to
backpedali.e., track down and install these older packages, a
nontrivial taskinstalling the source is preferable.
That's what's described here.

Download the source and move it to your directory of choice. With a
typical installation, you'll end up with three
directory treesthe source tree, the installation tree, and the
working directory tree. In this example, I'm setting
up the source tree in the directory
/usr/local/src. Once you have the source package
where you want it, unpack the code.

[root@fanny src]# gunzip OpenPBS_2_3_16.tar.gz
[root@fanny src]# tar -vxpf OpenPBS_2_3_16.tar

When untarring the package, use the -p option to
preserve permissions bits.

Since the
OpenPBS code is no longer supported,
it is somewhat brittle. Before you can compile the code, you will
need to apply some patches. What you install will depend on your
configuration, so plan to spend some time on the Internet: the
OpenPBS URL given above is a good place to start. For Red Hat Linux
9.0, start by downloading the scaling patch from http://www-unix.mcs.anl.gov/openpbs/ and the
errno and gcc patches from
http://bellatrix.pcl.ox.ac.uk/~ben/pbs/.
(Working out the details of what you need is the annoying side of
installing OpenPBS.) Once you have the patches you want, install
them.

[root@fanny src]# cp openpbs-gcc32.patch /usr/local/src/OpenPBS_2_3_16/
[root@fanny src]# cp openpbs-errno.patch /usr/local/src/OpenPBS_2_3_16/
[root@fanny src]# cp ncsa_scaling.patch /usr/local/src/OpenPBS_2_3_16/
[root@fanny src]# cd /usr/local/src/OpenPBS_2_3_16/
[root@fanny OpenPBS_2_3_16]# patch -p1 -b < openpbs-gcc32.patch
patching file buildutils/exclude_script
[root@fanny OpenPBS_2_3_16]# patch -p1 -b < openpbs-errno.patch
patching file src/lib/Liblog/pbs_log.c
patching file src/scheduler.basl/af_resmom.c
[root@fanny OpenPBS_2_3_16]# patch -p1 -b < ncsa_scaling.patch
patching file src/include/acct.h
patching file src/include/cmds.h
patching file src/include/pbs_ifl.h
patching file src/include/qmgr.h
patching file src/include/server_limits.h

The scaling patch changes built-in limits that prevent OpenPBS from
working with larger clusters. The other patches correct problems
resulting from recent changes to the gcc
complier.[1]

[1] Even with the patches, I found it necessary
to manually edit the file srv_connect.c, adding
the line #include
<error.h> with the other
#include lines in the file. If you have this
problem, you'll know because
make will fail when referencing this file. Just
add the line and remake the file.


As noted, you'll want to keep the installation
directory separate from the source tree, so create a new directory
for PBS. /usr/local/OpenPBS is a likely choice.
Change to this directory and run

configure ,

make ,

make install , and

make clean from it.

[root@fanny src]# mkdir /usr/local/OpenPBS
[root@fanny src]# cd /usr/local/OpenPBS
[root@fanny OpenPBS]# /usr/local/src/OpenPBS_2_3_16/configure \
> --set-default-server=fanny --enable-docs --with-scp
...
[root@fanny OpenPBS]# cd /usr/local/src/OpenPBS_2_3_16/
[root@fanny OpenPBS-2.3.16]# make
...
[root@fanny OpenPBS-2.3.16]# /usr/local/src/OpenPBS
[root@fanny OpenPBS]# make install
...
[root@fanny OpenPBS]# make clean
...

In this example, the configuration options set
fanny as the server, create the documentation,
and use scp (SSH secure copy program) when
moving files between remote hosts. Normally, you'll
create the documentation only on the server. The

Administrator
Guide contains several pages of additional options.

By default, the procedure builds all the software. For the compute
nodes, this really isn't necessary since all you
need is pbs_mom on these
machines. Thus, there are several alternatives that you might want to
consider when setting up the clients. You could just go ahead and
build everything like you did for the server, or you could use
different build options to restrict what is built. For example, the
option --disable-server prevents the
pbs_server daemon
from being built. Or you could build and then install just
pbs_mom and the files it needs. To do this,
change to the MOM subdirectory, in this example
/usr/local/OpenPBS/src/resmom, and run
make install to install just MOM.

[root@ida OpenPBS]# cd /usr/local/OpenPBS/src/resmom
[root@ida resmom]# make install
...

Yet another possibility is to use NFS to mount the appropriate
directories on the client machines. The

Administrator
Guide outlines these alternatives but
doesn't provide many details. Whatever your
approach, you'll need pbs_mom
on every compute node.

The make install step will create the
/usr/spool/PBS working directory, and will
install the user commands in /usr/local/bin and
the daemons and administrative commands in
/usr/local/sbin. make clean
removes unneeded files.


11.1.3 Configuring PBS


Before
you can use PBS, you'll need to create or edit the
appropriate configuration files, located in the working directory,
e.g., /usr/spool/PBS, or its subdirectories.
First, the server needs the node file, a file listing the machines it
will communicate with. This file provides the list of nodes used at
startup. (This list can be altered dynamically with the
qmgr command.) In the subdirectory
server_priv, create the file
nodes with the editor of your choice. The nodes
file should have one entry per line with the names of the machines in
your cluster. (This file can contain additional information, but this
is enough to get you started.) If this file does not exist, the
server will know only about itself.

MOM will need the configuration file config,
located in the subdirectory mom_priv. At a
minimum, you need an entry to start logging and an entry to identity
the server to MOM. For example, your file might look something like
this:

$logevent 0x1ff
$clienthost fanny

The argument to $logevent is a mask that
determines what is logged. A value of 0X0ff will
log all events excluding debug messages, while a value of
0X1ff will log all events including debug
messages. You'll need this file on every machine.
There are a number of other options, such as creating an access list.

Finally, you'll want to create a
default_server file in the working directory
with the fully qualified domain name of the machine running the
server daemon.

PBS uses ports 15001-15004 by default, so it is essential that your
firewall doesn't block these ports. These can be
changed by editing the /etc/services file. A
full list of services and ports can be found in the

Administrator Guide (along with other
configuration options). If you decide to change ports, it is
essential that you do this consistently across your cluster!

Once you have the configuration files in place, the next step is to
start the appropriate daemons, which must be started as root. The
first time through, you'll want to start these
manually. Once you are convinced that everything is working the way
you want, configure the daemons to start automatically when the
systems boot by adding them to the appropriate startup file, such as
/etc/rc.d/rc.local. All three daemons must be
started on the server, but the pbs_mom is the
only daemon needed on the compute nodes. It is best to start
pbs_mom before you start the
pbs_server so that it can respond to the
server's polling.

Typically, no options are needed for pbs_mom.
The first time (and only the first time) you run
pbs_server, start it with the option -t
create
.

[root@fanny OpenPBS]# pbs_server -t create

This option is used to create a new server database. Unlike
pbs_mom and pbs_sched,
pbs_server can be configured dynamically after
it has been started.

The options to pbs_sched will depend on your
site's scheduling policies. For the default FIFO
scheduler, no options are required. For a more detailed discussion of
command-line options, see the manpages for each daemon.


11.1.4 Managing PBS


We'll
begin by looking at the command-line utilities first since the GUI
may not always be available. Once you have mastered these commands,
using the GUI should be straightforward. From a
manager's perspective, the first command
you'll want to become familiar with is
qmgr, the queue management command.
qmgr is used to create job queues and manage
their properties. It is also used to manage nodes and servers
providing an interface to the batch system. In this section
we'll look at a few basic examples rather than try
to be exhaustive.

First, identify the pbs_server managers, i.e.,
the users who are allowed to reconfigure the batch system. This is
generally a one-time task. (Keep in mind that not all commands
require administrative privileges. Subcommands such as the
list and print can be
executed by all users.) Run the qmgr command as
follows, substituting your username:

[root@fanny OpenPBS]# qmgr
Max open servers: 4
Qmgr: set server managers=sloanjd@fanny.wofford.int
Qmgr: quit

You can specify multiple managers by adding their names to the end of
the command, separated by commas. Once done, you'll
no longer need root privileges to manage PBS.

Your next task will be to create a queue. Let's look
at an example.

[sloanjd@fanny PBS]$ qmgr
Max open servers: 4
Qmgr: create queue workqueue
Qmgr: set queue workqueue queue_type = execution
Qmgr: set queue workqueue resources_max.cput = 24:00:00
Qmgr: set queue workqueue resources_min.cput = 00:00:01
Qmgr: set queue workqueue enabled = true
Qmgr: set queue workqueue started = true
Qmgr: set server scheduling = true
Qmgr: set server default_queue = workqueue
Qmgr: quit

In this example we have created a new queue named
workqueue. We have limited CPU time to between 1
second and 24 hours. The queue has been enabled, started, and set as
the default queue for the server, which must have at least one queue
defined. All queues must have a type, be enabled, and be started.

As you can see from the example, the general form of a
qmgr command line is a command
(active, create,
delete, set,
unset, list, or
print) followed by a target
(server, queue, or
node) followed by an attribute assignment. These
keywords can be abbreviated as long as there is no ambiguity. In the
first example in this section, we set a server attribute. In the
second example, the target was the queue that we were creating for
most of the commands.

To examine the configuration of the server, use the command

Qmgr: print server

This can be used to save the configuration you are using. Use the
command

[root@fanny PBS]# qmgr -c "print server" > server.config

Note, that with the -c flag,
qmgr commands can be entered on a single line.
To re-create the queue at a later time, use the command

[root@fanny PBS]# qmgr < server.config

This can save a lot of typing or can be automated if needed. Other
actions are described in the documentation.

Another useful command is
pbsnodes, which lists the status of the nodes on
your cluster.

[sloanjd@amy sloanjd]$ pbsnodes -a
oscarnode1.oscardomain
state = free
np = 1
properties = all
ntype = cluster
oscarnode2.oscardomain
state = free
np = 1
properties = all
ntype = cluster
...

On a large cluster, that can create a lot of output.


11.1.5 Using PBS


From
the user's perspective, the place to start is the
qsub command, which submits jobs. The only
jobs that the qsub accepts are scripts, so
you'll need to package your tasks appropriately.
Here is a simple example script:

#!/bin/sh
#PBS -N demo
#PBS -o demo.txt
#PBS -e demo.txt
#PBS -q workq
#PBS -l mem=100mb
mpiexec -machinefile /etc/myhosts -np 4 /home/sloanjd/area/area

The first line specified the shell to use in interpreting the script,
while the next few lines starting with #PBS are
directives that are passed to PBS. The first names the job, the next
two specify where output and error output go, the next to last
identifies the queue that is used, and the last lists a resource that
will be needed, in this case 100 MB of memory. The blank line signals
the end of PBS directives. Lines that follow the blank line indicate
the actual job.

Once you have created the batch script for your job, the
qsub command is used to submit the job.

[sloanjd@amy area]$ qsub pbsdemo.sh
11.amy

When run, qsub returns the job identifier as
shown. A number of different options are available, both as
command-line arguments to qsub or as directives
that can be included in the script. See the qsub
(1B)
manpage for more details.

There are several things you should be aware of when using
qsub. First, as noted, it expects a script.
Next, the target script cannot take any command-line arguments.
Finally, the job is launched on one node. The script must ensure that
any parallel processes are then launched on other nodes as needed.

In addition to qsub, there are a number of other
useful commands available to the general user. The commands
qstat and
qdel can be used to manage jobs. In this
example, qstat is used to determine what is on
the queue:

[sloanjd@amy area]$ qstat
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
11.amy pbsdemo sloanjd 0 Q workq
12.amy pbsdemo sloanjd 0 Q workq

qdel is used to delete jobs as shown.

[sloanjd@amy area]$ qdel 11.amy
[sloanjd@amy area]$ qstat
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
12.amy pbsdemo sloanjd 0 Q workq

qstat can be called with the job identifier to
get more information about a particular job or with the
-s option to get more details.

A few of the more useful ones include
the following:

qalter


This is used to modify the attributes of an existing job.


qhold


This is used to place a hold on a job.


qmove


This is used to move a job from one queue to another.


qorder


This is used to change the order of two jobs.


qrun


This is used to force a server to start a job.



If you start with the qsub (1B) manpage, other
available commands are listed in the "See
Also" section.


Figure 11-1. xpbs -admin


Figure 11-2. xpbsmon


11.1.6 PBS's GUI


PBS provides two GUIs for queue
management. The command
xpbs will start a general interface. If you
need to do administrative tasks, you should include the argument
-admin. Figure 11-1 shows the
xpbs GUI with the -admin
option. Without this option, the general appearance is the same, but
a number of buttons are missing. You can terminate a server; start,
stop, enable, or disable a queue; or run or rerun a job. To monitor
nodes in your cluster, you can use the
xpbsmon command, shown for a few machines in
Figure 11-2.


11.1.7 Maui Scheduler


If you need
to go beyond the schedulers supplied with PBS, you should consider
installing Maui. In a sense, Maui picks up where PBS leaves off. It
is an external schedulerthat is, it does not include a
resource manager. Rather, it can be used in conjunction with a
resource manager such as PBS to extend the resource
manager's capabilities. In addition to PBS, Maui
works with a number of other resource managers.

Maui controls how, when, and where jobs will be run and can be
described as a policy engine. When used correctly, it can provide
extremely high system utilization and should be considered for any
large or heavily utilized cluster that needs to optimize throughput.
Maui provides a number of very advanced scheduling options.
Administration is through the master configuration file
maui.cfg and through either a text-based or a
web-based interface.

Maui is installed by default as part of OSCAR and Rocks. For the most
recent version of Maui or for further documentation, you should visit
the Maui web site, http://www.supercluster.org.


/ 142