Beowulf Cluster Computing with Linux, Second Edition [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Beowulf Cluster Computing with Linux, Second Edition [Electronic resources] - نسخه متنی

William Gropp; Ewing Lusk; Thomas Sterling

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
توضیحات
افزودن یادداشت جدید







17.2 Using PBS

From the user's perspective, a workload mangement system enables you to make more efficient use of your time by allowing you to specify the tasks you need run on the cluster. The system takes care of running these tasks and returning the results to you. If the cluster is full, then it holds your tasks and runs them when the resources are available.

PBS provides two user interfaces: a command-line interface (CLI) and a graphical user interface (GUI). You can use either to interact with PBS: both interfaces have the same functionality. (The examples below show the command line interface; see the "Using the PBS Graphical User Interface" section below for examples of the GUI.)

Using either interface, you create a batch job that you then submit to PBS. A batch job is a shell script containing the set of commands you want run on the cluster. It also contains directives that specify the resource requirements (such as memory or CPU time) that your job needs. Once you create your PBS job, you can reuse it, if you wish, or you can modify it for subsequent runs. Example job scripts are shown below.

PBS also provides a special kind of batch job called interactive batch. This job is treated just like a regular batch job (it is queued up and must wait for resources to become available before it can run). But once it is started, the user's terminal input and output are connected to the job in what appears to be an

rlogin session. It appears that the user is logged into one of the nodes of the cluster, and the resources requested by the job are reserved for that job. Many users find this feature useful for debugging their applications or for computational steering.


17.2.1 Creating a PBS Job


Previously we mentioned that a PBS job is simply a shell script containing resource requirements of the job and the command(s) to be executed. (However, if you use the PBS graphical interface, you do not have to edit any batch files; instead, the GUI provides a point and click interface that creates the batch job script for you.) A sample PBS job might look like the following:


#!/bin/sh
#PBS -1 walltime=1:00:00
#PBS -1 nodes=4
#PBS -j oe
cd ${HOME}/PBS/trial
mpiexec -n 4 myprogram

This script would then be submitted to PBS using the qsub command.

Let us look at the script for a moment. The first line tells what shell to use to interpret the script. Lines 2-3 are resource directives, specifying arguments to the "resource list" (

"-1" ) option of

qsub . Note that all PBS directives begin with

#PBS . These lines tell PBS what to do with your job. Any

qsub option can also be placed inside the script by using a

#PBS directive. However, PBS stops parsing directives with the first blank line encountered.


Returning to our example above, we see a request for one hour of wall-clock time and four nodes. The fourth line is a request for PBS to merge the stdout and stderr file streams of the job into a single file. The last two lines are the commands the user wants executed: change directory to a particular location, then execute an MPI program called

'myprogram' .

This job script could have been created in one of two ways: using a text editor, or using the xpbs graphical interface (see below).


17.2.2 Submitting a PBS Job


The command used to submit a job to PBS is

qsub . For example, say you created a file containing your PBS job called

'myscriptfile' . The following example shows how to submit the job to PBS:


% qsub myscriptfile
12322.sol.pbspro.com

The second line in the example is the job identifier returned by the PBS Server. This unique identifier can be used to act on this job in the future (before it completes running). The next section of this chapter discusses using this "job id" in various ways.

The

qsub command has a number of options that can be specified either on the command-line or in the job script itself. Note that any command-line option will override the same option within the script file.

Table 17.1 lists the most commonly used options to

qsub . See the PBS User Guide for the complete list and full description of the options.

















































Table 17.1: Qsub options.


Option


Purpose





-1 list


List of resources needed by job


-q queue


Queue to submit job to


-N name


Name of job


-S shell


Shell to execute job script


-p priority


Priority of job relative to your jobs


-a datetime


Delay job under after datetime


-j oe


Join output and error files


-h


Place a hold on job








The

"-l resource_list" option is used to specify the resources needed by the job. Table 17.2 lists all the resources available to jobs running on clusters.


























































Table 17.2: PBS resources.


Resource


Meaning





arch


System architecture needed by job


cput


CPU time required by all processes in job


file


Maximum single file disk space requirements


mem


Total amount of RAM memory required


ncpus


Number of CPUs (processors) required


nice


Requested "nice" (Unix priority) value


nodes


Number and/or type of nodes needed


pcput


Maximum per-process CPU time required


pmem


Maximum per-process memory required


wall time


Total wall-clock time needed


workingset


Total disk space requirements








17.2.3 Getting the Status of a PBS Job


Once the job has been submitted to PBS, you can use either the

qstat or

xpbs commands to check the job status. If you know the job identifier for your job, you can request the status explicitly. Note that unless you have multiple clusters, you need only specify the sequence number portion of the job identifier:


% qstat 12322
Job id Name User Time Use S Queue
------------- ------------ ------ -------- - -----
12322.sol myscriptfile jjones 00:06:39 R submit

If you run the

qstat command without specifing a job identifier, then you will receive status on all jobs currently queued and running.

Often users wonder why their job is not running. You can query this information from PBS using the

"-s" (status) option of

qstat , for example,



% qstat -s 12323
Job id Name User Time Use S Queue
------------- ------------ ------ -------- - -----
12323.sol myscriptfile jjones 00:00:00 Q submit
Requested number of CPUs not currently available.

A number of options to

qstat change what information is displayed. The PBS User Guide gives the complete list.


17.2.4 PBS Command Summary


So far we have seen several of the PBS user commands. Table 17.3 is provided as a quick reference for all the PBS user commands. Details on each can be found in the PBS manual pages and the PBS User Guide.





























































Table 17.3: PBS user commands.


Command


Purpose





qalter


Alter job(s)


qdel


Delete job(s)


qhold


Hold job(s)


qmsg


Send a message to job(s)


qmove


Move job(s) to another queue


qrls


Release held job(s)


qrerun


Rerun job(s)


qselect


Select a specific subset of jobs


qsig


Send a signal to job(s)


qstat


Show status of job(s)


qsub


Submit job(s)


xpbs


Graphical Interface (GUI) to PBS commands








17.2.5 Using the PBS Graphical User Interface


PBS provides two GUI interfaces: a TCL/TK-based GUI called xpbs and an optional Web-based GUI.

The GUI

xpbs provides a user-friendly point-and-click interface to the PBS commands. To run

xpbs as a regular, nonprivileged user, type



setenv DISPLAY your_workstation_name:0
xpbs


To run

xpbs with the additional purpose of terminating PBS Servers, stopping and starting queues, or running or rerunning jobs, type


xpbs -admin

Note that you must be identified as a PBS operator or manager in order for the additional

"-admin" functions to take effect.

From this main

xpbs window, you can create and submit jobs, monitor jobs, queues, and servers, as well as perform any of the actions that the command line interface permits you to do.

The optional Web-based user interface provides access to all the functionality of

xpbs via almost any Web browser. To access it, you simply type the URL of your PBS Server host into your browser. The layout and usage are similar to those of

xpbs .



17.2.6 PBS Application Programming Interface


Part of the PBS package is the PBS Interface Library, or IFL. This library provides a means of building new PBS clients. Any PBS service request can be invoked through calls to the interface library. Users may wish to build a PBS job that will check its status itself or submit new jobs, or they may wish to customize the job status display rather than use the

qstat command. Administrators may use the interface library to build new control commands.

The IFL provides a user-callable function that corresponds to each PBS client command. There is (approximately) a one-to-one correlation between commands and PBS service requests. Additional routines are provided for network connection management. The user-callable routines are declared in the header file

'PBS_ifl.h' . Users request service of a batch server by calling the appropriate library routine and passing it the required parameters. The parameters correspond to the options and operands on the commands. The user must ensure that the parameters are in the correct syntax. Each function will return zero upon success and a nonzero error code on failure. These error codes are available in the header file

'PBS_error.h' . The library routine will accept the parameters and build the corresponding batch request. This request is then passed to the server communication routine. (The PBS API is fully documented in the PBS External Reference Specification.)

/ 198