Beowulf Cluster Computing with Linux, Second Edition [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Beowulf Cluster Computing with Linux, Second Edition [Electronic resources] - نسخه متنی

William Gropp; Ewing Lusk; Thomas Sterling

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
توضیحات
افزودن یادداشت جدید







17.4 Configuring PBS

Now that PBS has been installed, the Server and MOMs can be configured and the scheduling policy selected. Note that further configuration of PBS may not be required since PBS Pro comes preconfigured, and the default configuration may completely meet your needs. However, you are advised to read this section to determine whether the defaults are indeed complete for you or whether any of the optional settings may apply.


17.4.1 Network Addresses and PBS


PBS makes use of fully qualified host names for identifying the jobs and their location. A PBS installation is known by the host name on which the Server is running. The name used by the daemons or used to authenticate messages is the canonical host name. This name is taken from the primary name field,

h_name , in the structure returned by the library call

gethostbyaddr() . According to the IETF RFCs, this name must be fully qualified and consistent for any IP address assigned to that host.


17.4.2 The Qmgr Command


The PBS manager command,

qmgr , provides a command-line administrator interface. The command reads directives from standard input. The syntax of each directive is checked and the appropriate request sent to the Server(s). A

qmgr directive takes one of the following forms:


command server [names] [attr OP value[,...]]
command queue [names] [attr OP value[,...]]
command node [names] [attr OP value[,...]]

where

command is the command to perform on an object. The

qmgr commands are listed in Table 17.4.














































Table 17.4: qmgr commands.


Command


Explanation





active


Set the active objects.


create


Create a new object, applies to queues and nodes.


delete


Destroy an existing object (queues or nodes).


set


Define or alter attribute values of the object.


unset


Clear the value of the attributes of the object.


list


List the current attributes and values of the object.


print


Print all the queue and server attributes.







The

list or

print subcommands of

qmgr can be executed by the general user. Creating or deleting a queue requires PBS Manager privilege. Setting or unsetting

server or

queue attributes requires PBS Operator or Manager privilege.


Here are several examples that illustrate using the

qmgr command. These and other

qmgr commands are fully explained below, along with the specific tasks they accomplish.


% qmgr
Qmgr: create node mars np=2,ntype=cluster
Qmgr: create node venus properties="inner,moonless"
Qmgr: set node mars properties = inner
Qmgr: set node mars properties += haslife
Qmgr: delete node mars
Qmgr: d n venus

Commands can be abbreviated to their minimum unambiguous form (as shown in the last line in the example above). A command is terminated by a new line character or a semicolon. Multiple commands may be entered on a single line. A command may extend across lines by marking the new line character with a backslash. Comments begin with a hash sign ("

# ") and continue to the end of the line. Comments and blank lines are ignored by

qmgr . See the

qmgr section of the PBS Administrator Guide for detailed usage and syntax description.


17.4.3 Nodes


Where jobs will be run is determined by an interaction between the Scheduler and the Server. This interaction is affected by the contents of the PBS

'nodes' file and the system configuration onto which you are deploying PBS. Without this list of nodes, the Server will not establish a communication stream with the MOM(s), and MOM will be unable to report information about running jobs or to notify the Server when jobs complete. In a cluster configuration, distributing jobs across the various hosts is a matter of the Scheduler determining on which host to place a selected job.

Regardless of the type of execution nodes, each node must be defined to the Server in the PBS nodes file, (the default location of which is

'/usr/spool/PBS/server_priv/nodes' ). This is a simple text file with the specification of a single node per line in the file. The format of each line in the file is


node_name[:ts] [attributes]

The node name is the network name of the node (host name), it does not have to be fully qualified (in fact, it is best kept as short as possible). The optional ":ts" appended to the name indicates that the node is a timeshared node (i.e. a nodes on which multiple jobs may be run if the required resources are available).

Nodes can have attributes associated with them. Attributes come in three types: properties,

name=value pairs, and

name.resource=value pairs. Zero or more properties may be specified. The property is nothing more than a string of alphanumeric characters (first character must be alphabetic) without meaning to PBS. Properties are used to group classes of nodes for allocation to a series of jobs.

Any legal node

name=value pair may be specified in the node file in the same format as on a

qsub directive:

attribute.resource=value . Consider the following example:


NodeA resource_available.ncpus=3 max_running=1

The expression

np=N may be used as shorthand for the expression


resources_available.ncpus=N

which can be added to declare the number of virtual processors (VPs) on the node. This syntax specifies a numeric string, for example,

np=4 . This expression will allow the node to be allocated up to N times to one job or more than one job. If

np=N is not specified for a cluster node, it is assumed to have one VP.

You may edit the nodes list in one of two ways. If the server is not running, you may directly edit the nodes file with a text editor. If the server is running, you should use

qmgr to edit the list of nodes.

Each item on the line must be separated by white space. The items may be listed in any order except that the host name must always be first. Comment lines may be included if the first nonwhite space character is the hash sign ("#").

The following is an example of a possible nodes file for a cluster called "planets":


# The first set of nodes are cluster nodes.
# Note that the properties are provided to
# logically group certain nodes together.
# The last node is a timeshared node.
#
mercury inner moonless
venus inner moonless np=1
earth inner np=1
mars inner np=2
jupiter outer np=18
saturn outer np=16
uranus outer np=14
neptune outer np=12
pluto:ts


17.4.4 Creating or Adding Nodes


After

pbs_server is started, the node list may be entered or altered via the

qmgr command:


create node node_name [attribute=value]

where the attributes and their associated possible values are shown in Table 17.5.





























































Table 17.5: PBS node attributes.


Attribute


Value





state


free ,

down ,

offline


properties


any alphanumeric string


ntype


cluster ,

time-shared


resources_available.ncpus (np)


number of virtual processors > 0


resources_available


list of resources available on node


resources_assigned


list of resources in use on node


max_running


maximum number of running jobs


max_user_run


maximum number of running jobs per user


max_group_run


maximum number of running jobs per group


queue


queue name (if any) associated with node


reservations


list of reservations pending on the node


comment


general comment







Below are several examples of setting node attributes via

qmgr :


% qmgr
Qmgr: create node mars np=2,ntype=cluster
Qmgr: create node venus properties="inner,moonless"

Once a node has been created, its attributes and/or properties can be modified by using the following

qmgr syntax:


set node node_name [attribute[+|-]=value]

where attributes are the same as for

create , for example,


% qmgr
Qmgr: set node mars properties=inner
Qmgr: set node mars properties+=haslife


Nodes can be deleted via

qmgr as well, using the

delete node syntax, as the following example shows:


% qmgr
Qmgr: delete node mars
Qmgr: delete node pluto

Note that the

busy state is set by the execution daemon,

pbs_mom , when a load-average threshold is reached on the node. See

max_load in MOM's config file. The

job-exclusive and

job-sharing states are set when jobs are running on the node.


17.4.5 Default Configuration


Server management consist of configuring the Server and establishing queues and their attributes. The default configuration, shown below, sets the minimum server settings and some recommended settings for a typical PBS cluster.


% qmgr
Qmgr: print server
# Create queues and set their attributes
#
# Create and define queue workq
#
create queue workq
set queue workq queue_type = Execution
set queue workq enabled = True
set queue workq started = True
#
# Set Server attributes
#
set server scheduling = True
set server default_queue = workq
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server scheduler_iteration = 600


17.4.6 Configuring MOM


The execution server daemons, MOMs, require much less configuration than does the Server. The installation process creates a basic MOM configuration file that contains the minimum entries necessary in order to run PBS jobs. This section describes the MOM configuration file and explains all the options available to customize the PBS installation to your site.

The behavior of MOM is controlled via a configuration file that is read upon daemon initialization (startup) and upon reinitialization (when

pbs_mom receives a SIGHUP signal). The configuration file provides several types of runtime information to MOM: access control, static resource names and values, external resources provided by a program to be run on request via a shell escape, and values to pass to internal functions at initialization (and reinitialization). Each configuration entry is on a single line, with the component parts separated by white space. If the line starts with a hash sign ("

# "), the line is considered to be a comment and is ignored.

A minimal MOM configuration file should contain the following:


$logevent 0x1ff
$clienthost server-hostname

The first entry,

$logevent , specifies the level of message logging this daemon should perform. The second entry,

$clienthost , identifies a host that is permitted to connect to this MOM. You should set the server-hostname variable to the name of the host on which you will be running the PBS Server (

pbs_server ). Advanced MOM configuration options are described in the PBS Administrator Guide.


17.4.7 Scheduler Configuration


Now that the Server and MOMs have been configured, we turn our attention to the PBS Scheduler. As mentioned previously, the Scheduler is responsible for implementing the local site policy regarding which jobs are run and on what resources. This section discusses the recommended configuration for a typical cluster. The full list of tunable Scheduler parameters and detailed explanation of each is provided in the PBS Administrator Guide.

The PBS Pro Scheduler provides a wide range of scheduling policies. It provides the ability to sort the jobs in dozens of different ways, including FIFO order. It also can sort on user and group priority. The queues are sorted by queue priority to determine the order in which they are to be considered. As distributed, the Scheduler is configured with the defaults shown in Table 17.6.
































































Table 17.6: Default scheduling policy parameters.


Option


Default Value





round_robin


False


by_queue


True


strict_fifo


False


load_balancing


False


load_balancing_rr


False


fair_share


False


help_starving_jobs


True


backfill


True


backfill_prime


False


sort_queues


True


sort_by


shortest_job_first


smp_cluster_dist


pack


preemptive_sched


True







Once the Server and Scheduler are configured and running, job scheduling can be initiated by setting the Server attribute scheduling to a value of true:


# qmgr -c "set server scheduling=true"


The value of scheduling is retained across Server terminations or starts. After the Server is configured, it may be placed into service.

/ 198