High Performance Linux Clusters with OSCAR, Rocks, OpenMosix, and MPI [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

High Performance Linux Clusters with OSCAR, Rocks, OpenMosix, and MPI [Electronic resources] - نسخه متنی

Joseph D. Sloan

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید








10.2 Ganglia


With
a large cluster, it can be a daunting task just to ensure that every
machine is up and running every day if you try to do it manually.
Fortunately, there are several tools that you can use to monitor the
state of your cluster. In clustering circles, the better known of
these include Ganglia,
Clumon, and
Performance Co-Pilot
(CPC)
. While this section will describe
Ganglia, you might reasonably consider any of these.

Ganglia is a real-time performance monitor for clusters and grids. If
you are familiar with MRTG, Ganglia uses the same round-robin
database package that was developed for MRTG. Memory efficient and
robust, Ganglia scales well and has been used with clusters with
hundreds of machines. It is also straightforward to configure for use
with multiple clusters so that a single management station can
monitor all the nodes within multiple clusters. It was developed at
UCB, is freely available (via a BSD license), and has been ported to
a number of different architectures.

Ganglia uses a client-server model and is composed of four parts. The
monitor daemon
gmond
needs to be installed on every machine in the cluster. The backend
for data collection, the daemon
gmetad,
and the web interface frontend are installed
on a single management station. (There is also a Python class for
sorting and classifying data from large clusters.) Data are
transmitted using XML and XDR via both TCP and multicasting.

In addition to these core components, there are two command-line
tools. The cluster status tool
gstat
provides a way to query gmond, allowing you to
create a status report for your cluster. The metric tool
gmetric
allows you to easily monitor additional host metrics in addition to
Ganglia's predefined metrics. For instance, suppose
you have a program (and interface) that measures a
computer's temperature on each node.
gmetric can be used to request that
gmond run this program. By running the
gmetric command under cron,
you could track computer temperature over time.

Finally, Ganglia also provides an execution environment.
gexec allows you to run commands across the
cluster transparently and forward stdin,
stdout, and stderr. This
discussion will focus of the three core elements of
Gangliagmond,
gmetad, and the web frontend.


10.2.1 Installing and Using Ganglia


Ganglia can be installed by compiling
the sources or using RPM packages. The installation of the software
for the management station, i.e., the node that collects information
from the other nodes and maintains the database, is somewhat more
involved. With large clusters, you may want to use a machine as a
dedicated monitor. For smaller clusters, you may be able to get by
with your head node if it is reasonably equipped.
We'll look at the installation of the management
node first since it is more involved.


10.2.1.1 RRDTool

Before
you begin, there are several prerequisites for installing Ganglia.
First, your network and hosts must be multicast enabled. This
typically isn't a problem with most Linux
installations. Next, the management station or stations, i.e., the
machine on which you'll install
gmetad and the web frontend, will also need
RRDtool
and Perl and a PHP-enabled web server.[2] (Since you will install only
gmond on your compute nodes, these do not
require Apache or RRDtool.)

[2] It appears
that only the include file and library from
RRDtool is needed, but I have not verified this.
Perl is required for RRDtool, not
Ganglia.


RRDtool is a round-robin database. As you add
information to the database, the oldest data is dropped from the
database. This allows you to store data in a compact manner that will
not expand endlessly over time. Sources can be downloaded from
http://www.rrdtool.org/. To
install it, you'll need to unpack it and run
configure, make, and
make install.

[root@fanny src]# gunzip rrdtool-1.0.48.tar.gz
[root@fanny src]# tar -vxf rrdtool-1.0.48.tar
...
[root@fanny src]# cd rrdtool-1.0.48
[root@fanny rrdtool-1.0.48]# ./configure
...
[root@fanny rrdtool-1.0.48]# make
[root@fanny rrdtool-1.0.48]# make install
...

You'll see a lot of output along the way. In this
example, I've installed it under
/usr/local/src. If you want to install it in a
different directory, you can use the --prefix
option to specify the directory when you run
configure. It
doesn't really matter where you put it, but when you
build Ganglia you'll need to tell Ganglia where to
find the RRDtool library and include files.


10.2.1.2 Apache and PHP

Next, check the configuration files for
Apache to ensure the PHP module is loaded. For Red Hat 9.0, the
primary configuration file is httpd.conf and is
located in /etc/httpd/conf/. It, in turn,
includes the configuration files in
/etc/httpd/conf.d/, in particular
php.conf. What you are looking for is a
configuration command that loads the PHP module somewhere in one of
the Apache configuration files. That is, one of the configuration
files should have some lines like the following:

LoadModule php4_module modules/libphp4.so
...
<Files *.php>
SetOutputFilter PHP
SetInputFilter PHP
LimitRequestBody 524288
</Files>

If you used the package system to set up Apache and PHP, this should
have been done for you. Finally, make sure Apache is running.


10.2.1.3 Ganglia monitor core

Next, you'll need to
download the appropriate software. Go to http://ganglia.sourceforge.net/.
You'll have a number of choices, including both
source files and RPM files, for both Ganglia and related software.
The Ganglia monitor core contains both gmond and
gmetad (although by default it
doesn't install gmetad). Here
is an example of using the monitor core download to install from
source files. First, unpack the software.

[root@fanny src]# gunzip ganglia-monitor-core-2.5.6.tar.gz
[root@fanny src]# tar -xvf ganglia-monitor-core-2.5.6.tar
...

As always, once you have unpacked the software, be sure to read the

README file.

Next, change to the installation directory and build the software.

[root@fanny src]# cd ganglia-monitor-core-2.5.6
[root@fanny ganglia-monitor-core-2.5.6]# ./configure \
> CFLAGS="-I/usr/local/rrdtool-1.0.48/include" \
> CPPFLAGS="-I/usr/local/rrdtool-1.0.48/include" \
> LDFLAGS="-L/usr/local/rrdtool-1.0.48/lib" --with-gmetad
...
[root@fanny ganglia-monitor-core-2.5.6]# make
...
[root@fanny ganglia-monitor-core-2.5.6]# make install
...

As you can see, this is a pretty standard install with a couple of
small exceptions. First, you'll need to tell
configure where to find the RRDtool to include
file and library by setting the various flags as shown above. Second,
you'll need to explicitly tell configure to build
gmetad. This is done with the
--with-gmetad option.

Once you've built the software,
you'll need to install and configure it. Both
gmond and gmetad have very
simple configuration files. The samples files
gmond/gmond.conf and
gmetad/gmetad.conf are included as part of the
source tree. You should copy these to /etc and
edit them before you start either program. The sample files are well
documented and straightforward to edit. Most defaults are reasonable.
Strictly speaking, the
gmond.conf file is not necessary
if you are happy with the defaults. However, you will probably want
to update the cluster information at a minimum. The
gmetad.conf file must be present and
you'll need to identify at least one data source.
You may also want to change the identity information in it.

For gmetad.conf, the data source entry is a list
of the machines that will be monitored. The format is the identifier
data_source followed by a unique string
identifying the cluster. Next is an optional polling interval.
Finally, there is a list of machines and optional port numbers. Here
is a simple example:

data_source "my cluster" 10.0.32.144 10.0.32.145 10.0.32.146 10.0.32.147

The default sampling interval is 15 seconds and the default port is
8649.

Once you have the configuration files in place and edited to your
satisfaction, copy the initialization files and start the programs.
For gmond, it will look something like this:

[root@fanny ganglia-monitor-core-2.5.6]# cp ./gmond/gmond.init \
> /etc/rc.d/init.d/gmond
[root@fanny ganglia-monitor-core-2.5.6]# chkconfig --add gmond
[root@fanny ganglia-monitor-core-2.5.6]# /etc/rc.d/init.d/gmond start
Starting GANGLIA gmond: [ OK ]

As shown, you'll want to ensure that
gmond is started whenever you reboot.

Before you start gmetad, you'll
want to create a directory for the database.

[root@fanny ganglia-monitor-core-2.5.6]# mkdir -p /var/lib/ganglia/rrds
[root@fanny ganglia-monitor-core-2.5.6]# chown -R nobody \
> /var/lib/ganglia/rrds

Next, copy over the initialization file and start the program.

[root@fanny ganglia-monitor-core-2.5.6]# cp ./gmetad/gmetad.init \
> /etc/rc.d/init.d/gmetad
[root@fanny ganglia-monitor-core-2.5.6]# chkconfig --add gmetad
[root@fanny ganglia-monitor-core-2.5.6]# /etc/rc.d/init.d/gmetad start
Starting GANGLIA gmetad: [ OK ]

Both programs should now be running. You can verify this by trying to
TELNET to their respective ports, 8649 for gmond
and 8651 for gmetad. When you do this you should
see a couple of messages followed by a fair amount of XML scroll by.

[root@fanny ganglia-monitor-core-2.5.6]# telnet localhost 8649
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
<!DOCTYPE GANGLIA_XML [
<!ELEMENT GANGLIA_XML (GRID)*>
...

If you see output such as this, everything is up and running. (Since
you are going to the localhost, this should work
even if your firewall is blocking TELNET.)


10.2.1.4 Web frontend

The final
step in setting up the monitoring station is to install the frontend
software. This is just a matter of downloading the appropriate file
and unpacking it. Keep in mind that you must install this so that it
is reachable as part of your website. Examine the
DocumentRoot in your Apache configuration file and
install the package under this directory. For example,

[root@fanny root]# grep DocumentRoot /etc/httpd/conf/httpd.conf
...
DocumentRoot "/var/www/html"
...

Now that you know where the document root is, copy the web frontend
to this directory and unpack it.

[root@fanny root]# cp ganglia-webfrontend-2.5.5.tar.gz /var/www/html/
[root@fanny root]# cd /var/www/html
[root@fanny html]# gunzip ganglia-webfrontend-2.5.5.tar.gz
[root@fanny html]# tar -xvf ganglia-webfrontend-2.5.5.tar

There is nothing to build in this case. The configuration file is
conf.php. Among other things, you can use this
to change the appearance of your web site by changing the display
themes.

At this point, you should be able to examine the state of this
machine. (You'll still need to install
gmond on the individual nodes before you can
look at the rest of the cluster.) Start your web browser and visit
your site, e.g.,

http://localhost/ganglia-webfrontend-2.5.5/ .
You should see something like Figure 10-1.


Figure 10-1. Ganglia on a single node

This shows the host is up. Next, we need to install
gmond on the individual nodes so we can see the
rest of the cluster. You could use the same technique used
abovejust skip over the prerequisites and the
gmetad steps. But it is much easier to use RPM.
Just download the package to an appropriate location and install it.
For example,

[root@george root]# rpm -vih ganglia-monitor-core-gmond-2.5.6-1.i386.rpm
Preparing... ########################################### [100%]
1:ganglia-monitor-core-gm########################################### [100%]
Starting GANGLIA gmond: [ OK ]

gmond is installed in
/usr/sbin and its configuration file in
/etc. Once you've installed
gmond on a machine, it should appear on your web
page when you click on refresh. Repeat the installation for your
remaining nodes.

Once you have Ganglia running, you may want to revisit the
configuration files. With Ganglia running, it will be easier to see
exactly what effect a change to a configuration file has. Of course,
if you change a configuration file, you'll need to
restart the appropriate services before you will see anything
different.

You should have no difficulty figuring out how to use Ganglia. There
are lots of "hot spots" on the
pages, so just click and see what you get. The first page will tell
you how many machines are up and down and their loads. You can select
a physical view or collect information on individual machines. Figure 10-2 shows information for an individual machine.
You can also change the metric displayed. However, not all metrics
are supported. The Ganglia documentation supplies a list of supported
metrics by architecture.


Figure 10-2. Ganglia Node View

As you can see, these screen captures were made when the cluster was
not otherwise in use. Otherwise the highlighted load figures would
reflect that activity.


/ 142