Beowulf Cluster Computing with Linux, Second Edition [Electronic resources] نسخه متنی

[51] for a discussion of some of the issues in making reproducible measurements of performance. That paper describes the methods used in the

mpptest program for measuring MPI performance.

9.10.1 mpptest

The

mpptest program allows you to measure many aspects of the performance of any MPI implementation. The most common MPI performance test is the Ping-Pong test; this test measures the time it takes to send a message from one process to another and then back. The

mpptest program provides Ping-Pong tests for the different MPI communication modes, as well as providing a variety of tests for collective operations and for more realistic variations on point-to-point communication, such as halo communication (like that in Section 8.3) and communication that does not reuse the same memory locations (thus benefiting from using data that is already in memory cache). The

mpptest program can also test the performance of some MPI-2 functions, including

MPI_Put and

MPI_Get .

Using mpptest

The

mpptest program is distributed with MPICH and MPICH2 in the directory '

examples/perftest '. You can also download it separately from www.mcs.anl.gov/mpi/perftest. Building and using

mpptest is very simple:


% tar zxf perftest.tar.gz
% cd perftest-1.2.1
% ./configure --with-mpich
% make
% mpiexec -n 2 ./mpptest -logscale
% mpiexec -n 16 ./mpptest -bisect
% mpiexec -n 2 ./mpptest -auto

To run with LAM/MPI, simply configure with the option

--with-lammpi . The '

README ' file contains instructions for building with other MPI implementations.

9.10.2 SKaMPI

The SKaMPI test suite [94] is a comprehensive test of MPI performance, covering virtually all of the MPI-1 communication functions.

One interesting feature of the SKaMPI benchmarks is the online tables showing the performance of MPI implementations on various parallel computers, ranging from Beowulf clusters to parallel vector supercomputers.

9.10.3 High Performance LINPACK

Perhaps the best-known benchmark in technical computing is the LINPACK benchmark. The version of this benchmark that is appropriate for clusters is the High Performance LINPACK (HPL). Obtaining and running this benchmark are relatively easy, though getting good performance can require a significant amount of effort. In addition, while the LINPACK benchmark is widely known, it tends to significantly overestimate the achieveable performance for many applications because it involves n³ computation on n² data and is thus relatively insensitive to the performance of the node memory system.

The HPL benchmark depends on another library, the basic linear algebra subroutines (BLAS), for much of the computation. Thus, to get good performance on the HPL benchmark, you must have a high-quality implementation of the BLAS. Fortunately, several sources of these routines are available. You can often get implementations of the BLAS from the CPU vendor directly, sometimes at no cost. Another possibility is to use the ATLAS implementation of the BLAS.

ATLAS

ATLAS is available from math-atlas.sourceforge.net. If prebuilt binaries fit your system, you should use those. Note that ATLAS is tuned for specific system characteristics including clock speed and cache sizes; if you have any doubts about whether your configuration matches that of a prebuilt version, you should build ATLAS yourself.

To build ATLAS, first download ATLAS from the Web site and then extract it. This will create an '

ATLAS ' directory into which the libraries will be built, so extract this where you want the libraries to reside. A directory on a local disk (such as '

/tmp ') rather than on on an NFS-mounted disk can help speedup ATLAS.


% cd /tmp
% tar zxf atlas3.4.1.tgz
% cd ATLAS

Check the '

erratal ' file at

math-atlas.sourceforge.net/erratal for updates. You may need to edit various files (no patches are supplied for ATLAS). Pay particular attention to the items that describe various possible ways that the install step may fail; you may choose to update values such as

ATL_nkflop before running ATLAS. Next, have ATLAS configure itself. Select a compiler; note that you should not use the Portland Group compiler here.


% make config CC=gcc

Answer

yes to most questions, including threaded and express setup, and accept the suggested architecture name. Next, make ATLAS. Here, we assume that the architecture name was

Linux-PIIISSE2 :


% make install arch=Linux-PIIISSE2 >&make.log

Note that this is not an "install" in the usual sense; the ATLAS libraries are not copied to '

/usr/local/lib ' and the like by the install. This step may take as long as several hours, unless ATLAS finds a precomputed set of parameters that fits your machine. ATLAS is also sensitive to variations in runtimes, so try to use a machine that has no other users. Make sure that it is the exact same type of machine as your nodes (e.g., if you have login nodes that are different from your compute nodes, make sure that you run ATLAS on the compute nodes).

At the end of the "make install" step, the BLAS are in '

ATLAS/lib/Linux-PIIISSE2 '. You are ready for the next step.

HPL

Download and unpack the HPL package from www.netlib.org/benchmark/hpl:


% tar zxf hpl.tgz
% cd hpl

Create a '

Make.<archname> ' in the '

hpl ' directory. Consider an

archname like

Linux_PIII_CBLAS_gm for a Linux system on Pentium III processors, using the C version of the BLAS constructed by ATLAS, and using the

gm device from the MPICH implementation of MPI. To create this file, look at the samples in the '

hpl/setup ' directory, for example,


% cp setup/Make.Linux_PII_CBLAS_gm Make.Linux_PIII_CBLAS_gm

Edit this file, changing

ARCH to the name you selected (e.g.,

Linux_PIII_CBLAS_gm ), and set

LAdir to the location of the ATLAS libraries. Then do the following:


% make arch=<thename>
% cd bin/<thename>
% mpiexec -n 4 ./xhpl

Check the output to make sure that you have the right answer. The file '

HPL.dat ' controls the actual test parameters. The version of '

HPL.dat ' that comes with the

hpl package is appropriate for testing

hpl . To run

hpl for performance requires modifying '

HPL.dat '. The file '

hpl/TUNING ' contains some hints on setting the values in this file for performance. Here are a few of the most important:

Change the problem size to a large value. Don't make it too large, however, since the total computational work grows as the cube of the problem size (doubling the problem size increases the amount of work by a factor of eight). Problem sizes of around 5,000–10,000 are reasonable.

Change the block size to a modest size. A block size of around 64 is a good place to start.

Change the processor decomposition and number of nodes to match your configuration. In most cases, you should try to keep the decomposition close to square (e.g., P and Q should be about the same value), with P
≥ Q.

Experiment with different values for
RFACT and
PFACT . On some systems, these parameters can have a significant effect on performance. For one large cluster, setting both to
right was preferable.

Beowulf Cluster Computing with Linux, Second Edition [Electronic resources] نسخه متنی

فارسی

کردی

العربیه

اردو

Türkçe

Русский

English

Français

کانال فیلم من

تبیان من

فایلهای من

کتابخانه من

پنل پیامکی

وبلاگ من

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی