High Performance Linux Clusters with OSCAR, Rocks, OpenMosix, and MPI [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

High Performance Linux Clusters with OSCAR, Rocks, OpenMosix, and MPI [Electronic resources] - نسخه متنی

Joseph D. Sloan

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید








15.4 Other Considerations


The issues we have examined up to this point are fairly generic.
There are other programming-specific issues that may need to be
addressed as well. In this section, we will look very briefly at two
of the more common of theseparallel I/O and random numbers.
These are both programming tasks that can cause particular problems
with parallel programs. You'll need to take care
whenever your programs use either of these. In some instances,
dealing with these issues may drive program design.


15.4.1 Parallel I/O


Large,
computationally expensive problems that require clusters often
involve large data sets. Since I/O is always much more costly than
computing, dealing with large data sets can severely diminish
performance and must be addressed.

There are several things you can do
to improve I/O performance even before you start programming. First,
you should buy adequate I/O hardware. If your cluster will be used
for I/O-intensive tasks, you need to pay particular attention when
setting up your cluster to ensure you are using fast disks and
adequate memory. Next, use a fast filesystem. While NFS may be an
easy way to get started with clusters, it is very slow. Other
parallel filesystems optimized for parallel performance should be
considered, such as PVFS, which is described in Chapter 12.

When programming, if memory isn't a problem, it is
generally better to make a few large requests rather than a larger
number of smaller requests. Design your programs so that I/O is
distributed across your processes. Because of historical limitations
in parallel I/O systems, it is typical for parallel programs to do
I/O from a single process. Ideally, you should use an interface, such
as MPI-IO, that spreads I/O across the cluster and has been optimized
for parallel I/O.

The standard Unix or POSIX filesystem interface for I/O provides
relatively poor performance when used in a parallel context, since it
does not support collective operations and does not provide
noncontiguous access to files. While the original MPI specification
avoided the complexities of I/O, the MPI-2 specification dealt with
this issue. The Chapter 9 of the specification) is often known as the
MPI-IO. This
standard was the joint work of the Scalable I/O Initiative and the
MPI-IO Committee through the MPI Forum.

ROMIO, from Argonne
National Laboratory, is a freely available, portable,
high-performance implementation of the MPI-IO standard that runs on a
number of different architectures. It is included with MPICH and
LAM/MPI and provides interfaces for both C and FORTRAN.

MPI-IO provides three types of data access mechanismsusing an
explicit offset, using individual file pointers, or using shared file
pointers. It also provides support for several different data
representations.


15.4.2 MPI-IO Functions


MPI-IO optimizations include
collective I/O, data sieving, and hints. With collective I/O, larger
chunks of data are read with a single disk access. The data can then
be distributed among the processes as needed. Data sieving is a
technique that combines a number of smaller noncontiguous reads into
one large read. The system selects and returns the sections requested
by the user and discards the rest. While this can improve disk
performance, it can put a considerable strain on memory. Hints
provide a mechanism to inform the filesystems about a
program's data access patterns, e.g., desired
caching policies or striping.

The following code fragment shows how MPI-IO functions might be used:

...
#define BUFFERSIZE 1000
...
int buffer[BUFFERSIZE];
MPI_File filehandle;
...
MPI_File_open(MPI_COMM_WORLD, "filename", MPI_MODE_RDONLY, MPI_INFO_NULL,
&filehandle);
MPI_File_seek(filehandle, processId*BUFFERSIZE*sizeof(int), MPI_SEEK_SET);
MPI_File_read(filehandle, buffer, BUFFERSIZE, MPI_INT, &status);
MPI_File_close(&filehandle);
...

The last four function calls would be executed by each process and,
like all MPI functions, would be sandwiched between calls to
MPI_Init and MPI_Finalize. In
this example, each process opens the file, moves to and reads a block
of data from the file, and then closes it.


15.4.2.1 MPI_File_open

MPI_File_open
is used to open a file. The first argument is the communication
group. Every process in the group will open the file. The second
argument is the file name. The third argument defines the type of
access required to the file. MPI_MODE_RDONLY is
read-only access. Nine different modes are supported including
MPI_MODE_RDWR (reading and writing),
MPI_MODE_WRONLY (write only),
MPI_MODE_CREATE (create if it
doesn't exist),
MPI_MODE_DELETE_ON_CLOSE (delete file when done),
and MPI_MODE_APPEND (set the file pointer at the
end of the file). C users can use the bit-vector OR
(|) to combine these constants. The next to last
argument is used to pass hints to the filesystem. The constant
MPI_INFO_NULL is used when no hint is available.
Using hints does not otherwise change the semantics of the program.
(See the MPI-2 documentation for the rather complex details of using
hints.) The last argument is the file handle (on type
MPI_File), an opaque object used to reference the
file once opened.


15.4.2.2 MPI_File_seek

This
function is used to position the file pointer. It takes three
arguments: the file handle, an offset into the file, and an update
mode. There are three update modes: MPI_SEEK_SET
(set pointer to offset), MPI_SEEK_CUR (set pointer
to current position plus offset), and MPI_SEEK_END
(set pointer to end of file plus offset). In this example we have set
the pointer to the offset. Notice that processId
is used to calculate a different offset into the file for each
process.


15.4.2.3 MPI_File_read

MPI_File_read allows you to read data from the
file specified by the first argument, the file handle. The second
argument specifies the address of the buffer, while the third element
gives the number of elements in the buffer. The fourth element
specifies the type of the data read. The options are the same as with
other MPI functions, such as MPI_Send. In this
example, we are reading BUFFERSIZE integers into
the array at buffer. The last argument is a
structure describing the status of read operation. For example, the
number of items actually read can be determined from
status with the MPI_Get_count
function.


15.4.2.4 MPI_File_close

MPI_File_close
closes the file referenced by the file handle.

The four new functions in this sample example, along with
MPI_File_write, are the core functions provided by
MPI-IO. However, a large number of other MPI-IO functions are also
available. These are described in detail in the MPI-2 documentation.


15.4.3 Random Numbers


Generating random (or pseudorandom)
numbers presents a particular problem for parallel programming.
Pseudorandom number generators
typically produce a stream of
"random" numbers where the next
random number depends upon previously generated random numbers in
some highly nonobvious way.[3] While the numbers appear to be random, and are for most
purposes, they are in fact calculated and reproducible provided you
start with the same parameters, i.e., at the same point in the
stream. By varying the starting parameters, it will appear that you
are generating a different stream of random numbers. In fact, you are
just starting at different points on the same stream. The period for
a random number generator is the number of entries in the stream
before the stream starts over again and begins repeating itself. For
good random number generators, the periods are quite large and
shouldn't create any problems for serial programs
using random number generators.

[3] As you can imagine,
coming up with a good generator is very, very tricky.


For parallel programs, however, there are some potential risks. For
example, if you are using a large number of random numbers on a
number of different processors and using the same random number
generator on each, then there is a chance that some of the streams
will overlap. For some applications, such as parallel Monte Carlo
simulations, this is extremely undesirable.

There are several ways around this. One approach is to have a single
process serve as a random number generator and distribute its random
numbers among the remaining processes. Since only a single generator
is used, it is straightforward to ensure that no random number is
used more than once. The disadvantage to this approach is the
communication overhead required to distribute the random numbers.
This can be minimized, somewhat, by distributing blocks of random
numbers, but this complicates programming since each process must now
manage a block of random numbers.

An alternative approach is to use the same random number generator in
each process but to use different offsets into the stream. For
example, if you are using 100 processes, process 0 would use the 1st,
101st, 201st, etc., random numbers in the stream. Process 1 would use
the 2nd, 102nd, 202nd, etc., random numbers in the stream, etc. While
this eliminates communication overhead, it adds to the complexity of
the program.

Fortunately, there are libraries of
random number generators designed specifically for use with parallel
programs. One such library is

Scalable Parallel Random
Number Generators
(

SPRNG ). This library actually provides six
different state-of-the-art random number generators (Table 15-1). SPRNG works nicely with MPI.
(You'll need to download and install SPRNG before
you can use it. See Chapter 9 for details.)

Table 15-1. SPRNG's random number generators

Code


Generator


0


Additive Lagged Fibonacci Generator


1


48-bit Linear Congruential Generator with Prime Addend


2


64-bit Linear Congruential Generator with Prime Addend


3


Combined Multiple Recursive Generator


4


Multiplicative Lagged Fibonacci Generator


5


Prime Modulus Linear Congruential Generator

To give you an idea of how to use SPRNG, we'll look
at a simple Monte Carlo simulation that
estimates the value of (in case you've forgotten).
The way the simulation works is a little like throwing darts.

Imagine throwing darts at a dart board with a circle in the center
like the one in Figure 15-5. Assuming that you are
totally inept, the darts could land anywhere, but ignore those that
miss the board completely. If you count the total number of darts
thrown (total) and if you count those that land in
the circle (in), then for random tosses,
you'd expect the ratio in/total
to be just the ratio of the area of the circle to the square. If the
square is 1 foot on a side, the area of the circle is /4 square feet.
Using this information, you can estimate as
4*in/total, i.e., four times the ratio of the area
of the circle to the area of the square.


Figure 15-5. Monte Carlo dartboard

You'll need to throw a lot of darts to get a
reasonable estimate. If you know a lot of inept dart enthusiasts, you
can recruit them. Each one throws darts and keeps track of their
total. If you add the results, you should get a better estimate.

The code that follows uses this technique to estimate the value of .
Multiple processes are used to provide a larger number of tosses and
a better estimate. The code uses SPRNG to generate separate streams
of random numbers, one for each process.

#include <stdio.h>
#include <math.h>
#include "mpi.h"
#define SIMPLE_SPRNG /* simple interface */
#define USE_MPI /* MPI version of SPRNG */
#include "sprng.h"

main(int argc, char *argv[ ])
{
int i, in, n, noProcesses, processId, seed, total;
double pi;
n = 1000;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &noProcesses);
MPI_Comm_rank(MPI_COMM_WORLD, &processId);
seed = make_sprng_seed( );
init_sprng(3, seed, SPRNG_DEFAULT);
print_sprng( );

in = hits(n);
MPI_Reduce(&in, &total, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
/* estimate and print pi */
if(processId = = 0)
{ pi = (4.0 * total) / (n * noProcesses);
printf("Pi estimate: %18.16f \n", pi);
printf("Number of samples: %12d \n", n * noProcesses);
}
MPI_Finalize( );
}
/* count darts in target */
int hits(int n)
{
int i, in = 0;
double x, y;
for (i = 0; i < n; i++)
{ x = sprng( );
y = sprng( );
if (x * x + y * y < 1.0) in++;
}
return in;
}

For simplicity, this code considers only the top right-hand corner
(one-fourth) of the board. But since the board is symmetric, the
ratio of the areas is the same. The code simulates randomly throwing
darts by generating a pair of random numbers for the coordinates of
where the dart might land, and then looks to see if they are within
the circle by calculating the distance to the center of the
circle.[4] This is done in the function
hits.

[4] Strictly speaking, it is using the square of
the distance to avoid evaluating a square root for each point, a
costly operation.


Apart from the SPRNG code, shown in boldface, everything should look
familiar. For MPI programming, before including the SPRNG header
file, you need to define two macros. In this example, the macro
SIMPLE_SPRNG is used to specify the simple interface,
which should be adequate for most needs. The alternative or

default interface provides for multiple streams
per process. The macro
USE_MPI is necessary to let the
init_sprng routine make the necessary MPI calls to
ensure separate streams for each process.

Before generating random numbers, a seed needs to be generated and an
initialization routine called. The routine
make_sprng_seed generates a seed using time and date
information from the system. When used with MPI, it broadcasts the
seed to all the processes. init_sprng initializes
the random number streams. (This call can be omitted if you want to
use the defaults.) The first argument to
init_sprng is an integer from 0 to 5 inclusive,
specifying which of the random number generators to use. Table 15-1 gives the possibilities. The second argument
is the seed, an encoding of the start state for the random number
generator, while the third argument is used to pass additional
parameters required by some generators.

The call to print_sprng, also optional, will
provide information about each of the streams as shown in the output
below. Finally, to generate random numbers, a double between 0 and 1,
the call sprng is used as seen in the
hits routine.

Here is an example of compiling the code. On this system, SPRNG has
been installed in the directory
/usr/local/src/sprng2.0.

[sloanjd@fanny SPRNG]$ mpicc pi-mpi.c -I/usr/local/src/sprng2.0/include \
> -L/usr/local/src/sprng2.0/lib -lsprng -lm -o pi-mpi

Note the inclusion of path and library information. (Look at the
Makefile file in the
EXAMPLES subdirectory in the installation tree
for more hints on compiling.)

Here is part of the output for this program.

[sloanjd@fanny SPRNG]$ mpirun -np 4 pi-mpi
Combined multiple recursive generator
seed = 88724496, stream_number = 0 parameter = 0
Pi estimate: 3.0930000000000000
Number of samples: 4000
Combined multiple recursive generator
seed = 88724496, stream_number = 2 parameter = 0
...

The output for two of the streams is shown. It is similar for the
other streams. If rounded, the answer is correct to two places.
That's using 4,000 darts.

The documentation for SPRNG provides a number of the details glossed
over here. And the installation also includes a large number of
detailed examples.

These two examples, I/O and random numbers, should give you an idea
of the types of problems you may encounter when writing parallel
code. Dealing with problem areas like these may be critical when
determining how your programs should be designed. At other times,
performance may hinge on more general issues such as balancing
parallelism with overhead. It all depends on the individual problem
you face. While good design is essential, often you will need to
tweak your design based on empirical measurements. Chapter 17 provides the tools you will need to do
this.


/ 142