High Performance Linux Clusters with OSCAR, Rocks, OpenMosix, and MPI [Electronic resources] نسخه متنی

13.3 An MPI Solution

Now
that we've seen how to create a serial solution,
let's look at a parallel solution.
We'll look at the solution first in C and then in
FORTRAN and C++.

13.3.1 A C Solution

The reason this area problem is both interesting and commonly used is
that it is very straightforward to subdivide this problem. We can let
different computers calculate the areas for different rectangles.
Along the way, we'll introduce two new functions,
MPI_Send and MPI_Receive, used
to exchange information among processes.

Basically, MPI_Comm_size and
MPI_Comm_rank are used to divide the problem among
processors. MPI_Send is used to send the
intermediate results back to the process with rank 0, which collects
the results with MPI_Recv and prints the final
answer. Here is the program:

#include "mpi.h"
#include <stdio.h>
/* problem parameters */
#define f(x)            ((x) * (x))
#define numberRects     50
#define lowerLimit      2.0
#define upperLimit      5.0
int main( int argc, char * argv[  ] )
{
/* MPI variables */
int dest, noProcesses, processId, src, tag;
MPI_Status status;
/* problem variables */
int         i;
double      area, at, height, lower, width, total, range;
/* MPI setup */
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &noProcesses);
MPI_Comm_rank(MPI_COMM_WORLD, &processId);
/* adjust problem size for subproblem*/
range = (upperLimit - lowerLimit) / noProcesses;
width = range / numberRects;
lower = lowerLimit + range * processId;
/* calculate area for subproblem */
area = 0.0;
for (i = 0; i < numberRects; i++)
{  at = lower + i * width + width / 2.0;
height = f(at);
area = area + width * height;
}
/* collect information and print results */
tag = 0;
if (processId = = 0)         /* if rank is 0, collect results */
{  total = area;
for (src= src < noProcesses; src++)
{  MPI_Recv(&area,  MPI_DOUBLE, src, tag, MPI_COMM_WORLD, &status);
total = total + area;
}
fprintf(stderr, "The area from %f to %f is: %f\n",
lowerLimit, upperLimit, total );
} 
else                        /* all other processes only send */
{  dest = 0; 
MPI_Send(&area,  MPI_DOUBLE, dest, tag, MPI_COMM_WORLD);
};
/* finish */
MPI_Finalize( );
return 0;
}

This code is fairly straightforward, and you've
already seen most of it. As before, we begin with the definition of
the problem function and parameters. This is followed by the
declaration of the variable that we'll need, first
for the MPI calls and then for the problem. There are a few MPI
variables whose use will be described shortly. Next comes the section
that sets up MPI, but there is nothing new here.

The first real change comes in the next section where we adjust the
problem size. In this example, we are calculating the area between 2
and 5. Since each process only needs to do part of this calculation,
we need to divide the problem among the processes so that each
process gets a different part (and all the parts are accounted for.)
MPI_Comm_size is used to determine the number of
parts the problem will be broken into,
noProcesses. That is, we divide the total range (2
to 5) equally among the processes and adjust the start of the range
for an individual process based on its rank. For example, with four
processes, one process could calculate from 2 to 2.75, one from 2.75
to 3.5, one from 3.5 to 4.25, and one from 4.25 to 5.

In the next section of code, each
process calculates the area for its part of the problem. This code
keeps the number of rectangles fixed in this example rather than
adjust it to the number of processes. That is, regardless of the
number of processes used, each will use the same number of rectangles
to solve its portion of the problem. Thus, if the number of processes
increases from one run to the next, the answer won't
come back any quicker but, up to a point, the answer should be more
accurate. If your goal is speed, you could easily set the total
number of rectangles for the problem, and then, in each process,
divide that number by the number of processes or use some similar
strategy.

Once we have completed this section, we need to collect and combine
all our individual results. This is the new stuff. One process will
act as a collector to which the remaining processes will send their
results. Using the process with rank 0 as the receiver is the logical
choice. The remaining processes act as senders. There is nothing
magical about this choice apart from the fact that there will always
be a process of rank 0. If a different process is selected,
you'll need to ensure that a process with that rank
exists. A fair amount of MPI code development can be done on a single
processor system and then moved to a multiprocessor environment, so
this isn't, as it might seem, a moot point.

The test (ProcessId = = 0) determines what will be
done by the collector process and what will be done by all the
remaining processes. The first branch following this test will be
executed by the single process with a rank of 0. The second branch
will be executed by each of the remaining processes. It is just this
sort of test that allows us to write a single program that will
execute correctly on each machine in the cluster with different
processes doing different things.

13.3.2 Transferring Data

The defining
characteristic of message passing is that the transfer of data from
one process to another requires operations to be performed by both
processes. This is handled by MPI_Send and
MPI_Recv. The first branch after this test
contains a loop that will execute once for each of the remaining
nodes in the cluster. At each execution of the body of the loop, the
rank 0 process collects information from one of the other processes.
This is done with the call to MPI_Recv. Each of
the other processes executes the second branch after the test once.
Each process uses the call to MPI_Send to pass its
results back to process 0. For example, for 100 processes, there are
99 calls to MPI_Send and 99 calls to
MPI_Recv. (Process 0 already knows what it
calculated.) Let's look at these two functions more
closely.

13.3.2.1 MPI_Send

MPI_Send
is used to send information from one process to another
process.^[2] A call to MPI_Send
must be matched with a corresponding call to
MPI_Recv in the receiving process. Information is
both typed and tagged. Typing is needed to support communications in
a heterogeneous environment. The type information is used to insure
that the necessary conversions to data representation are applied as
data moves among machines in a transparent manner.

^[2] Actually, a process can send a message to
itself, but this possibility can get tricky so we'll
ignore it.

The first three arguments to MPI_Send,
collectively, are used to specify the transmitted data. The first
argument gives the address of the data, the second gives the number
of items to be sent, and the third gives the data type. In this
sample code, we are sending the area, a single
double, so we specify
MPI_DOUBLE as the type. In addition to
MPI_DOUBLE, the other possible types are
MPI_BYTE, MPI_CHAR,
MPI_UNSIGNED_CHAR, MPI_SHORT,
MPI_UNSIGNED_SHORT, MPI_INT,
MPI_UNSIGNED_INT, MPI_LONG,
MPI_UNSIGNED_LONG,
MPI_LONG_DOUBLE, MPI_FLOAT, and
MPI_PACKED.

The next argument is the destination. This is just the rank of the
receiver. The destination is followed by a tag. Since MPI provides
buffering, several messages can be outstanding. The tag is used to
distinguish among multiple messages. This is a moot point in this
example. MPI_COMM_WORLD is the default
communicator, which has already been described.

13.3.2.2 MPI_Recv

The arguments to MPI_Recv
are similar but include one addition, a status field.
MPI_STATUS is a type definition for a structure
that holds information about the actual message size, its source, and
its tag. In C, the status variable is a structure composed of three
fieldsMPI_SOURCE,
MPI_TAG, and
MPI_ERRORthat contain the source, tag, and
error code, respectively. With MPI_Recv, you can
use a wildcard for either or both the source and the
tagMPI_ANY_SOURCE and
MPI_ANY_TAG. The status field allows you to
determine the actual source and tag in this situation.

You should be aware that MPI_Send and
MPI_Recv are both blocking calls. For example, if
you try to receive information that hasn't been
sent, your process will be blocked or wait until it is sent before it
can continue executing. While this is what you might expect, it can
lead to nasty surprises if your code isn't properly
written since you may have two processes waiting for each other.

Here is the output:

[sloanjd@amy sloanjd]$ mpicc mpi-rect.c -o mpi-rect
[sloanjd@amy sloanjd]$ mpirun -np 5 mpi-rect
The area from 2.000000 to 5.000000 is: 38.999964

Of course, all of this assumed you wanted to program in C. The next
two sections provide alternatives to C.

13.3.3 MPI Using FORTRAN

Let's take a look at
the same program written in FORTRAN.

        program main
include "mpif.h"
parameter (NORECS = 50, DLIMIT = 2.00, ULIMIT = 5.00)
integer dst, err, i, noprocs, procid, src, tag
integer status(MPI_STATUS_SIZE)
double precision area, at, height, lower, width, total, range
f(x) = x * x
********** MPI setup **********
call MPI_INIT(err)
call MPI_COMM_SIZE(MPI_COMM_WORLD, noprocs, err)
call MPI_COMM_RANK(MPI_COMM_WORLD, procid, err)
********** adjust problem size for subproblem **********
range = (ULIMIT - DLIMIT) / noprocs
width = range / NORECS
lower = DLIMIT + range * procid
********** calculate area for subproblem **********
area = 0.0;
do 10 i = 0, NORECS - 1
at = lower + i * width + width / 2.0
height = f(at)
area = area + width * height
10     continue
********** collect information and print results **********
tag = 0
********** if rank is 0, collect results **********
if (procid .eq. 0) then
total = area
do 20 src =  noprocs - 1
call MPI_RECV(area,  MPI_DOUBLE_PRECISION, src, tag,
+                      MPI_COMM_WORLD, status, err)
total = total + area
20        continue
print '(1X, A, F5.2, A, F5.2, A, F8.5)', 'The area from ', 
+             DLIMIT, ' to ', ULIMIT, ' is: ', total
else
********** all other processes only send **********
dest = 0; 
call MPI_SEND(area,  MPI_DOUBLE_PRECISION, dest, tag,
+                   MPI_COMM_WORLD, err)
endif
********** finish ***********
call MPI_FINALIZE(err)
stop
end

I'm assuming that, if you are reading this, you
already know FORTRAN and that you have already read the C version of
the code. So this discussion is limited to the differences between
MPI in C and in FORTRAN. As you can see, there
aren't many.

Don't forget to compile this with
mpif77 rather than mpicc.

FORTRAN 77 programs begin with include "mpi.f".
FORTRAN 90 may substitute use mpi if the MPI
implementation supports modules.

In creating the MPI specification, a great deal of effort went into
having similar binding in C and FORTRAN. The biggest difference is
the way error codes are handled. In FORTRAN there are explicit
parameters included as the last argument to each function call. This
will return either MPI_SUCCESS or an
implementation-defined error code.

In C, function arguments tend to be more strongly typed than in
FORTRAN, and you will notice that C tends to use addresses when the
function is returning a value. As you might expect, the parameters to
MPI_Init have changed. Finally,
MPI_STATUS is an array rather than a structure in
FORTRAN.

Overall, the differences between C and FORTRAN
aren't that great. You should have little difficulty
translating code from one language to another.

13.3.4 MPI Using C++

Here
is the same code in C++:

#include "mpi.h"
#include <stdio.h>
/* problem parameters */
#define f(x)            ((x) * (x))
#define numberRects     50
#define lowerLimit      2.0
#define upperLimit      5.0
int main( int argc, char * argv[  ] )
{
/* MPI variables */
int dest, noProcesses, processId, src, tag;
MPI_Status status;
/* problem variables */
int         i;
double      area, at, height, lower, width, total, range;
/* MPI setup */
MPI::Init(argc, argv);
noProcesses = MPI::COMM_WORLD.Get_size( );
processId = MPI::COMM_WORLD.Get_rank( );
/* adjust problem size for subproblem*/
range = (upperLimit - lowerLimit) / noProcesses;
width = range / numberRects;
lower = lowerLimit + range * processId;
/* calculate area for subproblem */
area = 0.0;
for (i = 0; i < numberRects; i++)
{  at = lower + i * width + width / 2.0;
height = f(at);
area = area + width * height;
}
/* collect information and print results */
tag = 0;
if (processId = = 0)         /* if rank is 0, collect results */
{  total = area;
for (src= src < noProcesses; src++)
{  MPI::COMM_WORLD.Recv(&area,  MPI::DOUBLE, src, tag);
total = total + area;
}
fprintf (stderr, "The area from %f to %f is: %f\n",
lowerLimit, upperLimit, total );
}
else                        /* all other processes only send */
{  dest = 0; 
MPI::COMM_WORLD.Send(&area,  MPI::DOUBLE, dest, tag);
};
/* finish */
MPI::Finalize( );
return 0;
}

If you didn't skip the section on FORTRAN,
you'll notice that there are more differences in
going from C to C++ than in going from C to FORTRAN.

Remember, you'll compile this with
mpiCC, not mpicc.

The C++ bindings were added to MPI as part of the MPI-2 effort.
Rather than try to follow the binding structure used with C and
FORTRAN, the C++ bindings were designed to exploit
MPI's object-oriented structure. Consequently, most
functions become members of C++ classes. For example,
MPI::COM_WORLD is an instance of the communicator
class. Get_rank and Get_size
are methods in the class. All classes are part of the MPI namespace.

Another difference you'll notice is that
Get_size and Get_rank return
values. Since the usual style of error handling in C++ is throwing
exceptions, which MPI follows, there is no need to return error
codes.

Finally, you notice that the type specifications have changed. In
this example, we see MPI::DOUBLE rather than
MPI_DOUBLE which is consistent with the naming
conventions being adopted here. We won't belabor
this example. By looking at the code, you should have a pretty clear
idea of how the bindings have changed with C++.

Now that we have a working solution, let's look at
some ways it can be improved. Along the way we'll
see two new MPI functions that can make life simpler.

High Performance Linux Clusters with OSCAR, Rocks, OpenMosix, and MPI [Electronic resources] نسخه متنی

فارسی

کردی

العربیه

اردو

Türkçe

Русский

English

Français

کانال فیلم من

تبیان من

فایلهای من

کتابخانه من

پنل پیامکی

وبلاگ من

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی