Beowulf Cluster Computing with Linux, Second Edition [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Beowulf Cluster Computing with Linux, Second Edition [Electronic resources] - نسخه متنی

William Gropp; Ewing Lusk; Thomas Sterling

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
توضیحات
افزودن یادداشت جدید











8.1 Hello World in MPI


To see what an MPI program looks like, we start with the classic "hello world" program. MPI specifies only the library calls to be used in a C, Fortran, or C++ program; consequently, all of the capabilities of the language are available. The simplest "Hello World" program is shown in Figure 8.1.










#include "mpi.h"
#include <stdio.h>
int main( int argc, char *argv[] )
{
MPI_Init( &argc, &argv );
printf( "Hello World\n" );
MPI_Finalize();
return 0;
}








Figure 8.1: Simple "Hello World" program in MPI.


All MPI programs must contain one call to


MPI_Init (or
MPI_Init_thread , described in
Section 9.9) and one to
MPI_Finalize . All other[
MPI module or include


mpif.h .




The simple program in
Figure 8.1 is not very interesting. In particular, all processes print the same text. A more interesting version has each process identify itself. This version, shown in Figure 8.2, illustrates several important points. Of particular note are the variables
rank and
size . Because MPI programs are made up of communicating processes, each process has its own set of variables. In this case, each process has its own address space containing its own variables
rank and
size (and
argc ,
argv , etc.). The routine
MPI_Comm_size returns the number of processes in the MPI job in the second argument. Each of the MPI processes is identified by a number, called the rank, ranging from zero to the value of
size minus one. The routine
MPI_Comm_rank returns in the second argument the rank of the process. The output of this program might look something like the following:



Hello World from process 0 of 4
Hello World from process 2 of 4
Hello World from process 3 of 4
Hello World from process 1 of 4










#include "mpi.h"
#include <stdio.h>
int main( int argc, char *argv[] )
{
int rank, size;
MPI_Init( &argc, &argv );
MPI_Comm_rank( MPI_COMM_WORLD, &rank );
MPI_Comm_size( MPI_COMM_WORLD, &size );
printf( "Hello World from process %d of %d\n", rank, size );
MPI_Finalize();
return 0;
}








Figure 8.2: A more interesting version of "Hello World".


Note that the output is not ordered from processes 0 to 3. MPI does not specify the behavior of other routines or language statements such as
printf ; in particular, it does not specify the order of output from print statements. However, there are tools, built using MPI, that can provide ordered output of messages.






8.1.1 Compiling and Running MPI Programs



The MPI standard does not specify how to compile and link programs (neither do C or Fortran). However, most MPI implementations provide tools to compile and link programs.


For example, one popular implementation, MPICH, provides scripts to ensure that the correct include directories are specified and that the correct libraries are linked. The script
mpicc can be used just like
cc to compile and link C programs. Similarly, the scripts
mpif77 ,


mpif 90 , and
mpicxx may be used to compile and link Fortran 77, Fortran, and C++ programs.


If you prefer not to use these scripts, you need only ensure that the correct paths and libraries are provided. The MPICH implementation provides the switch
-show for
mpicc that shows the command lines used with the C compiler and is an easy way to find the paths. Note that the name of the MPI library may be
'libmpich.a' ,
'libmpi.a' , or something similar and that additional libraries, such as
'libsocket.a' or
'libgm.a' , may be required. The include path may refer to a specific installation of MPI, such as
'/usr/include/local/mpich2-1.0/include' .


Running an MPI program (in most implementations) also requires a special program, particularly when parallel programs are started by a batch system as described in Chapter 14. Many implementations provide a program
mpirun that can be used to start MPI programs. For example, the command



mpirun -np 4 helloworld


runs the program
helloworld using four processes.


The name and command-line arguments of the program that starts MPI programs were not specified by the original MPI standard, just as the C standard does not specify how to start C programs. However, the MPI Forum did recommend, as part of the MPI-2 standard, an
mpiexec command and standard command-line arguments to be used in starting MPI programs. A number of MPI implementations including the all-new version of MPICH, called MPICH2, now provide
mpiexec . The name
mpiexec was selected because no MPI implementation was using it (many are using
mpirun , but with incompatible arguments). The syntax is almost the same as for the MPICH version of
mpirun ; instead of using
-np to specify the number of processes, the switch
-n is used:



mpiexec -n 4 helloworld


The MPI standard defines additional switches for
mpiexec ; for more details, see Section 4.1, "Portable MPI Process Startup," in the MPI-2 standard. For greatest portability, we recommend that the
mpiexec form be used; if your preferred implementation does not support
mpiexec , point the maintainers to the MPI-2 standard.


Most MPI implementations will attempt to run each process on a different processor; most MPI implementations provide a way to select particular processors for each MPI process.




8.1.2 Adding Communication to Hello World



The code in Figure 8.2 does not guarantee that the output will be printed in any particular order. To force a particular order for the output, and to illustrate how data is communicated between processes, we add communication to the "Hello World" program. The revised program implements the following algorithm:



Find the name of the processor that is running the process
If the process has rank > 0, then
send the name of the processor to the process with rank 0
Else
print the name of this processor
for each rank,
receive the name of the processor and print it
Endif


This program is shown in Figure 8.3. The new MPI calls are to


MPI_Send and


MPI_Recv and to


MPI_Get_processor_name . The latter is a convenient way to get the name of the processor on which a process is running.


MPI_Send and


MPI_Recv can be understood by stepping back and considering the two requirements that must be satisfied to communicate data between two processes:





  1. Describe the data to be sent or the location in which to receive the data





  2. Describe the destination (for a send) or the source (for a receive) of the data.













#include "mpi.h"
#include <stdio.h>
int main( int argc, char *argv[] )
{
int numprocs, myrank, namelen, i;
char processor_name[MPI_MAX_PROCESSOR_NAME];
char greeting[MPI_MAX_PROCESSOR_NAME + 80];
MPI_Status status;
MPI_Init( &argc, &argv );
MPI_Comm_size( MPI_COMM_WORLD, &numprocs );
MPI_Comm_rank( MPI_COMM_WORLD, &myrank );
MPI_Get_processor_name( processor_name, &namelen );
sprintf( greeting, "Hello, world, from process %d of %d on %s",
myrank, numprocs, processor_name );
if ( myrank == 0 ) {
printf( "%s\n", greeting );
for ( i = 1; i < numprocs; i++ ) {
MPI_Recv( greeting, sizeof( greeting ), MPI_CHAR,
i, 1, MPI_COMM_WORLD, &status );
printf( "%s\n", greeting );
}
}
else {
MPI_Send( greeting, strlen( greeting ) + 1, MPI_CHAR,
0, 1, MPI_COMM_WORLD );
}
MPI_Finalize( );
return 0;
}








Figure 8.3: A more complex "Hello World" program in MPI. Only process 0 writes to stdout; each process sends a message to process 0.


In addition, MPI provides a way to tag messages and to discover information about the size and source of the message. We will discuss each of these in turn.


Describing the Data Buffer



A data buffer typically is described by an address and a length, such as


"a,100," where


a is a pointer to 100 bytes of data. For example, the Unix


write call describes the data to be written with an address and length (along with a file descriptor). MPI generalizes this to provide two additional capabilities: describing noncontiguous regions of data and describing data so that it can be communicated between processors with different data representations. To do this, MPI uses three values to describe a data buffer: the address, the (MPI) datatype, and the number or count of the items of that datatype. For example, a buffer


a containing four C


int s is described by the triple


"a, 4, MPI_INT." There are predefined MPI datatypes for all of the basic datatypes defined in C, Fortran, and C++. The most common datatypes are shown in Table 8.1.






















































Table 8.1: The most common MPI datatypes. C and Fortran types on the same row are often but not always the same type. The type MPI_BYTE is used for raw data bytes and does not correspond to any particular datatype. The type MPI_PACKED is used for data that was incrementally packed with the routine MPI_Pack. The C++ MPI datatypes have the same name as the C datatypes but without the MPI_prefix, for example, MPI::INT.



C




Fortran




MPI type




MPI type







int




MPI_INT




INTEGER




MPI_INTEGER




double




MPI_DOUBLE




DOUBLE PRECISION




MPI_DOUBLE_PRECISION




float




MPI_FLOAT




REAL




MPI_REAL




long




MPI_LONG




char




MPI_CHAR




CHARACTER




MPI_CHARACTER




LOGICAL




MPI_LOGICAL







MPI_BYTE







MPI_BYTE







MPI_PACKED







MPI_PACKED










Describing the Destination or Source



The destination or source is specified by using the rank of the process. MPI generalizes the notion of destination and source rank by making the rank relative to a group of processes. This group may be a subset of the original group of processes. Allowing subsets of processes and using relative ranks make it easier to use MPI to write component-oriented software (more on this in Section 9.4). The MPI object that defines a group of processes (and a special communication context that will be discussed in Section 9.4) is called a communicator. Thus, sources and destinations are given by two parameters: a rank and a communicator. The communicator


MPI_COMM_WORLD is predefined and contains all of the processes started by


mpirun or


mpiexec . As a source, the special value


MPI_ANY_SOURCE may be used to indicate that the message may be received from any rank of the MPI processes in this MPI program.


Selecting among Messages



The "extra" argument for


MPI_Send is a nonnegative integer tag value. This tag allows a program to send one extra number with the data.


MPI_Recv can use this value either to select which message to receive (by specifying a specific tag value) or to use the tag to convey extra data (by specifying the wild card value


MPI_ANY_TAG ). In the latter case, the tag value of the received message is stored in the status argument (this is the last parameter to


MPI_Recv in the C binding). This is a structure in C, an integer array in Fortran, and a class in C++. The tag and rank of the sending process can be accessed by referring to the appropriate element of


status as shown in Table 8.2.

































Table 8.2: Accessing the source and tag after an MPI_Recv.



C




Fortran




C++







status.MPI_SOURCE




status(MPI_SOURCE)




status.Get_source()




status.MPI_TAG




status(MPI_TAG)




status.Get_tag()









Determining the Amount of Data Received



The amount of data received can be found by using the routine


MPI_Get_count . For example,



MPI_Get_count( &status, MPI_CHAR, &num_chars );


returns in


num_chars the number of characters sent. The second argument should be the same MPI datatype that was used to receive the message. (Since many applications do not need this information, the use of a routine allows the implementation to avoid computing


num_chars unless the user needs the value.)


Our example provides a maximum-sized buffer in the receive. It is also possible to find the amount of memory needed to receive a message by using


MPI_Probe , as shown in Figure 8.4.










char *greeting;
int num_chars, src;
MPI_Status status;
...
MPI_Probe( MPI_ANY_SOURCE, 1, MPI_COMM_WORLD, &status );
MPI_Get_count( &status, MPI_CHAR, &num_chars );
greeting = (char *)malloc( num_chars );
src = MPI_Recv( greeting, num_chars, MPI_CHAR,
src, 1, MPI_COMM_WORLD, &status );








Figure 8.4: Using MPI_Probe to find the size of a message before receiving it.


MPI guarantees that messages are ordered, that is, that messages sent from one process to another arrive in the same order in which they were sent and that an


MPI_Recv after an


MPI_Probe will receive the message that the probe returned information on as long as the same message selection criteria (source rank, communicator, and message tag) are used. Note that in this example, the source for the


MPI_Recv is specified as


status.MPI_SOURCE , not


MPI_ANY_SOURCE , to ensure that the message received is the same as the one about which


MPI_Probe returned information.


[2]There are a few exceptions, including


MPI_Initialized .


/ 198