High Performance Linux Clusters with OSCAR, Rocks, OpenMosix, and MPI [Electronic resources] نسخه متنی

14.2 More on Collective Communication

Unlike
point-to-point communication, collective communication involves every
process in a communication group. In Chapter 13,
you saw two examples of collective communication functions,
MPI_Bcast and MPI_Reduce (along
with MPI_Allreduce). There are two advantages to
collective communication functions. First, they allow you to express
a complex operation using simpler semantics. Second, the
implementation may be able to optimize the operations in ways not
available with simple point-to-point operations.

Collective
operations fall into three categories: a barrier synchronization
function, global communication or data movement functions (e.g.,
MPI_Bcast), and global reduction or collective
computation functions (e.g., MPI_Reduce). There is
only one barrier synchronization function,
MPI_Barrier. It serves to synchronize the
processes. No data is exchanged. This function is described in Chapter 17. All of the other collective functions are
nonsynchronous. That is, a collective function can return as soon as
its role in the communication process is complete. Unlike
point-to-point operations, nonsynchronous mode is the only mode
supported by collective functions.

While collective functions don't have to wait for
the corresponding functions to execute on other nodes, they may block
while waiting for space in system buffers. Thus, collective functions
come only in blocking versions.

The requirements that all collective functions be blocking,
nonsynchronous, and support only one communication mode simplify the
semantics of collective operations. There are other features in the
same vein: no tag argument is used, the amount of data sent must
exactly match the amount of data received, and every process must
call the function with the same arguments.

14.2.1 Gather and Scatter

After MPI_Bcast and MPI_Reduce,
the two most useful collective operations are
MPI_Gather and MPI_Scatter.

14.2.1.1 MPI_Gather

MPI_Gather, as the name implies, gathers
information from all the processes in a communication group. It takes
eight arguments. The first three arguments define the information
that is sent. These are the starting address of the send buffer, the
number of items in the buffer, and the data type for the items. The
next three arguments, which define the received information, are
address of the receive buffer, the number of items for a single
receive, and the data type. The seventh argument is the rank of the
receiving or root process. And the last argument is the communicator.
Keep in mind that each process, including the root process, sends the
contents of its send buffer to the receiver. The root process
receives the messages, which are stored in rank order in the receive
buffer. While this may seem similar to MPI_Reduce,
notice that the data is simply received. It is not combined or
reduced in any way.

Figure 14-1 shows the flow of data with a gather
operation. The root or receiver can be any of the processes. Each
process sends a different piece of the data, shown in the figure as
the different x's.

Figure 14-1. Gathering data

Here is an example using MPI_Gather.

#include "mpi.h"
#include <stdio.h>
int main( int argc, char * argv[  ] )
{
int processId;
int a[4] = {0, 0, 0, 0};
int b[4] = {0, 0, 0, 0};
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &processId);
if (processId = = 0) a[0] = 1; 
if (processId = = 1) a[1] = 2; 
if (processId = = 2) a[2] = 3; 
if (processId = = 3) a[3] = 5; 
if (processId = = 0)
fprintf(stderr, "Before: b[  ] = [%d, %d, %d, %d]\n", b[0], b[1], b[2], 
b[3]);
MPI_Gather(&a[processId], 1, MPI_INT, b, 1, MPI_INT, 0, MPI_COMM_WORLD);
if (processId = = 0)
fprintf(stderr, "After:  b[  ] = [%d, %d, %d, %d]\n", b[0], b[1], b[2], 
b[3]);
MPI_Finalize( );
return 0;
}

While this is a somewhat contrived example, you can clearly see how
MPI_Gather works. Pay particular attention to the
arguments in this example. Note that both the address of the item
sent and the address of the receive buffer (in this case just an
array name) are used. Here is the output:

[sloanjd@amy C12]$ mpirun -np 4 gath
Before: b[  ] = [0, 0, 0, 0]
After:  b[  ] = [1, 2, 3, 5]

MPI_Gather has a couple of useful variants.
MPI_Gatherv has an additional argument, an integer
array giving displacements. MPI_Gatherv is used
when the amount of data varies from process to process.
MPI_Allgather functions just like
MPI_Gather except that all processes receive
copies of the data. There is also an
MPI_Allgatherv.

14.2.1.2 MPI_Scatter

MPI_Scatter is the dual or inverse of
MPI_Gather. If you compare Figure 14-2 to Figure 14-1, the only
difference is the direction of the arrows. The arguments to
MPI_Scatter are the same as
MPI_Gather. You can think of
MPI_Scatter as splitting the send buffer and
sending a piece to each receiver, i.e., each receiver receives a
unique piece of data. MPI_Scatter also has a
vector variant MPI_Scatterv.

Figure 14-2. Scattering data

Here is another contrived example, this time with
MPI_Scatter:

#include "mpi.h"
#include <stdio.h>
int main( int argc, char * argv[  ] )
{
int processId, b;
int a[4] = {0, 0, 0, 0};
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &processId);
if (processId = = 0) { a[0] = 1; a[1] = 2; a[2] = 3; a[3] = 5; }
 MPI_Scatter(a, 1, MPI_INT, &b, 1, MPI_INT, 0, MPI_COMM_WORLD);
fprintf(stderr, "Process %d: b = %d \n", processId, b);
MPI_Finalize( );
return 0;
}

Notice that we are sending an array but receiving its individual
elements in this example. In summary, MPI_Gather
sends an element and receives an array while
MPI_Scatter sends an array and receives an
element.

As with point-to-point communication, there are additional collective
operations (for example, MPI_Alltoall,
MPI_Reduce_scatter, and
MPI_Scan). It is even possible to define your own
reduction operator (using MPI_Op_create) for use
with MPI_Reduce, etc. Because the amount and
nature of the data that you need to share varies with the nature of
the problem, it is worth becoming familiar with
MPI's collective functions. You may need them sooner
than you think.

High Performance Linux Clusters with OSCAR, Rocks, OpenMosix, and MPI [Electronic resources] نسخه متنی

فارسی

کردی

العربیه

اردو

Türkçe

Русский

English

Français

کانال فیلم من

تبیان من

فایلهای من

کتابخانه من

پنل پیامکی

وبلاگ من

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی