High Performance Linux Clusters with OSCAR, Rocks, OpenMosix, and MPI [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

High Performance Linux Clusters with OSCAR, Rocks, OpenMosix, and MPI [Electronic resources] - نسخه متنی

Joseph D. Sloan

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید








1.4 Limitations


While clusters
have a lot to offer, they are not panaceas. There is a limit to how
much adding another computer to a problem will speed up a
calculation. In the ideal situation, you might expect a calculation
to go twice as fast on two computers as it would on one.
Unfortunately, this is the limiting case and you can only approach
it.

Any calculation can be broken into blocks of code or instructions
that can be classified in one of two exclusive ways. Either a block
of code can be parallelized and shared among two or more machines, or
the code is essentially serial and the instructions must be executed
in the order they are written on a single machine. Any code that
can't be parallelized won't benefit
from any additional processors you may have.

There are several reasons why some blocks of code
can't be parallelized and must be executed in a
specific order. The most obvious example is I/O, where the order of
operations is typically determined by the availability, order, and
format of the input and the format of the desired output. If you are
generating a report at the end of a program, you
won't want the characters or lines of output printed
at random.

Another reason some code can't be parallelized comes
from the data dependencies within the code. If you use the value of

x to calculate the value of

y , then you'll need to
calculate

x before you calculate

y . Otherwise, you won't know
what value to use in the calculation. Basically, to be able to
parallelize two instructions, neither can depend on the other. That
is, the order in which the two instructions finish must not matter.

Thus, any program can be seen as a series of alternating
sectionssections that can be parallelized and effectively run
on different machines interspersed with sections that must be
executed as written and that effectively can only be run on a single
machine. If a program spends most of its time in code that is
essentially serial, parallel processing will have limited value for
this code. In this case, you will be better served with a faster
computer than with parallel computers. If you can't
change the algorithm, big iron is the best approach for this type of
problem.


1.4.1 Amdahl's Law


As just noted, the amount of code that must be executed serially
limits how much of a speedup you can expect from parallel execution.
This idea has been formalized by what is known as
Amdahl's
Law
, named after Gene Amdahl, who first stated the law in
the late sixties. In a nutshell, Amdahl's Law states
that the serial portion of a program will be the limiting factor in
how much you can speed up the execution of the program using multiple
processors.[2]

[2] While Amdahl's Law is
the most widely known and most useful metric for describing parallel
performance, there are others. These include
Gustafson-Barsus's, Sun's, and
Ni's Laws and the Karp-Flat and the Isoefficiency
Metrics.


An example should help clarify Amdahl's Law.
Let's assume you have a computation that takes 10
hours to complete on a currently available computer and that 90
percent of your code can be parallelized. In other words, you are
spending one hour doing instructions that must be done serially and
nine hours doing instructions that can be done in parallel.
Amdahl's Law states that you'll
never be able to run this code on this class of computers in less
than one hour, regardless of how many additional computers you have
available. To see this, imagine that you had so many computers that
you could execute all the parallel code instantaneously. You would
still have the serial code to execute, which has to be done on a
single computer, and it would still take an hour.[3]

[3] For
those of you who love algebra, the speedup factor is equal to
1/(s + p/N), where s is the
fraction of the code that is inherently serial,
p is the fraction of the code that can be
parallelized, and N is the number of processors
available. Clearly, p + s = 1. As the number of
processors becomes very large, p/N becomes very
small, and the speedup becomes essentially 1/s.
So if s is 0.1, the largest speedup you can
expect is a factor of 10, no matter how many processors you have
available.


In practice, you won't have an unlimited number of
processors, so your total time will always be longer. Figure 1-6 shows the amount of time needed for this
example, depending on the number of processors you have available.


Figure 1-6. Execution time vs. number of processors

You should also remember that Amdahl's law is an
ideal. In practice, there is the issue of the overhead introduced by
parallelizing the code. For example, coordinating communications
among the various processes will require additional code. This adds
to the overall execution time. And if there is contention for the
network, this can stall processes, further slowing the calculation.
In other words, Amdahl's Law is the best speedup you
can hope for, but not the actual speedup you'll see.

What can you do if you need to do this calculation in less than one
hour? As I noted earlier, you have three choices when you want to
speed up a calculationbetter algorithms, faster computers, or
more computers. If more computers won't take you all
the way, your remaining choices are better algorithms and faster
computers. If you can rework your code so that a larger fraction can
be done in parallel, you'll see an increased benefit
from a parallel approach. Otherwise, you'll need to
dig deep into your pocket for faster computers.

Surprisingly, a fair amount of controversy still surrounds what
should be obvious once you think about it. This stems in large part
from the misapplication of Amdahl's Law over the
years. For example, Amdahl's Law has been misused as
an argument favoring faster computers over parallel computing.

The most common misuse is based on the assumption that the amount of
speedup is independent of the size of the problem.
Amdahl's Law simply does not address how problems
scale. The fraction of the code that must be executed serially
usually changes as the size of the problem changes. So, it is a
mistake to assume that a problem's speedup factor
will be the same when the scale of the problem changes. For instance,
if you double the length of a simulation, you may find that the
serial portions of the simulation, such as the initialization and
report phases, are basically unchanged, while the parallelizable
portion of the code is what doubles. Hence, the fraction of the time
spent in the serial code will decrease and Amdahl's
Law will specify a greater speedup. This is good news! After all,
it's when problems get bigger that we most need the
speedup. For most problems, the speedup factor will depend upon the
problem size. As the problem size changes, so does the speedup
factor. The amount will depend on the nature of the individual
problem, but typically, the speedup will increase as the size of the
problem increases. As the problem size grows, it is not unusual to
the see a linear increase in the amount of time spent in the serial
portion of the code and a quadratic increase in the amount of time
spent in the parallelizable portion of the code. Unfortunately, if
you only apply Amdahl's Law to the smaller problem
size, you'll underestimate the benefit of a parallel
approach.

Having said this, it is important to remember that
Amdahl's Law does clearly state a limitation of
parallel computing. But this limitation varies not only from problem
to problem, but with the size of the problem as well.

One last word about the limitations of clustersthe limitations
are often tied to a particular approach. It is often possible to mix
approaches and avoid limitations. For example, in constructing your
clusters, you'll want to use the best computers you
can afford. This will lessen the impact of inherently serial code.
And don't forget to look at your
algorithms!


/ 142