Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] نسخه متنی

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Recipe 18.15. Summing Numbers with Maximal Accuracy

Credit: Yaroslav Bulatov, Connelly Barnes

Problem

Due to the properties of floating point
arithmetic, the simplest loop to sum a list of numbers (also embodied
in the built-in sum function, as well as similar
functions such as add.reduce in add-on modules
such as Numeric and numarray)
is not maximally accurate. You wish to minimize such numeric
inaccuracy.

Solution

A careful and accurate algorithm, using and progressively
accumulating partial sums (i.e., partials),
can reduce inaccuracy:

import math
def sum_with_partials(arr):
arr = list(arr)
size = len(arr)
iters = int(math.ceil(math.log(size) / math.log(2)))
step = 1
for itr in xrange(iters):
for i in xrange(0, size-step, step+step):
next_i = i+step
arr[i] += arr[next_i]
step += step
return arr[0]

Discussion

Here is an example of the numeric behavior of this
sum_with_partials function compared to that of the
built-in sum:

if _ _name_ _ == '_ _main_ _':
arr = [0.123456789012345]*10000000
true_answer = arr[0] * len(arr)
print '"True" result: %r' % true_answer
sum_result = sum(arr)
print '"sum"  result: %r' % sum_result
sum_p_resu = sum_with_partials(arr)
print 'sum_p. result: %r' % sum_p_resu
# emits:
# "True" result: 1234567.89012345
# "sum"  result: 1234567.8902233159
# sum_p. result: 1234567.89012345

As you can see, in this case, the built-in sum
accumulated a relative error of almost
10^-10 after summing 10 million
floats all equal to each other (giving less than
11 digits of accuracy in the result), while
sum_with_partials happens to be
"perfect" for this case to within
machine precision (15 digits of accuracy). Summing just a million
copies rather than 10 million lowers
sum's relative error only to a
bit more than 10^-11.

The Trouble with Summations

How come a simple summing loop is less than maximally accurate? The
root of the trouble is that summing two floating-point numbers of
very different magnitudes loses accuracy. For example, suppose we
used decimal floating-point numbers with a precision of four digits:
then, summing 1.234 to 123.4 would give 124.6,
"losing" 0.034 from the smaller
number. Such artefacts are inevitable, as long as we have finite
precision during our computations.

Now, imagine we're using a simple loop such as:

    total = 0.0
for number in numbers:
total += number

to sum a million numbers, all positive and of reasonably similar
magnitudes. Built-in sum internally uses exactly
this kind of simple loop. By the time we've summed,
say, the first 100,000 numbers, the running total
has become much larger than each new number
we're adding to it. We have thus put ourselves in
exactly the situation just shown to be problematic: after a while,
we're systematically summing floating-point numbers
of very different magnitudes, and thus systematically losing
accuracy.

The partials algorithm shown in this recipe works by summing numbers
two at a timetherefore, no major loss of accuracy occurs,
since we're assuming that the numbers we start with
are of reasonably similar magnitudes. So, after the first pass of the
partials algorithm, we're left with half as many
partials as the amount of numbers we started with. All the partials
are of reasonably similar magnitudes, so we just iterate the same
procedure: at each step, we keep halving the number of partials that
are left, until we're down to just one number, the
grand total, having lost along the way as little accuracy as
feasible.

If you know that the input argument arr is a list,
and you don't mind destroying that list as part of
the summation, you can omit from the body of
sum_with_partials the statement:

    arr = list(arr)

and recover a little bit of performance. Without this small
enhancement, on one slow laptop, summing a million floats with the
built-in sum takes 360 milliseconds, while the
more accurate function sum_with_partials shown in
this recipe takes 1.8 seconds to perform the same task (a slowdown of
five times). In theory, sum_with_partials should be
asymptotically faster than built-in sum if
you're doing unbounded-precision arithmetic (e.g.,
with Python's built-in longs or
other unbounded-precision data types from such add-ons as
gmpy, which you can download from http://gmpy.sourceforge.net). To sum a list
of n elements with
d digits of precision, in
unbounded-precision exact arithmetic, sum takes
O(n(d+logd))
time, while sum_with_partials takes
O(nd).
However, I have not been able to observe that effect in empirical
measurements.

Most of the time, you probably don't want to pay the
price of slowing down a summation by five times in order to increase
your digits of accuracy from 10 or 11 to 15. However, for those
occasions in which this tradeoff is right for your applications, and
you need to sum millions and millions of floating-point numbers, this
recipe might well prove rather useful to you. Another simple way to
increase accuracy, particularly when your input numbers are
not necessarily all of similar magnitude, is to
ensure the small-magnitude ones are summed first. This is
particularly easy to code in Python 2.4, although
it's inevitably
O(n
log n): just
sum(sorted(data, key=abs)).
Finally, if precision is much more important
than speed, consider using decimal.Decimal (which
lets you ask for as much precision as you're willing
to wait for and is part of Python 2.4's standard
library). Or you could use gmpy.mpf (which also
allows any precision you require, may even be faster, but must be
downloaded as part of gmpy from http://gmpy.sourceforge.net.)

Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] نسخه متنی

فارسی

کردی

العربیه

اردو

Türkçe

Русский

English

Français

کانال فیلم من

تبیان من

فایلهای من

کتابخانه من

پنل پیامکی

وبلاگ من

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی