Recipe 18.15. Summing Numbers with Maximal Accuracy
Credit: Yaroslav Bulatov, Connelly Barnes
Problem
Due to the properties of floating point
arithmetic, the simplest loop to sum a list of numbers (also embodied
in the built-in sum function, as well as similar
functions such as add.reduce in add-on modules
such as Numeric and numarray)
is not maximally accurate. You wish to minimize such numeric
inaccuracy.
Solution
A careful and accurate algorithm, using and progressively
accumulating partial sums (i.e., partials),
can reduce inaccuracy:
import math
def sum_with_partials(arr):
arr = list(arr)
size = len(arr)
iters = int(math.ceil(math.log(size) / math.log(2)))
step = 1
for itr in xrange(iters):
for i in xrange(0, size-step, step+step):
next_i = i+step
arr[i] += arr[next_i]
step += step
return arr[0]
Discussion
Here is an example of the numeric behavior of this
sum_with_partials function compared to that of the
built-in sum:
if _ _name_ _ == '_ _main_ _':As you can see, in this case, the built-in sum
arr = [0.123456789012345]*10000000
true_answer = arr[0] * len(arr)
print '"True" result: %r' % true_answer
sum_result = sum(arr)
print '"sum" result: %r' % sum_result
sum_p_resu = sum_with_partials(arr)
print 'sum_p. result: %r' % sum_p_resu
# emits:
# "True" result: 1234567.89012345
# "sum" result: 1234567.8902233159
# sum_p. result: 1234567.89012345
accumulated a relative error of almost
10-10 after summing 10 million
floats all equal to each other (giving less than
11 digits of accuracy in the result), while
sum_with_partials happens to be
"perfect" for this case to within
machine precision (15 digits of accuracy). Summing just a million
copies rather than 10 million lowers
sum's relative error only to a
bit more than 10-11.
The Trouble with SummationsHow come a simple summing loop is less than maximally accurate? The root of the trouble is that summing two floating-point numbers of very different magnitudes loses accuracy. For example, suppose we used decimal floating-point numbers with a precision of four digits: then, summing 1.234 to 123.4 would give 124.6, "losing" 0.034 from the smaller number. Such artefacts are inevitable, as long as we have finite precision during our computations.Now, imagine we're using a simple loop such as: total = 0.0to sum a million numbers, all positive and of reasonably similar magnitudes. Built-in sum internally uses exactly this kind of simple loop. By the time we've summed, say, the first 100,000 numbers, the running total has become much larger than each new number we're adding to it. We have thus put ourselves in exactly the situation just shown to be problematic: after a while, we're systematically summing floating-point numbers of very different magnitudes, and thus systematically losing accuracy.The partials algorithm shown in this recipe works by summing numbers two at a timetherefore, no major loss of accuracy occurs, since we're assuming that the numbers we start with are of reasonably similar magnitudes. So, after the first pass of the partials algorithm, we're left with half as many partials as the amount of numbers we started with. All the partials are of reasonably similar magnitudes, so we just iterate the same procedure: at each step, we keep halving the number of partials that are left, until we're down to just one number, the grand total, having lost along the way as little accuracy as feasible. |
and you don't mind destroying that list as part of
the summation, you can omit from the body of
sum_with_partials the statement:
arr = list(arr)and recover a little bit of performance. Without this small
enhancement, on one slow laptop, summing a million floats with the
built-in sum takes 360 milliseconds, while the
more accurate function sum_with_partials shown in
this recipe takes 1.8 seconds to perform the same task (a slowdown of
five times). In theory, sum_with_partials should be
asymptotically faster than built-in sum if
you're doing unbounded-precision arithmetic (e.g.,
with Python's built-in longs or
other unbounded-precision data types from such add-ons as
gmpy, which you can download from http://gmpy.sourceforge.net). To sum a list
of n elements with
d digits of precision, in
unbounded-precision exact arithmetic, sum takes
O(n(d+logd))
time, while sum_with_partials takes
O(nd).
However, I have not been able to observe that effect in empirical
measurements.Most of the time, you probably don't want to pay the
price of slowing down a summation by five times in order to increase
your digits of accuracy from 10 or 11 to 15. However, for those
occasions in which this tradeoff is right for your applications, and
you need to sum millions and millions of floating-point numbers, this
recipe might well prove rather useful to you. Another simple way to
increase accuracy, particularly when your input numbers are
not necessarily all of similar magnitude, is to
ensure the small-magnitude ones are summed first. This is
particularly easy to code in Python 2.4, although
it's inevitably
O(n
log n): just
sum(sorted(data, key=abs)).
Finally, if precision is much more important
than speed, consider using decimal.Decimal (which
lets you ask for as much precision as you're willing
to wait for and is part of Python 2.4's standard
library). Or you could use gmpy.mpf (which also
allows any precision you require, may even be faster, but must be
downloaded as part of gmpy from http://gmpy.sourceforge.net.)
See Also
Recipe 18.16 shows how to
use a bounded-precision simulation of floating point to estimate the
accuracy of algorithms;
for Douglas M. Priest's Ph.D. thesis On
Properties of Floating Point Arithmetics: Numerical Stability and the
Cost of Accurate Computations, covering this entire field
with depth and rigor; gmpy is at http://gmpy.sourceforge.net.