Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] - نسخه متنی

David Ascher, Alex Martelli, Anna Ravenscroft

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید


Recipe 5.3. Sorting a List of Objects by an Attribute of the Objects


Credit: Yakov Markovitch, Nick Perkins


Problem



You need to sort a
list of objects according to one attribute of each object.


Solution


The DSU idiom shines, as usual:

def sort_by_attr(seq, attr):
intermed = [ (getattr(x, attr), i, x) for i, x in enumerate(seq) ]
intermed.sort( )
return [ x[-1] for x in intermed ]
def sort_by_attr_inplace(lst, attr):
lst[:] = sort_by_attr(lst, attr)

In Python 2.4, DSU is natively supported, so your code can be even
shorter and faster:

import operator
def sort_by_attr(seq, attr):
return sorted(seq, key=operator.attrgetter(attr))
def sort_by_attr_inplace(lst, attr):
lst.sort(key=operator.attrgetter(attr))


Discussion


Sorting a list of objects by an attribute of each object is best done
using the DSU idiom introduced previously in Recipe 5.2. In Python 2.3 and 2.4, DSU
is no longer needed, as it used to be, to ensure that a sort is
stable (sorting is always stable in Python 2.3 and later), but
DSU's speed advantages still shine.

Sorting, in the
general case and with the best algorithms, is
O(n log
n) (as is
often the case in mathematical formulas, the juxtaposition of terms,
here n and log
n, indicates that the terms are
multiplied). DSU's speed comes from maximally
accelerating the
O(n log
n) part,
which dominates sorting time for sequences of substantial length
n, by using only Python's
native (and maximally fast) comparison. The preliminary
decoration step, which prepares an
intermediate auxiliary list of tuples, and the successive
undecoration step, which extracts the
important item from each tuple after the intermediate list is sorted,
are only
O(n).
Therefore any minor inefficiencies in these steps contribute
negligible overhead if n is large enough,
and reasonably little even for many practical values of
n.


The O( )-Notation



The most useful way to reason
about many performance issues is in terms of what is popularly known
as big-O analysis and notation (the
O stands for
"order"). You can find detailed
explanations, for example, at http://en.wikipedia.org/wiki/Big_O_notation,
but here's a summary.

If we consider an algorithm applied to input data of some size
N, running time can be described, for
large enough values of N (and big inputs
are often those for which performance is most critical), as being
proportional to some function of N. This
is indicated with notations such as
O(N)
(running time proportional to N:
processing twice as much data takes about twice as much time, 10
times as much data, 10 times as much time, and so on; also known as
linear time),
O(N
squared) (running time proportional to the square of
N: processing twice as much data takes
about four times as much time, 10 times as much data, 100 times as
much time; also known as quadratic time), and
so on. Another case you will see often is
O(N log
N), which is
faster than
O(N
squared) but not as fast as
O(N).

The constant of proportionality is often ignored (at least in
theoretical analysis) because it depends on such issues as the clock
rate of your computer, not just on the algorithm. If you buy a
machine that's twice as fast as your old one,
everything will run in half the time, but that will not change any of
the comparisons between alternative algorithms.

This recipe puts index i, in each tuple
that is an item of list intermed,
ahead of the corresponding
x (where x is
the i-th item in
seq). This placement ensures that two
items of seq will never be compared
directly, even if they have the same value for the attribute named
attr. Even in that case, their indices
will still differ, and thus Python's lexicographic
comparison of the tuples will never get all the way to comparing the
tuples' last items (the original items from
seq). Avoiding object comparisons may save
us from performing extremely slow operations, or even from attempting
forbidden ones. For example, we could sort a list of complex numbers
by their real attribute: we would get an exception
if we ever tried to compare two complex numbers directly, because no
ordering is defined on complex numbers. But thanks to the precaution
described in this paragraph, such an event can never occur, and the
sorting will therefore proceed correctly.

As mentioned earlier in Recipe 5.2, Python 2.4 supports DSU
directly. You can pass an optional keyword-argument
key, to sort, which is the
callable to use on each item to get the sort key. Standard library
module operator has two new functions,
attrgetter and itemgetter, that
exist specifically to return callables suitable for this purpose. In
Python 2.4, the ideal solution to this problem therefore becomes:

import operator
seq.sort(key=operator.attrgetter(attr))

This snippet performs the sort
in-place, which helps make it blazingly faston my machine,
three times faster than the Python 2.3 function shown first in this
recipe. If you need a sorted copy, without disturbing
seq, you can get it using Python
2.4's new built-in function
sorted:

sorted_copy = sorted(seq, key=operator.attrgetter(attr))

While not quite as fast as an in-place sort, this latest snippet is
still over 2.5 times faster than the function shown first in this
recipe. Python 2.4 also guarantees that, when you pass the optional
key named argument, list items will never be
accidentally compared directly, so you need not take any special
safeguards. Moreover, stability is also guaranteed.


See Also


Recipe 5.2; Python
2.4's Library Reference docs
about the sorted built-in function,
operator module's
attrgetter and itemgetter
functions, and the key argument to
.sort and sorted.


    / 394