Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] - نسخه متنی

David Ascher, Alex Martelli, Anna Ravenscroft

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید


Recipe 5.8. Getting the First Few Smallest Items of a Sequence


Credit: Matteo Dell'Amico, Raymond
Hettinger, George Yoshida, Daniel Harding


Problem


You need to get just a few of the
smallest items from a sequence. You could sort the sequence and just
use seq[:n], but is there any way you can do
better?


Solution


Perhaps you can do better, if n, the
number of items you need, is small compared to
n, the sequence's length.
sort is very fast, but it still takes O(n
log n)
time, while we can get the first
n smallest elements in time
O(n) if n is small.
Here is a simple and practical generator for this purpose, which
works equally well in Python 2.3 and 2.4:

import heapq
def isorted(data):
data = list(data)
heapq.heapify(data)
while data:
yield heapq.heappop(data)

In Python 2.4 only, you can use an even simpler and faster way to get
the smallest n items of
data when you know
n in advance:

import heapq
def smallest(n, data):
return heapq.nsmallest(n, data)


Discussion


data can be any bounded iterable; the
recipe's function isorted starts by
calling list on it to ensure that. You can remove
the statement data = list(data) if all the
following conditions hold: you know that
data is a list to start with, you
don't mind the fact that the generator reorders
data's items, and you
want to remove items from data as you
fetch them.

As shown previously in Recipe 5.7, the Python Standard Library
contains module heapq, which supports the data
structures known as heaps. Generator
isorted in this recipe's Solution
relies on making a heap at the start (via
heap.heapify) and then yielding and removing the
heap's smallest remaining item at each step (via
heap.heappop).

In Python 2.4, module heapq has also grown two new
functions. heapq.nlargest(n,
data) returns a list of the
n largest items of data;
heapq.nsmallest(n, data) returns a list of the
n smallest items. These functions do not
require that data satisfy the heap
condition; indeed, they do not even require
data to be a listany bounded
iterable whose items are comparable will do. Function
smallest in this recipe's Solution
just lets heapq.smallest do all the work.

To judge speed,
we must always measure itguessing about
relative speeds of different pieces of code is a
mug's game. So, how does
isorted's performance compare with
Python 2.4's built-in function
sorted, when we're only looping
on the first few (smallest) items? To help measure timing, I wrote a
top10 function that can use either
approach, and I also made sure I had a sorted
function even in Python 2.3, where it's not built
in:

try:
sorted
except:
def sorted(data):
data = list(data)
data.sort( )
return data
import itertools
def top10(data, howtosort):
return list(itertools.islice(howtosort(data), 10))

On my machine running Python 2.4 on thoroughly shuffled lists of
1,000 integers, top10 takes about 260 microseconds
with isorted, while it takes about 850 microseconds
with the built-in sorted. However, Python 2.3 is
much slower and gives vastly different results: about 12 milliseconds
with isorted, about 2.7 milliseconds with
sorted. In other words, Python 2.3 is 3 times
slower than Python 2.4 for sorted, but
it's 50 times slower for
isorted. Lesson to retain: whenever you optimize,
measure. You shouldn't choose
optimizations based on first principles, since the performance
numbers can vary so widely, even between vastly compatible
"point releases". A secondary point
can be made: if you care about performance, move to Python 2.4 as
soon as you can. Python 2.4 has been vastly optimized and accelerated
over Python 2.3, particularly in areas related to searching and
sorting.

If you know that your code need only support Python 2.4, then, as
this recipe's Solution indicates, using
heapq's new function
nsmallest is faster, as well as simpler, than
doing your own coding. To implement top10 in Python
2.4, for example, you just need:

import heapq
def top10(data):
return heapq.nsmallest(10, data)

This version takes about half the time of the previously shown
isorted-based top10, when called
on the same thoroughly shuffled lists of 1,000 integers.


See Also


Library Reference and Python in a
Nutshell
docs about method sort of
type list, and about modules
heapq and timeit; Chapter 19 for more about iteration in Python;
Python in a Nutshell's
chapter on optimization; heapq.py in the Python
sources contains a very interesting discussion of heaps; Recipe 5.7 for more information about
heapq.

/ 394