Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] - نسخه متنی

David Ascher, Alex Martelli, Anna Ravenscroft

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید


Recipe 5.9. Looking for Items in a Sorted Sequence


Credit: Noah Spurrier


Problem


You need to look for a lot of items in a sequence.


Solution


If
list L is sorted, module
bisect from the Python Standard Library makes it
easy to check if some item x is present in
L:

import bisect
x_insert_point = bisect.bisect_right(L, x)
x_is_present = L[x_insert_point-1:x_insert_point] == [x]


Discussion



Looking for an item
x in a list L
is very easy in Python: to check whether the item is there at all,
if x in L; to find out where exactly it is,
L.index(x). However, if
L has length n,
these operations take time proportional to
nessentially, they just loop over
the list's items, checking each for equality to
x. If L is
sorted, we can do better.

The
classic algorithm to look for an item in a sorted sequence is known
as binary search, because at each step it
roughly halves the range it's still searching
onit generally takes about
log2n
steps. It's worth considering when
you're going to look for items many times, so you
can amortize the cost of sorting over many searches. Once
you've decided to use binary search for
x in L, after
calling L.sort( ), module
bisect from the Python Standard Library makes the
job easy.

Specifically, we need function
bisect.bisect_right, which returns the index where
an item should be inserted, to keep the sorted
list sorted, but doesn't alter the list; moreover,
if the item already appears in the list,
bisect_right returns an index
that's just to the right of any items with the same
value. So, after getting this "insert
point" by calling bisect.bisect_right(L,
x)
, we need only to check the list immediately
before the insert point, to see if an item equal
to x is already there.

The way we compute x_is_present in the
"Solution" may not be immediately
obvious. If we know that L is not empty,
we can use a simpler and more obvious approach:

x_is_present = L[x_insert_point-1] == x

However, the indexing in this simpler approach raises an exception
when L is empty. When the slice boundaries are
invalid, slicing is less "strict"
than indexing, since it just produces an empty slice without raising
any exception. In general, somelist[i:i+1] is the
same one-item list as [somelist[i]] when
i is a valid index in
somelist: it's an empty list
[ ] when the indexing would raise
IndexError. The computation of
x_is_present in the recipe exploits this important
property to avoid having to deal with exceptions and handle empty and
nonempty cases for L in one uniform way. An
alternative approach is:

x_is_present = L and L[x_insert_point-1] == x

This alternative approach exploits
and's short-circuiting behavior
to guard the indexing, instead of using slicing.

An auxiliary dict, as shown in Recipe 5.12, is also a possibility as
long as items are hashable (meaning that items can be used as keys
into a dict). However, the approach in this
recipe, based on a sorted list, may be the only useful one when the
items are comparable (otherwise, the list could not be sorted) but
not hashable (so a dict can't
have those items as its keys).

When the list is already sorted, and the number of items you need to
look up in it is not extremely large, it may in any case be faster to
use bisect than to build an auxiliary dictionary,
since the investment of time in the latter operation might not be
fully amortized. This is particularly likely in Python 2.4, since
bisect has been optimized very effectively and is
much faster than it was in Python 2.3. On my machine, for example,
bisect.bisect_right for an item in the middle of a
list of 10,000 integers is about four times faster in Python 2.4 than
it was in Python 2.3.


See Also


Documentation for the bisect module in the
Library Reference and Python in a
Nutshell
; Recipe 5.12.

/ 394