Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] - نسخه متنی

David Ascher, Alex Martelli, Anna Ravenscroft

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید







Recipe 19.21. Computing a Summary Report with itertools.groupby


Credit: Paul Moore, Raymond Hettinger


Problem


You have a list of data grouped by a key value, typically read from a
spreadsheet or the like, and want to generate a summary of that
information for reporting purposes.


Solution


The itertools.groupby function introduced in
Python 2.4 helps with this task:

from itertools import groupby
from operator import itemgetter
def summary(data, key=itemgetter(0), field=itemgetter(1)):
"" Summarise the given data (a sequence of rows), grouped by the
given key (default: the first item of each row), giving totals
of the given field (default: the second item of each row).
The key and field arguments should be functions which, given a
data record, return the relevant value.
""
for k, group in groupby(data, key):
yield k, sum(field(row) for row in group)
if _ _name_ _ == "_ _main_ _":
# Example: given a sequence of sales data for city within region,
# _sorted on region_, produce a sales report by region
sales = [('Scotland', 'Edinburgh', 20000),
('Scotland', 'Glasgow', 12500),
('Wales', 'Cardiff', 29700),
('Wales', 'Bangor', 12800),
('England', 'London', 90000),
('England', 'Manchester', 45600),
('England', 'Liverpool', 29700)]
for region, total in summary(sales, field=itemgetter(2)):
print "%10s: %d" % (region, total)


Discussion


In many situations, data is available in tabular form, with the
information naturally grouped by a subset of the data values (e.g.,
recordsets obtained from database queries and data read from
spreadsheetstypically with the csv module
of the Python Standard Library). It is often useful to be able to
produce summaries of the detail data.

The new groupby function (added in Python 2.4 to
the itertools module of the Python Standard
Library) is designed exactly for the purpose of handling such grouped
data. It takes as arguments an iterator, whose items are to be
thought of as records, along with a function to extract the
key value from each record.
itertools.groupby yields each distinct key from
the iterator in turn, each along with a new iterator that runs
through the data values associated with that key.

The groupby function is often used to generate
summary totals for a dataset. The summary function
defined in this recipe shows one simple way of doing this. For a
summary report, two extraction functions are required: one function
to extract the key, which is the function that you pass to the
groupby function, and another function to extract
the values to be summarized. The recipe uses another innovation of
Python 2.4 for these purposes: the
operator.itemgetter higher-order function: called
with an index i as its argument.
itemgetter produces a function
f such that
f(x)
extracts the
ith item from
x, operating just like an
indexing
x[i].

The input records must be sorted by the given key; if
you're uncertain about that condition, you can use
groubpy(sorted(data, key=key), key) to ensure it,
exploiting the built-in function sorted, also new
in Python 2.4. It's quite convenient that the same
key-extraction function can be passed to both
sorted and groupby in this
idiom. The groupby function itself does not sort
its input, which gains extra flexibility that may come in
handyalthough most of the time you will want to use
groupby only on sorted data. See Recipe 19.10 for a case in which
it's quite handy to use groupby
on nonsorted data.

For example, if the sales data was in a CSV file
sales.csv, the usage example in the
recipe's if _ _name_ _ ==
`_ _main_ _
' section
might become:

    import csv
sales = sorted(cvs.reader(open('sales.csv', 'rb')),
key=itemgetter(1))
for region, total in summary(sales, field=itemgetter(2)):
print "%10s: %d" % (region, total)

Overall, this recipe provides a vivid illustration of how the new
Python 2.4 features work well together: in addition to the
groupby function, the
operator.itemgetter used to provide field
extraction functions, and the potential use of the built-in function
sorted, the recipe also uses a generator
expression as the argument to the sum built-in
function. If you need to implement this recipe's
functionality in Python 2.3, you can start by implementing your own
approximate version of groupby, for example as
follows:

class groupby(dict):
def _ _init_ _(self, seq, key):
for value in seq:
k = key(value)
self.setdefault(k, [ ]).append(value)
_ _iter_ _ = dict.iteritems

This version doesn't include all the features of
Python 2.4's groupby, but
it's very simple and may be sufficient for your
purposes. Similarly, you can write your own simplified versions of
functions itemgetter and
sorted, such as:

def itemgetter(i):
def getter(x): return x[i]
return getter
def sorted(seq, key):
aux = [(key(x), i, x) for i, x in enumerate(seq)]
aux.sort( )
return [x for k, i, x in aux]

As for the generator expression, you can simply use a list
comprehension in its placejust call sum([field(row)
for row in group])
where the recipe has the same call
without the additional square brackets, [ ]. Each
of these substitutions will cost a little performance, but, overall,
you can build the same functionality in Python 2.3 as you can in
version 2.4the latter just is slicker, simpler, faster,
neater!


See Also


itertools.groupy,
operator.itemgetter, sorted,
and csv in the Library
Reference
(for Python 2.4).


/ 394