Recipe 5.4. Sorting Keys or Indices Basedon the Corresponding Values
Credit: John Jensen, Fred Bremmer, Nick Coghlan
Problem
You need to count the
occurrences of various items and present the items in order of their
number of occurrencesfor example, to produce a histogram.
Solution
A histogram, apart
from graphical issues, is based on counting the occurrences of items
(easy to do with a Python list or dictionary) and then sorting the
keys or indices in an order based on corresponding values. Here is a
subclass of dict that adds two methods for the
purpose:
class hist(dict):If the items you're counting are best modeled by
def add(self, item, increment=1):
add 'increment' to the entry for 'item'
self[item] = increment + self.get(item, 0)
def counts(self, reverse=False):
return list of keys sorted by corresponding values
aux = [ (self[k], k) for k in self ]
aux.sort( )
if reverse: aux.reverse( )
return [k for v, k in aux]
small integers in a compact range, so that you want to keep item
counts in a list, the solution is quite similar:
class hist1(list):
def _ _init_ _(self, n):
initialize this list to count occurrences of n distinct items
list._ _init_ _(self, n*[0])
def add(self, item, increment=1):
add 'increment' to the entry for 'item'
self[item] += increment
def counts(self, reverse=False):
return list of indices sorted by corresponding values
aux = [ (v, k) for k, v in enumerate(self) ]
aux.sort( )
if reverse: aux.reverse( )
return [k for v, k in aux]
Discussion
The
add method of hist embodies the
normal Python idiom for counting occurrences of arbitrary (but
hashable) items, using a dict to hold the counts.
In class hist1, based on a list,
we take a different approach, initializing all counts to 0 in
_ _init_ _, so the add method is
even simpler.The counts methods produce the lists of keys, or
indices, sorted in the order given by the corresponding values. The
problem is very similar in both classes, hist and
hist1; therefore, the solutions are also almost
identical, using in each case the DSU approach already shown in
Recipe 5.2 and Recipe 5.3. If we need both
classes in our program, the similarity is so close that we should
surely factor out the commonalities into a single auxiliary function
_sorted_keys:
def _sorted_keys(container, keys, reverse):and then implement the counts methods of each class
return list of 'keys' sorted by corresponding values in 'container'
aux = [ (container[k], k) for k in keys ]
aux.sort( )
if reverse: aux.reverse( )
return [k for v, k in aux]
as thin wrappers over this _sorted_keys function:
class hist(dict):DSU is so important that in Python 2.4, as shown previously in Recipe 5.2 and Recipe 5.3, the sort
...
def counts(self, reverse=False):
return _sorted_keys(self, self, reverse)
class hist1(list):
...
def counts(self, reverse=False):
return _sorted_keys(self, xrange(len(self)), reverse)
method of lists and the new built-in function
sorted offer a fast, intrinsic implementation of
it. Therefore, in Python 2.4, function _sorted_keys
can become much simpler and faster:
def _sorted_keys(container, keys, reverse):The bound-method container._
return sorted(keys, key=container._ _getitem_ _, reverse=reverse)`
_getitem_ _ performs exactly the same operation as the
indexing container[k] in the Python 2.3
implementation, but it's a callable to call on each
k of the sequence that
we're sorting, namely
keysexactly the right kind of value to pass
as the key keyword argument to the
sorted built-in function. Python 2.4 also affords
an easy, direct way to get a list of a dictionary's
items sorted by value:
from operator import itemgetterThe
def dict_items_sorted_by_value(d, reverse=False):
return sorted(d.iteritems( ), key=itemgetter(1), reverse=reverse)
operator.itemgetter higher-order function, also
new in Python 2.4, is a handy way to supply the
key argument when you want to sort a container
whose items are subcontainers, keying on a certain item of each
subcontainer. This is exactly the case here, since a
dictionary's items are a sequence of pairs (two-item
tuples), and we want to sort the sequence keying on the second item
of each tuple.Getting back to this recipe's main theme, here is a
usage example for the class hist shown in this
recipe's Solution:
sentence = ' Hello there this is a test. Hello there this was a test,This code snippet produces the following output:
but now it is not. '
words = sentence.split( )
c = hist( )
for word in words: c.add(word)
print "Ascending count:"
print c.counts( )
print "Descending count:"
print c.counts(reverse=True)
Ascending count:
[(1, 'but'), (1, 'it'), (1, 'not.'), (1, 'now'),
(1, 'test,'), (1, 'test.'),
(1, 'was'), (2, 'Hello'), (2, 'a'), (2, 'is'),
(2, 'there'), (2, 'this')]
Descending count:
[(2, 'this'), (2, 'there'), (2, 'is'), (2, 'a'),
(2, 'Hello'), (1, 'was'),
(1, 'test.'), (1, 'test,'), (1, 'now'), (1, 'not.'),
(1, 'it'), (1, 'but')]
See Also
Recipe "Special Method Names" in
the Language Reference and the OOP chapter in
Python in a Nutshell, about special method
_ _getitem_ _; Library
Reference docs for Python 2.4 sorted
built-in and the key= argument to
sort and sorted.