Recipe 1.24. Making Some Strings Case-Insensitive
Credit: Dale Strickland-Clark, Peter Cogolo, Mark
McMahon
Problem
You want to treat some strings so that all comparisons and lookups
are case-insensitive, while all other uses of the strings preserve
the original case.
Solution
The best solution is to wrap the specific strings in question into a
suitable subclass of str:
class iStr(str):
""
Case insensitive string class.
Behaves just like str, except that all comparisons and lookups
are case insensitive.
""
def _ _init_ _(self, *args):
self._lowered = str.lower(self)
def _ _repr_ _(self):
return '%s(%s)' % (type(self)._ _name_ _, str._ _repr_ _(self))
def _ _hash_ _(self):
return hash(self._lowered)
def lower(self):
return self._lowered
def _make_case_insensitive(name):
''' wrap one method of str into an iStr one, case-insensitive '''
str_meth = getattr(str, name)
def x(self, other, *args):
''' try lowercasing 'other', which is typically a string, but
be prepared to use it as-is if lowering gives problems,
since strings CAN be correctly compared with non-strings.
'''
try: other = other.lower( )
except (TypeError, AttributeError, ValueError): pass
return str_meth(self._lowered, other, *args)
# in Python 2.4, only, add the statement: x.func_name = name
setattr(iStr, name, x)
# apply the _make_case_insensitive function to specified methods
for name in 'eq lt le gt gt ne cmp contains'.split( ):
_make_case_insensitive('_ _%s_ _' % name)
for name in 'count endswith find index
rfind rindex startswith'.split( ):
_make_case_insensitive(name)
# note that we don't modify methods 'replace', 'split', 'strip', ...
#of course, you can add modifications to them, too,
if you prefer that.
del _make_case_insensitive# remove helper function, not needed any more
Discussion
Some implementation choices in class
iStr are worthy of notice. First, we choose to
generate the lowercase version once and for all, in method _
_init_ _, since we envision that in typical uses of
iStr instances, this version will be required
repeatedly. We hold that version in an attribute that is private, but
not overly so (i.e., has a name that begins with one underscore, not
two), because if iStr gets subclassed (e.g., to make
a more extensive version that also offers case-insensitive splitting,
replacing, etc., as the comment in the
"Solution" suggests),
iStr's subclasses are quite likely
to want to access this crucial "implementation
detail" of superclass iStr!We do not offer "case-insensitive"
versions of such methods as replace, because
it's anything but clear what kind of input-output
relation we might want to establish in the general case.
Application-specific subclasses may therefore be the way to provide
this functionality in ways appropriate to a given application. For
example, since the replace method is not wrapped,
calling replace on an instance of
iStr returns an instance of str,
not of iStr. If that is a
problem in your application, you may want to wrap all
iStr methods that return strings, simply to ensure
that the results are made into instances of iStr.
For that purpose, you need another, separate helper function, similar
but not identical to the _make_case_insensitive one
shown in the
"Solution":
def _make_return_iStr(name):and you need to call this helper function
str_meth = getattr(str, name)
def x(*args):
return iStr(str_meth(*args))
setattr(iStr, name, x)
_make_return_iStr on all the names of relevant
string methods returning strings such as:
for name in 'center ljust rjust strip lstrip rstrip'.split( ):Strings have about 20 methods (including special methods such as
_make_return_iStr(name)
_ _add_ _ and _ _mul_ _) that
you should consider wrapping in this way. You can also wrap in this
way some additional methods, such as split and
join, which may require special handling, and
others, such as encode and
decode, that you cannot deal with unless you also
define a case-insensitive unicode subtype. In
practice, one can hope that not every single one of these methods
will prove problematic in a typical application. However, as you can
see, the very functional richness of Python strings makes it a bit of
work to customize string subtypes fully, in a general way without
depending on the needs of a specific application.The implementation of iStr is careful to avoid the
boilerplate code (meaning repetitious and therefore bug-prone code)
that we'd need if we just overrode each needed
method of str in the normal way, with
def statements in the class body. A custom
metaclass or other such advanced technique would offer no special
advantage in this case, so the boilerplate avoidance is simply
obtained with one helper function that generates and installs wrapper
closures, and two loops using that function, one for normal methods
and one for special ones. The loops need to be placed
after the class statement, as
we do in this recipe's Solution, because they need
to modify the class object iStr, and the class
object doesn't exist yet (and thus cannot be
modified) until the class statement has completed.In Python 2.4, you can reassign the func_name
attribute of a function object, and in this case, you should do so to
get clearer and more readable results when introspection (e.g., the
help function in an interactive interpreter
session) is applied to an iStr instance. However,
Python 2.3 considers attribute func_name of
function objects to be read-only; therefore, in this
recipe's Solution, we have indicated this
possibility only in a comment, to avoid losing Python 2.3
compatibility over such a minor issue.Case-insensitive (but case-preserving) strings have many uses, from
more tolerant parsing of user input, to filename matching on
filesystems that share this characteristic, such as all of Windows
filesystems and the Macintosh default filesystem. You might easily
find yourself creating a variety of
"case-insensitive" container types,
such as dictionaries, lists, sets, and so onmeaning containers
that go out of their way to treat string-valued keys or items as if
they were case-insensitive. Clearly a better architecture is to
factor out the functionality of
"case-insensitive" comparisons and
lookups once and for all; with this recipe in your toolbox, you can
just add the required wrapping of strings into iStr
instances wherever you may need it, including those times when
you're making case-insensitive container types.For example, a list whose items are basically strings, but are to be
treated case-insensitively (for sorting purposes and in such methods
as count and index), is
reasonably easy to build on top of iStr:
class iList(list):Essentially, all we're doing is ensuring that every
def _ _init_ _(self, *args):
list._ _init_ _(self, *args)
# rely on _ _setitem_ _ to wrap each item into iStr...
self[:] = self
wrap_each_item = iStr
def _ _setitem_ _(self, i, v):
if isinstance(i, slice): v = map(self.wrap_each_item, v)
else: v = self.wrap_each_item(v)
list._ _setitem_ _(self, i, v)
def append(self, item):
list.append(self, self.wrap_each_item(item))
def extend(self, seq):
list.extend(self, map(self.wrap_each_item, seq))
item that gets into an instance of iList gets
wrapped by a call to iStr, and everything else takes
care of itself.Incidentally, this example class iList is accurately
coded so that you can easily make customized subclasses of
iList to accommodate application-specific subclasses
of iStr: all such a customized subclass of
iList needs to do is override the single class-level
member named wrap_each_item.
See Also
Library Reference and Python in a
Nutshell sections on str, string
methods, and special methods used in comparisons and hashing.