Recipe 1.22. Printing Unicode Charactersto Standard Output
Credit: David
Ascher
Problem
You want to print Unicode strings to standard output (e.g., for
debugging), but they don't fit in the default
encoding.
Solution
Wrap the
sys.stdout stream with a converter, using the
codecs module of Python's
standard library. For example, if you know your output is going to a
terminal that displays characters according to the ISO-8859-1
encoding, you can code:
import codecs, sys
sys.stdout = codecs.lookup('iso8859-1')[-1](sys.stdout)
Discussion
Unicode strings live in a large space, big enough for all of the
characters in every language worldwide, but thankfully the internal
representation of Unicode strings is irrelevant for users of Unicode.
Alas, a file stream, such as sys.stdout, deals
with bytes and has an encoding associated with it. You can change the
default encoding that is used for new files by modifying the
site module. That, however, requires changing your
entire Python installation, which is likely to confuse other
applications that may expect the encoding you originally configured
Python to use (typically the Python standard encoding, which is
ASCII). Therefore, this kind of modification is
not to be recommended.This recipe takes a sounder approach: it rebinds
sys.stdout as a stream that expects Unicode input
and outputs it in ISO-8859-1 (also known as
"Latin-1"). This approach
doesn't change the encoding of any previous
references to sys.stdout, as illustrated here.
First, we keep a reference to the original, ASCII-encoded
sys.stdout:
>>> old = sys.stdoutThen, we create a Unicode string that wouldn't
normally be able to go through sys.stdout:
>>> char = u"\N{LATIN SMALL LETTER A WITH DIAERESIS}"If you don't get an error from this operation,
>>> print char
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeError: ASCII encoding error: ordinal not in range(128)
it's because Python thinks it knows which encoding
your "terminal" is using (in
particular, Python is likely to use the right encoding if your
"terminal" is IDLE, the free
development environment that comes with Python). But, suppose you do
get this error, or get no error but the output is not the character
you expected, because your
"terminal" uses UTF-8 encoding and
Python does not know about it. When that is the case, we can just
wrap sys.stdout in the codecs
stream writer for UTF-8, which is a much richer encoding, then rebind
sys.stdout to it and try again:
>>> sys.stdout = codecs.lookup('utf-8')[-1](sys.stdout)This approach works only if your
>>> print char
ä
"terminal", terminal emulator, or
other window in which you're running the interactive
Python interpreter supports the UTF-8 encoding, with a font rich
enough to display all the characters you need to output. If you
don't have such a program or device available, you
may be able to find a suitable one for your platform in the form of a
free program downloadable from the Internet.Python tries to determine which encoding your
"terminal" is using and sets that
encoding's name as attribute
sys.stdout.encoding. Sometimes (alas, not always)
it even manages to get it right. IDLE already wraps your
sys.stdout, as suggested in this recipe, so,
within the environment's interactive Python shell,
you can directly print Unicode strings.
See Also
Documentation for the codecs and
site modules, and
setdefaultencoding in module
sys, in the Library
Reference and Python in a
Nutshell; Recipe 1.20 and Recipe 1.21.