Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] - نسخه متنی

David Ascher, Alex Martelli, Anna Ravenscroft

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید







Recipe 1.22. Printing Unicode Charactersto Standard Output


Credit: David
Ascher


Problem


You want to print Unicode strings to standard output (e.g., for
debugging), but they don't fit in the default
encoding.


Solution


Wrap the
sys.stdout stream with a converter, using the
codecs module of Python's
standard library. For example, if you know your output is going to a
terminal that displays characters according to the ISO-8859-1
encoding, you can code:

import codecs, sys
sys.stdout = codecs.lookup('iso8859-1')[-1](sys.stdout)


Discussion


Unicode strings live in a large space, big enough for all of the
characters in every language worldwide, but thankfully the internal
representation of Unicode strings is irrelevant for users of Unicode.
Alas, a file stream, such as sys.stdout, deals
with bytes and has an encoding associated with it. You can change the
default encoding that is used for new files by modifying the
site module. That, however, requires changing your
entire Python installation, which is likely to confuse other
applications that may expect the encoding you originally configured
Python to use (typically the Python standard encoding, which is
ASCII). Therefore, this kind of modification is
not to be recommended.

This recipe takes a sounder approach: it rebinds
sys.stdout as a stream that expects Unicode input
and outputs it in ISO-8859-1 (also known as
"Latin-1"). This approach
doesn't change the encoding of any previous
references to sys.stdout, as illustrated here.
First, we keep a reference to the original, ASCII-encoded
sys.stdout:

>>> old = sys.stdout

Then, we create a Unicode string that wouldn't
normally be able to go through sys.stdout:

>>> char = u"\N{LATIN SMALL LETTER A WITH DIAERESIS}"
>>> print char
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeError: ASCII encoding error: ordinal not in range(128)

If you don't get an error from this operation,
it's because Python thinks it knows which encoding
your "terminal" is using (in
particular, Python is likely to use the right encoding if your
"terminal" is IDLE, the free
development environment that comes with Python). But, suppose you do
get this error, or get no error but the output is not the character
you expected, because your
"terminal" uses UTF-8 encoding and
Python does not know about it. When that is the case, we can just
wrap sys.stdout in the codecs
stream writer for UTF-8, which is a much richer encoding, then rebind
sys.stdout to it and try again:

>>> sys.stdout = codecs.lookup('utf-8')[-1](sys.stdout)
>>> print char
ä

This approach works only if your
"terminal", terminal emulator, or
other window in which you're running the interactive
Python interpreter supports the UTF-8 encoding, with a font rich
enough to display all the characters you need to output. If you
don't have such a program or device available, you
may be able to find a suitable one for your platform in the form of a
free program downloadable from the Internet.

Python tries to determine which encoding your
"terminal" is using and sets that
encoding's name as attribute
sys.stdout.encoding. Sometimes (alas, not always)
it even manages to get it right. IDLE already wraps your
sys.stdout, as suggested in this recipe, so,
within the environment's interactive Python shell,
you can directly print Unicode strings.


See Also


Documentation for the codecs and
site modules, and
setdefaultencoding in module
sys, in the Library
Reference
and Python in a
Nutshell
; Recipe 1.20 and Recipe 1.21.


/ 394