home | tech | misc | code | bookmarks (broken) | contact | README


Python notes

Using Modules

When running the Python interpreter, if you get an error like:

Traceback (most recent call last):
  File "/opt/docutils-2008aug06/bin/rst2html.py", line 17, in ?
    from docutils.core import publish_cmdline, default_description
ImportError: No module named docutils.core

It will be necessary to tell your interpreter where to look for modules. In the above example, I installed docutils in /opt/doc-utils-2008aug06/. So I did:

export PYTHONPATH=/opt/docutils-2008aug06/lib/python2.4/site-packages

References: python2.4 -h

Troubleshooting

UnicodeDecodeError: 'ascii' codec can't decode ...

If you get an error like:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 16: ordinal not in range(128)

This can have several reasons. A good page about this error is this.

And I strongly recommend the Pragmatic Unicode presentation by Ned Batchelder that explains this really fine. This is a 40 min and worthwhile presentation.

In summary, why these commands have different output?

$ python2.7 -c "import sys; print sys.stdout.encoding"                                .
UTF-8
$ python2.7 -c "import sys; print sys.stdout.encoding" | cat
None

My terminal encoding is UTF-8 and my program works fine. But when I pipe the output or redirect it to a file, Python cannot discover the encoding I want to. So the variable is set to None and, when a UTF-8 encoding needs to be converted, it tries to convert to the ASCII encoding which, obviously, cannot represent all characters in UTF-8.

Ned Batchelder recommends to always work with encoding and decoding of UTF-8 strings (see that, in Python 2, a UTF-8 string is a different type of object than a normal string) or to either change your default encoding at the very beginning of your program. According to this link this can be done in different ways, like:

import sys
reload(sys)
sys.setdefaultencoding('UTF8')

Or using the PYTHONIOENCODING environment variable:

$ PYTHONIOENCODING=UTF-8 python2.7 -c "import sys; print sys.stdout.encoding" | cat
UTF-8