Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Re: codecs.register_error for "strict",unicode.encode() and str.decode()

Reply
Thread Tools

Re: codecs.register_error for "strict",unicode.encode() and str.decode()

 
 
Peter Otten
Guest
Posts: n/a
 
      07-27-2012
Alan Franzoni wrote:

> Hello,
> I think I'm missing some piece here.
>
> I'm trying to register a default error handler for handling exceptions
> for preventing encoding/decoding errors (I know how this works and that
> making this global is probably not a good practice, but I found this
> strange behaviour while writing a proof of concept of how to let Python
> work in a more forgiving way).
>
> What I discovered is that register_error() for "strict" seems to work in
> the way I expect for string decoding, not for unicode encoding.
>
> That's what happens on Mac, Python 2.7.1 from Apple:
>
> melquiades:tmp alan$ cat minimal_test_encode.py
> # -*- coding: utf-8 -*-
>
> import codecs
>
> def handle_encode(e):
> return ("ASD", e.end)
>
> codecs.register_error("strict", handle_encode)
>
> print u"*".encode("ascii")
>
> melquiades:tmp alan$ python minimal_test_encode.py
> Traceback (most recent call last):
> File "minimal_test_encode.py", line 10, in <module>
> u"*".encode("ascii")
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe0' in
> position 0: ordinal not in range(12
>
>
> OTOH this works properly:
>
> melquiades:tmp alan$ cat minimal_test_decode.py
> # -*- coding: utf-8 -*-
>
> import codecs
>
> def handle_decode(e):
> return (u"ASD", e.end)
>
> codecs.register_error("strict", handle_decode)
>
> print "*".decode("ascii")
>
> melquiades:tmp alan$ python minimal_test_decode.py
> ASDASD
>
>
> What piece am I missing? The doc at
> http://docs.python.org/library/codecs.html says " For
> encoding /error_handler/ will be called with a UnicodeEncodeError
>

<http://docs.python.org/library/exceptions.html#exceptions.UnicodeEncodeError>
> instance, which contains information about the location of the error.", is
> there any reason why the standard "strict" handler cannot be replaced?


The error handling for the standard erorrs "strict", "replace", "ignore",
and "xmlcharrefreplace" is hardwired, see function unicode_encode_ucs1 in
Lib/unicodeobject.c:

if (known_errorHandler==-1) {
if ((errors==NULL) || (!strcmp(errors, "strict")))
known_errorHandler = 1;
....
switch (known_errorHandler) {
case 1: /* strict */
raise_encode_exception(&exc, encoding, unicode, collstart,
collend, reason);
goto onError;

You need another gun to shoot yourself in the foot


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
if and and vs if and,and titi VHDL 4 03-11-2007 05:23 AM



Advertisments