Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > "Decoding unicode is not supported" in unusual situation

Reply
Thread Tools

"Decoding unicode is not supported" in unusual situation

 
 
John Nagle
Guest
Posts: n/a
 
      03-11-2012
On 3/9/2012 4:57 PM, Steven D'Aprano wrote:
> On Fri, 09 Mar 2012 10:11:58 -0800, John Nagle wrote:
> This demonstrates a gross confusion about both Unicode and Python. John,
> I honestly don't mean to be rude here, but if you actually believe that
> (rather than merely expressing yourself poorly), then it seems to me that
> you are desperately misinformed about Unicode and are working on the
> basis of some serious misapprehensions about the nature of strings.
>
> In Python 2.6/2.7, there is no ambiguity between str/bytes. The two names
> are aliases for each other. The older name, "str", is a misnomer, since
> it *actually* refers to bytes (and always has, all the way back to the
> earliest days of Python). At best, it could be read as "byte string" or
> "8-bit string", but the emphasis should always be on the *bytes*.


There's an inherent ambiguity in that "bytes" and "str" are really
the same type in Python 2.6/2.7. That's a hack for backwards
compatibility, and it goes away in 3.x. The notes for PEP 358
admit this.

It's implicit in allowing

unicode(s)

with no encoding, on type "str", that there is an implicit
assumption that s is ASCII. Arguably, "unicode()" should
have required an encoding in all cases.

Or "str" and "bytes" should have been made separate types in
Python 2.7, in which case unicode() of a str would be a safe
ASCII to Unicode translation, and unicode() of a bytes object
would require an encoding. But that would break too much old code.
So we have an ambiguity and a hack.

"While Python 2 also has a unicode string type, the fundamental
ambiguity of the core string type, coupled with Python 2's default
behavior of supporting automatic coercion from 8-bit strings to unicode
objects when the two are combined, often leads to UnicodeErrors"
- PEP 404

John Nagle
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Odd situation or maybe not. Drew Windows 64bit 3 05-02-2012 11:37 PM
To unicode or not to unicode Ron Garret Python 23 02-23-2009 01:16 AM
RE: It's not unusual =?Utf-8?B?c2FuZHdvcm0=?= MCSE 14 05-20-2004 03:52 PM
Re: It's not unusual kpg MCSE 3 05-20-2004 01:57 PM
Unicode + jsp + mysql + tomcat = unicode still not displaying Robert Mark Bram Java 0 09-28-2003 05:37 AM



Advertisments