Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Unicode question

Reply
Thread Tools

Unicode question

 
 
=?ISO-8859-1?Q?Gerhard_H=E4ring?=
Guest
Posts: n/a
 
      07-17-2003
>>> u""
u'\x84\x94\x81'

(Python 2.2.3/2.3b2; sys.getdefaultencoding() == "ascii")

Why does this work?

Does Python guess which encoding I mean? I thought Python should refuse
to guess


-- Gerhard

 
Reply With Quote
 
 
 
 
Thomas Heller
Guest
Posts: n/a
 
      07-17-2003
Gerhard Hring <(E-Mail Removed)> writes:

> >>> u""

> u'\x84\x94\x81'
>
> (Python 2.2.3/2.3b2; sys.getdefaultencoding() == "ascii")
>
> Why does this work?
>
> Does Python guess which encoding I mean? I thought Python should
> refuse to guess


I stumbled over this yesterday, and it seems it is (at least) partially
answered by PEP 263:

In Python 2.1, Unicode literals can only be written using the
Latin-1 based encoding "unicode-escape". This makes the programming
environment rather unfriendly to Python users who live and work in
non-Latin-1 locales such as many of the Asian countries. Programmers
can write their 8-bit strings using the favorite encoding, but are
bound to the "unicode-escape" encoding for Unicode literals.

I have the impression that this is undocumented on purpose, because you
should not write unescaped non-ansi characters into the source file
(with 'unknown' encoding).

Thomas
 
Reply With Quote
 
 
 
 
=?ISO-8859-1?Q?Gerhard_H=E4ring?=
Guest
Posts: n/a
 
      07-18-2003
Thomas Heller wrote:
> Gerhard Hring <(E-Mail Removed)> writes:
>
>
>> >>> u""

>>u'\x84\x94\x81'
>>
>>(Python 2.2.3/2.3b2; sys.getdefaultencoding() == "ascii")
>>
>>Why does this work?
>>
>>Does Python guess which encoding I mean? I thought Python should
>>refuse to guess

>
>
> I stumbled over this yesterday, and it seems it is (at least) partially
> answered by PEP 263:
>
> In Python 2.1, Unicode literals can only be written using the
> Latin-1 based encoding "unicode-escape". This makes the programming
> environment rather unfriendly to Python users who live and work in
> non-Latin-1 locales such as many of the Asian countries. Programmers
> can write their 8-bit strings using the favorite encoding, but are
> bound to the "unicode-escape" encoding for Unicode literals.
>
> I have the impression that this is undocumented on purpose, because you
> should not write unescaped non-ansi characters into the source file
> (with 'unknown' encoding).


I agree that using latin1 as default is bad. If there's an encoding
cookie in the 2.3+ source file then this encoding could be used.

I stumbled on this when giving another Python user on this list a
pointer to the relevant section in the Python tutorial
(http://www.python.org/doc/current/tu...00000000000000)
where Guido uses u"" in an example.

As this is BAD the tutorial should probably be changed. I'll file a bug
report.

-- Gerhard

 
Reply With Quote
 
=?UTF-8?B?R2VyaGFyZCBIw6RyaW5n?=
Guest
Posts: n/a
 
      07-18-2003
Gerhard Häring wrote:
> Ricardo Bugalho wrote:
>> On Fri, 18 Jul 2003 02:07:13 +0200, Gerhard Häring wrote:
>>
>>>> Gerhard Häring <(E-Mail Removed)> writes:
>>>>
>>>>>>>> u"äöü"
>>>>>
>>>>> u'\x84\x94\x81'
>>>>> [this works, but IMO shouldn't]


> [...]
> You'll get warnings if you don't define an encoding (either encoding
> cookie or BOM) and use 8-Bit characters in your source files. These
> warnings will becomome errors in later Python versions.
>
> It's all in the PEP


I feel like an idiot now I do get the warnings when I run a Python
script, but I do not get the warnings when I'm using the interactive
prompt. So it's all good (almost). Why not also produce warnings at the
interactive prompt?

-- Gerhard

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: os.lisdir, gets unicode, returns unicode... USUALLY?!?!? Jean-Paul Calderone Python 23 11-21-2006 10:25 AM
os.lisdir, gets unicode, returns unicode... USUALLY?!?!? gabor Python 13 11-18-2006 09:23 AM
Unicode digit to unicode string Gabriele *darkbard* Farina Python 2 05-16-2006 01:15 PM
unicode wrap unicode object? ygao Python 6 04-08-2006 09:54 AM
Unicode + jsp + mysql + tomcat = unicode still not displaying Robert Mark Bram Java 0 09-28-2003 05:37 AM



Advertisments