Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Python (http://www.velocityreviews.com/forums/f43-python.html)
-   -   Unicode question (http://www.velocityreviews.com/forums/t319823-unicode-question.html)

=?ISO-8859-1?Q?Gerhard_H=E4ring?= 07-17-2003 03:47 PM

Unicode question
 
>>> u""
u'\x84\x94\x81'

(Python 2.2.3/2.3b2; sys.getdefaultencoding() == "ascii")

Why does this work?

Does Python guess which encoding I mean? I thought Python should refuse
to guess :-)


-- Gerhard


Thomas Heller 07-17-2003 04:47 PM

Re: Unicode question
 
Gerhard Hring <gh@ghaering.de> writes:

> >>> u""

> u'\x84\x94\x81'
>
> (Python 2.2.3/2.3b2; sys.getdefaultencoding() == "ascii")
>
> Why does this work?
>
> Does Python guess which encoding I mean? I thought Python should
> refuse to guess :-)


I stumbled over this yesterday, and it seems it is (at least) partially
answered by PEP 263:

In Python 2.1, Unicode literals can only be written using the
Latin-1 based encoding "unicode-escape". This makes the programming
environment rather unfriendly to Python users who live and work in
non-Latin-1 locales such as many of the Asian countries. Programmers
can write their 8-bit strings using the favorite encoding, but are
bound to the "unicode-escape" encoding for Unicode literals.

I have the impression that this is undocumented on purpose, because you
should not write unescaped non-ansi characters into the source file
(with 'unknown' encoding).

Thomas

=?ISO-8859-1?Q?Gerhard_H=E4ring?= 07-18-2003 12:07 AM

Re: Unicode question
 
Thomas Heller wrote:
> Gerhard Hring <gh@ghaering.de> writes:
>
>
>> >>> u""

>>u'\x84\x94\x81'
>>
>>(Python 2.2.3/2.3b2; sys.getdefaultencoding() == "ascii")
>>
>>Why does this work?
>>
>>Does Python guess which encoding I mean? I thought Python should
>>refuse to guess :-)

>
>
> I stumbled over this yesterday, and it seems it is (at least) partially
> answered by PEP 263:
>
> In Python 2.1, Unicode literals can only be written using the
> Latin-1 based encoding "unicode-escape". This makes the programming
> environment rather unfriendly to Python users who live and work in
> non-Latin-1 locales such as many of the Asian countries. Programmers
> can write their 8-bit strings using the favorite encoding, but are
> bound to the "unicode-escape" encoding for Unicode literals.
>
> I have the impression that this is undocumented on purpose, because you
> should not write unescaped non-ansi characters into the source file
> (with 'unknown' encoding).


I agree that using latin1 as default is bad. If there's an encoding
cookie in the 2.3+ source file then this encoding could be used.

I stumbled on this when giving another Python user on this list a
pointer to the relevant section in the Python tutorial
(http://www.python.org/doc/current/tu...00000000000000)
where Guido uses u"" in an example.

As this is BAD the tutorial should probably be changed. I'll file a bug
report.

-- Gerhard


=?UTF-8?B?R2VyaGFyZCBIw6RyaW5n?= 07-18-2003 09:51 AM

Re: Unicode question
 
Gerhard Häring wrote:
> Ricardo Bugalho wrote:
>> On Fri, 18 Jul 2003 02:07:13 +0200, Gerhard Häring wrote:
>>
>>>> Gerhard Häring <gh@ghaering.de> writes:
>>>>
>>>>>>>> u"äöü"
>>>>>
>>>>> u'\x84\x94\x81'
>>>>> [this works, but IMO shouldn't]


> [...]
> You'll get warnings if you don't define an encoding (either encoding
> cookie or BOM) and use 8-Bit characters in your source files. These
> warnings will becomome errors in later Python versions.
>
> It's all in the PEP :)


I feel like an idiot now :-( I do get the warnings when I run a Python
script, but I do not get the warnings when I'm using the interactive
prompt. So it's all good (almost). Why not also produce warnings at the
interactive prompt?

-- Gerhard



All times are GMT. The time now is 06:56 AM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.