Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Problem with sets and Unicode strings

Reply
Thread Tools

Problem with sets and Unicode strings

 
 
Dennis Benzinger
Guest
Posts: n/a
 
      06-29-2006
Diez B. Roggisch wrote:
>> But I'd say that it's not intuitive that for sets x in y can be false
>> (without raising an exception!) while the doing the same with a tuple
>> raises an exception. Where is this difference documented?

>
> 2.3.7 Set Types -- set, frozenset
>
> ...
>
> Set elements are like dictionary keys; they need to define both __hash__ and
> __eq__ methods.
> ...
>
> And it has to hold that
>
> a == b => hash(a) == hash(b)
>
> but NOT
>
> hash(a) == hash(b) => a == b
>
> Thus if the hashes vary, the set doesn't bother to actually compare the
> values.
> [...]


Ok, I understand.
But isn't it a (minor) problem that using a set like this:

# -*- coding: UTF-8 -*-

FIELDS_SET = set(("Fńcher", ))


print u"Fńcher" in FIELDS_SET
print u"Fńcher" == "Fńcher"


shadows the error of not setting sys.defaultencoding()?


Dennis
 
Reply With Quote
 
 
 
 
Robert Kern
Guest
Posts: n/a
 
      06-29-2006
Dennis Benzinger wrote:
> Ok, I understand.
> But isn't it a (minor) problem that using a set like this:
>
> # -*- coding: UTF-8 -*-
>
> FIELDS_SET = set(("F├Ącher", ))
>
> print u"F├Ącher" in FIELDS_SET
> print u"F├Ącher" == "F├Ącher"
>
> shadows the error of not setting sys.defaultencoding()?


You can't set the default encoding. If you could, then scripts that run on your
machine wouldn't run on mine.

If there's an error, it's the fact that you use a regular string at the
beginning ("F├Ącher") and a unicode string later (u"F├Ącher"). But set objects
can't know that that's the problem or even if it *is* a problem.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

 
Reply With Quote
 
 
 
 
Dennis Benzinger
Guest
Posts: n/a
 
      06-29-2006
Robert Kern wrote:
> Dennis Benzinger wrote:
>> Ok, I understand.
>> But isn't it a (minor) problem that using a set like this:
>>
>> # -*- coding: UTF-8 -*-
>>
>> FIELDS_SET = set(("F├Ącher", ))
>>
>> print u"F├Ącher" in FIELDS_SET
>> print u"F├Ącher" == "F├Ącher"
>>
>> shadows the error of not setting sys.defaultencoding()?

>
> You can't set the default encoding. If you could, then scripts that run
> on your machine wouldn't run on mine.
> [...]


As Serge Orlov wrote in one of his posts you _can_ set the default
encoding (at least in site.py). See
<http://docs.python.org/lib/module-sys.html>


Bye,
Dennis
 
Reply With Quote
 
Robert Kern
Guest
Posts: n/a
 
      06-29-2006
Dennis Benzinger wrote:
> Robert Kern wrote:
>> Dennis Benzinger wrote:
>>> Ok, I understand.
>>> But isn't it a (minor) problem that using a set like this:
>>>
>>> # -*- coding: UTF-8 -*-
>>>
>>> FIELDS_SET = set(("F├Ącher", ))
>>>
>>> print u"F├Ącher" in FIELDS_SET
>>> print u"F├Ącher" == "F├Ącher"
>>>
>>> shadows the error of not setting sys.defaultencoding()?

>> You can't set the default encoding. If you could, then scripts that run
>> on your machine wouldn't run on mine.
>> [...]

>
> As Serge Orlov wrote in one of his posts you _can_ set the default
> encoding (at least in site.py). See
> <http://docs.python.org/lib/module-sys.html>


Okay, *don't* set the default encoding to anything other than 'ascii'. Doing so
would be an error, not the other way around.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

 
Reply With Quote
 
Fredrik Lundh
Guest
Posts: n/a
 
      06-29-2006
Dennis Benzinger wrote:

>>> shadows the error of not setting sys.defaultencoding()?

>>
>> You can't set the default encoding. If you could, then scripts that run
>> on your machine wouldn't run on mine.
>> [...]

>
> As Serge Orlov wrote in one of his posts you _can_ set the default
> encoding (at least in site.py). See
> <http://docs.python.org/lib/module-sys.html>


yes, but you're not supposed to do that, for several reasons, including
the reasons Robert provided: if you mess with the interpreter defaults,
code you write isn't portable, and code written by others may not work
on your machine.

the interpreter isn't fully encoding agnostic either; things are not
guaranteed to work properly if you're not using the default.

</F>

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
File names, character sets and Unicode Michal Ludvig Python 1 12-12-2008 11:08 AM
compare unicode to non-unicode strings Asterix Python 5 08-31-2008 07:31 PM
Strings, Strings and Damned Strings Ben C Programming 14 06-24-2006 05:09 AM
html, unicode and character sets jb HTML 5 03-29-2006 08:32 AM
problem using sets strings and namespaces JBorges C++ 5 07-29-2005 06:02 PM



Advertisments