Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > save dictionary to a file without brackets.

Reply
Thread Tools

save dictionary to a file without brackets.

 
 
Chris Angelico
Guest
Posts: n/a
 
      08-09-2012
On Fri, Aug 10, 2012 at 8:26 AM, Dave Angel <(E-Mail Removed)> wrote:
> On 08/09/2012 06:03 PM, Andrew Cooper wrote:
>> O(n) for all other entries in the dict which suffer a hash collision
>> with the searched entry.
>>
>> True, a sensible choice of hash function will reduce n to 1 in common
>> cases, but it becomes an important consideration for larger datasets.

>
> I'm glad you're wrong for CPython's dictionaries. The only time the
> lookup would degenerate to O[n] would be if the hash table had only one
> slot. CPython sensibly increases the hash table size when it becomes
> too small for efficiency.
>
> Where have you seen dictionaries so poorly implemented?


In vanilla CPython up to version (I think) 3.3, where it's possible to
DoS the hash generator. Hash collisions are always possible, just
ridiculously unlikely unless deliberately exploited.

(And yes, I know an option was added to older versions to randomize
the hashes there too. It's not active by default, so "vanilla CPython"
is still vulnerable.)

ChrisA
 
Reply With Quote
 
 
 
 
Andrew Cooper
Guest
Posts: n/a
 
      08-09-2012
On 09/08/2012 23:26, Dave Angel wrote:
> On 08/09/2012 06:03 PM, Andrew Cooper wrote:
>> On 09/08/2012 22:34, Roman Vashkevich wrote:
>>> Actually, they are different.
>>> Put a dict.{iter}items() in an O(k^N) algorithm and make it a hundred thousand entries, and you will feel the difference.
>>> Dict uses hashing to get a value from the dict and this is why it's O(1).
>>>

>> Sligtly off topic, but looking up a value in a dictionary is actually
>> O(n) for all other entries in the dict which suffer a hash collision
>> with the searched entry.
>>
>> True, a sensible choice of hash function will reduce n to 1 in common
>> cases, but it becomes an important consideration for larger datasets.
>>
>> ~Andrew

>
> I'm glad you're wrong for CPython's dictionaries. The only time the
> lookup would degenerate to O[n] would be if the hash table had only one
> slot. CPython sensibly increases the hash table size when it becomes
> too small for efficiency.
>
>
> Where have you seen dictionaries so poorly implemented?
>


Different n, which I should have made more clear. I was using it for
consistency with O() notation. My statement was O(n) where n is the
number of hash collisions.

The choice of hash algorithm (or several depending on the
implementation) should specifically be chosen to reduce collisions to
aid in efficient space utilisation and lookup times, but any
implementation must allow for collisions. There are certainly runtime
methods of improving efficiency using amortized operations.

As for poor implementations,

class Foo(object):

...

def __hash__(self):
return 0

I seriously found that in some older code I had the misfortune of
reading. It didn't remain in that state for long.

~Andrew
 
Reply With Quote
 
 
 
 
Chris Angelico
Guest
Posts: n/a
 
      08-09-2012
On Fri, Aug 10, 2012 at 8:39 AM, Tim Chase
<(E-Mail Removed)> wrote:
> On 08/09/12 17:26, Dave Angel wrote:
>> On 08/09/2012 06:03 PM, Andrew Cooper wrote:
>> I'm glad you're wrong for CPython's dictionaries. The only time the
>> lookup would degenerate to O[n] would be if the hash table had only one
>> slot. CPython sensibly increases the hash table size when it becomes
>> too small for efficiency.
>>
>> Where have you seen dictionaries so poorly implemented?

>
> PHP?
>
> http://www.phpclasses.org/blog/post/...f-Servers.html


That's the same hash collision attack that I alluded to above, and it
strikes *many* language implementations. Most released a patch fairly
quickly and quietly (Pike, Lua, V8 (JavaScript/ECMAScript), PHP), but
CPython dared not, on account of various applications depending on
hash order (at least for tests). It's not (for once) an indictment of
PHP (maybe that should be an "inarrayment"?), it's a consequence of a
hashing algorithm that favored simplicity over cryptographic
qualities.

(It feels weird to be defending PHP...)

ChrisA
 
Reply With Quote
 
Roy Smith
Guest
Posts: n/a
 
      08-09-2012
In article <ucXUr.1030527$(E-Mail Removed)4>,
Andrew Cooper <(E-Mail Removed)> wrote:

> As for poor implementations,
>
> class Foo(object):
> def __hash__(self):
> return 0
>
> I seriously found that in some older code I had the misfortune of
> reading.


Python assumes you are a consenting adult. If you wish to engage in
activities which are hazardous to your health, so be it. But then
again, you could commit this particular stupidity just as easily in C++
or any other language which lets you define your own hash() function.
 
Reply With Quote
 
Chris Angelico
Guest
Posts: n/a
 
      08-09-2012
On Fri, Aug 10, 2012 at 9:05 AM, Roy Smith <(E-Mail Removed)> wrote:
> Python assumes you are a consenting adult. If you wish to engage in
> activities which are hazardous to your health, so be it.


.... you mean, Python lets you make a hash of it?

*ducks for cover*

ChrisA
 
Reply With Quote
 
Roy Smith
Guest
Posts: n/a
 
      08-09-2012
In article <(E-Mail Removed)>,
Chris Angelico <(E-Mail Removed)> wrote:

> On Fri, Aug 10, 2012 at 9:05 AM, Roy Smith <(E-Mail Removed)> wrote:
> > Python assumes you are a consenting adult. If you wish to engage in
> > activities which are hazardous to your health, so be it.

>
> ... you mean, Python lets you make a hash of it?
>


Only if you order it with spam, spam, spam, spam, spam, spam, and spam.
 
Reply With Quote
 
Mark Lawrence
Guest
Posts: n/a
 
      08-09-2012
On 10/08/2012 00:24, Roy Smith wrote:
> In article <(E-Mail Removed)>,
> Chris Angelico <(E-Mail Removed)> wrote:
>
>> On Fri, Aug 10, 2012 at 9:05 AM, Roy Smith <(E-Mail Removed)> wrote:
>>> Python assumes you are a consenting adult. If you wish to engage in
>>> activities which are hazardous to your health, so be it.

>>
>> ... you mean, Python lets you make a hash of it?
>>

>
> Only if you order it with spam, spam, spam, spam, spam, spam, and spam.
>


Now now gentlemen we're getting slightly off topic here and wouldn't
want to upset the people who insist on staying on topic. Or would we?

--
Cheers.

Mark Lawrence.

 
Reply With Quote
 
Dave Angel
Guest
Posts: n/a
 
      08-09-2012
On 08/09/2012 06:54 PM, Andrew Cooper wrote:
> On 09/08/2012 23:26, Dave Angel wrote:
>> On 08/09/2012 06:03 PM, Andrew Cooper wrote:
>>> On 09/08/2012 22:34, Roman Vashkevich wrote:
>>>> Actually, they are different.
>>>> Put a dict.{iter}items() in an O(k^N) algorithm and make it a hundred thousand entries, and you will feel the difference.
>>>> Dict uses hashing to get a value from the dict and this is why it's O(1).
>>>>
>>> Sligtly off topic, but looking up a value in a dictionary is actually
>>> O(n) for all other entries in the dict which suffer a hash collision
>>> with the searched entry.
>>>
>>> True, a sensible choice of hash function will reduce n to 1 in common
>>> cases, but it becomes an important consideration for larger datasets.
>>>
>>> ~Andrew

>> I'm glad you're wrong for CPython's dictionaries. The only time the
>> lookup would degenerate to O[n] would be if the hash table had only one
>> slot. CPython sensibly increases the hash table size when it becomes
>> too small for efficiency.
>>
>>
>> Where have you seen dictionaries so poorly implemented?
>>

> Different n, which I should have made more clear. I was using it for
> consistency with O() notation. My statement was O(n) where n is the
> number of hash collisions.

That's a little like doing a survey, and reporting the results as
showing that 100% of the women hit their husbands, among the population
of women who hit their husbands.

In your original message, you already stated the assumption that a
proper hash algorithm would be chosen, then went on to apparently claim
that large datasets would still have an order n problem. That last is
what I was challenging.

The rest of your message here refers to client code, not the system.
> The choice of hash algorithm (or several depending on the
> implementation) should specifically be chosen to reduce collisions to
> aid in efficient space utilisation and lookup times, but any
> implementation must allow for collisions. There are certainly runtime
> methods of improving efficiency using amortized operations.
>
> As for poor implementations,
>
> class Foo(object):
>
> ...
>
> def __hash__(self):
> return 0
>
> I seriously found that in some older code I had the misfortune of
> reading. It didn't remain in that state for long.
>
> ~Andrew



--

DaveA

 
Reply With Quote
 
Dave Angel
Guest
Posts: n/a
 
      08-09-2012
On 08/09/2012 06:53 PM, Chris Angelico wrote:
> On Fri, Aug 10, 2012 at 8:26 AM, Dave Angel <(E-Mail Removed)> wrote:
>> On 08/09/2012 06:03 PM, Andrew Cooper wrote:
>>> O(n) for all other entries in the dict which suffer a hash collision
>>> with the searched entry.
>>>
>>> True, a sensible choice of hash function will reduce n to 1 in common
>>> cases, but it becomes an important consideration for larger datasets.

>> I'm glad you're wrong for CPython's dictionaries. The only time the
>> lookup would degenerate to O[n] would be if the hash table had only one
>> slot. CPython sensibly increases the hash table size when it becomes
>> too small for efficiency.
>>
>> Where have you seen dictionaries so poorly implemented?

> In vanilla CPython up to version (I think) 3.3, where it's possible to
> DoS the hash generator. Hash collisions are always possible, just
> ridiculously unlikely unless deliberately exploited.
>
> (And yes, I know an option was added to older versions to randomize
> the hashes there too. It's not active by default, so "vanilla CPython"
> is still vulnerable.)
>
> ChrisA


Thank you to you and others, who have corrected my over-general
response. I was not intending to claim anything about either a
deliberate DOS, nor a foolishly chosen hash function. But the message I
was replying to seemed to me to claim that for large datasets, even a
good hash algorithm would end up giving O(n) performance.



--

DaveA

 
Reply With Quote
 
Tim Chase
Guest
Posts: n/a
 
      08-10-2012
On 08/09/12 18:33, Mark Lawrence wrote:
> On 10/08/2012 00:24, Roy Smith wrote:
>>> ... you mean, Python lets you make a hash of it?

>>
>> Only if you order it with spam, spam, spam, spam, spam, spam, and spam.

>
> Now now gentlemen we're getting slightly off topic here and wouldn't
> want to upset the people who insist on staying on topic. Or would we?


We apologise for the off-topicness in the thread. Those responsible
have been sacked...

-tkc



 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Performance ordered dictionary vs normal dictionary Navkirat Singh Python 6 07-29-2010 10:18 AM
creating a dictionary from a dictionary with regex james_027 Python 1 08-22-2007 07:39 AM
how to automatically "Save " a page after certain intervals without clicking "Save Page As..." subhadip Java 0 03-28-2007 04:15 PM
[DICTIONARY] - Copy dictionary entries to attributes Ilias Lazaridis Python 6 02-21-2006 11:27 AM
save a dictionary in a file Luis P. Mendes Python 10 11-16-2004 09:40 AM



Advertisments