Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Are lists at least as efficient as dictionaries?

Reply
Thread Tools

Are lists at least as efficient as dictionaries?

 
 
Narendra C. Tulpule
Guest
Posts: n/a
 
      08-29-2003
Hi,
if you know the Python internals, here is a newbie question for you.
If I have a list with 100 elements, each element being a long string,
is it more efficient to maintain it as a dictionary (with a key = a
string from the list and value = None) for the purpose of insertion
and removal?
Basically, if Python really implements lists as linked lists but
dictionaries as hash tables, it may well be that hashing a key takes
negligible time as compared to comparing it against every list
element.
Oh and here is (as a non-sequiter) something I don't understand
either:
>>> x = ([],)
>>> x[0] += ['something']

Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: object doesn't support item assignment
>>> x

(['something'],) <---- complained but did it anyway??
>>> x[0].append('and another thing') <------- no complaint!
>>> x

(['something', 'and another thing'],)
 
Reply With Quote
 
 
 
 
Peter Otten
Guest
Posts: n/a
 
      08-29-2003
Narendra C. Tulpule wrote:

> If I have a list with 100 elements, each element being a long string,
> is it more efficient to maintain it as a dictionary (with a key = a
> string from the list and value = None) for the purpose of insertion
> and removal?


Wild guess: I suppose that both implementations will not significantly
affect the overall speed of your application, e.g. if the strings are
*really* large (as opposed to the list of *only* 100 elements), reading
from disk will take much longer than inserting into the list, even at
arbitrary positions.

Also, note that no particular order is preserved for the keys in a
dictionary.

And now for something completely different:

>>>> x = ([],)
>>>> x[0] += ['something']

> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> TypeError: object doesn't support item assignment


+= calls the list.__iadd__(self, other) method, which seems to be
implemented as

def __iadd__(self, other):
self.append(other)
return self

The call of this method succeds, but the following assignment fails, because
tuples are immutable. This could only be remedied if all assignments

a = a

were silently ignored, or, if the += operator would not perform an
assignment, which has the disadvantage that it would no longer work for
immutables, so that
>>> i = 1
>>> i += 1
>>> print i

1 # admittedly faked
>>>


You could change __iadd__() to

def __iadd__(self, other):
return self + other

but then sane behaviour in one special case comes at the cost of always
creating a copy of a potentially large (say more than 100 items list.

By the way, one topic per post is always a good idea

Peter
 
Reply With Quote
 
 
 
 
Alex Martelli
Guest
Posts: n/a
 
      08-29-2003
Narendra C. Tulpule wrote:

> Hi,
> if you know the Python internals, here is a newbie question for you.
> If I have a list with 100 elements, each element being a long string,
> is it more efficient to maintain it as a dictionary (with a key = a
> string from the list and value = None) for the purpose of insertion
> and removal?


If you use a dictionary, there will be no "intrinsic" ordering -- you
may as well use sets, then. "Insertion" for a list would thus just
be an .append, very fast. "removal" may be O(N) for a list, if you
need to search through it for occurrences.

> Basically, if Python really implements lists as linked lists but
> dictionaries as hash tables, it may well be that hashing a key takes
> negligible time as compared to comparing it against every list
> element.


Python's lists are actually vectors, dicts are indeed hash tables.
Hashing a long string does take some time, but the value may be
cached (depending on the Python implementation) so that said time
need be spent only once for a given string object (but separate
string objects will spend the time multiply, even if it so happens
that their values coincide).

All in all, there's no serious alternative to benchmarking both
possibilities in a realistic scenario. Quite likely you may find
out that -- for sensible frequencies of insertiom / removal --
the time doesn't matter for your application overall (dominated
by I/O or other issues), and then you can use the simplest rather
than the fastest approach. At least 9 times out of 10 you will
be happiest with simplicity. If you don't care about ordering,
Python 2.3's sets are likely the simplest data structure (they
are implemented in terms of dictionaries, thus pretty fast too).


Alex


 
Reply With Quote
 
Jp Calderone
Guest
Posts: n/a
 
      08-29-2003
On Fri, Aug 29, 2003 at 10:07:20AM +0200, Peter Otten wrote:
> Narendra C. Tulpule wrote:
>

[ snip]
>
> And now for something completely different:
>
> >>>> x = ([],)
> >>>> x[0] += ['something']

> > Traceback (most recent call last):
> > File "<stdin>", line 1, in ?
> > TypeError: object doesn't support item assignment

>
> += calls the list.__iadd__(self, other) method, which seems to be
> implemented as
>
> def __iadd__(self, other):
> self.append(other)
> return self


self.extend(other), actually. But that's the basic idea, yea.

>
> The call of this method succeds, but the following assignment fails, because
> tuples are immutable. This could only be remedied if all assignments
>
> a = a
>
> were silently ignored, or, if the += operator would not perform an
> assignment, which has the disadvantage that it would no longer work for
> immutables, so that
> >>> i = 1
> >>> i += 1
> >>> print i

> 1 # admittedly faked
> >>>


+= could simply be syntactic sugar for a call to __add__ and then an
assignment. This would work for mutable and immutable objects.

>
> You could change __iadd__() to
>
> def __iadd__(self, other):
> return self + other
>
> but then sane behaviour in one special case comes at the cost of always
> creating a copy of a potentially large (say more than 100 items list.


True, except that list.extend() exists.

Jp

--
"There is no reason for any individual to have a computer in their
home."
-- Ken Olson, President of DEC, World Future Society
Convention, 1977

 
Reply With Quote
 
John Roth
Guest
Posts: n/a
 
      08-29-2003

"Narendra C. Tulpule" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed) om...
> Hi,
> if you know the Python internals, here is a newbie question for you.
> If I have a list with 100 elements, each element being a long string,
> is it more efficient to maintain it as a dictionary (with a key = a
> string from the list and value = None) for the purpose of insertion
> and removal?


Lists are more efficient for lookup by index, dictionaries are
more efficient for insertion (except at the end, where Python
maintains extra slots for just this case), removal and lookup
by key. Lists are also much more efficient for sequential
traversal.

John Roth


 
Reply With Quote
 
Jp Calderone
Guest
Posts: n/a
 
      08-30-2003
On Fri, Aug 29, 2003 at 10:24:37AM -0700, Chad Netzer wrote:
> On Fri, 2003-08-29 at 07:54, Jp Calderone wrote:
>
> > += could simply be syntactic sugar for a call to __add__ and then an
> > assignment. This would work for mutable and immutable objects.

>
> But it loses the advantage that some objects would otherwise have of
> being able to mutate in place, without allocating a new object (ie. very
> large matrix additions).
>


But as you removed from my original post, list.extend() exists. All one
has to do to retain the existing functionality of __iadd__ is name the
method something else, then call it. All the advantages, none of the
confusing or difficult to track semantics.

Jp

 
Reply With Quote
 
Chad Netzer
Guest
Posts: n/a
 
      08-30-2003
On Sat, 2003-08-30 at 00:03, Jp Calderone wrote:

> But as you removed from my original post, list.extend() exists. All one
> has to do to retain the existing functionality of __iadd__ is name the
> method something else, then call it. All the advantages, none of the
> confusing or difficult to track semantics.


True, however when working with large Matrices, I much prefer A += B, to
A.add(B), and the performance advantages with __iadd__ can be
substantial.

I prefer the original posters suggestion that self assignment be
ignored. But I see your point about the more difficult semantics of
__iadd__.


--
Chad Netzer


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Need regular expression for at least 7 characters and at least 1 special chatacter AAaron123 ASP .Net 0 10-03-2008 01:25 PM
Efficient Rank Ordering of Nested Lists pablo.mitchell@gmail.com Python 11 08-05-2007 10:29 PM
List of lists of lists of lists... =?UTF-8?B?w4FuZ2VsIEd1dGnDqXJyZXogUm9kcsOtZ3Vleg==?= Python 5 05-15-2006 11:47 AM
Efficient implementation of deeply nested lists Kay Schluehr Python 7 01-20-2006 08:27 AM
efficient intersection of lists with rounding Gordon Williams Python 9 12-06-2004 04:30 PM



Advertisments