Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Performance: sets vs dicts.

Reply
Thread Tools

Performance: sets vs dicts.

 
 
Arnaud Delobelle
Guest
Posts: n/a
 
      08-31-2010
Paul Rubin <(E-Mail Removed)> writes:

> http://www.velocityreviews.com/forums/(E-Mail Removed) (Aahz) writes:
>> Possibly; IMO, people should not need to run timeit to determine basic
>> algorithmic speed for standard Python datatypes.

>
> Indeed. Alex Stepanov (designer of C++ Standard Template Library) was
> emphatic that algorithm complexity assertions should be part of the
> interface of any STL function:
>
> http://www.sgi.com/tech/stl/drdobbs-interview.html
>
> He also said it should be part of the "unwritten contract" between the
> module and its user, but I don't understand why he said "unwritten",
> since in the C++ STL the complexity statements are part of the written
> spec.


The spec is written, not the contract

--
Arnaud
 
Reply With Quote
 
 
 
 
Jerry Hill
Guest
Posts: n/a
 
      08-31-2010
On Mon, Aug 30, 2010 at 7:42 PM, Aahz <(E-Mail Removed)> wrote:
> Possibly; IMO, people should not need to run timeit to determine basic
> algorithmic speed for standard Python datatypes.


http://wiki.python.org/moin/TimeComplexity takes a stab at it. IIRC,
last time this came up, there was some resistance to making promises
about time complexity in the official docs, since that would make
those numbers part of the language, and thus binding on other
implementations.

--
Jerry
 
Reply With Quote
 
 
 
 
Aahz
Guest
Posts: n/a
 
      08-31-2010
In article <(E-Mail Removed)>,
Jerry Hill <(E-Mail Removed)> wrote:
>On Mon, Aug 30, 2010 at 7:42 PM, Aahz <(E-Mail Removed)> wrote:
>>
>> Possibly; IMO, people should not need to run timeit to determine basic
>> algorithmic speed for standard Python datatypes.

>
>http://wiki.python.org/moin/TimeComplexity takes a stab at it. IIRC,
>last time this came up, there was some resistance to making promises
>about time complexity in the official docs, since that would make
>those numbers part of the language, and thus binding on other
>implementations.


I'm thoroughly aware of that page and updated it yesterday to make it
easier to find.

However, I think there are some rock-bottom basic guarantees we can make
regardless of implementation. Does anyone seriously think that an
implementation would be accepted that had anything other than O(1) for
index access into tuples and lists? Dicts that were not O(1) for access
with non-pathological hashing? That we would accept sets having O()
performance worse than dicts?

I suggest that we should agree on these guarantees and document them in
the core.
--
Aahz ((E-Mail Removed)) <*> http://www.pythoncraft.com/

"...if I were on life-support, I'd rather have it run by a Gameboy than a
Windows box." --Cliff Wells
 
Reply With Quote
 
Stefan Behnel
Guest
Posts: n/a
 
      08-31-2010
Jerry Hill, 31.08.2010 14:24:
> On Mon, Aug 30, 2010 at 7:42 PM, Aahz wrote:
>> Possibly; IMO, people should not need to run timeit to determine basic
>> algorithmic speed for standard Python datatypes.

>
> http://wiki.python.org/moin/TimeComplexity takes a stab at it. IIRC,
> last time this came up, there was some resistance to making promises
> about time complexity in the official docs, since that would make
> those numbers part of the language, and thus binding on other
> implementations.


The docs start getting clearer about what's language specification and
what's implementation specific to CPython, so that would just add another
couple of CPython-only bits to what's there already. Although I do consider
facts like list indexing being O(1) pretty much at the
language-rather-than-implementation level. Most Python code simply expects
that and it would degrade a lot of code if that ever changed in whatever
implementation.

Stefan

 
Reply With Quote
 
Ian
Guest
Posts: n/a
 
      08-31-2010
On 31/08/2010 15:09, Aahz wrote:
> I suggest that we should agree on these guarantees and document them in
> the core.

I suspect that documenting them will be the easy part


 
Reply With Quote
 
Jerry Hill
Guest
Posts: n/a
 
      08-31-2010
On Tue, Aug 31, 2010 at 10:09 AM, Aahz <(E-Mail Removed)> wrote:
> I suggest that we should agree on these guarantees and document them in
> the core.


I can't get to the online python-dev archives from work (stupid
filter!) so I can't give you a link to the archives, but the original
thread that resulted in the creation of that wiki page was started on
March 9th, 2008 and was titled "Complexity documentation request".

At the time, opposition to formally documenting this seemed pretty
widespread, including from yourself and Guido. You've obviously
changed your mind on the subject, so maybe it's something that would
be worth revisiting, assuming someone wants to write the doc change.

--
Jerry
 
Reply With Quote
 
Terry Reedy
Guest
Posts: n/a
 
      08-31-2010
On 8/31/2010 10:09 AM, Aahz wrote:
> In article<(E-Mail Removed)>,
> Jerry Hill<(E-Mail Removed)> wrote:
>> On Mon, Aug 30, 2010 at 7:42 PM, Aahz<(E-Mail Removed)> wrote:
>>>
>>> Possibly; IMO, people should not need to run timeit to determine basic
>>> algorithmic speed for standard Python datatypes.

>>
>> http://wiki.python.org/moin/TimeComplexity takes a stab at it. IIRC,
>> last time this came up, there was some resistance to making promises
>> about time complexity in the official docs, since that would make
>> those numbers part of the language, and thus binding on other
>> implementations.

>
> I'm thoroughly aware of that page and updated it yesterday to make it
> easier to find.
>
> However, I think there are some rock-bottom basic guarantees we can make
> regardless of implementation. Does anyone seriously think that an
> implementation would be accepted that had anything other than O(1) for
> index access into tuples and lists?


Does anyone seriously think that an implementation should be rejected as
an implementation if it intellegently did seq[n] lookups in log2(n)/31
time units for all n (as humans would do), instead of stupidly taking 1
time unit for all n < 2**31 and rejecting all larger values (as 32-bit
CPython does)?

> Dicts that were not O(1) for access with non-pathological hashing?


You would actually be unhappy if small dicts ran even faster than they
do now (while large dicts ran the same)? Not me!

> I suggest that we should agree on these guarantees and document them in
> the core.


I disagree for several reasons.

1. It would a category mistake. Python is an abstract algorithm
language. The concept of speed is not applicable. I happen to think it
one of the best, which is why I once dubbed it 'executable pseudocode',
to compare it to similar but inferior, non-executable algorithm
languages too often used still today. To human readers, as readers,
speed of CPython or anything else is not relevant.

2. A Python interpreter is an abstract algorithm. Algorithms can be
characterized by operation counts, but not by physical universe time.

3. Algorithms, including Python interpreters, can be embodied in
anything that can compute, including humans, VonNeuman machines of
various types, and possible future non-VonNeuman machines. To limit
'python interpreter' to current vonNeuman machines would be short
sighted. Human interpretation of Python code (and other specifications)
is part of development, testing, and debugging.

4. Guarantee? Who will pay what to who if what is not performed ;-?. The
Python license intentionally disclaims any guarantee of performance. If
you are using 'guarantee' metaphorically, perhaps try another word.

5. Accepted? If you want to claim that an interpreter that is useless
for your purposes is not really an interpreter, you are free to, I
suppose. Asking the PSF to impose your view on me would, to me, violate
its diversity policy.

6. O-notation was developed for characterizing the asymptotic (large-N)
behavior of abstract algorithms in terms of abstract operation counts.
Left out are differences between operations, lower order terms,
multiplicative constants, how large N must be to get large-N behavior,
and what happens for smaller N. These and other factors make the
translation of O(f(n)) into real time extremely mushy or even wrong.

I already suggested that O(1) lookup is a symption of simple-minded
arithmetic stupidity, of doing unnecessary work when adding small
numbers. When it is a result doing most additions slower than necessary,
it is a bad thing, not a good thing, and a bad criteria for an
acceptable algorithm and acceptable interpreter. Similarly, books say
that O(n*logn) sorting is 'better' that O(n**2) sorting. However, Tim
Peters verified that the opposite is true for real times for small n,
and hence the current list.sort smartly uses a fast O(n**2) algorithm
for small lists (len < 64) and small runs in larger lists. Rejecting
list.sort for this reason would be stupid.

--
Terry Jan Reedy

 
Reply With Quote
 
Paul Rubin
Guest
Posts: n/a
 
      09-01-2010
Terry Reedy <(E-Mail Removed)> writes:
> Does anyone seriously think that an implementation should be rejected
> as an implementation if it intellegently did seq[n] lookups in
> log2(n)/31 time units for all n (as humans would do), instead of
> stupidly taking 1 time unit for all n < 2**31 and rejecting all larger
> values (as 32-bit CPython does)?


Er, how can one handle n > 2**31 at all, in 32-bit CPython?

>> Dicts that were not O(1) for access with non-pathological hashing?

>
> You would actually be unhappy if small dicts ran even faster than they
> do now (while large dicts ran the same)? Not me!


Not sure what you are referring to by that. I'd actually be ok with
using O(log(n)) dicts to eliminate the O(n) behavior of the hash-based
implementation on adverse input sets.

> Similarly, books say that O(n*logn) sorting is 'better' that O(n**2)
> sorting. However, Tim Peters verified that the opposite is true for
> real times for small n, and hence the current list.sort smartly uses a
> fast O(n**2) algorithm for small lists (len < 64) and small runs in
> larger lists. Rejecting list.sort for this reason would be stupid.


That algorithm is still O(n log n) since the n**2 behavior up to
some constant n is effectively an additive constant that asymptotically
is within the specified big-O. 64 actually sounds suspiciously large
for the cutover, but I'm sure Tim tuned it carefully.
 
Reply With Quote
 
Stefan Behnel
Guest
Posts: n/a
 
      09-01-2010
Lie Ryan, 01.09.2010 15:46:
> On 09/01/10 00:09, Aahz wrote:
>> However, I think there are some rock-bottom basic guarantees we can make
>> regardless of implementation. Does anyone seriously think that an
>> implementation would be accepted that had anything other than O(1) for
>> index access into tuples and lists? Dicts that were not O(1) for access
>> with non-pathological hashing? That we would accept sets having O()
>> performance worse than dicts?
>>
>> I suggest that we should agree on these guarantees and document them in
>> the core.

>
> While I think documenting them would be great for all programmers that
> care about practical and theoretical execution speed; I think including
> these implementation details in core documentation as a "guarantee"
> would be a bad idea for the reasons Terry outlined.
>
> One way of resolving that is by having two documentations (or two
> separate sections in the documentation) for:
> - Python -- the language -- documenting Python as an abstract language,
> this is the documentation which can be shared across all Python
> implementations. This will also be the specification for Python Language
> which other implementations will be measured to.
> - CPython -- the Python interpreter -- documents implementation details
> and performance metrics. It should be properly noted that these are not
> part of the language per se. This will be the playground for CPython
> experts that need to fine tune their applications to the last drop of
> blood and don't mind their application going nuts with the next release
> of CPython.


I disagree. I think putting the "obvious" guarantees right into the normal
documentation will actually make programmers aware that there *are*
different implementations (and differences between implementations), simply
because it wouldn't just say "O(1)" but "the CPython implementation of this
method has an algorithmic complexity of O(1), other Python implementations
are known to perform alike at the time of this writing". Maybe without the
last half of the sentence if we really don't know how other implementations
work here, or if we expect that there may well be a reason they may choose
to behave different, but in most cases, it shouldn't be hard to make that
complete statement.

After all, we basically know what other implementations there are, and we
also know that they tend to match the algorithmic complexities at least for
the major builtin types. It seems quite clear to me as a developer that the
set of builtin types and "collections" types was chosen in order to cover a
certain set of algorithmic complexities and not just arbitrary interfaces.

Stefan

 
Reply With Quote
 
Lie Ryan
Guest
Posts: n/a
 
      09-01-2010
On 09/01/10 00:09, Aahz wrote:
> In article <(E-Mail Removed)>,
> Jerry Hill <(E-Mail Removed)> wrote:
>> On Mon, Aug 30, 2010 at 7:42 PM, Aahz <(E-Mail Removed)> wrote:
>>>
>>> Possibly; IMO, people should not need to run timeit to determine basic
>>> algorithmic speed for standard Python datatypes.

>>
>> http://wiki.python.org/moin/TimeComplexity takes a stab at it. IIRC,
>> last time this came up, there was some resistance to making promises
>> about time complexity in the official docs, since that would make
>> those numbers part of the language, and thus binding on other
>> implementations.

>
> I'm thoroughly aware of that page and updated it yesterday to make it
> easier to find.
>
> However, I think there are some rock-bottom basic guarantees we can make
> regardless of implementation. Does anyone seriously think that an
> implementation would be accepted that had anything other than O(1) for
> index access into tuples and lists? Dicts that were not O(1) for access
> with non-pathological hashing? That we would accept sets having O()
> performance worse than dicts?
>
> I suggest that we should agree on these guarantees and document them in
> the core.


While I think documenting them would be great for all programmers that
care about practical and theoretical execution speed; I think including
these implementation details in core documentation as a "guarantee"
would be a bad idea for the reasons Terry outlined.

One way of resolving that is by having two documentations (or two
separate sections in the documentation) for:
- Python -- the language -- documenting Python as an abstract language,
this is the documentation which can be shared across all Python
implementations. This will also be the specification for Python Language
which other implementations will be measured to.
- CPython -- the Python interpreter -- documents implementation details
and performance metrics. It should be properly noted that these are not
part of the language per se. This will be the playground for CPython
experts that need to fine tune their applications to the last drop of
blood and don't mind their application going nuts with the next release
of CPython.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Feature Sets Parastatidis Nikos Cisco 2 07-26-2005 12:58 PM
How to determine IOS Feature Sets on Cisco 2600 series routers? John Keeney Cisco 1 06-14-2004 07:05 PM
Sql Server multiple XML result sets to ADO.NET randall g ASP .Net 1 12-29-2003 09:24 PM
How to identify feature sets? This Old Man Cisco 3 12-05-2003 03:20 AM
Shaping data/multiple result sets - hmmm... Mike Kingscott ASP .Net 0 08-27-2003 12:39 PM



Advertisments