Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Performance: sets vs dicts.

Reply
Thread Tools

Performance: sets vs dicts.

 
 
John Bokma
Guest
Posts: n/a
 
      09-02-2010
Robert Kern <(E-Mail Removed)> writes:

> On 9/1/10 4:40 PM, John Bokma wrote:
>> Arnaud Delobelle<(E-Mail Removed)> writes:
>>
>>> Terry Reedy<(E-Mail Removed)> writes:


[...]

>>> I don't understand what you're trying to say. Aahz didn't claim that
>>> random list element access was constant time, he said it was O(1) (and
>>> that it should be part of the Python spec that it is).

>>
>> Uhm, O(1) /is/ constant time, see page 45 of Introduction to Algorithms,
>> 2nd edition.

>
> While we often use the term "constant time" to as a synonym for O(1)
> complexity of an algorithm, Arnaud and Terry are using the term here
> to mean "an implementation takes roughly the same amount of wall-clock
> time every time".


Now that's confusing in a discussion that earlier on provided a link to
a page using big O notation. At least for people following this
partially, like I do.

--
John Bokma j3b

Blog: http://johnbokma.com/ Facebook: http://www.facebook.com/j.j.j.bokma
Freelance Perl & Python Development: http://castleamber.com/
 
Reply With Quote
 
 
 
 
rurpy@yahoo.com
Guest
Posts: n/a
 
      09-02-2010
On 09/01/2010 04:51 PM, Raymond Hettinger wrote:
> On Aug 30, 6:03 am, (E-Mail Removed) (Aahz) wrote:
>> That reminds me: one co-worker (who really should have known better
>> had the impression that sets were O(N) rather than O(1). Although
>> writing that off as a brain-fart seems appropriate, it's also the case
>> that the docs don't really make that clear, it's implied from requiring
>> elements to be hashable. Do you agree that there should be a comment?

>
> There probably ought to be a HOWTO or FAQ entry on algorithmic
> complexity
> that covers classes and functions where the algorithms are
> interesting.
> That will concentrate the knowledge in one place where performance is
> a
> main theme and where the various alternatives can be compared and
> contrasted.



> I think most users of sets rarely read the docs for sets. The few lines
> in the tutorial are enough so that most folks "just get it" and don't read
> more detail unless they attempting something exotic.


I think that attitude is very dangerous. There is
a long history in this world of one group of people
presuming what another group of people does or does
not do or think. This seems to be a characteristic
of human beings and is often used to promote one's
own ideology. And even if you have hard evidence
for what you say, why should 60% of people who don't
read docs justify providing poor quality docs to
the 40% that do?

So while you may "think" most people rarely read
the docs for basic language features and objects
(I presume you don't mean to restrict your statement
to only sets), I and most people I know *do* read
them. And when read them I expect them, as any good
reference documentation does, to completely and
accurately describe the behavior of the item I am
reading about. If big-O performance is deemed an
intrinsic behavior of an (operation of) an object,
it should be described in the documentation for
that object.

Your use of the word "exotic" is also suspect.
I learned long ago to always click the "advanced
options" box on dialogs because most developers/-
designers really don't have a clue about what
users need access to.

> Our docs have gotten
> somewhat voluminous,


No they haven't (relative to what they attempt to
describe). The biggest problem with the docs is
that they are too terse. They often appear to have
been written by people playing a game of "who can
describe X in the minimum number of words that can
still be defended as correct." While that may be
fun, good docs are produced by considering how to
describe something to the reader, completely and
accurately, as effectively as possible. The test
is not how few words were used, but how quickly
the reader can understand the object or find the
information being sought about the object.

> so it's unlikely that adding that particular
> needle to the haystack would have cured your colleague's "brain-fart"
> unless he had been focused on a single document talking about the
> performance
> characteristics of various data structures.


I don't know the colleague any more that you so I
feel comfortable saying that having it very likely
*would* have cured that brain-fart. That is, he
or she very likely would have needed to check some
behavior of sets at some point and would have either
noted the big-O characteristics in passing, or would
have noted that such information was available, and
would have returned to the documentation when the
need for that information arose. The reference
description of sets is the *one* canonical place to
look for information about sets.

There are people who don't read documentation, but
one has to be very careful not use the existence
of such people as an excuse to justify sub-standard
documentation.

So I think relegating algorithmic complexity information
to some remote document far from the description of the
object it pertains to, is exactly the wrong approach.
This is not to say that a performance HOWTO or FAQ
in addition to the reference manual would not be good.

 
Reply With Quote
 
 
 
 
Terry Reedy
Guest
Posts: n/a
 
      09-02-2010
On 9/1/2010 10:57 PM, http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:

> So while you may "think" most people rarely read
> the docs for basic language features and objects
> (I presume you don't mean to restrict your statement
> to only sets), I and most people I know *do* read
> them. And when read them I expect them, as any good
> reference documentation does, to completely and
> accurately describe the behavior of the item I am
> reading about. If big-O performance is deemed an
> intrinsic behavior of an (operation of) an object,
> it should be described in the documentation for
> that object.


However, big-O performance is intentionally NOT so deemed. And I have
and would continue to argue that it should not be, for multiple reasons.

Performance is a feature of implementations, and I think they should be
documented.

> This is not to say that a performance HOWTO or FAQ
> in addition to the reference manual would not be good.


I have writing a draft of such for CPython on my personal todo list.

--
Terry Jan Reedy

 
Reply With Quote
 
Terry Reedy
Guest
Posts: n/a
 
      09-02-2010
On 9/1/2010 8:11 PM, John Bokma wrote:
> Terry Reedy<(E-Mail Removed)> writes:
>
>> On 9/1/2010 5:40 PM, John Bokma wrote:

>
> [..]
>
>> Yes, I switched, because 'constant time' is a comprehensible claim
>> that can be refuted and because that is how some will interpret O(1)
>> (see below for proof.

>
> You make it now sound as if this interpretation is not correct or out of
> place.


Saying that an interpretation is comprehensible and common is hardly a
putdown . It simply is not unique. I also said that the other
interpretation is not coherent for size-limited problems. However, if
you mean 'constant', whatever you mean by that, why not say so?

Here is the technical definition given in
https://secure.wikimedia.org/wikiped...Big_O_notation
I assure you that it is the standard one invented by a mathematiciam
over a century ago and used by mathematicians ever since. It was
popularized in computer science by Donald Knuth in 1976 and used in The
Art of Computer Programming. It is related to the definition of limits.
'''
Formal definition

Let f(x) and g(x) be two functions defined on some subset of the real
numbers. One writes

f(x)=O(g(x))\mbox{ as }x\to\infty\,

if and only if, for sufficiently large values of x, f(x) is at most a
constant multiplied by g(x) in absolute value. That is, f(x) = O(g(x))
if and only if there exists a positive real number M and a real number
x0 such that

|f(x)| \le \; M |g(x)|\mbox{ for all }x>x_0.
'''
For g(x) == 1, this reduces to |f(x)| < M for some M (and large x),
which is to say, f(x) is bounded (at least for large x).

Hmmm. By that definition, 1/x is O(1). Let x0 be anything but 0 and M =
1/x0. Some might find that disturbing. But it is the nature of the
notation, as with limit notation, that is says *nothing* about the
behavior of f for small x.

> People who have bothered to read ItA will use O(1) and constant
> time interchangeably while talking of the order of growth of the running
> time algorithms


People who have bothered to read the Bidle will use terms the way they
are used in the Bible. So what? In other words, I do not regard ItA as
scripture in the way you seem to. (I presume you mean the Cormen, etal
book. I partially read a library copy over 20 years ago but never
bothered to buy a copy. I see it is now into its third edition.) If they
disagree with Knuth and most others (which I would not claim without
seeing their actual text) then so much the worse for them.

> and most of those are aware that 'big oh' hides a
> constant, and that in the real world a O(log n) algorithm can outperform
> an O(1) algorithm for small values of n.


Right. And if 'small values of n' include all possible values, then
rejecting a particular O(log n) algorithm as 'unacceptable' relative to
all O(1) algorithms is pretty absurd.

>>> Uhm, O(1) /is/ constant time, see page 45 of Introduction to Algorithms,
>>> 2nd edition.


Given that the Wikipedia article references that section also, I wonder
if it really disagrees with the definition above.

--
Terry Jan Reedy

 
Reply With Quote
 
Arnaud Delobelle
Guest
Posts: n/a
 
      09-03-2010
Terry Reedy <(E-Mail Removed)> writes:

> On 9/1/2010 8:11 PM, John Bokma wrote:

[...]
>>>> Uhm, O(1) /is/ constant time, see page 45 of Introduction to Algorithms,
>>>> 2nd edition.

>
> Given that the Wikipedia article references that section also, I
> wonder if it really disagrees with the definition above.


In simple terms, O(1) is *bounded* time.

This is not the same as constant, even though it is true that the
misleading expression "constant time" is widely used. I emphasized the
difference as you seemed to say that describing list element access as
O(1) was misleading because for example accessing contiguous elements
would be significantly faster that accessing distant ones. This may or
may not be the case, but it remains true that this time is bounded and
this is what O(1) really means.

--
Arnaud
 
Reply With Quote
 
rurpy@yahoo.com
Guest
Posts: n/a
 
      09-03-2010
On 09/02/2010 02:47 PM, Terry Reedy wrote:
> On 9/1/2010 10:57 PM, (E-Mail Removed) wrote:
>
>> So while you may "think" most people rarely read
>> the docs for basic language features and objects
>> (I presume you don't mean to restrict your statement
>> to only sets), I and most people I know *do* read
>> them. And when read them I expect them, as any good
>> reference documentation does, to completely and
>> accurately describe the behavior of the item I am
>> reading about. If big-O performance is deemed an
>> intrinsic behavior of an (operation of) an object,
>> it should be described in the documentation for
>> that object.

>
> However, big-O performance is intentionally NOT so deemed.


The discussion, as I understood it, was about whether
or not it *should* be so deemed.

> And I have
> and would continue to argue that it should not be, for multiple reasons.


Yes, you have. And others have argued the opposite.
Personally, I did not find your arguments very convincing,
particularly that it would be misleading or that the
limits necessarily imposed by a real implementation
somehow invalidates the usefulness of O() documentation.
But I acknowledged that there was not universal agreement
that O() behavior should be documented in the the reference
docs by qualifying my statement with the word "if".

But mostly my comments were directed towards some of the
side comments in Raymond's post I thought should not pass
unchallenged. I think that some of the attitudes expressed
(and shared by others) are likely the direct cause of many
of the faults I find in the currrent documentation.

 
Reply With Quote
 
John Bokma
Guest
Posts: n/a
 
      09-03-2010
Terry Reedy <(E-Mail Removed)> writes:

> On 9/1/2010 8:11 PM, John Bokma wrote:


[...]

> Right. And if 'small values of n' include all possible values, then
> rejecting a particular O(log n) algorithm as 'unacceptable' relative
> to all O(1) algorithms is pretty absurd.


I have little time, but want to reply to this one: anyone using big-Oh
and claiming that an O(log n) algorithm is 'unacceptable' relative to
all O(1) algorithms has no clue what he/she is talking about.

big-Oh says something about the order of /growth/ of the running time of
an algorithm. And since 1 is a constant O(1) means that the order of
/growth/ of the running time is constant (independend of the input
size. Since "the growth of the running time is constant" is quite a
mouth full, it's often shortened to 'constant time' since from the
context it's clear what's being meant. But this doesn't mean
that if the algorithm gets an input size of 100000 versus 1 that it
takes the same number of seconds for the latter to process.

--
John Bokma j3b

Blog: http://johnbokma.com/ Facebook: http://www.facebook.com/j.j.j.bokma
Freelance Perl & Python Development: http://castleamber.com/
 
Reply With Quote
 
Robert Kern
Guest
Posts: n/a
 
      09-03-2010
On 9/3/10 12:08 PM, John Bokma wrote:
> Terry Reedy<(E-Mail Removed)> writes:
>
>> On 9/1/2010 8:11 PM, John Bokma wrote:

>
> [...]
>
>> Right. And if 'small values of n' include all possible values, then
>> rejecting a particular O(log n) algorithm as 'unacceptable' relative
>> to all O(1) algorithms is pretty absurd.

>
> I have little time, but want to reply to this one: anyone using big-Oh
> and claiming that an O(log n) algorithm is 'unacceptable' relative to
> all O(1) algorithms has no clue what he/she is talking about.
>
> big-Oh says something about the order of /growth/ of the running time of
> an algorithm. And since 1 is a constant O(1) means that the order of
> /growth/ of the running time is constant (independend of the input
> size.


That's an ambiguous wording that is likely to confuse people. It seems like you
are saying that the O() behavior is the order of the first derivative of the
running time as a function of some interesting parameter of the problem, which
it is not. O() notation *describes* the growth, but it *is not* the order of the
growth itself.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Feature Sets Parastatidis Nikos Cisco 2 07-26-2005 12:58 PM
How to determine IOS Feature Sets on Cisco 2600 series routers? John Keeney Cisco 1 06-14-2004 07:05 PM
Sql Server multiple XML result sets to ADO.NET randall g ASP .Net 1 12-29-2003 09:24 PM
How to identify feature sets? This Old Man Cisco 3 12-05-2003 03:20 AM
Shaping data/multiple result sets - hmmm... Mike Kingscott ASP .Net 0 08-27-2003 12:39 PM



Advertisments