Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Py2.3: Feedback on Sets

Reply
Thread Tools

Py2.3: Feedback on Sets

 
 
Michael Hudson
Guest
Posts: n/a
 
      08-14-2003
"Raymond Hettinger" <(E-Mail Removed)> writes:

> I've gotten lots of feedback on the itertools module
> but have not heard a peep about the new sets module.
>
> * Are you overjoyed/outraged by the choice of | and &
> as set operators (instead of + and *)?


I'd actually rather sets didn't overload any operators at all, but
appreciate that this may be a minority position.

| and & is the only sane choice, however.

> * Is the support for sets of sets necessary for your work
> and, if so, then is the implementation sufficiently
> powerful?


I don't use them as much as I should, I suspect.

> * Is there a compelling need for additional set methods like
> Set.powerset() and Set.isdisjoint(s) or are the current
> offerings sufficient?


I've not reached for something and not found it there yet.

> * Does the performance meet your expectations?


My uses so far have not had even the faintest of performance demands,
so, yes.

> * Do you care that sets can only contain hashable elements?


Not yet.

Cheers,
mwh

--
The website looks like the Hi-Score sheet from a Bullshit Bingo
tournament. -- Dan Holdsworth, asr
 
Reply With Quote
 
 
 
 
Bob Gailer
Guest
Posts: n/a
 
      08-14-2003
After giving blanket approval to the docs I now add:

I have a mission to set some new guidelines for Python documentation.
Perhaps this is a good place to start.
Example - currently we have:

class Set( [iterable])
Constructs a new empty Set object. If the optional iterable parameter is
supplied, updates the set with elements obtained from iteration. All of the
elements in iterable should be immutable or be transformable to an
immutable using the protocol described in section
<http://www.python.org/doc/current/lib/immutable-transforms.html#immutable-transforms>5.12.3.


Problems:
The result of Set appears to be an empty Set object. The fact that it might
be filled is hidden in the parameter description.
The parameter description itself is hidden in the paragraph, making it
harder to find, especially when the reader is in a hurry.

Some suggested guidelines to improve readability and understandability:
1 - label each paragraph so we know what it is about
2 - have a function paragraph that briefly but completely describes the
function
3 - have labeled sections for things that can be so grouped (e.g. parameters)
4 - start the description of each thing in a new paragraph.

Example:

class Set( [iterable])
function: Constructs a new empty Set object and optionally fills it.
parameters:
iterable [optional] if supplied, updates the set with elements
obtained from
iteration. All of the elements in iterable should be immutable or be
transformable to an immutable using the protocol described in
section
<http://www.python.org/doc/current/lib/immutable-transforms.html#immutable-transforms>5.12.3.


What do you think? If this layout is appealing, let's use the set docs as a
starting point to model this approach. I for one am willing to apply this
model to the rest ot the set docs, and help update other docs, but not all
of them.

BTW I also have a problem with the term "Common uses". "Common" suggests
that these are better, or more frequent. I suggest "Some examples of
application of sets".

I also agree with the suggestion that operations that are synonymous be so
indicated in the table.

Bob Gailer
http://www.velocityreviews.com/forums/(E-Mail Removed)
303 442 2625



---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.506 / Virus Database: 303 - Release Date: 8/1/2003

 
Reply With Quote
 
 
 
 
Russell E. Owen
Guest
Posts: n/a
 
      08-14-2003
In article <(E-Mail Removed)>,
Skip Montanaro <(E-Mail Removed)> wrote:

> Russell> I suspect the upgrade issue will significantly slow the
> Russell> incorporation of sets and the other new modules, but that over
> Russell> time they're likely to become quite popular. I am certainly
> Russell> looking forward to using sets and csv.
>
>The csv module (and the _csv module which underpins it) should work with
>2.2.3. If they don't, please file a bug report.


That's excellent news. It might be worth adding it to the documentation,
e.g. "new in version 2.3 but compatible with version 2.2.x" (surely x is
1 (with True/False) or 0 (without), or was there really some needed
feature change in 2.2.3?).

>That was the intention with the csv module. I wonder if some limitations to
>use of sets with 2.2.x could be gotten around by adding a __future__ import?
>Maybe itertools is also needed.


That is an interesting question. Mind you, I have no idea if sets is
compatible with 2.2.x or not; I didn't try since it wasn't documented
and I didn't want to risk missing some obscure bug.

-- Russell
 
Reply With Quote
 
John Baxter
Guest
Posts: n/a
 
      08-15-2003
In article <bhe3cr$8n0$(E-Mail Removed)>,
"Andrew Dalke" <(E-Mail Removed)> wrote:

> I read some mention of using "|" instead of "+", so I knew
> to use it. I would have liked +, but not *. I know the logic
> for thinking * but & doesn't have the other connotations
> * has (like [1] * 2, "a"*9)
>
> > * Is the support for sets of sets necessary for your work
> > and, if so, then is the implementation sufficiently
> > powerful?


After years of using Python without sets, I hand built a specialized
intersection a couple of months ago. Knowing the Sets module was
coming, I did only what I needed at that moment, and didn't bother
optimizing it (it takes a few seconds to do what I need...removing a
second or two isn't useful). (I worked around a "need" for difference
by changing the input generation in the overall problem.)

So..."necessary" is too strong here, but "a good thing" is certainly
apt. If I only get to choose yes or no for "necessary" the answer is
"yes".

--John

--
Email to above address discarded by provider's server. Don't bother sending.
 
Reply With Quote
 
Raymond Hettinger
Guest
Posts: n/a
 
      08-15-2003
"Istvan Albert"
> One pattern that I constantly need is to remove duplicates from
> a sequence. I don't know if this an often enough used pattern to
> warrant an API change, for me it would be most useful if I could
> get the contents of a set as a sequence right away, without having to
> explicitly code it.



>>> list(Set('abracadbra'))

['a', 'r', 'b', 'c', 'd']



> > * Are the docs clear? Can you suggest improvements?

>
> I wondered whether it would be better to specify the immutability
> of the class at the constructor level.


ImmutableSet is available as a constructor.


> Then there is the update method. It feels a little bit redundant
> since there is an add() method that seems to be doing the same thing
> only that add() adds only one element at a time.
> Would it be possible to have add() handle all additions, iterable or
> not, then scrap update() altogether.


Not really.
Set.update() is for vectorizing high volume additions.
There is some analogy to list.append() vs. list.extend().


> > Then just by looking at the docs, it feels a little bit confusing to

> have discard() and remove() do essentially the same thing but only one
> of them raising an exception. Which one? I already forgot. I don't know
> which one I would prefer though.


Will clarify the docs.


> Another aspect that I did not understand, what is difference between
> update() and union_update().


update() works with any iterable and union_update() only with another Set.
If the API is liberized to allow any iterable for most operations, then
the distinction will vanish.



> The long winded method names, such as difference_update() also feel
> redundant when one can achieve the same thing with the -= operator. I
> would drop these and instead show in the docs how to accomplish these
> with the operators. Would considerably cut down on the documentation,
> and apparent complexity.


That is a good thought; however,
some find a.union(b) to be more readable than a|b
and some find that a.symmetric_difference is more memorable than a^b.



> For example methods like x.issubset(y) is the same as bool(x-y) so may
> not be all that necessary, just a thought.


Granted. However:

* issubset has an early out algorithm and consumes contant memory.
In contrast, bool(x-y) builds a whole new set and then throws it away.
* issubset and issuperset are somewhat basic set operations



> > * Are sets helpful in your daily work or does the need arise
> > only rarely?

>
> I use them very often and they are extremely useful.


Me too.


Raymond Hettinger


 
Reply With Quote
 
Terry Reedy
Guest
Posts: n/a
 
      08-15-2003

"Raymond Hettinger" <(E-Mail Removed)> wrote in message
news:3b__a.9694$u%(E-Mail Removed)...
> "Istvan Albert"
> > > Then just by looking at the docs, it feels a little bit

confusing to
> > have discard() and remove() do essentially the same thing but only

one
> > of them raising an exception. Which one? I already forgot. I don't

know
> > which one I would prefer though.


I agree that this is confusing -- like having both str.find and
str.index. I would prefer one delete function with an optional param
'silent' to switch its 'not there' response from the default (either
True or False, according to what seems to be the more common usage) to
the other choice. (I know, I should have read draft more carefully
and commented last fall -- but this seems like the sort of redundancy
that Guido wants to remove in 3.0.)

Terry J. Reedy


 
Reply With Quote
 
Gerrit Holl
Guest
Posts: n/a
 
      08-15-2003
Raymond Hettinger wrote:
> Subject: Py2.3: Feedback on Sets


> * Do you care that sets can only contain hashable elements?


This is the only disadvantage for me.

For the rest, I am happy about it. I am already using it a lot
on places where I used lists before, but where a Set is much
better (no order, no duplicates, it really *is* a set)

> User feedback is essential to determining the future direction
> of sets (whether it will be implemented in C, change API,
> and/or be given supporting language syntax).


I really like them. I would also like to be able to do
{elem for elem in set if foo(elem)} to construct a subset.

Gerrit.

--
255. If he sublet the man's yoke of oxen or steal the seed-corn,
planting nothing in the field, he shall be convicted, and for each one
hundred gan he shall pay sixty gur of corn.
-- 1780 BC, Hammurabi, Code of Law
--
Asperger Syndroom - een persoonlijke benadering:
http://people.nl.linux.org/~gerrit/
Het zijn tijden om je zelf met politiek te bemoeien:
http://www.sp.nl/

 
Reply With Quote
 
Raymond Hettinger
Guest
Posts: n/a
 
      08-15-2003
"Russell E. Owen"
> I don't rely on sets heavily (I do have a few implemented as
> dictionaries with value=None) and am not yet ready to make my users
> upgrade to Python 2.3.
>
> I suspect the upgrade issue will significantly slow the incorporation of
> sets and the other new modules, but that over time they're likely to
> become quite popular. I am certainly looking forward to using sets and
> csv.
>
> I think it'd speed the adoption of new modules if they were explicitly
> written to be compatible with one previous generation of Python (and
> documented as such) so users could manually include them with their code
> until the current generation of Python had a bit more time to be adopted.


Wish granted!

The sets module now will run under Py2.2.
It should be available for download from CVS after 24 hours:
http://cvs.sourceforge.net/cgi-bin/v...src/Lib/sets.p
y


Raymond Hettinger


 
Reply With Quote
 
Raymond Hettinger
Guest
Posts: n/a
 
      08-16-2003
"Gary Feldman"
> >* Are the docs clear? Can you suggest improvements?

>
> I haven't used them yet, but since I'm working my way through
> the docs in general, I thought I'd check them out and comment.


All of the issues you found have been fixed (except for the discussion of
what an iterable parameter means -- that will be addressed elsewhere).


Raymond Hettinger


 
Reply With Quote
 
Raymond Hettinger
Guest
Posts: n/a
 
      08-17-2003
"John Smith"
> Suggestion: How about adding Set.isProperSubset() and
> Set.isProperSuperset()?


We have them in operator form: a<b a>b
Spelling them out did not seem to add much value.
This is doubly true because some people read it
as s.isProperSubsetOf(t) and others read it as
s.hasTheProperSubset(t).


Raymond Hettinger

> Thanks for this wonderful module. I've been working on data mining and
> machine
> learning area using Python. Set operations are very important to me.


Great. You'll love it even more when I implement it in C.



Raymond Hettinger


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
More user feedback on Sets.py Raymond Hettinger Python 11 11-12-2003 03:26 PM
Re: Py2.3: Feedback on Sets Beni Cherniavsky Python 10 08-28-2003 05:29 AM
Re: Py2.3: Feedback on Sets Beni Cherniavsky Python 1 08-22-2003 07:19 AM
RE: Py2.3: Feedback on Sets Delaney, Timothy C (Timothy) Python 3 08-18-2003 05:01 PM
Re: Py2.3: Feedback on Sets (fwd) David Mertz Python 10 08-18-2003 04:09 AM



Advertisments