David Eppstein <(E-Mail Removed)> writes:

> In article <(E-Mail Removed) .com>,

> "Xah Lee" <(E-Mail Removed)> wrote:

>

>> parti(aList, equalFunc)

>>

>> given a list aList of n elements, we want to return a list that is a

>> range of numbers from 1 to n, partition by the predicate function of

>> equivalence equalFunc. (a predicate function is a function that

>> takes two arguments, and returns either True or False.)

>

> In Python it is much more natural to use ranges from 0 to n-1.

> In the worst case, this is going to have to take quadratic time

> (consider an equalFunc that always returns false) so we might as well do

> something really simple rather than trying to be clever.
As you say, with the spec as it stands, you can't do better than

quadratic time (although it's O(n*m) where m is the number of

partitions, rather than O(n^2)).

You can do a lot better if you can use a "key" function, rather than

an "equivalence" function, much as list.sort has a "key" argument, and

itertools.groupby (which is pretty close in function to this

partitioning problem) uses a key argument.

In fact, I'd have difficulty thinking of an example where I'd want a

partition function as specified, in Python. In Perl, it makes a lot of

sense, as Perl's array indexing operations lend themselves to slinging

round lists of indices like this. But in Python, I'd be far more

likely to use list.sort followed by itertools.groupby - sort is stable

(so doesn't alter the relative order within equivalence classes), and

groupby then picks out the equivalence classes:

>>> elements = [['x', 'x', 'x', '1'],
.... ['x', 'x', 'x', '2'],

.... ['x', 'x', 'x', '2'],

.... ['x', 'x', 'x', '2'],

.... ['x', 'x', 'x', '3'],

.... ['x', 'x', 'x', '4'],

.... ['x', 'x', 'x', '5'],

.... ['x', 'x', 'x', '5']]

>>> # No need to sort here, as the elements are already sorted!

>>> from pprint import pprint

>>> pprint([(k, list(v)) for k, v in groupby(elements, itemgetter(3))])
[('1', [['x', 'x', 'x', '1']]),

('2', [['x', 'x', 'x', '2'], ['x', 'x', 'x', '2'], ['x', 'x', 'x', '2']]),

('3', [['x', 'x', 'x', '3']]),

('4', [['x', 'x', 'x', '4']]),

('5', [['x', 'x', 'x', '5'], ['x', 'x', 'x', '5']])]

If you avoid the sort, the whole thing is highly memory efficient, as

well, because by using iterators, we don't ever take a copy of the

original list.

Having cleverly redefined the question so that it fits the answer I

wanted to give, I'll shut up now

Paul.

--

To attain knowledge, add things every day; to attain wisdom, remove

things every day. -- Lao-Tse