Velocity Reviews > Some thougts on cartesian products

# Some thougts on cartesian products

Christoph Zwerschke
Guest
Posts: n/a

 01-22-2006
In Python, it is possible to multiply a string with a number:

>>> "hello"*3

'hellohellohello'

However, you can't multiply a string with another string:

>>> 'hello'*'world'

Traceback (most recent call last):
File "<interactive input>", line 1, in ?
TypeError: can't multiply sequence by non-int

Sometimes I was missing such a feature.
What I expect as the result is the "cartesian product" of the strings.

Here is a simple implementation of new list and string objects that
explains what I mean. It also implements powers of lists and strings:

class plist(list):
"""list with cartesian product list"""

def __mul__(self, other):
if isinstance(other, pstr):
return plist([s+o for s in self for o in other])
if hasattr(other, '__getitem__'):
return plist([[s, o] for s in self for o in other])
else:
return list(self)*other

def __pow__(self, other):
if isinstance(other, int) and other > 0:
if other == 1:
return self
return self * self**(other-1)

class pstr(str):
"""str with cartesian product list"""

def __mul__(self, other):
if hasattr(other, '__getitem__'):
return plist([s+o for s in self for o in other])
else:
return str(self)*other

def __pow__(self, other):
if isinstance(other, int) and other > 0:
if other == 1:
return self
return self * self**(other-1)

With these new strings you can do the following:

>>> pstr("ab")*pstr("cd")*pstr("ef")

>>> print pstr("abcdefgh")*pstr("12345678")

['a1', 'a2', ..., 'a8', 'b1', 'b2', ..., 'b8',
...., ..., ..., 'h1', 'h2', ..., 'h8']

>>> print pstr("ACGU")**3

['AAA', 'AAC', 'AAG', 'AAU', 'ACA', 'ACC', 'ACG', ...,
...., 'UGC', 'UGG', 'UGU', 'UUA', 'UUC', 'UUG', 'UUU']

I think this can be quite handy at times and save some typing.

If Python strings had this ability, you could even outdo the
117 byte solution in the recent shortest Python coding contest
(http://www.pycontest.net), as follows:

j=''.join;seven_seg=lambda x:j(j(\
(' |'*'_ '*' |')[ord('|B¬@z”(ÀD°'[int(d)])%e]\
for d in x)+'\n'for e in(4,9,7))

This has only 110 bytes.

Or you could write a simple password cracker like that:

def crack(crypted, alphabet):
for passwd in alphabet**4:
if crypt(passwd, crypted[:2]) == crypted:
return passwd

And call it with alphabet = string.lowercase, for example.

Cartesian products may be generally interesting for iterables:

def imul(iterable1, iterable2):
"""cartesian product of two iterables"""
for object1 in iterable1:
for object2 in iterable2:
if isinstance(object1, basestring) and \
isinstance(object2, basestring):
yield object1 + object2
else:
yield (object1, object2)

def ipow(iterable, number):
"""cartesian product power of an iterable"""
if number == 1:
for object in iterable:
yield object
elif number > 1:
for object1 in iterable:
for object2 in ipow(iterable, number-1):
yield object1 + object2

class istr(str):
"""str with iterable cartesian product"""

def __mul__(self, other):
if isinstance(other, str):
return imul(self, other)
else:
return str(self)*other

def __pow__(self, other):
return ipow(self, other)

I was wondering if similar functionality could be added in some way to
Python. I noticed that Python has a lot of aggregate functions that can
"reduce" given collection objects (like reduce, filter, min, max, sum,
hash) and functions that keep the same size (like map, sort or zip), but
few functions that can "inflate" the given objects (like range and
enumerate). I know, such functions are dangerous because they also
inflate time and memory consumed by the program. Still, sometimes they
can make sense, whenever you for some reason simply *have* to walk
through all the combinations.

Giovanni Bajo
Guest
Posts: n/a

 01-22-2006
Christoph Zwerschke wrote:

> Sometimes I was missing such a feature.
> What I expect as the result is the "cartesian product" of the strings.

I've been thinking of it as well. I'd like it for lists too:

>> range(3)**2

[(0,0), (0,1), (0,2), (1,0), (1,1), (1,2), (2,0), (2,1), (2,2)]

--
Giovanni Bajo

Kay Schluehr
Guest
Posts: n/a

 01-22-2006

Giovanni Bajo wrote:
> Christoph Zwerschke wrote:
>
> > Sometimes I was missing such a feature.
> > What I expect as the result is the "cartesian product" of the strings.

>
> I've been thinking of it as well. I'd like it for lists too:
>
> >> range(3)**2

> [(0,0), (0,1), (0,2), (1,0), (1,1), (1,2), (2,0), (2,1), (2,2)]
>
> --
> Giovanni Bajo

But why isn't this interpreted as [0, 1, 4] like it is in Mathematica?

I would prefer a polymorphic distribute(*args) function ( or generator
) that acts on tuples of listlike objects of equal size. It could be
extended to distribute(*args[,default]) by a single default argument
that is inserted in the distribution table when two listlike objects
have not the same size.

Kay

Alex Martelli
Guest
Posts: n/a

 01-22-2006
Kay Schluehr <(E-Mail Removed)> wrote:
...
> > >> range(3)**2

> > [(0,0), (0,1), (0,2), (1,0), (1,1), (1,2), (2,0), (2,1), (2,2)]

...
> But why isn't this interpreted as [0, 1, 4] like it is in Mathematica?

Since range(3)*2 is [0, 1, 2, 0, 1, 2], it would be horribly, painfully
inconsistent if **2 was interpreted as "square each item".

Alex

Christoph Zwerschke
Guest
Posts: n/a

 01-22-2006
Kay Schluehr schrieb:
> But why isn't this interpreted as [0, 1, 4] like it is in Mathematica?

Because we are thinking of a cartesian product. If you have lists of
numbers, then there are mane more ways to define a product: tensor
product, vector product, scalar product, componentwise product...

The cartesian product is a much more generic concept, you can have it

class sset(set):

def __mul__(self, other):
return sset((a,b) for a in self for b in other)

def __pow__(self, other):
if isinstance(other, int) and other > 0:
if other == 1:
return self
elif other == 2:
return self*self
else:
return sset(a + (b,) \
for a in self**(other-1) for b in self)

Example:

for x in sorted(sset("ACGU")**3):
print ''.join(x)

AAA
AAC
AAG
AAU
ACA
ACC
..
..
..
UGU
UUA
UUC
UUG
UUU

Now as I'm thinking about it, wouldn't it be nice to have the cartesian
products on Python sets? Maybe also a method that returns the power set
of a set (the set of all subsets), or the set of all subsets with a
given length. You could get a 6/49 lotto tip with something like:

choice(set(range(49)).powerset(6))

-- Christoph

Christoph Zwerschke
Guest
Posts: n/a

 01-22-2006
Alex Martelli wrote:
> Kay Schluehr <(E-Mail Removed)> wrote:
>> range(3)**2
>> But why isn't this interpreted as [0, 1, 4] like it is in Mathematica?

>
> Since range(3)*2 is [0, 1, 2, 0, 1, 2], it would be horribly, painfully
> inconsistent if **2 was interpreted as "square each item".

Yes. Python does not interpreate the product of a list with a number as
a scalar product. Otherwise range(3)*2 should be [0, 1, 4] as well.

For doing such things I would use a vector subtype of list.

-- Christoph

Alex Martelli
Guest
Posts: n/a

 01-22-2006
Christoph Zwerschke <(E-Mail Removed)> wrote:
...
> given length. You could get a 6/49 lotto tip with something like:
>
> choice(set(range(49)).powerset(6))

And that would be better than the current random.sample(range(49),6) in
WHAT ways, exactly...?

Alex

Paul Rubin
Guest
Posts: n/a

 01-22-2006
http://www.velocityreviews.com/forums/(E-Mail Removed) (Alex Martelli) writes:
> > given length. You could get a 6/49 lotto tip with something like:
> > choice(set(range(49)).powerset(6))

>
> And that would be better than the current random.sample(range(49),6) in
> WHAT ways, exactly...?

I think the first one would be incorrect since it samples with
replacement. At least, it looks like it does.

Steven D'Aprano
Guest
Posts: n/a

 01-22-2006
On Sun, 22 Jan 2006 18:29:45 +0100, Christoph Zwerschke wrote:

> Alex Martelli wrote:
>> Kay Schluehr <(E-Mail Removed)> wrote:
>>> range(3)**2
>>> But why isn't this interpreted as [0, 1, 4] like it is in Mathematica?

>>
>> Since range(3)*2 is [0, 1, 2, 0, 1, 2], it would be horribly, painfully
>> inconsistent if **2 was interpreted as "square each item".

>
> Yes. Python does not interpreate the product of a list with a number as
> a scalar product. Otherwise range(3)*2 should be [0, 1, 4] as well.
>
> For doing such things I would use a vector subtype of list.

Not everything needs to be a separate class! Why create a magic class for
every piece of functionality you want? Just create functions that operate
on existing classes!

Instead of a class that supports cartesian products, make a function that
takes two sequences and returns the cartesian product of them. (This will
likely be best implemented as a generator.) If you write it properly,
which is to say if you don't go out of your way to break it, this function
will *automatically* work on any sequence type, lists, tuples, strings,
and things you and I haven't even thought of.

What advantage is there to creating a "list with cartesian product"
subclass of list?

--
Steven.

Christoph Zwerschke
Guest
Posts: n/a

 01-22-2006
Alex Martelli schrieb:
> Christoph Zwerschke <(E-Mail Removed)> wrote:
> ...
>> given length. You could get a 6/49 lotto tip with something like:
>>
>> choice(set(range(49)).powerset(6))

> And that would be better than the current random.sample(range(49),6) in
> WHAT ways, exactly...?

You're right, random.sample(range(49),6) does the same and much faster.
I just didn't think of it (it is new since Python 2.3).

What if you need 12 different tips for your lotto ticket?

s = set(range(49)).powerset(6)
for x in range(10):
print s.pop()

But the real disadvantage of this idea is the memory consumed and the
time to set up that memory: set(range(49)).powerset(6) has a cardinality
of about 13 million entries! You PC would start to swap just for getting
a lotto tip...

-- Christoph