Some thougts on cartesian products

  • Thread starter Christoph Zwerschke
  • Start date
C

Christoph Zwerschke

In Python, it is possible to multiply a string with a number:
'hellohellohello'

However, you can't multiply a string with another string:
Traceback (most recent call last):
File "<interactive input>", line 1, in ?
TypeError: can't multiply sequence by non-int

Sometimes I was missing such a feature.
What I expect as the result is the "cartesian product" of the strings.

Here is a simple implementation of new list and string objects that
explains what I mean. It also implements powers of lists and strings:

class plist(list):
"""list with cartesian product list"""

def __mul__(self, other):
if isinstance(other, pstr):
return plist([s+o for s in self for o in other])
if hasattr(other, '__getitem__'):
return plist([[s, o] for s in self for o in other])
else:
return list(self)*other

def __pow__(self, other):
if isinstance(other, int) and other > 0:
if other == 1:
return self
return self * self**(other-1)

class pstr(str):
"""str with cartesian product list"""

def __mul__(self, other):
if hasattr(other, '__getitem__'):
return plist([s+o for s in self for o in other])
else:
return str(self)*other

def __pow__(self, other):
if isinstance(other, int) and other > 0:
if other == 1:
return self
return self * self**(other-1)

With these new strings you can do the following:
['ace', 'acf', 'ade', 'adf', 'bce', 'bcf', 'bde', 'bdf']
['a1', 'a2', ..., 'a8', 'b1', 'b2', ..., 'b8',
...., ..., ..., 'h1', 'h2', ..., 'h8']
['AAA', 'AAC', 'AAG', 'AAU', 'ACA', 'ACC', 'ACG', ...,
...., 'UGC', 'UGG', 'UGU', 'UUA', 'UUC', 'UUG', 'UUU']

I think this can be quite handy at times and save some typing.

If Python strings had this ability, you could even outdo the
117 byte solution in the recent shortest Python coding contest
(http://www.pycontest.net), as follows:

j=''.join;seven_seg=lambda x:j(j(\
(' |'*'_ '*' |')[ord('|B¬@z”(ÀD°'[int(d)])%e]\
for d in x)+'\n'for e in(4,9,7))

This has only 110 bytes.

Or you could write a simple password cracker like that:

def crack(crypted, alphabet):
for passwd in alphabet**4:
if crypt(passwd, crypted[:2]) == crypted:
return passwd

And call it with alphabet = string.lowercase, for example.

Cartesian products may be generally interesting for iterables:

def imul(iterable1, iterable2):
"""cartesian product of two iterables"""
for object1 in iterable1:
for object2 in iterable2:
if isinstance(object1, basestring) and \
isinstance(object2, basestring):
yield object1 + object2
else:
yield (object1, object2)

def ipow(iterable, number):
"""cartesian product power of an iterable"""
if number == 1:
for object in iterable:
yield object
elif number > 1:
for object1 in iterable:
for object2 in ipow(iterable, number-1):
yield object1 + object2

class istr(str):
"""str with iterable cartesian product"""

def __mul__(self, other):
if isinstance(other, str):
return imul(self, other)
else:
return str(self)*other

def __pow__(self, other):
return ipow(self, other)

I was wondering if similar functionality could be added in some way to
Python. I noticed that Python has a lot of aggregate functions that can
"reduce" given collection objects (like reduce, filter, min, max, sum,
hash) and functions that keep the same size (like map, sort or zip), but
few functions that can "inflate" the given objects (like range and
enumerate). I know, such functions are dangerous because they also
inflate time and memory consumed by the program. Still, sometimes they
can make sense, whenever you for some reason simply *have* to walk
through all the combinations.
 
G

Giovanni Bajo

Christoph said:
Sometimes I was missing such a feature.
What I expect as the result is the "cartesian product" of the strings.

I've been thinking of it as well. I'd like it for lists too:
[(0,0), (0,1), (0,2), (1,0), (1,1), (1,2), (2,0), (2,1), (2,2)]
 
K

Kay Schluehr

Giovanni said:
Christoph said:
Sometimes I was missing such a feature.
What I expect as the result is the "cartesian product" of the strings.

I've been thinking of it as well. I'd like it for lists too:
[(0,0), (0,1), (0,2), (1,0), (1,1), (1,2), (2,0), (2,1), (2,2)]

But why isn't this interpreted as [0, 1, 4] like it is in Mathematica?

I would prefer a polymorphic distribute(*args) function ( or generator
) that acts on tuples of listlike objects of equal size. It could be
extended to distribute(*args[,default]) by a single default argument
that is inserted in the distribution table when two listlike objects
have not the same size.

Kay
 
A

Alex Martelli

Kay Schluehr said:
range(3)**2
[(0,0), (0,1), (0,2), (1,0), (1,1), (1,2), (2,0), (2,1), (2,2)]
...
But why isn't this interpreted as [0, 1, 4] like it is in Mathematica?

Since range(3)*2 is [0, 1, 2, 0, 1, 2], it would be horribly, painfully
inconsistent if **2 was interpreted as "square each item".


Alex
 
C

Christoph Zwerschke

Kay said:
But why isn't this interpreted as [0, 1, 4] like it is in Mathematica?

Because we are thinking of a cartesian product. If you have lists of
numbers, then there are mane more ways to define a product: tensor
product, vector product, scalar product, componentwise product...

The cartesian product is a much more generic concept, you can have it
already for sets:

class sset(set):

def __mul__(self, other):
return sset((a,b) for a in self for b in other)

def __pow__(self, other):
if isinstance(other, int) and other > 0:
if other == 1:
return self
elif other == 2:
return self*self
else:
return sset(a + (b,) \
for a in self**(other-1) for b in self)

Example:

for x in sorted(sset("ACGU")**3):
print ''.join(x)

AAA
AAC
AAG
AAU
ACA
ACC
..
..
..
UGU
UUA
UUC
UUG
UUU

Now as I'm thinking about it, wouldn't it be nice to have the cartesian
products on Python sets? Maybe also a method that returns the power set
of a set (the set of all subsets), or the set of all subsets with a
given length. You could get a 6/49 lotto tip with something like:

choice(set(range(49)).powerset(6))

-- Christoph
 
C

Christoph Zwerschke

Alex said:
Kay Schluehr said:
range(3)**2
But why isn't this interpreted as [0, 1, 4] like it is in Mathematica?

Since range(3)*2 is [0, 1, 2, 0, 1, 2], it would be horribly, painfully
inconsistent if **2 was interpreted as "square each item".

Yes. Python does not interpreate the product of a list with a number as
a scalar product. Otherwise range(3)*2 should be [0, 1, 4] as well.

For doing such things I would use a vector subtype of list.

-- Christoph
 
A

Alex Martelli

Christoph Zwerschke said:
given length. You could get a 6/49 lotto tip with something like:

choice(set(range(49)).powerset(6))

And that would be better than the current random.sample(range(49),6) in
WHAT ways, exactly...?


Alex
 
P

Paul Rubin

And that would be better than the current random.sample(range(49),6) in
WHAT ways, exactly...?

I think the first one would be incorrect since it samples with
replacement. At least, it looks like it does.
 
S

Steven D'Aprano

Alex said:
Kay Schluehr said:
range(3)**2
But why isn't this interpreted as [0, 1, 4] like it is in Mathematica?

Since range(3)*2 is [0, 1, 2, 0, 1, 2], it would be horribly, painfully
inconsistent if **2 was interpreted as "square each item".

Yes. Python does not interpreate the product of a list with a number as
a scalar product. Otherwise range(3)*2 should be [0, 1, 4] as well.

For doing such things I would use a vector subtype of list.

Not everything needs to be a separate class! Why create a magic class for
every piece of functionality you want? Just create functions that operate
on existing classes!

Instead of a class that supports cartesian products, make a function that
takes two sequences and returns the cartesian product of them. (This will
likely be best implemented as a generator.) If you write it properly,
which is to say if you don't go out of your way to break it, this function
will *automatically* work on any sequence type, lists, tuples, strings,
and things you and I haven't even thought of.

What advantage is there to creating a "list with cartesian product"
subclass of list?
 
C

Christoph Zwerschke

And that would be better than the current random.sample(range(49),6) in
WHAT ways, exactly...?

You're right, random.sample(range(49),6) does the same and much faster.
I just didn't think of it (it is new since Python 2.3).

What if you need 12 different tips for your lotto ticket?

s = set(range(49)).powerset(6)
for x in range(10):
print s.pop()

But the real disadvantage of this idea is the memory consumed and the
time to set up that memory: set(range(49)).powerset(6) has a cardinality
of about 13 million entries! You PC would start to swap just for getting
a lotto tip...

-- Christoph
 
C

Christoph Zwerschke

Steven said:
Not everything needs to be a separate class! Why create a magic class for
every piece of functionality you want? Just create functions that operate
on existing classes!
>
> What advantage is there to creating a "list with cartesian product"
> subclass of list?

Principally, you're right (see also my example with iterators).

But I can still see two reasons for classes:

1) That function would have to make a lot of case distinctions (check
the types of operands). If you have a class, you already know the type
of the operands (at least one).

2) It allows you to write a*b instead of mul(a,b) which looks nicer.

-- Christoph
 
C

Christoph Zwerschke

Paul said:
I think the first one would be incorrect since it samples with
replacement. At least, it looks like it does.

No, the elements of the powerset would be sets with 6 elements each, not
tuples. So technically, it would be correct. Just horribly inefficient.

-- Christoph
 
P

Paul Rubin

Christoph Zwerschke said:
No, the elements of the powerset would be sets with 6 elements each,
not tuples. So technically, it would be correct. Just horribly
inefficient.

Oh I see, not the Cartesian product. Yeah, it would be silly in
practice.
 
A

Alex Martelli

Steven D'Aprano said:
What advantage is there to creating a "list with cartesian product"
subclass of list?

Essentially, syntax sugar -- for some people, being able to code a*b
rather than product(a,b) takes on a huge significance; Python chooses to
support this syntax variation by special methods in classes, and thus
encourages people to create classes if they're keen on the syntax.


Alex
 
A

Alex Martelli

Christoph Zwerschke said:
You're right, random.sample(range(49),6) does the same and much faster.
I just didn't think of it (it is new since Python 2.3).

Yep, it's one of 2.3's many little gems. Still, builtin set is new in
2.4, so it's hardly a "more classic" approach;-).

What if you need 12 different tips for your lotto ticket?

You probably want random ones, then...
s = set(range(49)).powerset(6)
for x in range(10):
print s.pop()

This is very systematic, not random;-). Still, using random.sample on s
would indeed produce 12 random different tips!-)
But the real disadvantage of this idea is the memory consumed and the
time to set up that memory: set(range(49)).powerset(6) has a cardinality
of about 13 million entries! You PC would start to swap just for getting
a lotto tip...

Well then, you need a new PC, 64-bit and with as many GB of RAM as
required, no?-)

Until you get such a PC, something like:

tips = set()
while len(tips) < 10:
tip = frozenzet(random.sample(range(49), 6))
tips.add(tip)

will have to suffice, I guess;-).


Alex
 
C

Christoph Zwerschke

Generally, if you could multiply strings in the above fashion, you could
spare one more more sub loops, as in this example:

for f in ('index', 'default')*('.html', '.htm', '.shtml'):
if exists(f):
break

In this case, it would be not really be better than that:

for f in 'index', 'default':
for e in '.html', '.htm', '.shtml':
if exists(f+e):
break

The password cracking was already a better example.

But this would be only efficient if the product of two strings is
returned as a generator, not a list. And I don't know if you really
would expect that, since multiplying a string or a list with a number
does not change its type.

-- Christoph
 
S

Steven D'Aprano

Essentially, syntax sugar -- for some people, being able to code a*b
rather than product(a,b) takes on a huge significance; Python chooses to
support this syntax variation by special methods in classes, and thus
encourages people to create classes if they're keen on the syntax.

I beg to differ: Python *allows* people to create classes if they're keen
on the syntax (and I can sympathise with that like), but Python
*encourages* by example generic tools that operate on as many different
types as makes sense.
 
C

Christoph Zwerschke

Alex said:
This is very systematic, not random;-). Still, using random.sample on s
would indeed produce 12 random different tips!-)

Right, that would be systematic. What I wanted to write was:

s = set(range(49)).powerset(6)
for x in range(10):
c = choice(s)
print c
s.remove(c)

Of course you could also use random.sample again:
random.sample(set(range(49)).powerset(6), 10)
But this would be just as inefficient.
tips = set()
while len(tips) < 10:
tip = frozenzet(random.sample(range(49), 6))
tips.add(tip)

Yep, that's better. The amount of hand coding is even the same as above.

-- Christoph
 
C

Christoph Zwerschke

Steven said:
I beg to differ: Python *allows* people to create classes if they're keen
on the syntax (and I can sympathise with that like), but Python
*encourages* by example generic tools that operate on as many different
types as makes sense.

But these generic tools (say the "pow" function) in turn simply use the
methods defined by the classes.

-- Christoph
 
S

Steven D'Aprano

Principally, you're right (see also my example with iterators).

But I can still see two reasons for classes:

1) That function would have to make a lot of case distinctions (check
the types of operands). If you have a class, you already know the type
of the operands (at least one).

If you are happy to always return a list of tuples regardless of what the
two operands are, generators make it so easy it is shameful. Even if you
want a special case of two string arguments returning a string, it is
hardly any more difficult:

def cartprod(A, B):
if type(A) == type(B) == str:
convert = lambda obj: "".join(list(obj))
else:
convert = lambda obj: obj # do nothing
for a in A:
for b in B:
yield convert((a, b))

Notice that the *only* reason we look at the type of the arguments is
because we want two strings to return a string. If we don't care about
that, the generator is less than half the size:

def cartprod(A, B):
for a in A:
for b in B:
yield (a, b)



That's a generator giving you the products one at a time. For the
benefit of anyone out there who doesn't know about generators, you use it
like this:
cprods = cartprod([1,2,3], "abc")
for t in cprods:
.... print t
....
(1, 'a')
(1, 'b')
(2, 'a')
(2, 'b')
(3, 'a')
(3, 'b')

If you want them all at once, memory allowing, you do this:
cprods = cartprod([1,2,3], "abc")
list(cprods)
[(1, 'a'), (1, 'b'), (2, 'a'), (2, 'b'), (3, 'a'), (3, 'b')]



2) It allows you to write a*b instead of mul(a,b) which looks nicer.

But not as clear as cartprod(a,b) or even cartesian_product(a,b).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,150
Members
46,696
Latest member
BarbraOLog

Latest Threads

Top