Some thougts on cartesian products

  • Thread starter Christoph Zwerschke
  • Start date
C

Christoph Zwerschke

Steven said:
If you are happy to always return a list of tuples regardless of what the
two operands are, generators make it so easy it is shameful. Even if you
want a special case of two string arguments returning a string, it is
hardly any more difficult:

def cartprod(A, B):
if type(A) == type(B) == str:
convert = lambda obj: "".join(list(obj))
else:
convert = lambda obj: obj # do nothing
for a in A:
for b in B:
yield convert((a, b))

I didn't deny that it's handy; my "imul" example was pretty much the
same. If you don't want to use the a*b syntax, it's a good solution.

-- Christoph
 
B

Bryan Olson

Christoph said:
In Python, it is possible to multiply a string with a number:

'hellohellohello'

Which is really useful.
However, you can't multiply a string with another string:

Traceback (most recent call last):
File "<interactive input>", line 1, in ?
TypeError: can't multiply sequence by non-int

Sometimes I was missing such a feature.
What I expect as the result is the "cartesian product" of the strings.

There's no such thing; you'd have to define it first. Are duplicates
significant? Order?

What you seem to want is easy enough:

[a + b for a in 'hello' for b in 'world']

[...]
Cartesian products may be generally interesting for iterables:

And maybe you want the result to be a generator:

(a + b for a in 'hello' for b in 'world')


New language features should be widely useful, and difficult or
awkward to code in Python as it is. All-combinations-of-sequences
is trivial to code and rarely needed.
 
C

Christoph Zwerschke

Bryan said:
What you seem to want is easy enough:
[a + b for a in 'hello' for b in 'world']

Yes, but 'hello'*'world' is easier.

And how would you write '01'**8 ? This should give you all bytes in
binary code.
And maybe you want the result to be a generator:
(a + b for a in 'hello' for b in 'world')

That may be the main problem to decide whether the cartesian product
should return a generator or a list. A list would be more intuitive and
can be accessed directly. For instance,

('01'**8)[42]

would give you the binary code of 42 (though in an inefficient way).
But because the inflationary character of cartesian products, a
generator would be also desirable (though you wouldn't expect '01'**2 to
be a generator, since '01'*2 isn't a generator either).
New language features should be widely useful, and difficult or
awkward to code in Python as it is. All-combinations-of-sequences
is trivial to code and rarely needed.

That's the other problem. The uses cases (like the password cracker
example) are very limited and in these cases you can either write nested
loops or write your own cartesian product.

-- Christoph

BTW: What is the shortest way to get the binary representation of a
number in Python? Is there really not something like itoa() anywhere in
the standard libs?
 
B

Bryan Olson

Christoph Zwerschke wrote:
[...]
That may be the main problem to decide whether the cartesian product
should return a generator or a list.

The Cartesion product is a set.

[...]
That's the other problem. The uses cases (like the password cracker
example) are very limited and in these cases you can either write nested
loops or write your own cartesian product.

Cartesian product is one of the essential operations of
relational algebra; in that context it's widely useful.
By itself, it's usually not what one wants.
 
S

Steven D'Aprano

Christoph Zwerschke wrote:
[...]
That may be the main problem to decide whether the cartesian product
should return a generator or a list.

The Cartesion product is a set.

And the generalization of mathematical sets in Python can be built-in
sets, lists or tuples, depending on what you need.

Given that cartesian products tend to be *extremely* large, some sort of
iterator is the only practical solution -- even if that's not
mathematically pure.

[...]
That's the other problem. The uses cases (like the password cracker
example) are very limited and in these cases you can either write nested
loops or write your own cartesian product.

Cartesian product is one of the essential operations of
relational algebra; in that context it's widely useful.
By itself, it's usually not what one wants.

Google on "Cartesian product python" and you will find thousands of hits.
This is something that keeps coming up over and over again.

Personally, I think cartesian products, together with permutations and
combinations, belong in a module, not built-ins.
 
E

Erik Max Francis

Bryan said:
Christoph Zwerschke wrote:
[...]
That may be the main problem to decide whether the cartesian product
should return a generator or a list.

The Cartesion product is a set.

Sets are iterable, so that isn't an answer. If it's a set, then it
should either be a generator or a set.
 
C

Christoph Zwerschke

Bryan said:
Christoph Zwerschke wrote:
[...]
That may be the main problem to decide whether the cartesian product
should return a generator or a list.

The Cartesion product is a set.

Of course it is a set. But if the factors of the product have a total
order (as in the case of strings, tuples or lists), this carries over to
the cartesian product (lexicographical order). The implementation should
reflect that by giving you a list or a generator, not a set.

Only if I build the cartesian product of sets I would expect a set.

"ab"*"cd" should be "ac", "ad", "bc", "bd", in this order.
Cartesian product is one of the essential operations of
relational algebra; in that context it's widely useful.
By itself, it's usually not what one wants.

Usually not. But sometimes you may want it. That's why SQL defines has
the "cross join" clause. Usually it is not needed and should be avoided,
but sometimes it may make sense.

-- Christoph
 
B

Bryan Olson

Steven said:
Bryan said:
Christoph Zwerschke wrote:
[...]
That may be the main problem to decide whether the cartesian product
should return a generator or a list.

The Cartesion product is a set.

And the generalization of mathematical sets in Python can be built-in
sets, lists or tuples, depending on what you need.

Given that cartesian products tend to be *extremely* large, some sort of
iterator is the only practical solution -- even if that's not
mathematically pure.

Query languages have included Cartesian product for decades. Their
solution is to optimize expressions to avoid building, or even
iterating over, big Cartesian products.

[...]
That's the other problem. The uses cases (like the password cracker
example) are very limited and in these cases you can either write nested
loops or write your own cartesian product.

Cartesian product is one of the essential operations of
relational algebra; in that context it's widely useful.
By itself, it's usually not what one wants.

Google on "Cartesian product python" and you will find thousands of hits.
This is something that keeps coming up over and over again.

It keeps coming up in exercises and examples. How many of your
Google results note an application that actually calls for a
Cartesian product enumerator? I wrote one an old thread, and so
far I've written it one more time than I've used it.
 
B

Bryan Olson

Steven said:
Bryan Olson wrote:
[Christoph Zwerschke had written:]
Google "cartesian product" and hit "I'm feeling lucky".

Or go here: http://mathworld.wolfram.com/CartesianProduct.html

Still think there is no such thing?

Uh, yes.

The Cartesian product of two sets A and B (also called the
product set, set direct product, or cross product) is defined to
be the set of [...]

All sets, no strings. What were you looking at?

(I'm also happy to see someone agrees with my capitalization of
"Cartesian".)
 
K

Kay Schluehr

Bryan said:
There's no such thing; you'd have to define it first. Are duplicates
significant? Order?

That's all trivial isn't it? A string is a set of pairs (i,c) where i
is an integer number, the index, with 0<=i<j<=card(string)-1, for
(i,c), (j,d) in string and c is a character. Since there is a natural
order on integer numbers we can induce this order on the string, so
that it is an ordered set. For representation purposes we can omit i
since i is simply the position of c in the sequence. Same holds for
lists, tuples or other sequences.

Kay
 
S

Steven D'Aprano

Steven said:
Bryan Olson wrote:
[Christoph Zwerschke had written:]
Google "cartesian product" and hit "I'm feeling lucky".

Or go here: http://mathworld.wolfram.com/CartesianProduct.html

Still think there is no such thing?

Uh, yes.

The Cartesian product of two sets A and B (also called the
product set, set direct product, or cross product) is defined to
be the set of [...]

All sets, no strings. What were you looking at?

You've just quoted a definition of Cartesian product [yes, you are right
about capitalisation, my bad]. How can you say with a straight face that
there is no such thing as a Cartesian product?

The question of sets versus strings is a red herring. Addition, in
mathematics, is defined on reals. No computer yet made can do addition on
reals. Should you declare that there is no such thing as addition because
reals and floats are different?

If people wish to extend Cartesian products to work on sequences, not just
sets, then to my mind that's a pretty obvious and sensible generalisation.
 
M

mensanator

Steven said:
Steven said:
Bryan Olson wrote:
[Christoph Zwerschke had written:]
What I expect as the result is the "cartesian product" of the strings.

There's no such thing; you'd have to define it first. Are duplicates
significant? Order?


Google "cartesian product" and hit "I'm feeling lucky".

Or go here: http://mathworld.wolfram.com/CartesianProduct.html

Still think there is no such thing?

Uh, yes.

The Cartesian product of two sets A and B (also called the
product set, set direct product, or cross product) is defined to
be the set of [...]

All sets, no strings. What were you looking at?

You've just quoted a definition of Cartesian product [yes, you are right
about capitalisation, my bad]. How can you say with a straight face that
there is no such thing as a Cartesian product?

The question of sets versus strings is a red herring. Addition, in
mathematics, is defined on reals. No computer yet made can do addition on
reals. Should you declare that there is no such thing as addition because
reals and floats are different?

If people wish to extend Cartesian products to work on sequences, not just
sets, then to my mind that's a pretty obvious and sensible generalisation.

I use Cartesian Products all the time in MS-Access, typically the
product of a set (of records) with itself. I really like your generator
because I can now do this stuff directly in Python. And I usually
want just a subset of the Cartesian Product, so I modified your
generator to produce what I want:

def cartprod(A, B, perm=True, repl=True):
if type(A) == type(B) == str:
convert = lambda obj: "".join(list(obj))
else:
convert = lambda obj: obj # do nothing
for a in A:
for b in B:
if perm and repl: # permutation with replacement
yield convert((a, b))
if (not perm) and repl: # combination with replacement
if b>=a:
yield convert((a,b))
if perm and (not repl): # permutation w/o replacement
if b!=a:
yield convert((a,b))
if (not perm) and (not repl): # combination w/o replacement
if b>a:
yield convert((a,b))


print
print 'For "abc"x"abc":'

print
cprods = cartprod("abc", "abc", True, True)
print 'permutation with replacement', list(cprods)

print
cprods = cartprod("abc", "abc", False, True)
print 'combination with replacement', list(cprods)

print
cprods = cartprod("abc", "abc", True, False)
print 'permutation w/o replacement', list(cprods)

print
cprods = cartprod("abc", "abc", False, False)
print 'combination w/o replacement', list(cprods)

"""

For "abc"x"abc":

permutation with replacement ['aa', 'ab', 'ac', 'ba', 'bb', 'bc', 'ca',
'cb', 'cc']

combination with replacement ['aa', 'ab', 'ac', 'bb', 'bc', 'cc']

permutation w/o replacement ['ab', 'ac', 'ba', 'bc', 'ca', 'cb']

combination w/o replacement ['ab', 'ac', 'bc']

"""

Thanks for showing me an easy way to do this.
 
C

Christoph Zwerschke

Bryan said:
Still think there is no such thing?

Uh, yes.

The Cartesian product of two sets A and B (also called the
product set, set direct product, or cross product) is defined to
be the set of [...]

All sets, no strings. What were you looking at?

Not only sets. This goes on (anyway "everything is a set"). You can also
have the Cartesian product of functions. And you can think of a string
as a function from a countable index set I to the set of all characters
C. So the Cartesian product of two strings will become a function from
IxI to CxC. Since IxX is countable again, this is equivalent to a tuple
of 2-tuples of characters which you can also interpret as a tuple of
strings with 2 chars:

"ab" x "cd" = ("ac", "ad", "bc", "bd")

Do I have eliminated all remaining clarities now? :)

-- Christoph
 
B

Bryan Olson

Steven said:
Steven said:
Bryan Olson wrote:
[Christoph Zwerschke had written:]
What I expect as the result is the "cartesian product" of the strings.

There's no such thing; you'd have to define it first. Are duplicates
significant? Order?


Google "cartesian product" and hit "I'm feeling lucky".

Or go here: http://mathworld.wolfram.com/CartesianProduct.html

Still think there is no such thing?

Uh, yes.

The Cartesian product of two sets A and B (also called the
product set, set direct product, or cross product) is defined to
be the set of [...]

All sets, no strings. What were you looking at?

You've just quoted a definition of Cartesian product [yes, you are right
about capitalisation, my bad]. How can you say with a straight face that
there is no such thing as a Cartesian product?

Ah, I think I see. I said there's no such thing as the Cartesian
product of stings. You thought I claimed that there's no such thing
as Cartesian product at all. Try re-reading what I wrote and what
your chosen reference says.

The question of sets versus strings is a red herring. Addition, in
mathematics, is defined on reals. No computer yet made can do addition on
reals. Should you declare that there is no such thing as addition because
reals and floats are different?

If people wish to extend Cartesian products to work on sequences, not just
sets, then to my mind that's a pretty obvious and sensible generalisation.

You lost me. Are you advocating implementing it without defining
it? Or are you saying the the significance of duplicates and order
are so obvious that those were silly questions?
 
B

Bryan Olson

Kay said:
Bryan Olson wrote:




That's all trivial isn't it? A string is a set of pairs (i,c) where i
is an integer number, the index, with 0<=i<j<=card(string)-1, for
(i,c), (j,d) in string and c is a character. Since there is a natural
order on integer numbers we can induce this order on the string, so
that it is an ordered set. For representation purposes we can omit i
since i is simply the position of c in the sequence. Same holds for
lists, tuples or other sequences.

Any of the combinations of duplicates and order being significant
or insignificant are easy to define, and easy to specify when
significant. Anticipating the implications of the choices is the
tricky part.
 
V

val bykoski

Christoph said:
Bryan said:
Still think there is no such thing?


Uh, yes.

The Cartesian product of two sets A and B (also called the
product set, set direct product, or cross product) is defined to
be the set of [...]

All sets, no strings. What were you looking at?


Not only sets. This goes on (anyway "everything is a set"). You can also
have the Cartesian product of functions. And you can think of a string
as a function from a countable index set I to the set of all characters
C. So the Cartesian product of two strings will become a function from
IxI to CxC. Since IxX is countable again, this is equivalent to a tuple
of 2-tuples of characters which you can also interpret as a tuple of
strings with 2 chars:

"ab" x "cd" = ("ac", "ad", "bc", "bd")

Do I have eliminated all remaining clarities now? :)

-- Christoph

Christoph,
i think you raised a great issue: a lack of efficient support for
"combining" objects. Any language, if has smth to do with reality,
needs that kind of functionality. The combination dynamics, or growth
(multiplication) dynamics is a critically important functionality in
chemistry, physics, biology. It probably may be emulated by standard
means such as lists and dictionaries. If such support is available
though, this is a sign of mature language designed to cover the
realistic processes with rich growth/combination dynamics.
For instance, the dynamics of aperiodic growth that generates a 3D
aperiodic arrays/structures with the controllable "bits" in each unit to
be configured by dynamic masks to match the environmental ("boundary")
conditions would be a significant step in building next-generation
languages/silicon to support synthesis of realistic 3D structures (and
functions). Accordingly, the command "line" may need to be 2D and the
interpreter be designed to handle/understand not only a (command) text.
Just reflecting aloud..
val
 
K

Kay Schluehr

Christoph said:
Now as I'm thinking about it, wouldn't it be nice to have the cartesian
products on Python sets? Maybe also a method that returns the power set
of a set (the set of all subsets), or the set of all subsets with a
given length.

For defining powersets it might suffice to include sets as exponents.
If A is a set then
Set([0,1])**A is the set of maps from A into {0,1}. The kernels of
those maps ( preimages of 0 in A ) is a subset classifier of A.

Kay
 
B

Bryan Olson

Christoph said:
Bryan said:
Still think there is no such thing?


Uh, yes.

The Cartesian product of two sets A and B (also called the
product set, set direct product, or cross product) is defined to
be the set of [...]

All sets, no strings. What were you looking at?

Not only sets.

Snipping is not going to make the facts go away. I did not
choose the reference at issue in this strand:

http://mathworld.wolfram.com/CartesianProduct.html
> This goes on (anyway "everything is a set").

The claim "everything is a set" falls into the category of
'not even wrong'. Whatever semantics Python adopts, it must
be well-defined.

Watch things not be sets:

x = [1, 1, 2]
y = [1, 2]
print x == y
print set(x) == set(y)

> You can also
have the Cartesian product of functions. And you can think of a string
as a function from a countable index set I to the set of all characters
C. So the Cartesian product of two strings will become a function from
IxI to CxC. Since IxX is countable again, this is equivalent to a tuple
of 2-tuples of characters which you can also interpret as a tuple of
strings with 2 chars:

"ab" x "cd" = ("ac", "ad", "bc", "bd")

I really did try to raise the real issues. I cannot make you answer,
but the question remains: are duplicate and order significant in
what you call "Cartesian product" or they not? Can you show that
your proposed language extensions are useful and consistent in
some reasonable sense?
Do I have eliminated all remaining clarities now? :)

Yes. Good one. Sure.
 
C

Christoph Zwerschke

Bryan said:
The claim "everything is a set" falls into the category of
'not even wrong'.

No, it falls into the category of the most fundamental Mathematical
concepts. You actually *define* tuples as sets, or functions as sets or
relations as sets, or even all kinds of numbers and other things which
exist in the heads of Mathematicians as sets.
Watch things not be sets:

x = [1, 1, 2]
y = [1, 2]
> print x == y
> print set(x) == set(y)

Python tuples and lists are of course not the same as Python sets.
But mathematically, you can understand them as sets anyway and associate
every Python tuple with a Python set. The naive approach to understand a
tuple as the set of its values which is done by casting to set() does
not work, as you rightly noticed. The associated set to a Python tuple
or list x would be set(enumerate(x)), not set(x).

Generally, two approaches are common for constructing tuples as sets:

(A) Think of an n-tuple as a function on the index set, range(n). Then
remember a function is a special relation is a set.

(1, 2, 2) would correspond to the set {(0, 1), (1, 1), (1, 2)}
(1, 2) would correspond to the set {(0, 1), (1, 2)}

In Python, the tuple or list x would correspond to set(enumerate(x)).

As a sidemark, another common approach is this:

(B) Define the set corresponding to (1, 2) as {1, 2}. Define the set
corresponding to (1, 2, 2) as {{1, 2}, 2}, the set corresponding to (1,
2, 2, 4) as {{{1, 2}, 2}, 4} and so on.
I really did try to raise the real issues. I cannot make you answer,
but the question remains: are duplicate and order significant in
what you call "Cartesian product" or they not? Can you show that
your proposed language extensions are useful and consistent in
some reasonable sense?

I already tried to answer. It is not what "I call" Cartesian product.
Since functions are sets in Mathematics, once you have a Cartesian
product on sets, there is a natural (canonical) way to define a
Cartesian product on functions as well. So there is also a canonical way
to define a Cartesian product on tuples, if you interpret tuples as
functions via (A). And there is a canonical way to understand the
resulting sets as tuples again (by the lexicographical order of the
index set).

So the cartesian product of a string, tuple, or list is well-defined
including its order.

The only ambiguity is whether the result should be a generator or a
tuple, and in the case of strings whether the elements in the result
should be returned as tuples,
"ab"*"cd" = ("a", c"), ("a", "d"), ("b", "c"), ("b", "d")
or concatenated as strings:
"ab"*"cd" = "ac", "ad", "bc", "bd"

In any way, there is no dispute about duplicates or ordering. This is
all canonical and well-defined.

Concerning the use, I admit there is no really frequent use, but in some
occasions it may be useful and I already gave some examples.

-- Christoph
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,282
Messages
2,571,404
Members
48,096
Latest member
Kenkian2628

Latest Threads

Top