python bijection

R

Raymond Hettinger

[Joshua Bronson]
Raymond, do you think there might be any future in including a built-
in bidict data structure in Python?

I don't think so. There are several forces working against it:

* the recipe is new, so it hasn't had a chance to mature
or to gain a fan club.

* there are many approaches to the solving the problem and
there is no reason to assume this one is the best.

* it extends the language with arcane syntax tricks instead of
using the language as designed by Guido. That makes it harder
to learn and remember.

* we've already got one (actually two). The two dictionary approach
uses plain python, requires no new learning, and is more flexible.
Also, sqlite3 provides another way to use multiple lookups to a
single record. The database approach is much more general
(extending to trijections, allowing multiple sort orders,
providing persistence, etc).

* the semantics of a bijection aren't obvious:

b['x'] = 'ex' # first record: ('x', 'ex')
b['y'] = 'why' # second record: ('y', 'why')
b[:'why'] = 'x' # do two records collapse into one? is there
an error?

* the proposed syntax doesn't address the issue covered in my previous
post.
Since bijections are symmetrical, they do not have an obvious
direction
(which is the primary key, the husband or the wife?). The syntax
needs to
allow user names to make it clear which is being accessed:

marriages.h2w['john'] = 'amy'
marriages.w2h['amy'] = 'john'

Contrast this with:

marriages['jordan'] = 'taylor' # are you sure you got the
order correct?
marriages[:'taylor'] = 'jordan' # this is easy to get backwards


Raymond
 
J

Joshua Bronson

[Joshua Bronson]
Raymond, do you think there might be any future in including a built-
in bidict data structure in Python?

I don't think so.  There are several forces working against it:

* the recipe is new, so it hasn't had a chance to mature
  or to gain a fan club.

* there are many approaches to the solving the problem and
  there is no reason to assume this one is the best.

* it extends the language with arcane syntax tricks instead of
  using the language as designed by Guido.  That makes it harder
  to learn and remember.

* we've already got one (actually two).  The two dictionary approach
  uses plain python, requires no new learning, and is more flexible.
  Also, sqlite3 provides another way to use multiple lookups to a
  single record.  The database approach is much more general
  (extending to trijections, allowing multiple sort orders,
  providing persistence, etc).

all good points.
* the semantics of a bijection aren't obvious:

     b['x'] = 'ex'      # first record:  ('x', 'ex')
     b['y'] = 'why'     # second record: ('y', 'why')
     b[:'why'] = 'x'    # do two records collapse into one? is there
an error?

In my implementation the two records collapse into one, on the theory
that if you say it you mean it, but you're right that the semantics
aren't obvious, especially since in sql this would be an error. Thank
you for pointing this out, it totally slipped my mind to document it!
(Noted now.) If my bidict package ever has >1 user, and they prefer
this to be an error, I'd totally change it.
* the proposed syntax doesn't address the issue covered in my previous
post.
  Since bijections are symmetrical, they do not have an obvious
direction
  (which is the primary key, the husband or the wife?).  The syntax
needs to
  allow user names to make it clear which is being accessed:

     marriages.h2w['john'] = 'amy'
     marriages.w2h['amy'] = 'john'

  Contrast this with:

     marriages['jordan'] = 'taylor'    # are you sure you got the
order correct?
     marriages[:'taylor'] = 'jordan'   # this is easy to get backwards

The "namedbidict" class factory I wrote on your recommendation allows
for the former. But it's still up to the user to choose names which
indicate the direction of the mapping, whether she uses namedbidict or
not:

marriages.husbands['john'] = 'amy' # namedbidict, direction
unclear
h2w['john'] = 'amy' # regular bidict, but not unclear b/c name is
'h2w'


(you can use
False

to tell that husbands is the forward mapping, but that sucks -- better
to have called it h2w or some such in the first place.)


Thanks for the thoughtful reply.

Josh
 
A

Aahz

I noticed the phonebook example in your ActiveState recipe and thought
you might consider changing it to something like husbands to wives,
since the names-to-phone-numbers relation is many-to-many.

What makes you think husbands to wives is one-to-one? ;-) (Even
assuming monogamy, you have husbands-to-husbands and wives-to-wives.)
 
C

Chris Rebert

What makes you think husbands to wives is one-to-one?  ;-)  (Even
assuming monogamy, you have husbands-to-husbands and wives-to-wives.)

Reminds me of this quite funny blog post:
"Gay marriage: the database engineering perspective"
http://qntm.org/?gay

Cheers,
Chris
 
J

Joshua Bronson

What makes you think husbands to wives is one-to-one?  ;-)  (Even
assuming monogamy, you have husbands-to-husbands and wives-to-wives.)

Hah! I knew this was coming and even put "assuming monogamy" in the
source!
http://bitbucket.org/jab/bidict/src/712da6e2dd26/bidict.py#cl-65 ;P

As for husbands-to-husbands and wives-to-wives, those are just
separate one-to-one mappings! Doesn't mean husbands-to-wives ain't one-
to-one!

At any rate, apologies to the community for my heteronormative
example. It was merely pedagogical and reflects nothing about my
personal views! If you have any further concerns, please send them to
my lawyer, /dev/null.
 
A

Aahz

At any rate, apologies to the community for my heteronormative
example. It was merely pedagogical and reflects nothing about my
personal views! If you have any further concerns, please send them to
my lawyer, /dev/null.

Apology accepted. ;-)
 
G

geremy condra

Raymond said:
[Joshua Bronson]
Raymond, do you think there might be any future in including a built-
in bidict data structure in Python?

I don't think so.  There are several forces working against it:

* the recipe is new, so it hasn't had a chance to mature
  or to gain a fan club.

* there are many approaches to the solving the problem and
  there is no reason to assume this one is the best.

* it extends the language with arcane syntax tricks instead of
  using the language as designed by Guido.  That makes it harder
  to learn and remember.

* we've already got one (actually two).  The two dictionary approach
  uses plain python, requires no new learning, and is more flexible.
  Also, sqlite3 provides another way to use multiple lookups to a
  single record.  The database approach is much more general
  (extending to trijections, allowing multiple sort orders,
  providing persistence, etc).

* the semantics of a bijection aren't obvious:

     b['x'] = 'ex'      # first record:  ('x', 'ex')
     b['y'] = 'why'     # second record: ('y', 'why')
     b[:'why'] = 'x'    # do two records collapse into one? is there
an error?

* the proposed syntax doesn't address the issue covered in my previous
post.
  Since bijections are symmetrical, they do not have an obvious
direction
  (which is the primary key, the husband or the wife?).  The syntax
needs to
  allow user names to make it clear which is being accessed:

     marriages.h2w['john'] = 'amy'
     marriages.w2h['amy'] = 'john'

  Contrast this with:

     marriages['jordan'] = 'taylor'    # are you sure you got the
order correct?
     marriages[:'taylor'] = 'jordan'   # this is easy to get backwards

I think the only major CS data type missing from Python is some
form of (fast) directed graph implementation à la kjGraph:

   http://gadfly.sourceforge.net/kjbuckets.html

With these, you can easily build all sorts of relations between
objects and apply fast operations on them. In fact, it should then
be possible to build a complete relational database in Python
(along the lines of Gadfly).

If you're in the market for a Python graph library, you may want
to check out Graphine- I'm obviously biased (I wrote most of it)
but it has a few more bells and whistles than kjbuckets, and is
pretty darned easy to use. It also supports undirected and
bridge graphs.

Geremy Condra
 
F

Francis Carr

[In re R. Hettinger's critiques]
* it extends the language with arcane syntax tricks...
I think most of these in the current version of J. Bronson's "bidict"
can be left unused, or removed altogether. In almost all cases, a
bidict should be accessed as an ordinary python dict.

* we've already got one (actually two).
The two dictionary approach...
Solutions such as bidict just automate the two-dict approach.
  ...sqlite3 provides another way...
In many many cases, using a dB (even a lightweight such as sqlite3) is
swatting the fly with a sledgehammer :)

Since bijections are symmetrical, they do not have an obvious
direction (which is the primary key, the husband or the wife?).
I think this is easy to solve with a classmethod constructor that
produces a pair of "linked" dicts. For example,
husband2wife, wife2husband = bidict.makepair(('Fred', 'John'),
('Mary', 'Ruth'))
Obviously from the code this pair of bidicts are "linked", and the
direction of each mapping is obvious from its name. Metaprogramming
idioms like namedtuple are not required.

* the semantics of a bijection aren't obvious:
     b['x'] = 'ex'      # first record:  ('x', 'ex')
     b['y'] = 'why'     # second record: ('y', 'why')
     b[:'why'] = 'x'    # do two records collapse into one?
# is there an error?
Among the various critiques, I think this is the most significant.

When I was fiddling with my implementation, I was disturbed that the
code
bidict[newKey] = oldValue
should have the subtle side-effect
del bidict.inv[oldValue]

And there is a much stranger case. Suppose bidict[key1]=val1 and
bidict[key2]=val2. Then the code
bidict[key1] = val2
should have the extremely subtle side-effects
del bidict[key2] # because val2 has been re-mapped
del bidict.inv[val1] # because key1 has been re-mapped
Blech!

I think there must be something better. It would be interesting to
hear more opinions on the matter.

I propose raising ValueError when operating on one key would also
silently re-map or delete a different (key,value) pair. This would
disallow both of the above cases. To get the effect of the first
case, one would simply operate on the inverse mapping:
bidict.inv[oldValue] = newKey
This should not be confusing: it's exactly how a python dict would
operate, except the "linked" mapping is altered to match, which is the
bookkeeping we want to automate in the first place. To get the effect
of the second case, one would have to explicitly demand the side-
effects:
del bidict[key2]
del bidict.inv[val1]
bidict[key1] = val2
Also not confusing.


-- FC
 
G

geremy condra

Thanks for the hint :)

The lib looks nice and would probably serve as a good prototype
for writing a new built-in type for Python.

I suspect that it would have a better chance at getting into
collections than becoming a builtin, but who knows. I'd just
like to have something like it in the standard library.
This would have to be written in C, though,

That's currently in the works, along with database backing.
We'd welcome any help though... hint, hint...
and come under a Python compatible license.

I'm willing to dual license under the Python license if
there were a substantial interest in doing so, and I'm
confident that the other authors and maintainers
would feel the same way. The question in my mind is
whether such an interest exists.
With the built-in feature moratorium
currently in place, there's about 1.5-2 years time to get this
done; perhaps a good GSoC project for next year :)

I'd love to have Graphine be a GSoC project, although
if the target were to get it into collections the
moratorium wouldn't change the timeline AFAICS.

Geremy Condra
 
M

MRAB

M.-A. Lemburg said:
Integrating an easy-to-use graph library into the collections
module (and it's C companion) is good idea.


Great !


Since Python is being used more and more in CS classes,
such an addition would complete the tool-set and make Python
even more attractive for undergrad CS courses.

Finding out how much interest exists in advance is always
a bit difficult with new data-structures. People have to
get a feeling of how they can be put to good use first,
so it's a chicken-and-egg problem.

We've seen the same thing happen with sets. They were first
made available via a separate module and then became built-ins
after people realized how useful they are in practice.
I'd like to add that people (myself included) were already using dicts
for sets before the module was written, so there was already a clear
demand for them.
 
G

geremy condra

To be fair, I don't think you'd have to look very far to find places
where a graph representation is approximated using some
combination of dicts, sets, and lists. ElementTree comes to mind
immediately, and the dict-of-dicts idea for logging recently
discussed on python-dev explicitly states that it uses that
structure to represent object graphs. To me that says that there
is at least some demand.

Geremy Condra
 
L

Lie Ryan

To be fair, I don't think you'd have to look very far to find places
where a graph representation is approximated using some
combination of dicts, sets, and lists. ElementTree comes to mind
immediately, and the dict-of-dicts idea for logging recently
discussed on python-dev explicitly states that it uses that
structure to represent object graphs. To me that says that there
is at least some demand.

Though I've never used ElementTree extensively before, I thought it was
supposed to use a Tree structure (though it is a subset of graph, the
need for tree is much more common than full-blown graph package).
 
G

geremy condra

Though I've never used ElementTree extensively before, I thought it was
supposed to use a Tree structure (though it is a subset of graph, the need
for tree is much more common than full-blown graph package).

Sure, its a tree, which is also a graph. In this case it looks to
me more like a directed acyclic graph than anything, but its
pretty much just semantics since the interface is functionally
equivalent.

Geremy Condra
 
T

Terry Reedy

M.-A. Lemburg said:
Integrating an easy-to-use graph library into the collections
module (and it's C companion) is good idea.

The current thinking among deveopers is that new modules should be
written and maintained in Python, which all implementations can use,
with speed-critical parts written in C for speed and imported by the
Python code.

You can write a PEP offering Graphine. To be accepted, there would need
to be some evidence that is the best Python graph module avaible and has
some community support.

I would probably use it more that 90% of the stdlib.

Terry Jan Reedy
 
C

Carl Banks

Sure, its a tree, which is also a graph. In this case it looks to
me more like a directed acyclic graph than anything, but its
pretty much just semantics since the interface is functionally
equivalent.

I'd have to agree with Lie, yes a tree is a graph, but it's simply not
an argument that Python community is grasping for graph structures.
It's like arguing that the Python community could benefit from a
quaternion type, because quaternions are actually heavily used in
Python, because a scalar number is a quarternion.

Carl Banks

(Would be +1 on a good graph implementation... just not because of
ElementTree.)
 
L

Lie Ryan

I'd have to agree with Lie, yes a tree is a graph, but it's simply not
an argument that Python community is grasping for graph structures.
It's like arguing that the Python community could benefit from a
quaternion type, because quaternions are actually heavily used in
Python, because a scalar number is a quarternion.
>
Carl Banks

(Would be +1 on a good graph implementation... just not because of
ElementTree.)

I think this could be an interpretation of the Zen:

Simple is better than complex.
Complex is better than complicated.

can be read as:
List is better than Tree
Tree is better than Graph

not having Tree and Graph package in the standard library force most
people to find List-based solution. And people that know they need
graphs will find them in 3rd party modules. I have needed Trees a few
times in python, but very rarely a Graph (except for playing around).
YMDWV (your mileage definitely will vary).
 
G

geremy condra

I'd have to agree with Lie, yes a tree is a graph, but it's simply not
an argument that Python community is grasping for graph structures.
It's like arguing that the Python community could benefit from a
quaternion type, because quaternions are actually heavily used in
Python, because a scalar number is a quarternion.

Fair enough. I suspect that other examples could be provided
easily enough that I'm not going to fight over that one.
Carl Banks

(Would be +1 on a good graph implementation... just not because of
ElementTree.)

I'd love it if you'd take a look at Graphine and see whether
it would meet the standard for a good graph implementation.

Geremy Condra
 
C

Carl Banks

I think this could be an interpretation of the Zen:

Simple is better than complex.
Complex is better than complicated.

can be read as:
List is better than Tree
Tree is better than Graph

not having Tree and Graph package in the standard library force most
people to find List-based solution. And people that know they need
graphs will find them in 3rd party modules. I have needed Trees a few
times in python, but very rarely a Graph (except for playing around).
YMDWV (your mileage definitely will vary).

If you want a better example, consider various database schemas that
have one-to-one, one-to-many, many-to-one, and many-to-many
relationships. I daresay these are very common. All of these can be
represented by a non-directed graph.

Another common use of directed graphs is for dependency
relationships. In practice, a lot of times running things in order of
dependency is done by assigning everything a scalar priotity, and
executing in order of priority. This can work ok, but it's fragile.
If there's a graph type in Python maybe people will be encouraged to
handle dependencies explicitly.


Carl Banks
 
G

geremy condra

I think this could be an interpretation of the Zen:

Simple is better than complex.
Complex is better than complicated.

can be read as:
List is better than Tree
Tree is better than Graph

not having Tree and Graph package in the standard library force most people
to find List-based solution. And people that know they need graphs will find
them in 3rd party modules. I have needed Trees a few times in python, but
very rarely a Graph (except for playing around). YMDWV (your mileage
definitely will vary).

Where a list will do, use a list- duh. But when you need a graph, you
shouldn't have to homebrew an implementation any more than you
should have to homebrew an odict or named tuple, both of which
are substantially easier to get right than a graph is.

Geremy Condra
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,183
Messages
2,570,967
Members
47,518
Latest member
RomanGratt

Latest Threads

Top