Standard graph API?

A

Andrew Dalke

I've been off-line for a couple months so a bit late
following up to this thread...

I too would like some standard collection of graph
algorithms, but not necessarily a standard API. I
work with a lot of molecular graphs which means I
would prefer using names like "atoms" and "bonds"
instead of "nodes" and "edges".

David said:
I would strongly prefer not to have weights or similar attributes as
part of a graph API. I would rather have the weights be a separate dict
or function or whatever passed to the graph algorithm. The main reason
is that I might want the same algorithm to be applied to the same graph
with a different set of weights.

An alternative, which is both intriguing and sends alarm
bells ringing in my head is to have the algorithm
collection instead generate code as needed, so that
I can ask for, say, "depth first search using '.atoms'
to get a list of neighboring nodes." The result could
exec'ed to generate usable Python dynamically, or
written to a file to be used as a normal Python module.

In this case if the DFS generates callback events then
it could include options to only create the code for
events that are needed.

There would need to be some standards on how the
graph is used, like "two nodes are the same iff
'node1 is node2'" or "the result of getting the list
of edges is iterable."

I believe this design philosophy is similar to Boost's.

Andrew
(e-mail address removed)
 
M

Magnus Lie Hetland

I've been off-line for a couple months so a bit late
following up to this thread...

Well, I'm still looking for replies, so... ;)
I too would like some standard collection of graph
algorithms, but not necessarily a standard API.

Hm. How would the algorithms work without a standard API?

To clarify, by API I here mean a protocol, i.e. a definition of how a
graph is supposed to behave -- *not* a standard implementation.
I work with a lot of molecular graphs which means I would prefer
using names like "atoms" and "bonds" instead of "nodes" and "edges".

Sure -- but unless we have some standard protocol/API, it would be
hard to get the algorithms to work with your graphs...

I've been thinking a bit about the use of object adaptation here; I
think it would be quite perfect. One possibility would be to use the
functionality of PyProtocols, but that's hardly standard... But it
would allow you to do stuff like

graph = AdjacencyMap(someStrangeMolecularGraph)
# or graph = adapt(someStrange..., AdjacencyMap)
graphAlgorithm(graph)

or the like...

We could then have a few different perspectives on graphs, such as
adjmap, incmap, adjarray and edgelist, for example.

Just loose thoughts.

An adjacency map would work just like a dict of lists (or something
equivalent) and a dict of list could be used as one -- except,
perhaps, for some "advanced" features which would be optional.
(Modifying the graph is a bit awkward with this syntax, especially as
you have to know whether the adjacency collection is a list or a set,
for example.)
An alternative, which is both intriguing and sends alarm bells
ringing

Sounds like a fun combination ;)
in my head is to have the algorithm
collection instead generate code as needed, so that
I can ask for, say, "depth first search using '.atoms'
to get a list of neighboring nodes."

Wouldn't it be *much* better to use the established (but not standard)
mechanism of object adaptation, as championed in PEP 246 and as
implemented and extended upon in PyProtocols?

Note that we could certainly do with *only* the PEP 246 mechanism
(perhaps with some minor updates to the PEP) *without* PyProtocols, if
we wanted a more minimalist solution.

(If only adapt() could become a standard library function... said:
The result could exec'ed to generate usable Python dynamically, or
written to a file to be used as a normal Python module.

See above -- adapt() would be much better IMO, and would fill the same
need (as I see it).

[snip]
There would need to be some standards on how the graph is used, like
"two nodes are the same iff 'node1 is node2'"

Yeah. Or one might want to allow equality to determine this, so that
the implementer could decide how to do it (nodes might be dynamically
created front-end objects after all).
or "the result of getting the list of edges is iterable."

Right -- that's typically the kind of API definition I'm after.

Basically what I was proposing is a sort of "Python Graph API" in the
same vein as the "Python DB API" (PEP 249), that is, simply an
informative PEP about how graphs should behave to ensure
interoperability, possibly with some standard wrapper functions (or
maybe not).

[snip]

It seems there are at least a few people who are interested in the
general concept, though. In case there is any merit in the idea, I've
set up a temporary wiki at

http://www.idi.ntnu.no/~mlh/python-graph-wiki.cgi

I'll post a separate announcement about it.
 
A

Andrew Dalke

Magnus said:
Hm. How would the algorithms work without a standard API?

There are certain things the different graphs have in
common. For example,
1. "test if two nodes/edges are the same"
2. "test if two nodes/edges are identically colored"
3. "list of all nodes adjacent to a node"
4. "list of all edges from a node" (either directed or
undirected)
5. "get the appropriate weight"

Different graphs have different ways to do this.
My molecular graphs do this with

1. atom1 is atom2
2. atom_compare(atom1, atom2)
3. atom.xatoms
4. atom.bonds
5. atom.weight # though that's the atomic weight and
# nothing to do with flow :)

An adjacency graph might do this with

1. node1 is node2
2. node1 == node2
3. edge_table[node]
4. -- not defined --
5. weights[node]

The ways to get the properties differ but the things
you do with them do not change.

I can concieve then some way to generate code
based on a template, like this

dfs = make_code(dfs_template,
args = "node, handler",
bond_neighbors = "node.xatoms",
on_enter = "handler.enter(node)")

.. make the graph ...
class Handler:
def enter(self, node):
print "Hi", node

dfs(graph, Handler())

or for an adjacency graph, something like

dfs = make_code(dfs_template,
args = "bond_table, node, handler",
get_neighbors = "bond_table[node]",
on_enter = "handler.enter(node)")
...
dfs(bond_table, start_node, handler)

Obviously it would need some sort of templating language.
Wonder if anyone's ever make one of those before. ;)
To clarify, by API I here mean a protocol, i.e. a definition of how a
graph is supposed to behave -- *not* a standard implementation.

I think we're talking about the same thing -- the sorts
of things you can do with nodes and edges, and not a
statement that a node or edge has a given property or method.
I've been thinking a bit about the use of object adaptation here; I
think it would be quite perfect. One possibility would be to use the
functionality of PyProtocols, but that's hardly standard... But it
would allow you to do stuff like

graph = AdjacencyMap(someStrangeMolecularGraph)
# or graph = adapt(someStrange..., AdjacencyMap)
graphAlgorithm(graph)

The problem is all the conversion from/to my graph
form to the standard graph form. Either the adapters
have to be live (so the modification to the standard
form get propogated back to my graph) or there needs
to be conversions between the two. Both sound slow.
Me:
Magnus
Sounds like a fun combination ;)

There the idea for me would be

find_flow = make_code(flow_template,
args = "source_nodes, sink_nodes, weight_table",
edges = "node.out_edges",
weight = "weight_table[edge]")

or to use a weight function

find_flow = make_code(flow_template,
args = "source_nodes, sink_nodes, weight_func",
edges = "node.out_edges",
weight = "weight_func(edge)")


Wouldn't it be *much* better to use the established (but not standard)
mechanism of object adaptation, as championed in PEP 246 and as
implemented and extended upon in PyProtocols?

Not really. Consider the events that could occur in a
DFS. There's
- on enter
- on exit
- on backtrack
and probably a few more that could be used in a general
purpose DFS. But I might need only one of them. With
a PE 246 adapter I can adapt my graph to fit the algorithm,
but if I don't need all of those events there's no way
to adapt the algorithm to fit my needs.

(Yes, even though I'm using Python I'm still worried
about efficiency.)

(If only adapt() could become a standard library function... <sigh> ;)

Perhaps someday, when people get more experience with it.
I've not yet used it.
Yeah. Or one might want to allow equality to determine this, so that
the implementer could decide how to do it (nodes might be dynamically
created front-end objects after all).

I found that 'is' testing for my graphs is much better.
At the very least, it's a lot faster (no method call overhead).
It seems there are at least a few people who are interested in the
general concept, though. In case there is any merit in the idea, I've
set up a temporary wiki at

http://www.idi.ntnu.no/~mlh/python-graph-wiki.cgi

I'll post a separate announcement about it.

It said "warning, in use, you might want to wait about 10
minutes before editing."

I think it's been about 10 minutes now. :)

Andrew
(e-mail address removed)
 
M

Magnus Lie Hetland

There are certain things the different graphs have in
common. For example,
1. "test if two nodes/edges are the same"
2. "test if two nodes/edges are identically colored"
3. "list of all nodes adjacent to a node"
4. "list of all edges from a node" (either directed or
undirected)
5. "get the appropriate weight">

Right. My idea was that we could define a standard API for this
functionality, as in (e.g.) the DB API. Your code generating idea is
an alternative.
Different graphs have different ways to do this.
Yes.

[snip]
The ways to get the properties differ but the things
you do with them do not change.
Indeed.

I can concieve then some way to generate code
based on a template, like this
[code generating template example snipped]

Hm. What you're basically is proposing is (as you said, basically ;) a
C++-like approach (as used in Boost)?

Or -- this goes beyond the standard C++ approach, of course, as that
would basically be a way of getting "standard Python functionality" in
a static language.

Your idea is interesting -- but quite a way from what I had
envisioned... Would there be a way to combine the ideas? I.e. define
an (abstract) interface to standard graph functionality and in
addition have this sort of template stuff? It might be a bit far
fetched, though, as it seems your template idea obviates the need for
a standard interface.

I need the interface for when I write my own algorithms -- I don't
have as much need for templated stock algorithms; so it seems our
needs may be somewhat complementary, perhaps?

[snip]
Obviously it would need some sort of templating language.
Wonder if anyone's ever make one of those before. ;)
Hehe.


I think we're talking about the same thing -- the sorts
of things you can do with nodes and edges, and not a
statement that a node or edge has a given property or method.

Well -- I was rather aiming for a definition of a graph protocol
(similar to e.g. the sequence protocol or the mapping protocol), and
that would entail fixing the methods and attributes involved.

[snip]
The problem is all the conversion from/to my graph
form to the standard graph form. Either the adapters
have to be live

Sure -- that wouldn't be a problem, would it?
(so the modification to the standard
form get propogated back to my graph) or there needs
to be conversions between the two. Both sound slow.

Maybe. I guess it depends on how you implement things. For example, if
it's only a matter of renaming methods, that wouldn't entail any
performance loss at all (you could just have another attribute bound
to the same method).

[snip]
There the idea for me would be

find_flow = make_code(flow_template,
args = "source_nodes, sink_nodes, weight_table",
edges = "node.out_edges",
weight = "weight_table[edge]")

But coding this doesn't seem like much fun.

My hope was that I could write new graph algorithms so they looked
somewhat pretty and readable. Having to write them in some new
template langage doesn't seem very tempting.

(As I said, stock algorithms aren't of that much interest to me.)
Not really. Consider the events that could occur in a
DFS. There's
- on enter
- on exit
- on backtrack
and probably a few more that could be used in a general
purpose DFS. But I might need only one of them. With
a PE 246 adapter I can adapt my graph to fit the algorithm,
but if I don't need all of those events there's no way
to adapt the algorithm to fit my needs.

I don't see how this has anything to do with the issue at hand,
really.

The person who implemented the DFS could deal with this issue (by
parametrizing it) or something. (*Or* using adaptation, for that
matter.) I'm sure there are many other ways of dealing with this...
IMO this is orthogonal. Whether or not you can tell the DFS how you
access the neighbors of a node isn't directly related to what you want
the DFS to do...
(Yes, even though I'm using Python I'm still worried
about efficiency.)

Well -- you could always make a C library for your graph structures,
with an optional interface (without any performance loss) that
conformed to the Hypothetical Graph API(tm), somewhat reminiscent of
how the DB API works...
Perhaps someday, when people get more experience with it.
I've not yet used it.

I agree with you -- one ought to get more experience with it. I have
not used it seriously myself. I think it might have quite a bit of
potential in this sort of situation, though.
I found that 'is' testing for my graphs is much better.

The problem is that it is impossible to override.

If I were to use some special hardware as the source of my graph
(which is a realistic example for me) I might have to create node
wrappers on the fly. Unless I use the Singleton pattern (the Borg
pattern wouldn't work) I could easily end up with equal but
non-identical nodes, and no way of overriding this.
At the very least, it's a lot faster (no method call overhead).

Right.

[snip wiki stuff]
It said "warning, in use, you might want to wait about 10
minutes before editing."

Right -- I was making some changes ;)
I think it's been about 10 minutes now. :)

Yup. Just go ahead and add stuff.
 
A

Andrew Dalke

Magnus said:
> Yup. Just go ahead and add stuff.

I tried. I can view the page and get to the edit page
but I can neither save it nor preview it. My browser
times out. Here are my two sets of comments


Brian Kelley's "frowns" packages
(http://staffa.wi.mit.edu/people/kelley/ ) has graph code appropriate
for a chemical informatics library, including a wrapper to the VFlib
isomorphism graph library. Andrew Dalke has some clique finding
code at http://starship.python.net/crew/dalke/clique/ .


and

(Andrew Dalke speaking here.)

It's pretty uniformly agreed that there is no standard graph
representation, if for no other reason than that my nodes are called
"atoms" and my edges are called "bonds" and I don't care about neither
multigraphs nor directed edges. ;)

Magnus suggests that adapters is the right approach. My concern with
that is two-fold. First, the adapter layer will cause a non-trivial
overhead -- at least an extra function call for every graph-specific
action. Second, the algorithm can't be specialized for the graph at
hand. (Eg, in a DFS the algorithm may generate events for node enter,
exit, and backtrack, while I may only need one of those.)

My thought is to use some sort of templating system. If done well the
actual code might even be Python, with a naming scheme to allow
replacements appropriate to the data structure (eg, to use
"weights[edge]" or "edge.size" for a flow algorithm) and remove code
blocks not needed for a given algorithm (eg, don't include a callback
code if it isn't used.)

This would depend on having standard ideas of what you can do with a
graph. Some include "iterate over all nodes", "compare two nodes for
equivalence", "get all edges for a given node."




Now responding to your post
Your idea is interesting -- but quite a way from what I had
envisioned... Would there be a way to combine the ideas? I.e. define
an (abstract) interface to standard graph functionality and in
addition have this sort of template stuff? It might be a bit far
fetched, though, as it seems your template idea obviates the need for
a standard interface.

As I outlined above, I believe it should be possible to build a
templating system on top of working Python code. The code
uses the 'standard' API and the templating system gives a way
to conform the algorithm to the graph at hand.

I think many people will just use the standard scheme,
but that enough people have different needs (like chemistry)
that it's worth the effort to be malleable that way.
Well -- I was rather aiming for a definition of a graph protocol
(similar to e.g. the sequence protocol or the mapping protocol), and
that would entail fixing the methods and attributes involved.

Ahh, then we are talking about different but related things.
I don't think it's possible to have a graph protocol like
mention which is good enough for all graph systems. Eg,
does the weight for a max flow calculation come from a
property of the edge, from a table passed into the function,
or from a callable? If a table, is it indexed by id(node),
by the node itself (is the node hashable?) or by some unique
id given to all nodes?

I know there are lessons to learn from both LEDA and
the Boost Graph Library. Sadly, I don't know them.
Maybe. I guess it depends on how you implement things. For example, if
it's only a matter of renaming methods, that wouldn't entail any
performance loss at all (you could just have another attribute bound
to the same method).

I do not want that. Which will users of my library use,
"edges" or "bonds"? This sort of choice is bad.
But coding this doesn't seem like much fun.

My hope was that I could write new graph algorithms so they looked
somewhat pretty and readable. Having to write them in some new
template langage doesn't seem very tempting.

(As I said, stock algorithms aren't of that much interest to me.)

What I need to do is take a look at, say, David Eppstein's
set of algorithms and see if my templating idea can work
without being too onerous.
I don't see how this has anything to do with the issue at hand,
really.

The person who implemented the DFS could deal with this issue (by
parametrizing it) or something. (*Or* using adaptation, for that
matter.) I'm sure there are many other ways of dealing with this...
IMO this is orthogonal. Whether or not you can tell the DFS how you
access the neighbors of a node isn't directly related to what you want
the DFS to do...

Mmm, in some way you are right. You've been talking
about adapting the data structure to work with the
algorithm. I've been talking about adapting the algorithm
to deal with the data structure.
The problem is that it is impossible to override.

That's why it's faster. :)
If I were to use some special hardware as the source of my graph
(which is a realistic example for me) I might have to create node
wrappers on the fly. Unless I use the Singleton pattern (the Borg
pattern wouldn't work) I could easily end up with equal but
non-identical nodes, and no way of overriding this.

I had experience using a C library for the underlying
graph data. It returned opaque integers for object
references. Every time I got one I wrapped it inside
a new Python object. So I could have different Python
objects for the same underlying object.

It turned out the some algorithms used so many "have
I seen this atom/bond before?" tests that the switch
from == to 'is' made a big difference.

On the other hand, it did mean I needed to convert from
my library's natural style of graph into the form that
allowed 'is' tests.

Andrew
(e-mail address removed)
 
P

Paul Moore

Andrew Dalke said:
Magnus said:
Hm. How would the algorithms work without a standard API?

There are certain things the different graphs have in
common. For example,
1. "test if two nodes/edges are the same"
2. "test if two nodes/edges are identically colored"
3. "list of all nodes adjacent to a node"
4. "list of all edges from a node" (either directed or
undirected)
5. "get the appropriate weight"

Different graphs have different ways to do this.
[...]

The ways to get the properties differ but the things
you do with them do not change.

I can concieve then some way to generate code
based on a template, like this

dfs = make_code(dfs_template,
args = "node, handler",
bond_neighbors = "node.xatoms",
on_enter = "handler.enter(node)")

.. make the graph ...
class Handler:
def enter(self, node):
print "Hi", node

dfs(graph, Handler())

Yuk.

This is *exactly* the type of thing that adaptation (PEP 246,
PyProtocols) is designed to support:

# pseudo-code here, I don't know the exact PEP 246 code form off
# the top of my head.
class IGraph(Protocol):
def nodes_same(n1, n2):
"No implementation here - this is just a protocol"
# etc

class IDFSVisitor(Protocol):
# mode protocol defns...

def dfs(g, h):
g = adapt(g, IGraph)
h = adapt(h, IDFSVisitor)
# code, using standard method names

Now, for your molecular graphs, you just write an adapter to make the
graph conform to the IGraph protocol. In theory, the adaptation
should be maximally efficient (so that you don't have to worry about
overheads of unnecessary wrappers), in practice I don't know how
close PyProtocols comes (although I believe it's good).
I think we're talking about the same thing -- the sorts
of things you can do with nodes and edges, and not a
statement that a node or edge has a given property or method.

Yes, this (to me) would be ideal. The issue of not being standard is
a bit circular - it's not standard because there aren't enough
examples of where it is needed, but people don't use it because it's
not standard.

I believe Guido also has some concerns over how interfaces "should"
work - but he's unlikely to do anything about that unless someone
forces the issue by championing PEP 246. It might be worth asking
Philip Eby if he would be happy to see PyProtocols added to the
standard library.
The problem is all the conversion from/to my graph
form to the standard graph form. Either the adapters
have to be live (so the modification to the standard
form get propogated back to my graph) or there needs
to be conversions between the two. Both sound slow.

My understanding is that a well-written adapter can be very
efficient. But I don't know in practice how that is achieved. And
obviously, a particular operation will never be efficient if the
underlying representation can't support it efficiently.
Not really. Consider the events that could occur in a
DFS. There's
- on enter
- on exit
- on backtrack
and probably a few more that could be used in a general
purpose DFS. But I might need only one of them. With
a PE 246 adapter I can adapt my graph to fit the algorithm,
but if I don't need all of those events there's no way
to adapt the algorithm to fit my needs.

You could adapt a visitor class to fit a standard DFS-visitor
protocol.
(Yes, even though I'm using Python I'm still worried
about efficiency.)

I worry about efficiency, as well. But my experiments showed call
overhead as the killer - adding an "on backtrack" callback to the
algorithm, and calling it with a visitor which had a null
implementation of the callback still added a chunk of overhead. Caling
a "do-nothing" callback for each node in a 10,000 node graph isn't
free. Of course, this argues for the template "build code on the fly"
approach, which I don't relish. Anyone know another good way of
avoiding this overhead?
Perhaps someday, when people get more experience with it.
I've not yet used it.

It needs a champion. Someone to do the work to get it into the
library. We have a PEP, and an implementation (PyProtocols) so I
suspect that it's a job that wouldn't require huge technical skills.
I found that 'is' testing for my graphs is much better.
At the very least, it's a lot faster (no method call overhead).

You're never going to avoid a method call in any standard - there's
going to be *someone* for whom "is" is inappropriate. So the standard
will have to cater for that.

Paul.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,740
Latest member
AdolphBig6

Latest Threads

Top