Using XSLT and XPath for graph data structure processing?

R

Ramon M. Felciano

Helo all --

I'm trying to gain a deeper understand for what type of
semi-declarative programming can be done through XML and XPath/XSLT.
I'm looking at graph processing problems as a testbed for this, and
came across a problem that I haven't been able to solve elegantly. The
problem is to find "linker" vertexes that a pair of verteces from a
pre-defined set. For example, if the graph verteces represent cities
and edges represent flights between them, then given a list of cities,
find all intermediate cities that you would stop in via a "connecting
flight".

For example, given the following simple graph:

V1 -> V2 -> V3 -> V4
\<- V5 ->/

(V5 points to both V2 and V4), and its XML serialization:

<graph>
<vertex id="V1"/>
<vertex id="V2" type="anchor"/>
<vertex id="V3"/>
<vertex id="V4" type="anchor"/>
<vertex id="V5"/>
<edge source="V1" target="V2"/>
<edge source="V2" target="V3"/>
<edge source="V3" target="V4"/>
<edge source="V5" target="V2"/>
<edge source="V5" target="V4"/>
</graph>

I would like to transform this into a second graph where all vertexes
that "link" two anchor distinct vertexes are flagged as link nodes. In
this case, there are two anchor vertexes V2 and V4, and I can link
them through V3 (V2 -> V3 -> V4) and V5 (V2 <- V5 -> V4). Note that
linked verteces must be distinct, so traversing the V2 <- V1 -> V2
path should not yield V1 as a link node. So I'd like to see something
like this:

<graph>
<vertex id="V1"/>
<vertex id="V2" type="anchor"/>
<vertex id="V3" linker="true"/>
<vertex id="V4" type="anchor"/>
<vertex id="V5" linker="true"/>
<edge source="V1" target="V2"/>
<edge source="V2" target="V3"/>
<edge source="V3" target="V4"/>
<edge source="V5" target="V2"/>
<edge source="V5" target="V4"/>
</graph>

It would be ideal to come up with a generalized solution that would
let you use 1, 2, .. N intermediate linking nodes. I've been able to
get this working with nested loops, but it isn't particularly
declarative or speedy, and is certainly more verbose than I'd like, so
I'm wondering if anyone here has insights into how to do this
elegantly and in XSLT/XPath style. For example, is it possible to
write a single XPath expression that will select <vertex> elements
that obey the above criteria? If not, does anyone have any suggestions
for how to code this effectively and efficiently with XSLT? Or is
XPath/XSLT not the right paradigm for this type of information
processing and transformation?

Thanks!

Ramon
 
A

Andy Dingley

I'm trying to gain a deeper understand for what type of
semi-declarative programming can be done through XML and XPath/XSLT.
I'm looking at graph processing problems as a testbed for this,

Don't do graphs in XML, use RDF instead.

Jave & Jena are the tools to use.
 
A

Andy Dingley

????? You are aware of course of RDF over XML?

Sure, but you don't process it _as_ XML.

XML does tree structure, not graphs. ID & IDREF just don't do anything
useful for you and there's a whole issue about the difficulty of
working with XML which contains a graph that isn't entirely
represented by that document (ie. external resources)

If you want to work with "a graph" in something that's transported "in
XML", then you need to build a layer above XML. Or you can use one,
like RDF, that already exists.
 
P

Patrick TJ McPhee

% I'm trying to gain a deeper understand for what type of
% semi-declarative programming can be done through XML and XPath/XSLT.

[...]

[given]

% <graph>
% <vertex id="V1"/>
% <vertex id="V2" type="anchor"/>
% <vertex id="V3"/>
% <vertex id="V4" type="anchor"/>
% <vertex id="V5"/>
% <edge source="V1" target="V2"/>
% <edge source="V2" target="V3"/>
% <edge source="V3" target="V4"/>
% <edge source="V5" target="V2"/>
% <edge source="V5" target="V4"/>
% </graph>

% this case, there are two anchor vertexes V2 and V4, and I can link
% them through V3 (V2 -> V3 -> V4) and V5 (V2 <- V5 -> V4). Note that

I don't really understand this last one. It seems like this is a directed
graph, so to have a link, you need to start at one anchor and
go to the other. V5 would be a linker if there were some way to
get to it from an anchor.

[we want to get]

% <graph>
% <vertex id="V1"/>
% <vertex id="V2" type="anchor"/>
% <vertex id="V3" linker="true"/>
% <vertex id="V4" type="anchor"/>
% <vertex id="V5" linker="true"/>
% <edge source="V1" target="V2"/>
% <edge source="V2" target="V3"/>
% <edge source="V3" target="V4"/>
% <edge source="V5" target="V2"/>
% <edge source="V5" target="V4"/>
% </graph>

I start by defining keys which can be used to find the anchors, the
sources for each edge, and the targets for each edge:

<xsl:key name='anchors' use='@id' match='vertex[@type = "anchor"]'/>
<xsl:key name='sources' use='@source' match='edge'/>
<xsl:key name='targets' use='@target' match='edge'/>

If we pay attention to the directions of the edges, we want an
expression which is true when the current node is the target of an edge
from one or more anchors and the source of an edge towards one or more
anchors. We can use the `targets' key and the current node's id attribute
to find all the edges for which it's a target

key('targets', @id)

That's a node set made up of edge elements. We can turn it into
a node set made up of the corresponding source attributes

key('targets', @id)/@source

then use that as an argument to key, to find the set anchor vertices
which are sources of the current node:

key('anchors', key('targets', @id)/@source)

If the current node is the target of an edge from an anchor, the
resulting node set will be non-empty. Similarly, here's the set
of anchor nodes which are targets of edges from the current node

key('anchors', key('sources', @id)/@target)

we can use those node sets as boolean values, and so create a match
expression which matches all vertex elements which are targets
of an edge from an anchor and sources of an edge to an anchor


match="vertex[key('anchors', key('targets', @id)/@source) and
key('anchors', key('sources', @id)/@target)]"

to ignore the direction of the edges, we can create a union of
those two node sets and count the elements in the union

match="vertex[count(key('anchors', key('targets', @id)/@source)|
key('anchors', key('sources', @id)/@target)) > 1]"


Note that if we were to make an edge from V2 to V1, the `directed'
match expression would call V1 a link, but the undirected one would
not.

The full style sheet is

<xsl:stylesheet
xmlns:xsl = 'http://www.w3.org/1999/XSL/Transform' version='1.0'>

<xsl:key name='anchors' use='@id' match='vertex[@type = "anchor"]'/>
<xsl:key name='sources' use='@source' match='edge'/>
<xsl:key name='targets' use='@target' match='edge'/>

<xsl:template match='node()|@*'>
<xsl:copy>
<xsl:apply-templates select='node()|@*'/>
</xsl:copy>
</xsl:template>

<xsl:template match='vertex[count(key("anchors", key("targets", @id)/@source)
|key("anchors", key("sources", @id)/@target)) > 1]'>
<xsl:copy>
<xsl:apply-templates select='node()|@*'/>
<xsl:attribute name='linker'>true</xsl:attribute>
</xsl:copy>
</xsl:template>

</xsl:stylesheet>

% It would be ideal to come up with a generalized solution that would
% let you use 1, 2, .. N intermediate linking nodes.

I'm not sure off the top of my head how you would generalise this.
 
D

Dimitre Novatchev

Helo all --

I'm trying to gain a deeper understand for what type of
semi-declarative programming can be done through XML and XPath/XSLT.
I'm looking at graph processing problems as a testbed for this, and
came across a problem that I haven't been able to solve elegantly. The
problem is to find "linker" vertexes that a pair of verteces from a
pre-defined set. For example, if the graph verteces represent cities
and edges represent flights between them, then given a list of cities,
find all intermediate cities that you would stop in via a "connecting
flight".

For example, given the following simple graph:

V1 -> V2 -> V3 -> V4
\<- V5 ->/

(V5 points to both V2 and V4), and its XML serialization:

<graph>
<vertex id="V1"/>
<vertex id="V2" type="anchor"/>
<vertex id="V3"/>
<vertex id="V4" type="anchor"/>
<vertex id="V5"/>
<edge source="V1" target="V2"/>
<edge source="V2" target="V3"/>
<edge source="V3" target="V4"/>
<edge source="V5" target="V2"/>
<edge source="V5" target="V4"/>
</graph>

I would like to transform this into a second graph where all vertexes
that "link" two anchor distinct vertexes are flagged as link nodes. In
this case, there are two anchor vertexes V2 and V4, and I can link
them through V3 (V2 -> V3 -> V4) and V5 (V2 <- V5 -> V4). Note that
linked verteces must be distinct, so traversing the V2 <- V1 -> V2
path should not yield V1 as a link node. So I'd like to see something
like this:

<graph>
<vertex id="V1"/>
<vertex id="V2" type="anchor"/>
<vertex id="V3" linker="true"/>
<vertex id="V4" type="anchor"/>
<vertex id="V5" linker="true"/>
<edge source="V1" target="V2"/>
<edge source="V2" target="V3"/>
<edge source="V3" target="V4"/>
<edge source="V5" target="V2"/>
<edge source="V5" target="V4"/>
</graph>

It would be ideal to come up with a generalized solution that would
let you use 1, 2, .. N intermediate linking nodes. I've been able to
get this working with nested loops, but it isn't particularly
declarative or speedy, and is certainly more verbose than I'd like, so
I'm wondering if anyone here has insights into how to do this
elegantly and in XSLT/XPath style. For example, is it possible to
write a single XPath expression that will select <vertex> elements
that obey the above criteria? If not, does anyone have any suggestions
for how to code this effectively and efficiently with XSLT? Or is
XPath/XSLT not the right paradigm for this type of information
processing and transformation?

Thanks!

Ramon


Ramon,

Here's the general solution and it is quite simple and
straightforward. I use a recursive template based on the following
rule:

Paths(/.., X, Z) = Union( {X -> Xi} . Paths(X, Xi, Z) )

Where the function Paths(Eset, X, Z) returns all paths from X to Z,
that do not include any vertices belonging to the exclusion-set Eset.

Here {X -> Xi} are all edges from X.

As you can see the code of the transformation is 71 lines long, but it
should be noted that for the purpose of readability I write some XPath
expressions on several lines and also half of these 71 lines are just
closing tags.

Here's the code. This transformation:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exsl="http://exslt.org/common"
exclude-result-prefixes="exsl"<xsl:eek:utput omit-xml-declaration="yes" indent="yes"/>

<xsl:key name="kNeighbors" match="vertex"
use="../edge[@target = current()/@id]/@source"/>

<xsl:key name="kNeighbors" match="vertex"
use="../edge[@source = current()/@id]/@target"/>

<xsl:template match="/">
<xsl:call-template name="getPaths">
<xsl:with-param name="pNode1"
select="/*/vertex[@type='anchor'][1]"/>
<xsl:with-param name="pNode2"
select="/*/vertex[@type='anchor'][2]"/>
</xsl:call-template>
</xsl:template>

<xsl:template name="getPaths">
<xsl:param name="pNode1" select="/.."/>
<xsl:param name="pNode2" select="/.."/>
<xsl:param name="pExcluded" select="/.."/>

<xsl:for-each select=
"key('kNeighbors', $pNode1/@id)
[not(@id = $pExcluded/@id)]">
<xsl:choose>
<xsl:when test="@id = $pNode2/@id">
<path>
<xsl:copy-of
select="/*/edge[$pNode1/@id = @*
and
$pNode2/@id = @*
]"/>
</path>
</xsl:when>
<xsl:eek:therwise>
<xsl:variable name="vrtfTail">
<xsl:call-template name="getPaths">
<xsl:with-param name="pNode1"
select="."/>
<xsl:with-param name="pNode2"
select="$pNode2"/>
<xsl:with-param name="pExcluded"
select="$pExcluded | $pNode1"/>
</xsl:call-template>
</xsl:variable>

<xsl:variable name="vTail"
select="exsl:node-set($vrtfTail)/*"/>

<xsl:if test="$vTail">
<path>
<xsl:copy-of
select="/*/edge[$pNode1/@id = @*
and
current()/@id = @*
]"/>

<xsl:copy-of select="$vTail/*"/>
</path>
</xsl:if>
</xsl:eek:therwise>
</xsl:choose>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>

when applied on this source.xml:

<graph>
<vertex id="V1"/>
<vertex id="V2" type="anchor"/>
<vertex id="V3"/>
<vertex id="V4" type="anchor"/>
<vertex id="V5"/>
<edge source="V1" target="V2"/>
<edge source="V1" target="V3"/>
<edge source="V2" target="V3"/>
<edge source="V3" target="V4"/>
<edge source="V5" target="V2"/>
<edge source="V5" target="V4"/>
</graph>

produces thewanted result:

<path>
<edge source="V1" target="V2"/>
<edge source="V1" target="V3"/>
<edge source="V3" target="V4"/>
</path>
<path>
<edge source="V2" target="V3"/>
<edge source="V3" target="V4"/>
</path>
<path>
<edge source="V5" target="V2"/>
<edge source="V5" target="V4"/>
</path>

These are all three different paths (with all nodes in the path only
once) from V2 to V4 in the graph described by the xml above.

The first solution:

V2 -> V1 -> V3 ->V4

is 3 edges long.

The other two are two edges long each.


I hope this helped in this specific problem and also to answer
positively your questions about the expressiveness of XPath/XSLT and
the appropriateness of XSLT as a tool for solving this type of
problems.


Dimitre Novatchev.
FXSL developer, XML Insider,

http://fxsl.sourceforge.net/ -- the home of FXSL
Resume: http://fxsl.sf.net/DNovatchev/Resume/Res.html
 
E

Ed Beroset

Dimitre said:
I hope this helped in this specific problem and also to answer
positively your questions about the expressiveness of XPath/XSLT and
the appropriateness of XSLT as a tool for solving this type of
problems.

What, no SVG output? :) I'm kidding. Seriously, I wasn't the
original poster on that one, but I enjoyed looking over your solution.
Nice work!

Ed
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,962
Messages
2,570,134
Members
46,690
Latest member
MacGyver

Latest Threads

Top