Weaver/Yarn Pattern in Python

C

Christian Stork

Hello everybody,

I am using Python to prototype a compression algorithm for tree-shaped
data, e.g., XML data or abstract syntax trees. I use a variant of the
visitor design pattern--called weaver/yarn pattern--in order to
traverse the tree that is being compressed or decompressed. I do not
know of any other interesting application of the weaver/yarn pattern,
but it seems like a very suitable subject of study for different
implementation strategies. Essentially, I am asking for input on my
design decisions.

First, let me give you a simple example of the traditional visitor
pattern. The following diagram illustrates a depth-first traversal of
a mini tree by a visitor. The basic idea is that the tree accepts a
visitor and helps it to do a kind of type-dispatch based on the kind
of nodes.


Tree data structure (nodes a,b,c of types A,B,C)

a:A
/ \
/ \
b:B c:C

Nested calls
------------
Call location Call stack

main: a.accept(v)
A: v.visitA(a)
V: b.accept(v)
B: v.visitB(b)
V: c.accept(v)
C: v.visitC(c)


class A(Node):
def accept(self, visitor):
visitor.visitA(self)
class B(Node):
def accept(self, visitor):
visitor.visitB(self)
class C(Node):
def accept(self, visitor):
visitor.visitC(self)


class Visitor:
"Perform specific tasks while visiting certain nodes"
def visitA(self, a):
for k in a.kids:
k.accept(self)
def visitB(self, b):
pass
def visitC(self, c):
pass

The advantage of this ping-pong between the tree and visitor v is that
v encapsulates related processing instructions. Several different
visitors can be maintained independently of each other and without
forcing changes to the tree node classes. The tree nodes only need to
provide a node.accept(visitor) method. Type-checking can ensure
the match between the visitor and the tree data structure.

Normally, visitors encapsulate different processing passes, which are
run one after the other, each time traversing the whole tree. I have
implemented the compression of trees as several (sub)visitors c1..cN
even though they could have been implemented as one big visitor.
Besides the easy recombinability of visitors this has the added
advantage that I can use the same visitors for compression and
decompression where this is appropriate.

But now I have a problem when decompressing. In order to run one
visitor after another the first one expects to traverse the whole
tree. But this is impossible in case of a decompressor. It lies in
the very nature of the application that the tree is being constructed
while the visitors work on it. Conceptually the solution is easy.
The decompression subvisitors d1..dM have to process the partially
available tree upto the point of traversal where it is available. At
each node execution has to iterate over the applicable code of d1..dM
in the given order. This necessitates a decomposition of visitors
into something that we call yarns and these yarns are weaved by one
visitor, which we call the weaver. Thus the name "Weaver/Yarn
Pattern" for this variation of the visitor pattern.

The following exemplifies my current implementation of the w/y pattern
for a recursive descent (ie depth-first traversal) visitor. For each
(sub)visitor d1..dM the former d<i>.visitX(x) method is divided into
several y.weaveX_...() methods. At entry and exit the weaver invokes
y.weaveX_First() and y.weaveX_Last(). Each descent into a child is
surrounded by y.weaveX_Before(kidno) and y.weaveX_After(kidno) method
calls.

class Yarn:
# methods replacing visitA(..)
def weaveA_First(self, a):
pass
def weaveA_Before(self, a, kidno):
pass
def weaveA_After(self, a, kidno):
pass
def weaveA_Last(self, a):
pass
# methods replacing visitB(..)
def weaveB_First(self, a):
pass
def weaveB_Last(self, a):
pass
# methods replacing visitC(..)
...


class Weaver:
"A special visitor which weaves yarns"
def __init__(self, yarns):
self.yarns = yarns
def visitA(self, a):
for y in self.yarns:
y.weaveA_First(a)
for i, k in enumerate(a.kids):
for y in self.yarns:
y.weaveA_Before(a, i)
k.accept(self)
for y in self.yarns:
y.weaveA_After(a, i)
for y in self.yarns:
y.weaveA_First(a)
def visitB(self, b):
for y in self.yarns:
y.weaveA_First(b)
for y in self.yarns:
y.weaveA_First(b)
def visitC(self, b):
...

By now it should be obvious that the boilerplate for this approach
becomes quite extensive and it would be desirable to reduce it. To
mitigate the problem I did three things:

- Automatically generate templates for yarn classes. The actual code
can be filled in. Disadvantage: No convenient way to deal with
changes to the tree data structure.

- The node.accept(weaver) methods are changed to call a "generic"
weaver.weave(node) method (instead of weaver.visitX(node)), which
hackishly constructs all the calls to the yarns by assembling the
method names from the __class__.__name__ and one of "First", "Last",
"Before", and "After".

This solution does pretty much what I want:

- Writing selected yarn methods allows me to express with little
overhead /what/ code to execute /when/ in the weaving process.

- Once the weaver/yarn interaction machinery is set up correctly, the
debugger points me to real code in case of bugs. This is an
advantage over approaches that create classes at runtime, e.g., use
of metaclasses or the "new" module.

OTOH, I'm not perfectly happy with this solution since it's totally
"hackish". For example, I have to use a function, hand-written
specifially for this purpose, in order to "type-check" the
correspondence of the tree types and the yarns.

Maybe Python is not the right language for an elegant solution and I
should look at more rigorously or differently typed languages, but I
thought it's worth presenting my case here and asking what other
python hackers think. Maybe there is another more pythonic way to do
this?

I looked into several of python's features, but to no avail.
Iterators don't seem to play well with yarns (especially if you try to
thread objects thru the yarns to support synthesized/inherited
attributes a known from attribute grammars). I also looked at
metaclasses in order to reduce the boilerplate but that did pay off
either.

There are certainly other languages and language concepts, which offer
interesting implementation alternatives:

- Dylan's generic functions seems to be a a good match for my problem,
which allows me to get around the method name mangling in
weaver.weave(node). Anybody an idea for a nice emulation of generic
functions in python? :)

- Lazy evaluation as provided by Haskell (or even Ocaml) seems to make
the traveral by several visitors unnecessary, but I'm not yet
convinced enough of this approach to start a full reimplementation.

- Systems for attribute grammar evaluation seem to address some of my
general concerns, but I am afraid that they might constrain me too
much.

If you're still reading this I'm impressed ;-) and I would be very
happy to hear what you think.

-Chris
 
Y

Ype Kingma

Christian said:
Hello everybody,

I am using Python to prototype a compression algorithm for tree-shaped
data, e.g., XML data or abstract syntax trees. I use a variant of the
visitor design pattern--called weaver/yarn pattern--in order to
traverse the tree that is being compressed or decompressed. I do not
know of any other interesting application of the weaver/yarn pattern,
but it seems like a very suitable subject of study for different
implementation strategies. Essentially, I am asking for input on my
design decisions.

First, let me give you a simple example of the traditional visitor
pattern. The following diagram illustrates a depth-first traversal of
a mini tree by a visitor. The basic idea is that the tree accepts a
visitor and helps it to do a kind of type-dispatch based on the kind
of nodes.
....

The advantage of this ping-pong between the tree and visitor v is that
v encapsulates related processing instructions. Several different
visitors can be maintained independently of each other and without
forcing changes to the tree node classes. The tree nodes only need to
provide a node.accept(visitor) method. Type-checking can ensure
the match between the visitor and the tree data structure.

Normally, visitors encapsulate different processing passes, which are
run one after the other, each time traversing the whole tree. I have
implemented the compression of trees as several (sub)visitors c1..cN
even though they could have been implemented as one big visitor.
Besides the easy recombinability of visitors this has the added
advantage that I can use the same visitors for compression and
decompression where this is appropriate.

But now I have a problem when decompressing. In order to run one
visitor after another the first one expects to traverse the whole
tree. But this is impossible in case of a decompressor. It lies in
the very nature of the application that the tree is being constructed
while the visitors work on it. Conceptually the solution is easy.
The decompression subvisitors d1..dM have to process the partially
available tree upto the point of traversal where it is available. At
each node execution has to iterate over the applicable code of d1..dM
in the given order. This necessitates a decomposition of visitors
into something that we call yarns and these yarns are weaved by one
visitor, which we call the weaver. Thus the name "Weaver/Yarn
Pattern" for this variation of the visitor pattern.

The following exemplifies my current implementation of the w/y pattern
for a recursive descent (ie depth-first traversal) visitor. For each
(sub)visitor d1..dM the former d<i>.visitX(x) method is divided into
several y.weaveX_...() methods. At entry and exit the weaver invokes
y.weaveX_First() and y.weaveX_Last(). Each descent into a child is
surrounded by y.weaveX_Before(kidno) and y.weaveX_After(kidno) method
calls.
....

By now it should be obvious that the boilerplate for this approach
becomes quite extensive and it would be desirable to reduce it. To
mitigate the problem I did three things:

- Automatically generate templates for yarn classes. The actual code
can be filled in. Disadvantage: No convenient way to deal with
changes to the tree data structure.

- The node.accept(weaver) methods are changed to call a "generic"
weaver.weave(node) method (instead of weaver.visitX(node)), which
hackishly constructs all the calls to the yarns by assembling the
method names from the __class__.__name__ and one of "First", "Last",
"Before", and "After".

This solution does pretty much what I want:

- Writing selected yarn methods allows me to express with little
overhead /what/ code to execute /when/ in the weaving process.

- Once the weaver/yarn interaction machinery is set up correctly, the
debugger points me to real code in case of bugs. This is an
advantage over approaches that create classes at runtime, e.g., use
of metaclasses or the "new" module.

OTOH, I'm not perfectly happy with this solution since it's totally
"hackish". For example, I have to use a function, hand-written
specifially for this purpose, in order to "type-check" the
correspondence of the tree types and the yarns.
....

Have a look at aspect oriented programming:

http://www.ccs.neu.edu/home/lieber/AOP.html

In theory it sounds like a good match for what you need.

I don't know how well Python supports this, perhaps you
can use a metaclass for this, but I'm not sure.


Have fun,
Ype
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,154
Members
46,702
Latest member
LukasConde

Latest Threads

Top