parallel class structures for AST-based objects

S

Steve Howell

I have been writing some code that parses a mini-language, and I am
running into what I know is a pretty common design pattern problem,
but I am wondering the most Pythonic way to solve it.

Basically, I have a bunch of really simple classes that work together
to define an expression--in my oversimplified example code below,
those are Integer, Sum, and Product.

Then I want to write different modules that work with those expression
objects. In my example, I have a parallel set of classes that enable
you to evaluate an expression, and another set of objects that enable
you to pretty-print the expression.

The below code works as intended so far (tested under 2.6), but before
I go too much further with this design, I want to get a sanity check
and some ideas on other ways to represent the interrelationships
within the code. Basically, the issue here is that you have varying
behavior in two dimensions--a node right now is only a Product/Integer/
Sum so far, but I might want to introduce new concepts like
Difference, Quotient, etc. And right now the only things that you can
do to expressions is eval() them and pprint() them, but you eventually
might want to operate on the expressions in new ways, including fairly
abstract operations that go beyond a simple walking of the tree.

Here is the code:

#######
# base classes just represents the expression itself, which
# get created by a parser or unit tests
# example usage is
# expression = Product(Sum(Integer(5),Integer(2)), Integer(6))
class Integer:
def __init__(self, val):
self.val = val

class BinaryOp:
def __init__(self, a,b):
self.a = a
self.b = b

class Sum(BinaryOp):
pass

class Product(BinaryOp):
pass

########

class EvalNode:
def __init__(self, node):
self.node = node

def evaluatechild(self, child):
return EvalNode.factory(child).eval()

@staticmethod
def factory(child):
mapper = {
'Sum': SumEvalNode,
'Product': ProductEvalNode,
'Integer': IntegerEvalNode
}
return abstract_factory(child, mapper)

class SumEvalNode(EvalNode):
def eval(self):
a = self.evaluatechild(self.node.a)
b = self.evaluatechild(self.node.b)
return a + b

class ProductEvalNode(EvalNode):
def eval(self):
a = self.evaluatechild(self.node.a)
b = self.evaluatechild(self.node.b)
return a * b

class IntegerEvalNode(EvalNode):
def eval(self): return self.node.val

#######

class PrettyPrintNode:
def __init__(self, node):
self.node = node

def pprint_child(self, child):
return PrettyPrintNode.factory(child).pprint()

@staticmethod
def factory(child):
mapper = {
'Sum': SumPrettyPrintNode,
'Product': ProductPrettyPrintNode,
'Integer': IntegerPrettyPrintNode
}
return abstract_factory(child, mapper)

class SumPrettyPrintNode(PrettyPrintNode):
def pprint(self):
a = self.pprint_child(self.node.a)
b = self.pprint_child(self.node.b)
return '(the sum of %s and %s)' % (a, b)

class ProductPrettyPrintNode(PrettyPrintNode):
def pprint(self):
a = self.pprint_child(self.node.a)
b = self.pprint_child(self.node.b)
return '(the product of %s and %s)' % (a, b)

class IntegerPrettyPrintNode(PrettyPrintNode):
def pprint(self): return self.node.val

##############
# Not sure where this method really "wants to be" structurally,
# or what it should be named, but it reduces some duplication

def abstract_factory(node, node_class_mapper):
return node_class_mapper[node.__class__.__name__](node)


expression = Product(Sum(Integer(5),Integer(2)), Integer(6))

evaluator = EvalNode.factory(expression)
print evaluator.eval()

pprinter = PrettyPrintNode.factory(expression)
print pprinter.pprint()
 
M

MRAB

Steve said:
I have been writing some code that parses a mini-language, and I am
running into what I know is a pretty common design pattern problem,
but I am wondering the most Pythonic way to solve it.

Basically, I have a bunch of really simple classes that work together
to define an expression--in my oversimplified example code below,
those are Integer, Sum, and Product.

Then I want to write different modules that work with those expression
objects. In my example, I have a parallel set of classes that enable
you to evaluate an expression, and another set of objects that enable
you to pretty-print the expression.

The below code works as intended so far (tested under 2.6), but before
I go too much further with this design, I want to get a sanity check
and some ideas on other ways to represent the interrelationships
within the code. Basically, the issue here is that you have varying
behavior in two dimensions--a node right now is only a Product/Integer/
Sum so far, but I might want to introduce new concepts like
Difference, Quotient, etc. And right now the only things that you can
do to expressions is eval() them and pprint() them, but you eventually
might want to operate on the expressions in new ways, including fairly
abstract operations that go beyond a simple walking of the tree.

Here is the code:

#######
# base classes just represents the expression itself, which
# get created by a parser or unit tests
# example usage is
# expression = Product(Sum(Integer(5),Integer(2)), Integer(6))
class Integer:
def __init__(self, val):
self.val = val

class BinaryOp:
def __init__(self, a,b):
self.a = a
self.b = b

class Sum(BinaryOp):
pass

class Product(BinaryOp):
pass

########

class EvalNode:
def __init__(self, node):
self.node = node

def evaluatechild(self, child):
return EvalNode.factory(child).eval()

@staticmethod
def factory(child):
mapper = {
'Sum': SumEvalNode,
'Product': ProductEvalNode,
'Integer': IntegerEvalNode
}
return abstract_factory(child, mapper)

class SumEvalNode(EvalNode):
def eval(self):
a = self.evaluatechild(self.node.a)
b = self.evaluatechild(self.node.b)
return a + b

class ProductEvalNode(EvalNode):
def eval(self):
a = self.evaluatechild(self.node.a)
b = self.evaluatechild(self.node.b)
return a * b

class IntegerEvalNode(EvalNode):
def eval(self): return self.node.val

#######

class PrettyPrintNode:
def __init__(self, node):
self.node = node

def pprint_child(self, child):
return PrettyPrintNode.factory(child).pprint()

@staticmethod
def factory(child):
mapper = {
'Sum': SumPrettyPrintNode,
'Product': ProductPrettyPrintNode,
'Integer': IntegerPrettyPrintNode
}
return abstract_factory(child, mapper)

class SumPrettyPrintNode(PrettyPrintNode):
def pprint(self):
a = self.pprint_child(self.node.a)
b = self.pprint_child(self.node.b)
return '(the sum of %s and %s)' % (a, b)

class ProductPrettyPrintNode(PrettyPrintNode):
def pprint(self):
a = self.pprint_child(self.node.a)
b = self.pprint_child(self.node.b)
return '(the product of %s and %s)' % (a, b)

class IntegerPrettyPrintNode(PrettyPrintNode):
def pprint(self): return self.node.val

##############
# Not sure where this method really "wants to be" structurally,
# or what it should be named, but it reduces some duplication

def abstract_factory(node, node_class_mapper):
return node_class_mapper[node.__class__.__name__](node)


expression = Product(Sum(Integer(5),Integer(2)), Integer(6))

evaluator = EvalNode.factory(expression)
print evaluator.eval()

pprinter = PrettyPrintNode.factory(expression)
print pprinter.pprint()

I don't see the point of EvalNode and PrettyPrintNode. Why don't you
just give Integer, Sum and Product 'eval' and 'pprint' methods?
 
S

Steve Howell

I don't see the point of EvalNode and PrettyPrintNode. Why don't you
just give Integer, Sum and Product 'eval' and 'pprint' methods?

That's a good question, and it's the crux of my design dilemma. If
ALL I ever wanted to to with Integer/Sum/Product was to eval() and
pprint(), then I would just add those methods to Integer, Sum, and
Product, and be done with it, as you suggest. But what happens when
somebody wants to extend capability? Should every future software
developer that wants to use Integer/Sum/Product extend those classes
to get work done? What if they want to store additional state for
nodes?
 
R

Richard Thomas

Steve said:
I have been writing some code that parses a mini-language, and I am
running into what I know is a pretty common design pattern problem,
but I am wondering the most Pythonic way to solve it.
Basically, I have a bunch of really simple classes that work together
to define an expression--in my oversimplified example code below,
those are Integer, Sum, and Product.
Then I want to write different modules that work with those expression
objects.  In my example, I have a parallel set of classes that enable
you to evaluate an expression, and another set of objects that enable
you to pretty-print the expression.
The below code works as intended so far (tested under 2.6), but before
I go too much further with this design, I want to get a sanity check
and some ideas on other ways to represent the interrelationships
within the code.  Basically, the issue here is that you have varying
behavior in two dimensions--a node right now is only a Product/Integer/
Sum so far, but I might want to introduce new concepts like
Difference, Quotient, etc.  And right now the only things that you can
do to expressions is eval() them and pprint() them, but you eventually
might want to operate on the expressions in new ways, including fairly
abstract operations that go beyond a simple walking of the tree.
Here is the code:
    #######
    # base classes just represents the expression itself, which
    # get created by a parser or unit tests
    # example usage is
    # expression = Product(Sum(Integer(5),Integer(2)), Integer(6))
    class Integer:
        def __init__(self, val):
            self.val = val
    class BinaryOp:
        def __init__(self, a,b):
            self.a = a
            self.b = b
    class Sum(BinaryOp):
        pass
    class Product(BinaryOp):
        pass
    ########
    class EvalNode:
        def __init__(self, node):
            self.node = node
        def evaluatechild(self, child):
            return EvalNode.factory(child).eval()
        @staticmethod
        def factory(child):
            mapper = {
                'Sum': SumEvalNode,
                'Product': ProductEvalNode,
                'Integer': IntegerEvalNode
                }
            return abstract_factory(child, mapper)
    class SumEvalNode(EvalNode):
        def eval(self):
            a = self.evaluatechild(self.node.a)
            b = self.evaluatechild(self.node.b)
            return a + b
    class ProductEvalNode(EvalNode):
        def eval(self):
            a = self.evaluatechild(self.node.a)
            b = self.evaluatechild(self.node.b)
            return a * b
    class IntegerEvalNode(EvalNode):
        def eval(self): return self.node.val
    #######
    class PrettyPrintNode:
        def __init__(self, node):
            self.node = node
        def pprint_child(self, child):
            return PrettyPrintNode.factory(child).pprint()
        @staticmethod
        def factory(child):
            mapper = {
                'Sum': SumPrettyPrintNode,
                'Product': ProductPrettyPrintNode,
                'Integer': IntegerPrettyPrintNode
                }
            return abstract_factory(child, mapper)
    class SumPrettyPrintNode(PrettyPrintNode):
        def pprint(self):
            a = self.pprint_child(self.node.a)
            b = self.pprint_child(self.node.b)
            return '(the sum of %s and %s)' % (a, b)
    class ProductPrettyPrintNode(PrettyPrintNode):
        def pprint(self):
            a = self.pprint_child(self.node.a)
            b = self.pprint_child(self.node.b)
            return '(the product of %s and %s)' % (a, b)
    class IntegerPrettyPrintNode(PrettyPrintNode):
        def pprint(self): return self.node.val
    ##############
    # Not sure where this method really "wants to be" structurally,
    # or what it should be named, but it reduces some duplication
    def abstract_factory(node, node_class_mapper):
        return node_class_mapper[node.__class__.__name__](node)
    expression = Product(Sum(Integer(5),Integer(2)), Integer(6))
    evaluator = EvalNode.factory(expression)
    print evaluator.eval()
    pprinter = PrettyPrintNode.factory(expression)
    print pprinter.pprint()

I don't see the point of EvalNode and PrettyPrintNode. Why don't you
just give Integer, Sum and Product 'eval' and 'pprint' methods?

This looks more structurally sound:

class Node(object):
def eval(self):
raise NotImplementedError
def pprint(self):
raise NotImplementedError

class BinaryOperatorNode(Node):
operator = None
def __init__(self, first, second):
self.first = first
self.second = second
def eval(self):
return self.operator(self.first.eval(), self.second.eval())
def pprint(self):
"%s(%s, %s)" % (type(self).__name__, self.first.pprint(),
self.second.pprint())

class Sum(BinaryOperatorNode):
operator = lambda x, y: x + y

class Product(BinaryOperatorNode):
operator = lambda x, y: x * y

I don't know what you're doing exactly but if all you need is to be
able to parse and evaluate expressions then you can get very decent
mileage out of overriding operators, to the extent that the whole
thing you are trying to do could be a single class:

class Expression(object):
def __init__(self, func):
self.func = func
def __call__(self, **context):
while isinstance(self, Expression):
self = self.func(context)
return self
def __add__(self, other):
return Expression(lambda context: self.func(context) + other)
def __mul__(self, other):
return Expression(lambda context: self.func(context) * other)
def __radd__(self, other):
return Expression(lambda context: other + self.func(context))
def __rmul__(self, other):
return Expression(lambda context: other * self.func(context))
# ... and so forth ...

def integer(value):
return Expression(lambda context: value)

def variable(name, default):
return Expression(lambda context: context.get(name, default))

X = Expression("X", 0)
expr = 2 * X + 1
assert expr(X=3) == 7

But maybe that's not what you need. No need to overengineer if it is
though, keep it simple, simple is better than complex.
 
S

Steve Howell

This looks more structurally sound:

class Node(object):
   def eval(self):
      raise NotImplementedError
   def pprint(self):
      raise NotImplementedError

My objection to the interface you describe is that Node defines the
type of operations that can be done to it by third-party code, which
is something that I cannot predict, and which (pscouples every
subclass of Node to eval() and pprint(), even though those subclasses
might not care about either operation. They might be reimplementing
only one of those operations, or some completely different operation.
class Sum(BinaryOperatorNode):
   operator = lambda x, y: x + y

class Product(BinaryOperatorNode):
   operator = lambda x, y: x * y

I do like the notion of driving up Sum/Product abstractions like
"operator" into BinaryOperatorNode, but that is sort of an artifact of
my toy example, not my main design dilemma.
I don't know what you're doing exactly but if all you need is to be
able to parse and evaluate expressions then you can get very decent
mileage out of overriding operators, to the extent that the whole
thing you are trying to do could be a single class...

class Expression(object):
def __init__(self, func):
self.func = func
def __call__(self, **context):
while isinstance(self, Expression):
self = self.func(context)
return self
def __add__(self, other):
return Expression(lambda context: self.func(context) + other)
def __mul__(self, other):
[snipped]

It is my fault for muddying the conversation with a toy example that
evaluates arithmetic expressions. My particular problem involves a
mini-language that describes how you want to descend Python data
structures.

But maybe that's not what you need. No need to overengineer if it is
though, keep it simple, simple is better than complex.


Yep! I am trying to keep things simple, but my wish to extend is not
speculative at this point; it is real. I have an expression syntax,
which I call pyDTL, that I want these operations on:

* generate two different versions of Python code that expresses the
operation
* eagerly evaluate the expression on some object
* pretty-print the expression itself
* use the expression as a prototype for stub objects to determine
what operations they allow
* etc....

All of those requirements create the need to somehow create new
objects or functions that correspond to the same AST.

I describe pyDTL here:

http://showellonprogramming.blogspot.com/2009/11/mini-ddl-for-python.html

Here is a simple example:

echo '{.foo, .bar(){.spam, .eggs} }' | python dtl/parse.py
dict(
foo = obj.foo,
bar = (lambda bar:
dict(
spam = bar.spam,
eggs = bar.eggs,
))(obj.bar()),
)


###
 
G

Gregory Ewing

Steve said:
My objection to the interface you describe is that Node defines the
type of operations that can be done to it by third-party code, which
is something that I cannot predict

I think you have the right idea with a mapping from node
classes to implementations of operations, but your
intermediate classes SumPrettyPrintNode etc. are not
necessary. Just put functions implementing the operations
directly into the mapping.

Another thing is that the mappings should be somewhere
global that can be extended, perhaps through a registration
interface. Putting the mapping dicts inside the node classes
doesn't gain you anything over simply using methods. All
you've done is reimplement method dispatch.

A third thing is that instead of just looking up the node
class in the mapping, you may want to walk up the MRO and
try each of the base classes until you find a match. That
way, nodes will effectively inherit implementations from
their base classes for any operations that they don't
explicitly implement.

This inheritance behaviour becomes important if you have
two developers A and B, where A adds new classes and B
adds new operations. If A and B don't talk to each other,
you necessarily end up with gaps in the matrix: A's classes
won't have implementations for B's operations.

The best you can hope for is that the missing operations
will somehow fall back gracefully on default implementations.
Inheritance allows this to be achieved: if A derives all
his classes from existing node classes, and B provides
default implementations of all his operations for the root
Node class, then some implementation will always be found
for every class/operation combination.

Now, there's actually a very easy way of implementing all
this. Your registration interface could simply be

def register_operation(klass, operation_name, function):
setattr(klass, operation_name, function)

In other words, monkey-patch the classes to add the new
functions directly as methods. Since it's so simple, you
could even do away with the registration function altogether
and just have developers insert methods directly into the
classes.

Some people might consider this an ugly hack, but the
end result is almost exactly the same, it's very simple,
and it's very efficient, making use of Python's existing
method dispatch machinery instead of reinventing it
yourself.

One reason you might want to keep the registration function,
even if all it's doing is modifying the classes, is to
make it easier to find where operations are being defined,
by searching for calls to register_operation().
 
D

Diez B. Roggisch

Steve said:
That's a good question, and it's the crux of my design dilemma. If
ALL I ever wanted to to with Integer/Sum/Product was to eval() and
pprint(), then I would just add those methods to Integer, Sum, and
Product, and be done with it, as you suggest. But what happens when
somebody wants to extend capability? Should every future software
developer that wants to use Integer/Sum/Product extend those classes
to get work done? What if they want to store additional state for
nodes?

What's usually done is to create visitors/matchers. Those traverse the
AST, and either only visit, or return transformed versions of it (think
e.g. algebraic optimization)

A visitor will roughly look like this:


class ASTVisitor(object):



def visit(self, node):
name = node.__class__.__name__.lower()
if hasattr(self, "visit_%s" % name):
getattr(self, "visit_%s" % name)(node)
for child in node:
self.visit(child)



You can of course chose another type of dispatch, using e.g. a generic
method.

Then you create Visitors for specific tasks - pretty-printing,
evaluation, rewriting. Those don't have the overhead of your current
design with all those factory-mapping stuff, and yet you aren't forced
to put logic into AST you don't want there.


Diez
 
S

Simon Forman

What's usually done is to create visitors/matchers. Those traverse the AST,
and either only visit, or return transformed versions of it (think e.g.
algebraic optimization)

A visitor will roughly look like this:


class ASTVisitor(object):



  def visit(self, node):
      name = node.__class__.__name__.lower()
      if hasattr(self, "visit_%s" % name):
          getattr(self, "visit_%s" % name)(node)
      for child in node:
          self.visit(child)



You can of course chose another type of dispatch, using e.g. a generic
method.

Then you create Visitors for specific tasks - pretty-printing, evaluation,
rewriting. Those don't have the overhead of your current design with all
those factory-mapping stuff, and yet you aren't forced to put logic into AST
you don't want there.


FWIW I often use John Aycock's SPARK (Scanning, Parsing, and Rewriting
Kit) for this sort of thing. It has a GenericASTTraversal which "is a
Visitor pattern according to Design Patterns."

http://pages.cpsc.ucalgary.ca/~aycock/spark/

It's apparently distributed with the python source, but it's not in
the standard library, more's the pity IMO.

There's a bit of a learning curve but it's well worth it.

~Simon
 
S

Steve Howell

FWIW I often use John Aycock's SPARK (Scanning, Parsing, and Rewriting
Kit) for this sort of thing.  It has a GenericASTTraversal which "is a
Visitor pattern according to Design Patterns."

http://pages.cpsc.ucalgary.ca/~aycock/spark/

It's apparently distributed with the python source, but it's not in
the standard library, more's the pity IMO.

There's a bit of a learning curve but it's well worth it.

Thanks, Simon, I think something like GenericASTTraversal should work
for me in most cases.

A couple of folks have suggested an idea that is in his paper, which
is to use a single class for something PrettyPrinter, and then use
reflection to find methods on PrettyPrinter to pretty-print sums,
products, integers, etc.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,189
Members
46,734
Latest member
manin

Latest Threads

Top