Painful?: Using the ast module for metaprogramming

J

Joseph Garvin

I decided to try using the ast module to see how difficult or not it
was to use for metaprogramming. So I tried writing a decorator that
would perform a simple transformation of a function's code. It was
certainly not as easy as I had guessed, but I did succeed so it's not
impossible. The issues I encountered might suggest changes to the ast
module, but since this is my first time messing with it I think it's
equally likely I'm just ignorant of the best way to handle them.
Questions:

-If I have the source to a single function definition and I pass it to
ast.parse, I get back an ast.Module. Why not an ast.FunctionDef?

-An ast.Name always has an ast.Load or ast.Store "context" associated
with it. This is a bit odd, since based on the actual context (what
the parent of the ast.Name node is) it is always clear whether the
variable needs to be loaded or stored -- loaded when the node is the
child of an expression and stored when the node is the target of an
assignment. Is there some weird corner case that necessitates this
redundancy? It adds a lot of noise when you're trying to interpret
trees, and the ast.Load and ast.Store objects themselves don't seem to
contain any data.

-Why can't orelse for ast.If and ast.While default to empty []? If you
make an ast.While object by hand but don't specify that orelse is
empty, you get an error when later trying to compile it. This seems
silly since by default not specifying an else in code results in the
ast module generating a node with orelse=[].

-Why can't keywords and args for ast.Call default to empty []? Same
problem as with orelse.

-Why can't I eval a functiondef to get back a function object? As it
stands, I have to work around this by giving the functiondef a unique
name, exec'ing the AST, and then doing a lookup in locals[] to get the
resulting function object. This is a nasty hack.

-The provided NodeTransformer class is useful, but provides no way
(that I can see) to replace a single statement with multiple
statements, because multiple statements would constitute multiple
nodes. To work around this, I take the code I want to swap in, and
wrap it in a "while 1:" block with a break at the end to ensure it
only runs once and doesn't loop. Again, a kludge.

-The module provides no means to convert an AST back into source code,
which would be nice for debugging.

-It would be nice if decorators were passed a function's AST instead
of a function object. As it is I have to use inspect.getsource to
retrieve the source for the function in question, and then use
ast.parse, which is a bit inefficient because the cpython parser has
to already have done this once before. Again, it feels like a hack.
Making decorators take AST's would obviously be a compatibility
breaking change, since you'd then have to compile them before
returning, so alternatively you could have "ast decorators" that would
use a different prefix symbol in place of @, or you could change it so
that decorators got passed the AST but the AST had a __call__ method
that would cause the AST to parse itself and become the resulting
function object in place and then execute itself, so until the first
time it was called it was an AST but after that was a function object
(it would 'lazily' become a function). I think that wouldn't break
most code.

I have some ideas for what an easier API would look like but I want to
be sure these are real issues first and not just me doing it wrong ;)

Regards,

Joe
 
P

Paul Rubin

Joseph Garvin said:
I decided to try using the ast module to see how difficult or not it
was to use for metaprogramming.

Maybe you really want Lisp? ;-)
 
M

Martin v. Löwis

-If I have the source to a single function definition and I pass it to
ast.parse, I get back an ast.Module. Why not an ast.FunctionDef?

Because it is easier for processing if you always get the same type of
result. Typically, you don't know what's in the source code, so you
need to parse, then inspect.
-An ast.Name always has an ast.Load or ast.Store "context" associated
with it. This is a bit odd, since based on the actual context (what
the parent of the ast.Name node is) it is always clear whether the
variable needs to be loaded or stored

It simplifies the interpretation/compilation of the code, by removing
the need to go up in the tree. Notice that just looking at the parent
node would not be sufficient: for

foo[foo] = foo = foo

the various occurrences of foo take all kinds of contexts. Figuring
out whether a specific occurrence is a load or store would be tricky.
It adds a lot of noise when you're trying to interpret
trees, and the ast.Load and ast.Store objects themselves don't seem to
contain any data.

Yes, they are flags only. operator works the same way (taking values
such as Add, Sub, ...); likewise unaryop and cmpop.
-Why can't orelse for ast.If and ast.While default to empty []?

You want to store None? That would be a type error; orelse is
specified as "stmt*". So it must be a list.
-Why can't keywords and args for ast.Call default to empty []? Same
problem as with orelse.

Same answer.
-Why can't I eval a functiondef to get back a function object?

Because a definition is not an expression. You can only eval
expressions.
-The module provides no means to convert an AST back into source code,
which would be nice for debugging.

See Demo/parser/unparse.py
-It would be nice if decorators were passed a function's AST instead
of a function object.

How could this possibly work? If you run from a pyc file, there will
be no AST available to pass.

Regards,
Martin
 
J

Joseph Garvin

Because it is easier for processing if you always get the same type of
result. Typically, you don't know what's in the source code, so you
need to parse, then inspect.

I see. True I'm guessing for the applications for which the module was
originally intended. In a metaprogramming context you usually know
though.

It simplifies the interpretation/compilation of the code, by removing
the need to go up in the tree. Notice that just looking at the parent
node would not be sufficient...

I see how it avoids needing to look at the parent node in general, but
if we were compiling by recursively descending through the AST, then
we would know whether Name's would be loads or stores by the time we
got to them (we would already had to have visited an encompassing
assignment or expression) -- except in the case of your a = b = c
expression, which I'm curious how Python handles. The natural answer
is for assignment to be an expression (so b = c returns the new value
of b). But Python doesn't do that, so then I'd expect we'd have some
third ast.LoadAndStore() option for b, but examining ast.parse's
behavior it looks like it chooses Store...
-Why can't orelse for ast.If and ast.While default to empty []?

You want to store None? That would be a type error; orelse is
specified as "stmt*". So it must be a list.

A list is actually what I want, an empty one. The problem is that
ast.While and ast.If's constructors default to the opposite,
orelse=None. Same with keywords and args for ast.Call. Admittedly,
adding orelse=[] to the constructor calls isn't terribly burdensome,
but it does make already obfuscated looking AST mangling code even
worse.
Because a definition is not an expression. You can only eval
expressions.

I understand that if function definitions were expressions, because of
the whitespace syntax there wouldn't be a way to express an assignment
to such an expression. But, why would it be problematic to let them be
expressions anyway?
See Demo/parser/unparse.py

Thanks :)
How could this possibly work? If you run from a pyc file, there will
be no AST available to pass.

I hadn't thought about bytecode compilation. In addition to the other
suggestions you would have to change it to preserve an AST.

Regards,

Joe
 
K

Kay Schluehr

-It would be nice if decorators were passed a function's AST instead
of a function object. As it is I have to use inspect.getsource to
retrieve the source for the function in question, and then use
ast.parse, which is a bit inefficient because the cpython parser has
to already have done this once before.

It doesn't matter that much though because the Python parser is very
efficient and the decorator is applied only once.

The PyGPU project used this approach when I remember it correctly:

http://www.cs.lth.se/home/Calle_Lejdfors/pygpu/
 
M

Martin v. Löwis

I see how it avoids needing to look at the parent node in general, but
if we were compiling by recursively descending through the AST, then
we would know whether Name's would be loads or stores by the time we
got to them (we would already had to have visited an encompassing
assignment or expression)

Python's compiler does recursively descend. However, it is still
more convenient to allow uniform processing of expressions, rather
than having parameters passed down into the visitor functions.
except in the case of your a = b = c
expression, which I'm curious how Python handles. The natural answer
is for assignment to be an expression (so b = c returns the new value
of b). But Python doesn't do that, so then I'd expect we'd have some
third ast.LoadAndStore() option for b, but examining ast.parse's
behavior it looks like it chooses Store...

No, the value of b is never read in this assignment. I'm not quite
sure whether you are aware what the *actual* semantics of this
assignment is; it is equivalent to

tmp = c
a = tmp
b = tmp

(so the assignment to a happens first), and neither a nor b is
being read. So it's simple stores into both a and b.
-Why can't orelse for ast.If and ast.While default to empty []?
You want to store None? That would be a type error; orelse is
specified as "stmt*". So it must be a list.

A list is actually what I want, an empty one. The problem is that
ast.While and ast.If's constructors default to the opposite,
orelse=None. Same with keywords and args for ast.Call. Admittedly,
adding orelse=[] to the constructor calls isn't terribly burdensome,
but it does make already obfuscated looking AST mangling code even
worse.

Ah, that could be fixed. Please contribute a patch.
I understand that if function definitions were expressions, because of
the whitespace syntax there wouldn't be a way to express an assignment
to such an expression. But, why would it be problematic to let them be
expressions anyway?

This would be a significant change to the compiler, which would have
to learn what bytecodes to compile it to. Feel free to discuss this
on python-ideas.

Regards,
Martin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,189
Members
46,734
Latest member
manin

Latest Threads

Top