compiler package vs parser

R

Robin Becker

Is the compiler package actually supposed to be equivalent to the parser module?

I ask because the following code

#### start p.py
def func(D):
for k in D:
exec '%s=D[%r]' % (k,k)
print i, j, k
print locals()
print i, j, k

if __name__=='__main__':
func(dict(i=1,j=33))
#### end p.py

when run through the compiler package to produce a module has different code to
that produced by the standard compilation (detected by dis.dis). In particular
the variables i and j in func above are treated differently; in the standard
compiler case LOAD_NAME is used and in the code from the package LOAD_GLOBAL is
used.

The code used to create the synthetic module is

#### start tp.py
from compiler import parse, pycodegen, misc, syntax
import time, struct, marshal
txt=open('p.py','r').read()

tree=parse(txt)
print 'tree\n',tree

def _get_tree(tree,filename):
misc.set_filename(filename, tree)
syntax.check(tree)
return tree

def getPycHeader():
mtime = time.time()
mtime = struct.pack('<i', mtime)
return pycodegen.Module.MAGIC + mtime

def dump(fn,code):
f=open(fn,'wb')
f.write(getPycHeader())
marshal.dump(code,f)
f.close()

gen = pycodegen.ModuleCodeGenerator(_get_tree(tree,'synp.py'))
code = gen.getCode()

dump('synp.pyc',code)
#### end tp.py

The module synp.pyc fails with a traceback (as expected because there are no
global i,j), but p.py runs OK.

I assume that my attempt to compile the tree is broken (is missing some special
traverse etc) otherwise the code would end up the same (except for line
numbering which I have ignored).
 
K

Kay Schluehr

Is the compiler package actually supposed to be equivalent to the parser module?

No. The parser module creates a concrete parse tree ( CST ) whereas
the compiler package transforms this CST into an AST for subsequent
computations. In more recent versions those CST -> AST transformations
are performed by the runtime and the Python compiler uses those
internally produced ASTs. The Python 2.6 API to ASTs is the ast module.
 
R

Robin Becker

Kay said:
No. The parser module creates a concrete parse tree ( CST ) whereas
the compiler package transforms this CST into an AST for subsequent
computations. In more recent versions those CST -> AST transformations
are performed by the runtime and the Python compiler uses those
internally produced ASTs. The Python 2.6 API to ASTs is the ast module.
.......

OK having digested most of what Kay said (my simple engineer's brain being
rather stretched at the best of times) I would like to rephrase my question.

Is it intended that the compile package be able to create the same code as the
normal parser chain. I think I understand now that the compiler used internally
is now exposed in various ways as built in modules.

The example I gave earlier seems to indicate that the compiler package may not
be creating the same final bytecode at least using the compilation approach that
I used (cbased on the compiler package itself).

If I have messed that up then there should be some easy fix, otherwise if
pycodegen is somehow not getting the semantics of the the variables i,j correct
is there some way I can fix that.

I realize that I probably ought to be trying this out with the newer ast stuff,
but currently I am supporting code back to 2.3 and there's not much hope of
doing it right there without using the compiler package.

My analysis of the problem is that in

#### start p.py
def func(D):
for k in D:
exec '%s=D[%r]' % (k,k)
print i, j, k
print locals()
print i, j, k

if __name__=='__main__':
func(dict(i=1,j=33))
#### end p.py

the compiler package ends up treating i & j as global, whereas the modern
analysis doesn't (and doesn't say they're definitely local either). Looking at
the code in Python/compile.c the compile_nameop code seems to check for scopes
FREE, CELL, LOCAL, GLOBAL_IMPLICIT & GLOBAL_EXPLICIT whereas
pycodegen.CodeGenerator._nameOp seems not to know about GLOBAL_IMPLICIT/EXPLICIT
but has only a single GLOBAL scope.
 
A

Aahz

My analysis of the problem is that in

#### start p.py
def func(D):
for k in D:
exec '%s=D[%r]' % (k,k)
print i, j, k
print locals()
print i, j, k

if __name__=='__main__':
func(dict(i=1,j=33))
#### end p.py

the compiler package ends up treating i & j as global, whereas the
modern analysis doesn't (and doesn't say they're definitely local
either). Looking at the code in Python/compile.c the compile_nameop
code seems to check for scopes FREE, CELL, LOCAL, GLOBAL_IMPLICIT &
GLOBAL_EXPLICIT whereas pycodegen.CodeGenerator._nameOp seems not to
know about GLOBAL_IMPLICIT/EXPLICIT but has only a single GLOBAL scope.

At this point, given the lack of response from people who actually know
the compiler internals, I think it's fair of you to try going to
python-dev -- normally questions like this are off-topic, and if someone
complains, I'll take the blame. ;-)
 
G

Gabriel Genellina

En Fri, 17 Apr 2009 10:55:46 -0300, Scott David Daniels
Robin said:
def func(D):
for k in D:
exec '%s=D[%r]' % (k,k)
print i, j, k
print locals()
print i, j, k
if __name__=='__main__':
func(dict(i=1,j=33))
#### end p.py
the compiler package ends up treating i & j as global, whereas the
modern analysis doesn't (and doesn't say they're definitely local
either).

If they are not definitely local, they are non-local. Locals are
determined at function definition time, not function execution time.
So, your expectations about what the exec statement can do above are
mistaken. You may try to work your way around it, but, IMHO, you
will not succeed. If the code above were to work as you wish, every
access to every non-local in code that contains an "exec" would have
to check a "new locals" dictionary just in case the exec added a local.

And that's what happens. In absence of an exec statement, locals are
optimized: the compiler knows exactly how many of them exist, and its
names, just by static code analysis. But when the function contains an
exec statement (or an "import *" statement) this is not possible anymore,
and the compiler has to switch to another strategy and generate code using
a less-optimized approach. Unknown variables are accessed using
LOAD_NAME/STORE_NAME (require a name lookup) instead of the normal
LOAD_FAST/STORE_FAST for local variables and LOAD_GLOBAL/STORE_GLOBAL for
global ones.
Think about what this code would have to do:
i = j = 42
def func(D):
print i, j, k
for k in D:
exec '%s=D[%r]' % (k,k)
print i, j, k

I don't completely understand what you wanted to show, but try this:

py> i = j = 42
py> def f():
.... print i, j
.... exec "k=1"
.... print i, j, k
.... exec "i=5"
.... print i, j, k
....
py> f()
42 42
42 42 1
5 42 1
py> i
42
py> j
42
py> k
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'k' is not defined

Note that i starts as a global name and after the exec "i=5" it becomes a
local variable; and k is always a local variable even if there is no
explicit assignment to it (except in the exec).
 
K

Kay Schluehr

I realize that I probably ought to be trying this out with the newer ast stuff,
but currently I am supporting code back to 2.3 and there's not much hope of
doing it right there without using the compiler package.

You might consider using the *builtin* parser module and forget about
the compiler package if it is broken ( I take your word that it is )
or modern ast representations which aren't really necessary for Python
anyway.
42

This is also not 100% reliable ( at least not for all statements in
all Python versions ) but it uses the internal parser/compiler and not
a standard library compiler package that might not be that well
maintained.
 
R

Robin Becker

Kay said:
You might consider using the *builtin* parser module and forget about
the compiler package if it is broken ( I take your word that it is )
or modern ast representations which aren't really necessary for Python
anyway.

42

This is also not 100% reliable ( at least not for all statements in
all Python versions ) but it uses the internal parser/compiler and not
a standard library compiler package that might not be that well
maintained.
........
thinking about it I just made the wrong decision back in 2004; we observed a
semantic change caused by the new scoping rules and tried to fix using the wrong
model; back then we were probably supporting 2.0 as well so the parser module
probably wasn't available everywhere anyway; even today the ast stuff isn't
available in 2.4. I prefer the ast approach as preppy is effectively indentation
free which makes the tree harder to synthesize for the parser tree.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,989
Messages
2,570,207
Members
46,782
Latest member
ThomasGex

Latest Threads

Top