How does python build its AST

M

MonkeeSage

A quick question about how python parses a file into compiled
bytecode. Does it parse the whole file into AST first and then compile
the AST, or does it build and compile the AST on the fly as it reads
expressions? (If the former case, why can't functions be called before
their definitions?)

Thanks,
Jordan
 
K

Kay Schluehr

A quick question about how python parses a file into compiled
bytecode. Does it parse the whole file into AST first and then compile
the AST, or does it build and compile the AST on the fly as it reads
expressions? (If the former case, why can't functions be called before
their definitions?)

Thanks,
Jordan

Python uses a highly optimized table based LL(1) parser to create a
syntax tree. In Python 2.5 it transforms the concrete syntax tree
( CST ) into an AST before compilation. Before that it compiled the
CST directly. I'm not sure what you are asking for ( in parentheses )?
Parser actions or preprocessing the tree? The latter is definitely
possible and you can build your own compilation machinery using the
parser module and the compile function.

Kay
 
M

MonkeeSage

Python uses a highly optimized table based LL(1) parser to create a
syntax tree. In Python 2.5 it transforms the concrete syntax tree
( CST ) into an AST before compilation. Before that it compiled the
CST directly. I'm not sure what you are asking for ( in parentheses )?
Parser actions or preprocessing the tree? The latter is definitely
possible and you can build your own compilation machinery using the
parser module and the compile function.

Kay

Thanks for your reply. You answered my main question. The secondary
question is why is it a NameError to try to use a variable/function
prior to the declaration in a source file, since python has already
seen the declaration on the first pass building the CST/AST? At
compile time, shouldn't it already know about it? (Forgive my
ignorance.)

Regards,
Jordan
 
J

Jason

Thanks for your reply. You answered my main question. The secondary
question is why is it a NameError to try to use a variable/function
prior to the declaration in a source file, since python has already
seen the declaration on the first pass building the CST/AST? At
compile time, shouldn't it already know about it? (Forgive my
ignorance.)

Regards,
Jordan

Remember that Python is a highly dynamic language. You can't
guarantee that a name will be accessible until the actual execution
point that you try to access it. For example:
Traceback (most recent call last):
File said:
# The standard hello function
def Hello(): print 'Hello!' ....
Hello() Hello!

# Modification through the globals dictionary
def Vikings(): print 'Spam, spam, spam! Spam!' ....
globals()[ 'olleH'[::-1] ] = Vikings
Hello() Spam, spam, spam! Spam!

# Another dictionary-based modification
globals()['Hello'] = lambda: 'Go away!'
Hello() 'Go away!'

# Remove the syntactic sugar and make a function object directly
import new
import compiler
new_code = compiler.compile( 'print "Die in a fire!"', 'Hateful', 'single')
Hello = new.function( new_code, {}, 'Angry')
Hello() Die in a fire!

# A callable object (not a function!)
class AnnoyingNeighbor(object):
.... def __call__(self):
.... print 'Hi-diddly-ho, neighbor!'
....
If this was in a file, which version of Hello should the first call go
to? Should the 'Hello' name be bound to the function, the callable
class instance, the result of new.function? What if another
statement

The problem is that, in Python, functions are first-class objects,
just like class definitions and other variables. The 'def' statement
is something that gets executed. It compiles the code block for the
function into a function object, then assigns it to the name of the
function.

When Python hits a function call, it must look up the name in the
current name dictionary. In any but that most simple cases, the name
may not refer to its previous values: any intervening calls or
statements could have rebound that name to another object, either
directly, through side-effects, or even through the C interface. You
can't call a function before its def statement is parsed any more than
you can print the value of a variable before you assign anything to
it.

Unlike C, functions are not special under Python. Unlike C++, classes
aren't terribly special either. They have syntactic sugar to help the
coding go down, but they are only Python objects, and are bound to the
same rules as all other Python objects.

--Jason
 
M

Michael Spencer

MonkeeSage said:
A quick question about how python parses a file into compiled
bytecode. Does it parse the whole file into AST first and then compile
the AST, or does it build and compile the AST on the fly as it reads
expressions? (If the former case, why can't functions be called before
their definitions?)

Thanks,
Jordan
Python (2.5+) does parse the whole file top-to-bottom into an AST, and then
compile the entire AST, resulting in a module code object that may contain other
compiled code objects for the function bodies.

However, this isn't relevant to your parenthesized question, since compiling
functions does not make them callable. A function becomes available only once
its def statement has been executed. Since execution (generally) proceeds top
to bottom, this is why functions can't be called before their definitions.

HTH
Michael
 
K

Kay Schluehr

Thanks for your reply. You answered my main question. The secondary
question is why is it a NameError to try to use a variable/function
prior to the declaration in a source file, since python has already
seen the declaration on the first pass building the CST/AST? At
compile time, shouldn't it already know about it? (Forgive my
ignorance.)

Regards,
Jordan

When the compiler finds a name not being bound in the function scope
it supposes the name is global and a LOAD_GLOBAL opcode is created
instead of LOAD_FAST ( accessing locals ). Other than the local/
function scope the global ( module level ) scope is dynamic and the
compiler can't know whether a name will be present at runtime.

Kay
 
T

Terry Reedy

|A quick question about how python parses a file into compiled
| bytecode. Does it parse the whole file into AST first and then compile
| the AST, or does it build and compile the AST on the fly as it reads
| expressions? (If the former case, why can't functions be called before
| their definitions?)

The direct answer is that names cannot be entered into namespaces and bound
to objects to be looked up until the corresponding object is created by
executing the corresponding code. Compiling Python code creates the
internal code needed to create Python objects, but only exceptionally
creates Python objects in the process. In particular, compiling a function
may create code objects (since the code is a constant) referenced by the
function creation code, but not function objects themselves.

A less direct answer is the Python is designed to by usable interactively.
In CPython interactive mode, you enter and the interpreter compiles and
executes one top(module)-level statement at a time. Calling a function you
have not yet entered would be magical.

Terry Jan Reedy
 
M

MonkeeSage

|A quick question about how python parses a file into compiled
| bytecode. Does it parse the whole file into AST first and then compile
| the AST, or does it build and compile the AST on the fly as it reads
| expressions? (If the former case, why can't functions be called before
| their definitions?)

The direct answer is that names cannot be entered into namespaces and bound
to objects to be looked up until the corresponding object is created by
executing the corresponding code. Compiling Python code creates the
internal code needed to create Python objects, but only exceptionally
creates Python objects in the process. In particular, compiling a function
may create code objects (since the code is a constant) referenced by the
function creation code, but not function objects themselves.

A less direct answer is the Python is designed to by usable interactively.
In CPython interactive mode, you enter and the interpreter compiles and
executes one top(module)-level statement at a time. Calling a function you
have not yet entered would be magical.

Terry Jan Reedy

Thanks for your replies Kay, Michael and Terry. To summarize my
understanding of your answers:

- Python (meaning CPython here) first does a parsing pass over the
entire file, with 2.5+ building a syntax tree and prior versions
building a parse tree.

- It then compiles the tree into bytecode, by walking down the nodes
recursively from the top down.

- When in interactive mode (e.g., python prompt), it builds the tree
and compiles it on the fly, as individual expressions are parsed.

Is this correct? If so, may I pick your brain on two more points?

1.) What is the benefit of doing a two phase compilation (parsing/
compiling), rather than a single, joint parse + compile phase (as in
interactive mode)?

2.) Wouldn't it be possible on the parsing phase to "tag" names as
valid, even if they occur prior to the assignment of the name, if on a
later branch that assignment is found (and have the compiler be aware
of such tags)?

The reason I'm wondering about these things is that on a different
group, it came up that perl allows referencing before assignment,
which seems to require a two-phase compilation, which made me wonder
how python does things. And since (if I haven't misunderstood), python
does use two-phase compilation, I just wondered if it would be
possible to do what perl does. I'm not advocating it as a feature for
python (it seems bass-ackwards to me to reference a name before it's
assigned to in the script), this is just curiosity.

Thanks,
Jordan
 
T

Terry Reedy

| 1.) What is the benefit of doing a two phase compilation (parsing/
| compiling), rather than a single, joint parse + compile phase (as in
| interactive mode)?

As far as I know (without looking at the code), there is no difference
between interactive and batch mode except that the unit processed is a
statement versus file.

| 2.) Wouldn't it be possible on the parsing phase to "tag" names as
| valid, even if they occur prior to the assignment of the name, if on a
| later branch that assignment is found (and have the compiler be aware
| of such tags)?

What would be the point? The semantics of Python code is essentially
independent of whether it is executed in interactive or batch mode. (The
exceptions are not relevant to your question.) So there is no point I can
see to doing something in file mode that could not be done in statement
mode.

tjr
 
M

MonkeeSage

| 1.) What is the benefit of doing a two phase compilation (parsing/
| compiling), rather than a single, joint parse + compile phase (as in
| interactive mode)?

As far as I know (without looking at the code), there is no difference
between interactive and batch mode except that the unit processed is a
statement versus file.

I see.
| 2.) Wouldn't it be possible on the parsing phase to "tag" names as
| valid, even if they occur prior to the assignment of the name, if on a
| later branch that assignment is found (and have the compiler be aware
| of such tags)?

What would be the point? The semantics of Python code is essentially
independent of whether it is executed in interactive or batch mode. (The
exceptions are not relevant to your question.) So there is no point I can
see to doing something in file mode that could not be done in statement
mode.

It would pretty much be pointless. Referencing a name before it's
assigned seems confusing and strange to me. I suppose one sane use
would be in implementing something like Haskell's "where" clause, but
I don't think that would work well (or even be very useful) in python.
Anyhow, this was more of an academic curiosity on how CPython does
things, and I appreciate your answers.

Regards,
Jordan
 
C

Carl Banks

A quick question about how python parses a file into compiled
bytecode. Does it parse the whole file into AST first and then compile
the AST, or does it build and compile the AST on the fly as it reads
expressions? (If the former case, why can't functions be called before
their definitions?)

Python creates certain objects at compile time but doesn't bind them
to names in the modulespace until run time.

Python could--and many other languages do--automatically bind these
objects to names upon import. Python doesn't do it because it sees a
module as "code to be executed" rather than a "list of global object
definitions".

Something like this would be awkward if Python bound the names at
import time:

if X:
def a(): do_this()
else:
def a(): do_that()

Which one gets bound to a?

To do something similar in C would require preprocessor macros (ick).


Carl Banks
 
M

MonkeeSage

Python creates certain objects at compile time but doesn't bind them
to names in the modulespace until run time.

Python could--and many other languages do--automatically bind these
objects to names upon import. Python doesn't do it because it sees a
module as "code to be executed" rather than a "list of global object
definitions".

Something like this would be awkward if Python bound the names at
import time:

if X:
def a(): do_this()
else:
def a(): do_that()

Which one gets bound to a?

To do something similar in C would require preprocessor macros (ick).

Carl Banks

Gotcha. Thanks again everyone for your answers.

Regards,
Jordan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,240
Members
46,828
Latest member
LauraCastr

Latest Threads

Top