a basic bytecode to machine code compiler

R

Rouslan Korneychuk

I was looking at the list of bytecode instructions that Python uses and
I noticed how much it looked like assembly. So I figured it can't be to
hard to convert this to actual machine code, to get at least a small
boost in speed.

And so I whipped up a proof of concept, available at
https://github.com/Rouslan/nativecompile

I'm aware that PyPy already has a working JIT compiler, but I figure it
will be a long time before they have a version of Python that is ready
for everybody to use, so this could be useful in the mean time.

I chose to create this for the latest stable version of Python and I
happen to use some functionality that is only available since Python 3.2.

The basic usage is:
Hello World!


This compiler does absolutely nothing clever. The only difference
between the bytecode version and the compiled version is there is no
interpreter loop and the real stack is used instead of an array.

Most of it is written in Python itself. There is one module written in C
that does the things that cannot easily be done in pure Python, such as
get the addresses of API functions and to execute the newly created code.

So far I have only implemented a few bytecode instructions and only have
32-bit x86-compatible support. I have only tested this on Linux. It
might work on Windows but only if you can run programs without any sort
of data execution prevention (I can fix that if anyone wants). And I'm
sure more optimized machine code can be generated (such as rearranging
the error checking code to work better with the CPU's branch predictor).

Since so few instructions are implemented I haven't done any benchmarks.


What do people think? Would I be wasting my time going further with this?
 
S

Stefan Behnel

Rouslan Korneychuk, 01.04.2011 00:33:
I was looking at the list of bytecode instructions that Python uses and I
noticed how much it looked like assembly. So I figured it can't be to hard
to convert this to actual machine code, to get at least a small boost in
speed.

I think I recall having read about something like this before, but I can't
quite find it. In any case, you may want to take a look at both Cython and
Psyco.

Stefan
 
T

Terry Reedy

I was looking at the list of bytecode instructions that Python uses and
I noticed how much it looked like assembly. So I figured it can't be to
hard to convert this to actual machine code, to get at least a small
boost in speed.

And so I whipped up a proof of concept, available at
https://github.com/Rouslan/nativecompile

I'm aware that PyPy already has a working JIT compiler, but I figure it
will be a long time before they have a version of Python that is ready
for everybody to use, so this could be useful in the mean time.

I believe PyPy folk think it ready now, at least for some uses.
Speedwise, it is more or less is now comparable to CPython, winning some
benchmarks, losing others.
....
What do people think? Would I be wasting my time going further with this?

Depends on whether your goal is personal (learning, fun, use) or
usefulness to others. For the latter, you *might* do better to help with
an existing project, such as Cython or Dufour's ShedSkin.
 
S

Steven D'Aprano

I'm aware that PyPy already has a working JIT compiler, but I figure it
will be a long time before they have a version of Python that is ready
for everybody to use, so this could be useful in the mean time.

PyPy is ready to use *now*, if you are happy writing code that targets
Python 2.5 and don't need C extensions.


[...]
What do people think? Would I be wasting my time going further with
this?

Depends on what your ultimate aim is. If it is to learn things yourself,
then it is never a waste of time to learn new things.

If your aim is to get a good working project that you can be proud to put
on your CV, then go right ahead.

If your aim is to contribute to a Python compiler that will actually be
used by people other than yourself, I'm not so sure... personally, I
expect that PyPy is the future of Python optimizing compilers, but I
could be wrong.

I suggest you check out the competitors:

Shedskin is a Python to C++ compiler;
Psyco is a JIT specialising compiler;
Nuitka claims to be a C++ implementation that compiles to machine code;
Berp claims to be a Haskell implementation that does the same;
Compyler claims to be a native x86 assembly compiler;
UnPython claims to be an experimental Python to C compiler.


Of the six, as far as I know only Shedskin and Psyco are widely used.



Good luck, and remember:

Release early, release often, and let the community know when you do!
 
S

Stefan Behnel

Steven D'Aprano, 01.04.2011 14:57:
I suggest you check out the competitors:

Shedskin is a Python to C++ compiler;
Psyco is a JIT specialising compiler;
Nuitka claims to be a C++ implementation that compiles to machine code;
Berp claims to be a Haskell implementation that does the same;
Compyler claims to be a native x86 assembly compiler;
UnPython claims to be an experimental Python to C compiler.


Of the six, as far as I know only Shedskin and Psyco are widely used.

Erm, yes, right. If you want to exclude Cython, which arguably is the only
static Python compiler that actually has a large user base, then those may
really be the only two that are widely used. Except that Psyco is certainly
being used a lot more often than Shedskin, mainly because it actually
allows you to execute Python code.

Stefan
 
R

Rouslan Korneychuk

Thanks for all the replies. I wasn't aware of some of these
alternatives. Most of these seem to transform Python code/bytecode into
another language. I was already well aware of Cython. On the Nuitka
blog, I notice it says "Compiling takes a lot [sic] time, ...". Compyler
seems to generate assembly and then parse the assembly to generate a
Windows exe. Berp turns python into Haskell, not directly into machine code.

The closest thing to mine seems to be Psyco. It tries to do something
more ambitious. It analyzes the program while it's running to create
specialized versions of certain functions. High memory usage seems to be
an issue with Psyco.

My approach is to simply translate the bytecode into raw machine code as
directly as possible, quickly and without using much memory. Basically I
was going for a solution with no significant drawbacks. It was also
meant to be very easy to maintain. The machine code is generated with a
series of functions that very closely mirrors AT&T syntax (same as the
default syntax for the GNU assembler) with some convenience functions
that make it look like some kind of high-level assembly. For example,
here is the implementation for LOAD_GLOBAL:

@hasname
def _op_LOAD_GLOBAL(f,name):
return (
f.stack.push_tos(True) + [
ops.push(address_of(name)),
ops.push(GLOBALS)
] + call('PyDict_GetItem') +
if_eax_is_zero([
discard_stack_items(1),
ops.push(BUILTINS)
] + call('PyDict_GetItem') +
if_eax_is_zero([
discard_stack_items(1),
ops.push(pyinternals.raw_addresses[
'GLOBAL_NAME_ERROR_MSG']),
ops.push(pyinternals.raw_addresses['PyExc_NameError'])
] + call('format_exc_check_arg') + [
goto(f.end)
])
) + [
discard_stack_items(2)
])

To make sense of it, you just need to ignore the square brackets and
plus signs (they are there to create a list that gets joined into one
byte string at the very end) and imagine it's assembly code (I should
probably just write a variadic function or use operator overloading to
make this syntactically clearer). Any time a new version of Python is
released, you would just run diff on Python-X.X/Python/ceval.c and see
which op codes need updating. You wouldn't need to make a broad series
of changes because of a minor change in Python's syntax or bytecode.

And that's just one of the larger op code implementations. Here's the
one for LOAD_CONST:

@hasconst
def _op_LOAD_CONST(f,const):
return f.stack.push_tos() + [f.stack.push(address_of(const))]


Anyway, It was certainly interesting working on this. I'll probably at
least implement looping and arithmetic so I can have something
meaningful to benchmark.
 
S

Steven D'Aprano

Steven D'Aprano, 01.04.2011 14:57:

Erm, yes, right. If you want to exclude Cython, which arguably is the
only static Python compiler that actually has a large user base, then
those may really be the only two that are widely used. Except that Psyco
is certainly being used a lot more often than Shedskin, mainly because
it actually allows you to execute Python code.

My apologies, I thought about including Cython in the list, but my
understanding of it is that it is a derivative of Pyrex, and used for
writing C extensions in a Python-like language (Python + type
annotations). We were talking about talking ordinary, unmodified Python
code and compiling it to machine code, and I didn't think either Pyrex or
Cython do that.
 
S

Stefan Behnel

Steven D'Aprano, 02.04.2011 12:04:
My apologies, I thought about including Cython in the list, but my
understanding of it is that it is a derivative of Pyrex, and used for
writing C extensions in a Python-like language (Python + type
annotations). We were talking about talking ordinary, unmodified Python
code and compiling it to machine code, and I didn't think either Pyrex or
Cython do that.

Ok, no problem. Pyrex certainly doesn't play in the same league.

Cython actually supports most Python language features now (including
generators in the development branch), both from Python 2 and Python 3.
Chances are that the next release will actually compile most of your Python
code unchanged, or only with minor adaptations.

Stefan
 
J

John Nagle

Cython actually supports most Python language features now (including
generators in the development branch), both from Python 2 and Python 3.
Chances are that the next release will actually compile most of your
Python code unchanged, or only with minor adaptations.

Cython requires the user to insert type declarations, though, to
get a speed improvement.

Shed Skin has a good type inference system, but it insists on
a unique type for each object (which includes "null"; a type of
"str or null" is not acceptable). The rest of Shed Skin, outside
the type inference system, is not very well developed.

There's a path there to a fast Python with some restrictions.
The Shed Skin inference engine with the Cython engine might have
potential.

PyPy gets some speedups, mostly by recognizing when numbers can be
unboxed. (CPython carries around a CObject for every integer; that's
what's meant by "boxing") But outside of number-crunching, PyPy
doesn't show big gains. And the whole PyPy JIT system is far
too complex. They try to retain all of Python's dynamism, and
as a result, they need to be able to bail from compiled code
to one of two different interpreters, then recompile and go
back to compiled mode.

Unladen Swallow failed to produce much of a speed improvement
and Google pulled the plug. It's time to look at alternatives.

There's no easy way to speed up Python; that's been tried.
It needs either a very, very elaborate JIT system, more complex
than the ones for Java or Self, or some language restrictions.
The main restriction I would impose is to provide a call that says:
"OK, we're done with loading, initialization, and configuration.
Now freeze the code." At that moment, all the global
analysis and compiling takes place. This allows getting rid
of the GIL and getting real performance out of multithread
CPUs.

John Nagle
 
P

Paul Rubin

John Nagle said:
There's no easy way to speed up Python; that's been tried.
It needs either a very, very elaborate JIT system, more complex
than the ones for Java or Self, or some language restrictions.

Is it worse than Javascript? Tracemonkey and its descendants produce
pretty fast code, I think.
The main restriction I would impose is to provide a call that says:
"OK, we're done with loading, initialization, and configuration.
Now freeze the code." At that moment, all the global
analysis and compiling takes place. This allows getting rid
of the GIL and getting real performance out of multithread CPUs.

That's quite an interesting idea. I do think a lot of production Python
code implicitly depends on the GIL and would need rework for multicore.
For example, code that expects "n += 1" to be atomic, because the
CPython bytecode interpreter won't switch threads in the middle of it.
 
R

Robert Kern

There's no easy way to speed up Python; that's been tried.
It needs either a very, very elaborate JIT system, more complex
than the ones for Java or Self, or some language restrictions.
The main restriction I would impose is to provide a call that says:
"OK, we're done with loading, initialization, and configuration.
Now freeze the code." At that moment, all the global
analysis and compiling takes place. This allows getting rid
of the GIL and getting real performance out of multithread
CPUs.

That is not the reason the GIL exists, and being able to "freeze" the code will
not let you get rid of the GIL alone. CPython's GIL exists to protect the data
structures internal to its implementation and provide implicit protection to C
extension modules (making them relatively easy to write threadsafe). Reference
counts are the primary data structures being protected. IronPython and Jython
allow just as much dynamism as CPython, and they have no mechanism for
"freezing" the code, but they do not have a GIL. I believe this has been pointed
out to you before, but I don't think I've seen you acknowledge it.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,736
Latest member
zacharyharris

Latest Threads

Top