First release of Shed Skin, a Python-to-C++ compiler.

M

Mark Dufour

After nine months of hard work, I am proud to introduce my baby to the
world: an experimental Python-to-C++ compiler. It can convert many
Python programs into optimized C++ code, without any user intervention
such as adding type declarations. It uses rather advanced static type
inference techniques to deduce type information by itself. In
addition, it determines whether deduced types may be parameterized,
and if so, it generates corresponding C++ generics. Based on deduced
type information, it also attempts to convert heap allocation into
stack and static preallocation (falling back to libgc in case this
fails.)

The compiler was motivated by the belief that in many cases it should
be possible to automatically deduce C++ versions of Python programs,
enabling users to enjoy both the productivity of Python and the
efficiency of C++. It works best for Python programs written in a
relatively static C++-style, in essence enabling users to specify C++
programs at a higher level.

At the moment the compiler correctly handles 124 unit tests, six of
which are serious programs of between 100 and 200 lines:

-an othello player
-two satisfiability solvers
-a japanese puzzle solver
-a sudoku solver
-a neural network simulator

Unfortunately I am just a single person, and much work remains to be
done. At the moment, there are several limitations to the type of
Python programs that the compiler accepts. Even so, there is enough of
Python left to be able to remain highly productive in many cases.
However, for most larger programs, there are probably some minor
problems that need to be fixed first, and some external dependencies
to be implemented/bridged in C++.

With this initial release, I hope to attract other people to help me
locate remaining problems, help implement external dependencies, and
in the end hopefully even to contribute to the compiler itself. I
would be very happy to receive small programs that the compiler does
or should be able to handle. If you are a C++ template wizard, and you
would be interested in working on the C++ implementation of builtin
types, I would also love to get in contact with you. Actually, I'd
like to talk to anyone even slightly interested in the compiler, as
this would be highly motivating to me.

The source code is available at the following site. Please check the
README for simple installation/usage instructions. Let me know if you
would like to create ebuild/debian packages.

Sourceforge site: http://shedskin.sourceforge.net
Shed Skin blog: http://shed-skin.blogspot.com

Should you reply to this mail, please also reply to me directly. Thanks!


Credits

Parts of the compiler have been sponsored by Google, via its Summer of
Code program. I am very grateful to them for keeping me motivated
during a difficult period. I am also grateful to the Python Software
Foundation for chosing my project for the Summer of Code. Finally, I
would like to thank my university advisor Koen Langendoen for guiding
this project.


Details

The following describes in a bit more detail various aspects of the
compiler. Before seriously using the compiler, please make sure to
understand especially its limitations.

Main Features

-very precise, efficient static type inference (iterative object
contour splitting, where each iteration performs the cartesian product
algorithm)
-stack and static pre-allocation (libgc is used as a fall-back)
-support for list comprehensions, tuple assignments, anonymous funcs
-generation of arbitrarily complex class and function templates
(even member templates, or generic, nested list comprehensions)
-binary tuples are internally analyzed
-some understanding of inheritance (e.g. list(dict/list) becomes
list<pyiter<A>>)
-hierarchical project support: generation of corresponding C++
hierarchy, including (nested) Makefiles; C++ namespaces
-annotation of source code with deduced types
-builtin classes, functions (enumerate, sum, min, max, range, zip..)
-polymorphic inline caches or virtual vars/calls (not well tested)
-always unbox scalars (compiler bails out with error if scalars are
mixed with pointer types)
-full source code available under the MIT license

Main Limitations/TODO's

-Windows support (I don't have Windows, sorry)
-reflection (getattr, hasattr), dynamic inheritance, eval, ..
-mixing scalars with pointer types (e.g. int and None in a single variable)
-mixing unrelated types in single container instance variable other
than tuple-2
-holding different types of objects in tuples with length >2;
builtin 'zip' can only take 2 arguments.
-exceptions, generators, nested functions, operator overloading
-recursive types (e.g. a = []; a.append(a))
-expect some problems when mixing floats and ints together
-varargs (*x) are not very well supported; keyword args are not supported yet
-arbitrary-size arithmetic
-possible non-termination ('recursive customization', have not
encountered it yet)
-profiling will be required for scaling to very large programs
-combining binary-type tuples with single-type tuples (e.g. (1,1.0)+(2,))
-unboxing of small tuples (should form a nice speedup)
-foreign code has to be modeled and implemented/bridged in C++
-some builtins are not implemented yet, e.g. 'reduce' and 'map'
 
P

Paul Rubin

Mark Dufour said:
After nine months of hard work, I am proud to introduce my baby to the
world: an experimental Python-to-C++ compiler.

Wow, looks really cool. But why that instead of Pypy?
 
C

Carl Friedrich Bolz

Hi!

adDoc's networker Phil said:
. pypy compiles to llvm (low-level virtual machine) bytecode
which is obviously not as fast as the native code coming from c++ compilers;

PyPy can currently compile Python code to C code and to LLVM bytecode.
Note that even for LLVM bytecode the argument is void since LLVM
(despite its name, which might lead one to think that it is Java-like)
compiles its bytecode to native assembler.
but the primary mission of pypy
is just having a python system that is
written in something like python rather than c or c++

it's really just plain python (it completely runs on top of CPython
after all) together with some restrictions -- which seem similar to the
restictions that shedskin imposes btw.
. there is no reason why the pypy project can't have a .NET architecture
instead of the java-like arrangement I assume it has now

Sorry, I can't really follow you here. In what way does PyPy have a
Java-like arrangement?
. without such a pypy.NET system,
shedskin is offering a service that pypy can't yet provide:
a ( python -> c++ )-conversion allows me to
smoothly integrate python contributions
with my already-staggering c++ library
. I'm not suggesting that pypy should be another
Mono rewritten in python,
because the essential mission of the .NET architecture
is being able to compile
any language of the user`s choice,
to some intermediate language designed to be
far more efficiently compiled to
any machine language of the user`s choice
than any human-readable language such as c++
. perhaps llvm bytecode can serve as such an intermediate language?
then llvm could be the new c++ (our defacto IL (intermediate language))
and shedskin (python -> IL=c++) could then be replaced by
the combination of pypy (python -> IL=llvm)
and some incentive for all target platforms
to develope a highly optimized
( llvm -> native code)-compiler
-- assuming also, that there is available
a highly optimized ( c++ -> llvm bytecode )-compiler .

there is. look at the LLVM page for details: www.llvm.org


Cheers,

Carl Friedrich Bolz
 
P

Paul Boddie

Carl said:
Sorry, I can't really follow you here. In what way does PyPy have a
Java-like arrangement?

I imagine that this remark was made in reference to the just-in-time
compilation techniques that PyPy may end up using, although I was under
the impression that most CLR implementations also use such techniques
(and it is possible to compile Java to native code as gcj proves).

But on the subject of LLVM: although it seems like a very interesting
and versatile piece of software, it also seems to be fairly difficult
to build; my last attempt made the old-style gcc bootstrapping process
seem like double-clicking on setup.exe. Does this not worry the PyPy
team, or did I overlook some easier approach? (Noting that a Debian
package exists for LLVM 1.4 but not 1.5.)

Paul
 
M

Michael Sparks

Mark said:
With this initial release, I hope to attract other people to help me
locate remaining problems,

Well, you did say you want help with locating problems. One problem with
this is it doesn't build...


If I try and build (following your instructions), I get presented with a
whole slew of build errors - knock on errors from the first few:

In file included from builtin_.cpp:1:
builtin_.hpp:4:29: gc/gc_allocator.h: No such file or directory
builtin_.hpp:5:23: gc/gc_cpp.h: No such file or directory
In file included from builtin_.cpp:1:
builtin_.hpp:89: error: syntax error before `{' token
builtin_.hpp:93: error: virtual outside class declaration

Which C++ libraries are you dependent on? (Stating this would be really
useful, along with specific versions and if possible where you got them :)

For reference, I'm building this on SuSE 9.3, under which I also have
boehm-gc-3.3.5-5 installed. I suspect you're using the same gc library
(having downloaded libgc from sourceforge and finding the includes don't
match the above include names) but a different version. For reference this
version/distribution of boehm-gc has the following file structure:

/usr/include/gc.h
/usr/include/gc_backptr.h
/usr/include/gc_config_macros.h
/usr/include/gc_cpp.h
/usr/include/gc_local_alloc.h
/usr/include/gc_pthread_redirects.h
/usr/lib/libgc.a
/usr/lib/libgc.la
/usr/lib/libgc.so
/usr/lib/libgc.so.1
/usr/lib/libgc.so.1.0.1

It's specifically the gc_cpp.h file that makes me suspect it's the same gc.

Regards,


Michael.
 
C

Carl Friedrich Bolz

Hi Paul!

Paul said:
I imagine that this remark was made in reference to the just-in-time
compilation techniques that PyPy may end up using, although I was under
the impression that most CLR implementations also use such techniques
(and it is possible to compile Java to native code as gcj proves).

Well, PyPy is still quite far from having a JIT build in. Plus the
JIT-techniques will probably differ quite a bit from Java _and_ the CLR :).
But on the subject of LLVM: although it seems like a very interesting
and versatile piece of software, it also seems to be fairly difficult
to build; my last attempt made the old-style gcc bootstrapping process
seem like double-clicking on setup.exe. Does this not worry the PyPy
team, or did I overlook some easier approach? (Noting that a Debian
package exists for LLVM 1.4 but not 1.5.)

We are not that worried about this since

a) building LLVM is not _that_ bad (you don't need to build the
C-frontend, which is the really messy part) and

b) the LLVM-backend is one of the more experimental backends we have
anyway (in fact, we have discovered some bugs in LLVM with PyPy
already). Since the C backend is quite stable we are not dependent
solely on LLVM so this is not too big a problem. Note that this doesn't
mean that the LLVM backend is not important: it's the only other backend
(apart from the C one) that can succesfully translate the whole PyPy
interpreter.

Cheers,

Carl Friedrich
 
P

Paul Boddie

Michael said:
Well, you did say you want help with locating problems. One problem with
this is it doesn't build...

I found that I needed both the libgc and libgc-dev packages for my
Kubuntu distribution - installing them fixed the include issues that
you observed - and it does appear to be the Boehm-Demers-Weiser GC
library, yes. The only other issue I observed was the importing of the
profile and pstats modules which don't exist on my system, but those
imports seemed to be redundant and could be commented out anyway.

Paul
 
A

A.B., Khalid

Mark said:
After nine months of hard work, I am proud to introduce my baby to the
world: an experimental Python-to-C++ compiler.

Good work.

I have good news and bad news.

First the good ShedSkin (SS) more or less works on Windows. After
patching gc6.5 for MinGW, building it, and testing it on WinXP with
some succuess, and after patching my local copy of SS, I can get the
test.py to compile from Python to C++, and it seems that I can get
almost all the unit tests in unit.py to pass.

Here is what I used:


1. shedskin-0.0.1

2. pyMinGW patched and MinGW compiled Python 2.4.1 from CVS:
Python 2.4.1+ (#65, Aug 31 2005, 22:34:14)
[GCC 3.4.4 (mingw special)] on win32
Type "help", "copyright", "credits" or "license" for more information.

3. MinGW 3.4.4:
g++ -v
Reading specs from
e:/UTILIT~1/PROGRA~1/MINGW/BIN/../lib/gcc/mingw32/3.4.4/specs
Configured with: ../gcc/configure --with-gcc --with-gnu-ld
--with-gnu-as --host=mingw32 --target=mingw32 --prefix=/mingw
--enable-threads --disable-nls
--enable-languages=c,c++,f77,ada,objc,java --disable-win32-registry
--disable-shared --enable-sjlj-exceptions --enable-libgcj
--disable-java-awt --without-x --enable-java-gc=boehm
--disable-libgcj-debug --enable-interpreter
--enable-hash-synchronization --enable-libstdcxx-debug
Thread model: win32
gcc version 3.4.4 (mingw special)


4. Also using:
- mingw-runtime 3.8
- w32api-3.3
- binutils-2.16.91-20050827-1
- gc6.5 (Bohem GC) locally patched



Now the bad news. Four tests in Unit.py fail, brief output is as
follows[1].

[SKIP 19532 lines]
*** tests failed: 4
[(60, '__class__ and __name__ attributes'), (85, 'ifa: mixing strings
and lists of strings in the same list'), (122, 'neural network
simulator XXX later: recursive customization, plus some small fixes'),
(124, 'small factorization program by Rohit Krishna Kumar')]


Moreover, and since the GC system you used only works in "recent
versions of Windows", it follows that this solution will not work in
all versions. I tested it on Win98 and both GC tests and SS's unit.py
tests crash; although SS can still seem to compile the tests to C++.

At any rate, if anyone is interested in the patches they can be
downloaded from [2].


Regards,
Khalid


[1] The entire output of unit.py can also be found at [2]
[2] http://jove.prohosting.com/iwave/ipython/Patches.html
 
M

Michael Sparks

Paul said:
I found that I needed both the libgc and libgc-dev packages for my
Kubuntu distribution - installing them fixed the include issues that
you observed - and it does appear to be the Boehm-Demers-Weiser GC
library, yes. The only other issue I observed was the importing of the
profile and pstats modules which don't exist on my system, but those
imports seemed to be redundant and could be commented out anyway.

Mark's also let me know this. Part of the problem is the version in SuSE 9.3
of the GC used is ancient - it should be version 6.5 onwards. Also for
people compiling from source you (at minimum) should be using the
configure line along the lines of:

./configure --enable-cplusplus

If you don't, you get build problems because one of the needed libraries
isn't built by default.

I started off with the obvious "hello world" type program :

----------------------------------------
print "GAME OVER"
----------------------------------------

Which compiled cleanly and worked as expected. I then read Mark's short
paper linked from his blog "Efficient Implementation of Modern Imperative
Languages; Application to Python", and got concerned by the comments:

"""We have decided not to investigate two types of features: [...snip...];
and those features that may be turned off without affecting correct
programs, e.g. array bounds checking, and exceptions"""

That set some alarm bells ringing, largely because LBYL being deprecated by
many people in favour of exceptions based code. (And more to the point,
widely used as a result)

As a result, I tried a trivial, but obvious program that should have clear
behaviour:

----------------------------------------
x = []
print "GAME OVER"

x.append(5)
print x[0]
try:
print x[1]
print "This should never be seen..."
except IndexError:
print "It's OK, we caught it..."
----------------------------------------

This compiles, but unfortunately has the following behaviour:

GAME OVER
5
0
This should never be seen...
It's OK, we caught it...

Obviously, neither the 0 nor the message following should have been
displayed. It's a pity that this assumption was made, but given the short
time the project's been going I can understand it, hopefully Mark will
continue towards greater python compliance :)



Michael.
 
P

Paul Boddie

Carl said:
a) building LLVM is not _that_ bad (you don't need to build the
C-frontend, which is the really messy part)

That piece of wisdom must have passed me by last time, when I probably
heeded the scary warning from the configure script and made the mistake
of getting the C front end. This time, the build process was virtually
effortless, and I'll now have to investigate LLVM further.

Thanks for the tip!

Paul
 
B

beliavsky

Good work.

I have good news and bad news.

First the good ShedSkin (SS) more or less works on Windows. After
patching gc6.5 for MinGW, building it, and testing it on WinXP with
some succuess, and after patching my local copy of SS, I can get the
test.py to compile from Python to C++, and it seems that I can get
almost all the unit tests in unit.py to pass.

I am reluctant to attempt an arduous installation on Windows, but if
Mr. Dufour or someone else could create a web site that would let you
paste in Python code and see a C++ translation, I think this would
expand the user base. Alternatively, a Windows executable would be
nice.
 
L

Luis M. Gonzalez

This is great news. Congratulations!

By the way, I read in your blog that you would be releasing a windows
intaller soon.
Have you, or anyone else, managed to do it?

Cheers,
Luis
 
M

Mark Dufour

By the way, I read in your blog that you would be releasing a windows
intaller soon.
Have you, or anyone else, managed to do it?

I just finished making a 20 MB (!) package for Windows XP (I'm not
sure which older versions of Windows it will run on.) It includes the
Boehm garbage collector and a C++ compiler (MingW), which hopefully
will make it really easy to create executables. However, I'm not
releasing it until somebody with XP can test it for me :) If you'd
like to try what I have so far, please download
http://kascade.org/shedskin-0.0.2.zip, unzip it and follow some simple
steps in the README file. I would really like to know about anything
that doesn't work, or is unclear!

BTW, I also fixed all OSX problems, but I'm waiting for a friend to
give it a final test.

What kind of program would you like to compile?


thanks!
mark.
 
A

A.B., Khalid

Mark said:
I just finished making a 20 MB (!) package for Windows XP (I'm not
sure which older versions of Windows it will run on.) It includes the
Boehm garbage collector and a C++ compiler (MingW), which hopefully
will make it really easy to create executables. However, I'm not
releasing it until somebody with XP can test it for me :) If you'd
like to try what I have so far, please download
http://kascade.org/shedskin-0.0.2.zip, unzip it and follow some simple
steps in the README file. I would really like to know about anything
that doesn't work, or is unclear!

BTW, I also fixed all OSX problems, but I'm waiting for a friend to
give it a final test.

What kind of program would you like to compile?


thanks!
mark.


Here is the very end of a very long output of unit.py run in Python
2.4.1 on WinXP Pro SP2:

[generating c++ code..]
*** compiling & running..
rm test.o test.exe
g++ -O3 -IG:/Downloads/Temp/ss2/shedskin -c test.cpp
g++ -O3 -IG:/Downloads/Temp/ss2/shedskin test.o
G:/Downloads/Temp/ss2/shedskin/libss.a -lgc -o test
output:
[3, 3, 3, 1097, 70201]

*** success: small factorization program by Rohit Krishna Kumar 124
*** no failures, yay!


:)

Well done. So what was causing that crash in test '__class__ and
__name__ attributes' after all?

I'll also try to test it on Win98.

Regards,
Khalid
 
M

Mark Dufour

*** success: small factorization program by Rohit Krishna Kumar 124
*** no failures, yay!


:)

Well done. So what was causing that crash in test '__class__ and
__name__ attributes' after all?

Well, I did something like this:

class_ c(..);
class_ *cp = &c;

class list {
list() {
this->class = cp;
}
}

constant_list = new list(..);

Now, depending on the order of things, I think this->class became
somewhat undefined. In any case, putting all initializations in the
right order in an initialization function called from main() fixed the
problem.

The problem with test 85 was that it should not actually be passed to
g++, since it is indeed incorrect code :) However, because of some
bug in unit.py, on sys.platform 'win32' g++ would always be called.

Thanks again for your help. Your Makefile made it very easy for me to
create a Windows package. I'm glad to learn about Mingw too.. very
nice.
I'll also try to test it on Win98.

I think you said the GC wouldn't work in this case..? Anyway, I've had
my share of Windows for a while.. I would be really glad if somebody
else could look into this.. :)

Have you tried compiling any code of your own yet..?


thanks!
mark.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,969
Messages
2,570,161
Members
46,710
Latest member
bernietqt

Latest Threads

Top