Shed Skin Python-to-C++ Compiler 0.0.21, Help needed

M

Mark Dufour

Hi all,

I have recently released version 0.0.20 and 0.0.21 of Shed Skin, an
optimizing Python-to-C++ compiler. Shed Skin allows for translation of
pure (unmodified), implicitly statically typed Python programs into
optimized C++, and hence, highly optimized machine language. Besides
many bug fixes and optimizations, these releases add the following
changes:

-support for 'bisect', 'collections.deque' and 'string.maketrans'
-improved 'copy' support
-support for 'try, else' construction
-improved error checking for dynamic types
-printing of floats is now much closer to CPython

For more details about Shed Skin and a collection of 27 programs, at a
total of about 7,000 lines, that it can compile (resulting in an
average speedup of about 39 times over CPython and 11 times over Psyco
on my computer), please visit the homepage at:

http://mark.dufour.googlepages.com

I could really use more help it pushing Shed Skin further. Simple ways
to help out, but that can save me lots of time, are to find smallish
code fragments that Shed Skin currently breaks on, and to help
improve/optimize the (C++) builtins and core libraries. I'm also
hoping someone else would like to deal with integration with CPython
(so Shed Skin can generate extension modules, and it becomes easier to
use 'arbitrary' external CPython modules such as 're' and 'pygame'.)
Finally, there may be some interesting Master's thesis subjects in
improving Shed Skin, such as transforming heap allocation into stack-
and static preallocation, where possible, to bring performance even
closer to manual C++. Please let me know if you are interested in
helping out, and/or join the Shed Skin mailing list.


Thanks!
Mark Dufour.
 
B

Bjoern Schliessmann

Mark said:
Shed Skin allows for translation of pure (unmodified), implicitly
statically typed Python programs into optimized C++, and hence,
^^^^^
highly optimized machine language.
^^^^^^^^^^^^^^^^

Wow, I bet all C++ compiler manufacturers would want you to work for
them.

Regards,


Björn
 
S

skip

Bjoern> ^^^^^^^^^^^^^^^^

Bjoern> Wow, I bet all C++ compiler manufacturers would want you to work
Bjoern> for them.

Why are you taking potshots at Mark? He's maybe onto something and he's
asking for help. If he can generate efficient C++ code from implicitly
statically type Python it stands to reason that he can take advantage of the
compiler's optimization facilities.

Skip
 
B

Bjoern Schliessmann

Why are you taking potshots at Mark?

What suggests that I'm "taking potshots" at Mark?
He's maybe onto something and he's asking for help. If he can
generate efficient C++ code from implicitly statically type Python
it stands to reason that he can take advantage of the compiler's
optimization facilities.

Yes, compilers do output optimized machine code. But generally
calling that code "highly optimized" is, IMHO, exaggeration.

Regards,


Björn
 
?

=?iso-8859-1?q?Luis_M._Gonz=E1lez?=

^^^^^> highly optimized machine language.

^^^^^^^^^^^^^^^^

Wow, I bet all C++ compiler manufacturers would want you to work for
them.

Regards,

Björn


Mark has been doing an heroic job so far.
Shedskin is an impressive piece of software and, if pypy hadn't been
started some time ago, it should have gotten more attention from the
community.
I think he should be taken very seriously.

He is the first programmer I know who actually released working
code(and a lot of it) of a project that actually manages to speed up
python by a large margin, by means of advanced type inference
techniques.
Other people, in the past, have attended conferences and made
spectacular announcements of projects that could speed up python by
60x or more, but never ever released any code.

Mark has been working quietly for a long time, and his works deserves
a lot of credit (and hopefully, some help).
 
A

Alexander Schmolck

Luis M. González said:
Mark has been doing an heroic job so far.
Shedskin is an impressive piece of software and, if pypy hadn't been
started some time ago, it should have gotten more attention from the
community.

Regardless of its merrits, it's GPL'ed which I assume is an immediate turn-off
for many in the community.

'as
 
P

Paul Boddie

Alexander said:
Regardless of its merrits, it's GPL'ed which I assume is an immediate turn-off
for many in the community.

In the way that tools such as gcc are GPL-licensed, or do you have
something else in mind?

Paul
 
P

Paul McGuire

Regardless of its merrits, it's GPL'ed which I assume is an immediate turn-off
for many in the community.

Why would that be? GPL'ed code libraries can be a turn-off for those
who want to release commercial products using them, but a GPL'ed
utility such as a compiler bears no relationship or encumbrance on the
compiled object code it generates.

-- Paul
 
P

Paul Rubin

Paul McGuire said:
Why would that be? GPL'ed code libraries can be a turn-off for those
who want to release commercial products using them, but a GPL'ed
utility such as a compiler bears no relationship or encumbrance on the
compiled object code it generates.

For some of us, doing volunteer work on non-GPL projects is a
turn-off. I don't mind writing code that goes into proprietary
products, but I expect to get paid for it just like the vendors of the
products expect to get paid. If I'm working for free I expect the
code to stay free. This is why I don't contribute code to Python on
any scale.
 
B

Bjoern Schliessmann

Luis said:
I think he should be taken very seriously.

Agreed.

Okay, it seems focusing a discussion on one single point is
difficult for many people. Next time I'll be mind-bogglingly clear
that even the last one understands after reading it one time ...

Regards,


Björn

Fup2 p
 
?

=?iso-8859-1?q?Luis_M._Gonz=E1lez?=

Agreed.

Okay, it seems focusing a discussion on one single point is
difficult for many people. Next time I'll be mind-bogglingly clear
that even the last one understands after reading it one time ...

Regards,

Björn

Fup2 p


Bjoern,

I understood what you said. It's just that it seemed that you were
mocking at the poster's message.
I apologize if that wasn't your intention.

Luis
 
D

Dennis Lee Bieber

Why would that be? GPL'ed code libraries can be a turn-off for those
who want to release commercial products using them, but a GPL'ed
utility such as a compiler bears no relationship or encumbrance on the
compiled object code it generates.
Take that up with ACT... GNAT 3.15p was explicitly unencumbered, but
the current version of GNAT, in the GPL (no-service contract) form has
gone the other direction, claiming that executables must be released
GPL.
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
M

Michael Torrie

Take that up with ACT... GNAT 3.15p was explicitly unencumbered, but
the current version of GNAT, in the GPL (no-service contract) form has
gone the other direction, claiming that executables must be released
GPL.

The no-service contract version of the GPL is not the same as the
standard GPLv2. Ordinarily the GPLv2 does not apply to the output of
the program unless the license specifies that it does (a modification or
addendum). Thus the output of a program is not part of the GPL usually,
unless specified MySQL's take on the GPLv2 without an addendum is
mistaken, in my opinion. However, copyright law probably still applies
to the programs output regardless of license, but in what way I don't
think the courts have ever specified, given that the output depends
largely on the input. GCC, Bison, and Flex, all explicitly state that
the output of the program is not under any license, and is your own
property. Perhaps the author of Shed Skin could make a note in the
license file to clarify the state of the output of his program.

There should be no problem with this Shed Skin program being under the
GPL and using it with python scripts that are not under the GPL. But if
you have any concern with a copyright license at all, you should consult
your lawyer. Too many companies see GPL'd programs as a free ride, not
willing to accept that they need a copyright license to use the code
just as they would with any code from any source. It's sad to see
because free software gets an unfair bad rap because of the greed of
others. On the other hand, others take an overly paranoid view of the
GPL and pretend it is viral and somehow magically infects your code with
the GPL license, which is false--if you use GPL'd code in your non GPL'd
application then you are in a copyright violation situation and your
only options are to either GPL your code or remove the offending GPL'd
source from your code and write your own dang code, thank you very much.
 
P

Paul Rubin

Michael Torrie said:
The no-service contract version of the GPL is not the same as the
standard GPLv2.

I don't see how that can be--we're talking about a GCC-based compiler,
right?
 
M

Michael Torrie

I don't see how that can be--we're talking about a GCC-based compiler,
right?

Well, that's beside the point anyway. The output of a program is beyond
the scope of the source code license for the program. However the
default is for the output to be copyrighted the author. Thus the author
of a program is free to say (give license, in other words) that the
output of a program can distributed. The real point is the Shed Skin
author can both license the program under the GPLv2 and also say that
the output from his program is not bound by any license. There's no
conflict unless the author of Shed Skin wants there to be. Worst case,
if indeed the GPLv2 says it covers the output of the program (which I
don't believe it does), copyright law still trumps everything and the
author is free at add an exemption to the license if he chooses, which
is what I've seen done with Bison. Bison is also a special case because
the output of bison contains code fragments that are part of the bison
source code itself, which is under the GPL. Thus a special exception
had to be made in this case.

Anyway, the only real point is that if there is a concern about the
copyright and licensing of the output of Shed Skin, then we merely need
to ask the author of it to clarify matters and move on with life. With
the exception of GNAT, to date no GPL'd compiler has ever placed a GPL
restriction on its output. Whether this is explicit or implicit doesn't
matter, so long as it's there.
 
M

Michael Torrie

I don't see how that can be--we're talking about a GCC-based compiler,
right?

I found the real reason why the GPL'd GNAT compiler's produced
executables are required to be GPL'd, and it has nothing to do with the
license of the compiler:

"What is the license of the GNAT GPL Edition?
Everything (tools, runtime, libraries) in the GNAT GPL Edition is
licensed under the General Public License (GPL). This ensures that
executables generated by the GNAT GPL Edition are Free Software and that
source code is made available with the executables, giving the freedom
to recepients to run, study, modify, adapt, and redistribute sources and
execuatbles under the terms of the GPL."[1]

Note that it says the runtime *and* the libraries are GPL. Thus the
linking clause in the GPL requires that programs that link against them
(the executable in other words) must be GPL'd. Note that GLibC, while
being GPL, has an exception clause in it, allowing linking to it by code
of any license.

Hence it's a red herring as far as the discussion and Shed Skin is
concerned, although the licensing of any Shed Skin runtime libraries
should be a concern to folks.

[1] https://libre.adacore.com/
 
J

John Nagle

Mark said:
Hi all,

I have recently released version 0.0.20 and 0.0.21 of Shed Skin, an
optimizing Python-to-C++ compiler. Shed Skin allows for translation of
pure (unmodified), implicitly statically typed Python programs into
optimized C++, and hence, highly optimized machine language. Besides
many bug fixes and optimizations, these releases add the following
changes:
>
I'm also
hoping someone else would like to deal with integration with CPython
(so Shed Skin can generate extension modules, and it becomes easier to
use 'arbitrary' external CPython modules such as 're' and 'pygame'.)

Reusing precompiled external modules will be tough. Even
CPython has trouble with that. But that's just a conversion
problem. Maybe SWIG (yuck, but it exists) could be persuaded
to cooperate.

For regular expressions, here's an implementation, in C++,
of Python-like regular expressions.

http://linuxgazette.net/issue27/mueller.html

That might be a way to get a regular expression capability into
Shed Skin quickly.
Finally, there may be some interesting Master's thesis subjects in
improving Shed Skin, such as transforming heap allocation into stack-
and static preallocation, where possible, to bring performance even
closer to manual C++. Please let me know if you are interested in
helping out, and/or join the Shed Skin mailing list.

Find out where the time is going before spending it on that.

A good test: BeautifulSoup. Many people use it for parsing
web pages, and it's seriously compute-bound.

John Nagle
 
K

Kay Schluehr

Mark has been doing an heroic job so far.
Shedskin is an impressive piece of software and, if pypy hadn't been
started some time ago, it should have gotten more attention from the
community.
I think he should be taken very seriously.

Indeed. The only serious problem from an acceptance point of view is
that Mark tried to solve the more difficult problem first and hung on
it. Instead of integrating a translator/compiler early with CPython,
doing some factorization of Python module code into compilable and
interpretable functions ( which can be quite rudimentary at first )
together with some automatically generated glue code and *always have
a running system* with monotone benefit for all Python code he seemed
to stem an impossible task, namely translating the whole Python to C++
and created therefore a "lesser Python". I do think this is now a well
identified anti-pattern but nothing that can't be repaired in this
case - from what I understand. However, speaking on my part, I don't
make my hands dirty with C++ code unless I get paid *well* for it.
This is like duplicating my job in my sparetime. No go. Otherwise it
wouldn't be a big deal to do what is necessary here and even extend
the system with perspective on Py3K annotations or other means to ship
typed Python code into the compiler.
 
M

mark.dufour

Anyway, the only real point is that if there is a concern about the
copyright and licensing of the output of ShedSkin, then we merely need
to ask the author of it to clarify matters and move on with life. With
the exception of GNAT, to date no GPL'd compiler has ever placed a GPL
restriction on its output. Whether this is explicit or implicit doesn't
matter, so long as it's there.

it's fine if people want to create non-GPL software with Shed Skin. it
is at least my intention to only have the compiler proper be GPL
(LICENSE states that the run-time libraries are BSD..)


mark dufour (Shed Skin author).
 
J

John Nagle

Kay said:
Indeed. The only serious problem from an acceptance point of view is
that Mark tried to solve the more difficult problem first and hung on
it. Instead of integrating a translator/compiler early with CPython,
doing some factorization of Python module code into compilable and
interpretable functions ( which can be quite rudimentary at first )
together with some automatically generated glue code and *always have
a running system* with monotone benefit for all Python code he seemed
to stem an impossible task, namely translating the whole Python to C++
and created therefore a "lesser Python".

Trying to incrementally convert an old interpreter into a compiler
is probably not going to work.
Otherwise it
wouldn't be a big deal to do what is necessary here and even extend
the system with perspective on Py3K annotations or other means to ship
typed Python code into the compiler.

Shed Skin may be demonstrating that "annotations" are unnecessary
cruft and need not be added to Python. Automatic type inference
may be sufficient to get good performance.

The Py3K annotation model is to some extent a repeat of the old
Visual Basic model. Visual Basic started as an interpreter with one
default type, which is now called Variant, and later added the usual types,
Integer, String, Boolean, etc., which were then manually declared.
That's where Py3K is going. Shed Skin may be able to do that job
automatically, which is a step forward and more compatible with
existing code. Doing more at compile time means doing less work
at run time, where it matters. This looks promising.

John Nagle
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,240
Members
46,830
Latest member
HeleneMull

Latest Threads

Top