Hello,
I'm using Python for a while now and I love it. There is just one thing I cannot understand. There are compilers for languages like C and C++. why is it impossible to create a compiler that can compile Python code to machinecode?
My reasoning is as follows:
When GCC compiles a program written in C++, it simply takes that code and decides what instructions that would mean for the computer's hardware. What does the CPU need to do, what does the memory need to remember, etc. etc. If you can create this machinecode from C++, then I would suspect that it should also be possible to do this (without a C-step in between) for programs written in Python.
Where is my reasoning wrong here? Is that because Python is dynamically typed? Does machinecode always need to know whether a variable is an int or a float? And if so, can't you build a compiler which creates machinecode that can handle both ints and floats in case of doubt? Or is it actually possible to do, but so much work that nobody does it?
I googled around, and I *think* it is because of the dynamic typing, but I really don't understand why this would be an issue..
Any insights on this would be highly appreciated!
Sure, python could be compiled into machine code. But what machine? Do
you refer to the hardware inside one of the Pentium chips? Sorry, but
Intel doesn't expose those instructions to the public. Instead, they
wrote a microcode interpreter, and embedded it inside their processor,
and the "machine languages" that are documented as the Pentium
Instruction sets are what that interpreter handles. Good thing too, as
the microcode machine language has changed radically over time, and I'd
guess there have been at least a dozen major variants, and a hundred
different sets of details.
So if we agree to ignore that interpreter, and consider the externally
exposed machine language, we can pick a subset of the various such
instruction sets, and make that our target.
Can Python be compiled directly into that instruction set? Sure, it
could. But would it be practical to write a compiler that went directly
to it, or is it simpler to target C, and use gcc?
Let's look at gcc. When you run it, does it look like it compiles C
directly to machine language? Nope. It has 3 phases (last I looked,
which was admittedly over 20 years ago). The final phase translates an
internal form of program description into a particular "machine
language". Even the mighty gcc doesn't do it in one step. Guess what,
that means other languages can use the same back end, and a given
language can use different back ends for different target machine
languages. (Incidentally, Microsoft C compiler does the exact same
thing, and a few of my patents involve injecting code between front end
and back end)
So now we have three choices. We could target the C language, and use
all of gcc, or we could target the intermediate language, and use only
the backend of gcc. Unfortunately, that intermediate language isn't
portable between compilers, so you'd either have to write totally
separate python compilers for each back end, or skip that approach, or
abandon total portability.
Well, we could write a Python compiler that targets an "abstract
intermediate language," which in turn gets translated into each of the
supported compiler's intermediate language. But that gets remarkably
close to just targeting C in the first place.
So how hard would it be just to directly target one machine language?
Not too bad if you didn't try to do any optimizations, or adapt to the
different quirks and missing features of the different implementations
of that machine language. But I expect what you got would be neither
smaller nor noticeably faster than the present system. Writing simple
optimizations that improve some things is easy. Writing great
optimizers that are also reliable and correct is incredibly hard. I'd
expect that gcc has hundreds of man years of effort in it.
Now, no matter which of these approaches you would take, there are some
issues. The tricky part is not being flexible between int and float
(and long, which is not part of the Intel machine instruction set), but
between an unlimited set of possible meanings for each operation. Just
picking on a+b, each class type that a and b might be can provide their
own __add__ and/or __radd__ methods. All those have to be searched for,
one has to be picked, and the code has to branch there. And that
decision, in general, has to be made at runtime, not by the compiler.
So by default, the code ends up being a twisted set of 4way
indirections, calls to dict lookups, and finally calling a function that
actually does an instruction or two of real work. Guess what, an
interpreter can store those details much more succinctly (code size),
and can run those choices nearly as quickly. So we're back to CPython.
Could it be improved? Sure, that's why there are multiple projects
which try to improve performance of the reference implementation. But
each project seems to get to the point where the early promise of
dozen-fold improvement dwindles down to a few times as fast, and not for
everything. There are lots of things that can be improved with static
analysis (so we're sure of the types of certain things), restricted
language (so the developer gives us extra clues). But that work is
nothing compared to what it would take to re-implement the equivalent of
the back ends of gcc.
Java works roughly the same way as Python, compiling to byte code files,
then interpreting them. The interpreter is given the fancy name
"virtual machine" because it really is an instruction set, one that
could have been interpreted by Intel in their internal microcode. But
they have their own history to stay compatible with. Look at the Merced
and how it's taken the world by storm (NOT).
But Java is much stricter about its byte code files, so each function is
much closer to machine level. Nearly all those Python indirections are
eliminated by the compiler (because it's not as dynamic a language), and
they do JIT compiling. The latter is why they're quick.