Multithread compute-bound programs on multiple CPUs are
how you get heavy number-crunching work done on multiprocessors.
In the scientific community, heavy CPU-bound tasks are either
parallelized using MPI and/or written in Fortran 90/95 and
parallelized using an expensive vectorizing compiler.
Of course, that's not something you use Python for, at least not
until it gets a real compiler.
That is also not correct:
1. Using Python does not change the complexity of the algorithm. Big-O
is still the same, and Big-O is still the main determinant of
performance.
2. I value my own time more than extra CPU cycles (and so does those
who pay my salary). If "Python is to slow", it is less expensive to
compensate by using more CPUs than using a less productive language
like Java or C++.
3. Only isolated bottlenecks really gain from being statically
compiled. These are usually very small parts of the program. They can
be identified with a profiler (intuition usually do not work very well
here) and rewritten in Pyrex, Fortran 95, C or assembly.
4. There is NumPy and SciPy, which can make Python fast enough for
most CPU-bound tasks.
http://www.scipy.org/PerformancePython
5. "Premature optimization is the root of all evil in computer
science." (Donald Knuth)
6. Pyrex (the compiler you asked for) does actually exist.
C and Fortran compilers can produce efficient code because they know
the type of each variable. We have do a Python compiler that can do
the same thing. It is called 'Pyrex' and extends Python with static
types. Pyrex can therefore produce code that are just as efficient as
hand-tuned C (see the link above). One can take the bad-performing
Python code, add type declarations to the variables that Pyrex needs
to generate efficient code (but all variables need not be declared),
and leave the rest to the compiler. But this is only required for very
small portions of the code. Transforming time-critical Python code to
Pyrex is child's play. "First make it work, then make it fast."
At the University of Oslo, the HPC centre has been running Python
courses for its clients. Python does not perform any worse than C or
Fortran, one just has to know (1) how to use it, (2) when to use it,
and (3) when not to use it.
99% of benchmarks showing bad performance with Python is due to
programmers not understanding which operations are expensive in
interpreted languages, and trying to use Python as if it were C++. The
typical example would be code that use a loop instead of using the
built-in function 'map' or a vectorized array expression with NumPy.
It's also the direction games are going.
I believe that is due to ignorance. Threads are implemented to be in
an idle blocking state 99% of the time.
The XBox 360 forced
game developers to go that way, since it's a 3-CPU shared memory
multiprocessor. That translates directly to multicore desktops
and laptops.
MPI works on SMPs.
MPI does not use threads on SMPs because it performs worse than using
multiple processes.