Lua is faster than Fortran???

S

sturlamolden

A number like "1.5 times faster" is meaningless without a specific
application and/or code section in mind. I'm pretty sure there are cases
where they are much faster than that, and there are cases where the net
gain is zero (or -0.x or whatever).

Here is what they say:

Benchmark CPython Unladen Change
2to3 25.13 s 24.87 s 1.01x faster
django 1.08 s 0.68 s 1.59x faster
html5lib 14.29 s 13.20 s 1.08x faster
nbody 0.51 s 0.28 s 1.84x faster
rietveld 0.75 s 0.55 s 1.37x faster
slowpickle 0.75 s 0.55 s 1.37x faster
slowspitfire 0.83 s 0.61 s 1.36x faster
slowunpickle 0.33 s 0.26 s 1.26x faster
spambayes 0.31 s 0.34 s 1.10x slower
 
S

Stefan Behnel

sturlamolden, 04.07.2010 19:10:
Here is what they say:

Benchmark CPython Unladen Change
2to3 25.13 s 24.87 s 1.01x faster
django 1.08 s 0.68 s 1.59x faster
html5lib 14.29 s 13.20 s 1.08x faster
nbody 0.51 s 0.28 s 1.84x faster
rietveld 0.75 s 0.55 s 1.37x faster
slowpickle 0.75 s 0.55 s 1.37x faster
slowspitfire 0.83 s 0.61 s 1.36x faster
slowunpickle 0.33 s 0.26 s 1.26x faster
spambayes 0.31 s 0.34 s 1.10x slower

Ok, so, which of those do you care about?

Stefan
 
P

Paul Rubin

D'Arcy J.M. Cain said:
Is that really true about LUA? I haven't looked that closely at it but
that paragraph probably turned off most people on this list to LUA.

I would say Lua focuses on implementation compactness; it's intended as
an embedded scripting interpreter. It's easy to sandbox and uses just
50k or so of memory. It's running in a lot of mobile phones, cameras,
etc. The language itself is nowhere near as featureful as Python and I
wouldn't want to use it for large scale development, but it appears
pretty good for what it was intended for.

Interestingly, it doesn't have lists or arrays. Its only container
structure is comparable to a Python dictionary. Arrays are just the
special case of dictionaries indexed by numbers. There is a little bit
of syntax sugar to help with that, but it's just the dict structure
underneath.

I wouldn't say it was all done by one person though, and in particular
I think LuaJIT was done by a different group than the main Lua developers.
 
L

Luis M. González

I was just looking at Debian's benchmarks. It seems LuaJIT is now (on
median) beating Intel Fortran!

C (gcc) is running the benchmarks faster by less than a factor of two.
Consider that Lua is a dynamically typed scripting language very
similar to Python.

LuaJIT also runs the benchmarks faster than Java 6 server, OCaml, and
SBCL.

I know it's "just a benchmark" but this has to count as insanely
impressive. Beating Intel Fortran with a dynamic scripting language,
how is that even possible? And what about all those arguments that
dynamic languages "have to be slow"?

If this keeps up we'll need a Python to Lua bytecode compiler very
soon. And LuaJIT 2 is rumoured to be much faster than the current...

Looking at median runtimes, here is what I got:

   gcc               1.10

   LuaJIT            1.96

   Java 6 -server    2.13
   Intel Fortran     2.18
   OCaml             3.41
   SBCL              3.66

   JavaScript V8     7.57

   PyPy             31.5
   CPython          64.6
   Perl             67.2
   Ruby 1.9         71.1

The only comfort for CPython is that Ruby and Perl did even worse.

You should read this thread: http://lambda-the-ultimate.org/node/3851
There, you'll see this subject discussed and explained at length.
Pay special attention to Mike Pall's comments (he is the creator of
Luajit) and his opinion about python and pypy.
You will read also about other projects, specially new javascript
engines such as Mozila's Tracemonkey (the authors participate in this
thread) and the pypy folks.
It is a very good read for anyone interested in the subject. Very
recommended!
Good luck!

Luis
 
S

Stefan Behnel

sturlamolden, 04.07.2010 21:44:
I have already said I don't care about unladen swallow.

What I meant, was: which of these benchmarks would have to be better to
make you care? Because your decision not to care seems to be based on
exactly these benchmarks.

Stefan
 
L

Luis M. González

You should read this thread:http://lambda-the-ultimate.org/node/3851
There, you'll see this subject discussed and explained at length.
Pay special attention to Mike Pall's comments (he is the creator of
Luajit) and his opinion about python and pypy.
You will read also about other projects, specially new javascript
engines such as Mozila's Tracemonkey (the authors participate in this
thread) and the pypy folks.
It is a very good read for anyone interested in the subject. Very
recommended!
Good luck!

Luis

To be more specific, check these comments on the above the above
suggested thread:
http://lambda-the-ultimate.org/node/3851#comment-57804
http://lambda-the-ultimate.org/node/3851#comment-57700

Luis
 
J

John Nagle

It's embarrassing that Javascript is now 9x faster than Python.
Javascript has almost all the dynamic problems that make CPython
slow, but they've been overcome in the Javascript JIT compiler.

Here's how the Javascript V8 system does it:

http://code.google.com/apis/v8/design.html

They get rid of unnecessary dictionary lookups for attributes by
automatically creating "hidden classes" which, in Python terms, use
"slots". If an attribute is added to an object, another hidden class
is created with that attribute. Each such class is hard-compiled
to machine code while the program is running. So attribute access
never requires a dictionary lookup. Adding a new, not-previously-used
attribute is a relatively expensive operation, but with most programs,
after a while all the new attributes have been added and the program
settles down to efficient operation.

That's in Google's Chrome browser right now.

The Unladen Swallow people should in theory be able to reach
that level of performance. (Both groups are employed at Google.
So their effectiveness will be compared.)

John Nagle
 
S

Stephen Hansen

They have managed to combine list and dict into one type (table) that
does the job of both.

You say "managed" as if it were some technical accomplishment, or that
the result is able to actually do the job of both: neither of these
assertions are true.

Have you actually *used* Lua? I quite like the language in certain
contexts where its appropriate, but if you've actually used it in real
code and found tables to be at all a substitute for *either*
dictionaries *or* lists, then I think somehow you've managed to actually
miss using either data structure in Python to any real extent, somehow.

Lua's tables are at very weak "dictionary-like" and "list-like" objects,
which indeed have been folded into one. To the detriment of both, at
least as far as they are an actual useful data structure.

You can't even get the equivalent of len(dict) in it: you have to
actually brute-force iterate the table and count manually. Even for a
purely array-like table with onlyn umbered indexes, #table can be
unreliable: its quite possible through normal list-like operations that
you perform on it, it can end up with holes where #table will fail.
Since it is *not* an list internally at *all*, but simply an associative
array with numbered indexes.

I could go on, and on, and on: but the fundamental *weakness* of Lua'
data types as data-structures is irrefutable, from its tables to strings
to numbers: they are incredibly weak on capabilities. You end up writing
all kinds of "library" functions to just do the normal things that
should be really easy to do.

Now, of course, there's really good reason why Lua is so simple in these
ways. Its entirely suitable for Lua as an embedded scripting language to
keep things very light, so it can be simple and fast. Good for Lua to
fill this niche superbly. But you can't start saying its simple
alternatives are at all comparable to Python's extremely rich and
capable data types.
And yes there are tuples.

No, there isn't. There's ways to create a tuple-like-thing which kind of
behaves like a braindead tuple, and functions have a positively bizarre
capability of returning more then one value (and accepting variable
values), so there's these points in the language where you have this
sort of Immutable Sequence, but its opaque until you unwrap it -- and
then it ceases to be.

That's not the same thing as having an immutable sequence that you can
store data in at your discretion, with a rich series of capabilities
that you can leverage.
There are no classes, but there are closures and other building blocks
that can be used to create any object-oriented type system

Not really.

Above, I spoke of tables as data structures, things just storing data.
But you're right, they are capable of more then that-- but you're
over-selling just how far that capability goes, by a long shot (and
underselling just how much work it takes to get it there).

Yes, tables have a limited series of 'hooks' that you can tie into to
alter their behavior, and through this you can simulate certain higher
order type systems as defined in other languages. In fact, lots of
people have done this: there's multiple "classy" libraries out there to
bring some kind of Class-or-Object-Oriented-Programming to Lua.

They work to varying degrees: but the hooks that Lua provides to tables
is still significantly lacking to really replace Python's
comprehensively dynamic object model, without a LOT of wasted cycles.

For example, while you can use __newindex to 'catch' someone setting a
new 'key' on a table, and and __index to replace or forward the actual
lookup, you can't actually capture someone trying to set a value to a
key which already exists. So, you end up having to do a full proxy
approach, where your table never actually stores anything directly
(except a reference to another hidden table), and so when someone goes
to set something you have to set it on the proxied object instead.
Because you can't let there ever be any real keys on the proxying /
external table.

So you're able to work around its lack a bit, to simulate something
/like/ what it means to have __setattr__.

But then you run into problems. There's no way now to iterate over the
table now, because pairs() will only return your internal hidden keys
that you're using to point to the proxied table (Although you can get
around this with some more complexity by hiding said key into a
closure-- but even then, you still can't iterate over the proxied
table's keys instead).

So what do you do? Well you go replace the global "pairs" function,
that's what you do! So it learns your particular style of Classness, and
interacts well. Hope you never use any third-party code which has even a
vaguely different kind of Classness. Alternately, you can just decide to
never use the standard idiom of iteration for your classes, and instead
one must always "for x in my_table:iter()" -- so now you have one kind
of code operating on 'real' tables, and a totally different kind
operating on your 'classy tables'.

And on and on. The point is, sure. Lua can sort of simulate different
OOP approaches, and Lua-folk are very adept at tweaking, twisting,
turning and ripping tables into all kinds of pseudo-data types that
match other paradigms, but its all hacks on top of hacks on top of
hacks. You end up with some *really* weird and/or messy code if you try.

Better to do Lua in Lua, instead of Python in Lua, or Java in Lua.

(just like
CLOS is defined by Lisp, not a part of the basic Lisp syntax). So I
imagine it would be possible to define an equivalent to the Python
type system in Lua, and compile Python to Lua. Lua can be compiled to
Lua byte code. Factoring Lua, out that means we should be able to
compile Python to Lua byte code.


--

Stephen Hansen
... Also: Ixokai
... Mail: me+list/python (AT) ixokai (DOT) io
... Blog: http://meh.ixokai.io/


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.10 (Darwin)

iQEcBAEBAgAGBQJMMPikAAoJEKcbwptVWx/l+aUIALYgpp7eilD2hCIl97gW/on9
ndNqIl124psO+CZVr/GwAPfk9GBdzd7BUOVNCKV3S7QetBD+PZ1aEe4jFKvEeH5g
pKLRgjg35FUCPKEmzH98gNu+VdAeIySquePaLOEX+AjdPI+SdoUrOaLsMdl8Wj18
JduBxzl4GosxbvqW4GkrOkEz9721BwjyMgvNhkMBodkl3X2hboNplf3QyPgJ9F3/
enYpRHQHESthzRaeGQePSLkG3EHgiQwaPxQIzMnzwftr8GlWyEWyZFmT8ivdzjVD
VjX8VbaRVpXausFreL1fAU3Ee9duoCiZk2R5Wq9woPNETHw2Z3PUdovQeqqPgGM=
=14eq
-----END PGP SIGNATURE-----
 
L

Luis M. González

    TheUnladenSwallowpeople should in theory be able to reach
that level of performance.  (Both groups are employed at Google.
So their effectiveness will be compared.)

                                John Nagle

No. Collin Winter said that they will never be as fast as Chrome's V8
or similar JS engines,
since they were created from scratch to be super fast above all else.
On the other hand, US is a project to enhance an existing interpreter,
carrying a lot of the burden of early design decisions.
 
S

sturlamolden

What I meant, was: which of these benchmarks would have to be better to
make you care? Because your decision not to care seems to be based on
exactly these benchmarks.

Those are the only one I've seen.
 
F

Felix

Well, I wish I did not have to use C, then :) For example, as a
contributor to numpy, it bothers me at a fundamental level that so
much of numpy is in C.

This is something that I have been thinking about recently. Python has
won quite a following in the scientific computing area, probably
especially because of great libraries such as numpy, scipy, pytables
etc. But it also seems python itself is falling further and further
behind in terms of performance and parallel processing abilities. Of
course all that can be fixed by writing C modules (e.g. with the help
of cython), but that weakens the case for using python in the first
place.
For an outsider it does not look like a solution to the GIL mess or a
true breakthrough for performance are around the corner (even though
there seem to be many different attempts at working around these
problems or helping with parts). Am I wrong? If not, what is the
perspective? Do we need to move on to the next language and loose all
the great libraries that have been built around python?

Felix
 
S

Stefan Behnel

Felix, 09.07.2010 05:39:
This is something that I have been thinking about recently. Python has
won quite a following in the scientific computing area, probably
especially because of great libraries such as numpy, scipy, pytables
etc. But it also seems python itself is falling further and further
behind in terms of performance and parallel processing abilities.

Well, at least its "parallel processing abilities" are quite good actually.
If you have really large computations, they usually run on more than one
computer (not just more than one processor). So you can't really get around
using something like MPI, in which case an additional threading layer is
basically worthless, regardless of the language you use. For computations,
threading keeps being highly overrated.

WRT a single machine, you should note that GPGPUs are a lot faster these
days than even multi-core CPUs. And Python has pretty good support for
GPUs, too.

Of course all that can be fixed by writing C modules (e.g. with the help
of cython), but that weakens the case for using python in the first
place.

Not at all. Look at Sage, for example. It's attractive because it provides
tons of functionality, all nicely glued together through a simple language
that even non-programmers can use efficiently and effectively. And its use
of Cython makes all of this easily extensible without crossing the gap of a
language border.

Stefan
 
S

sturlamolden

This is something that I have been thinking about recently. Python has
won quite a following in the scientific computing area, probably
especially because of great libraries such as numpy, scipy, pytables
etc.

Python is much more friendly to memory than Matlab, and a much nicer
language to work in. It can also be used to program more than just
linear algebra. If you have to read data from a socket, Matlab is not
so fun anymore.
But it also seems python itself is falling further and further
behind in terms of performance and parallel processing abilities.

First, fine-grained parallelism really belongs in libraries like MKL,
GotoBLAS and FFTW. Python can manage the high-level routines just like
Matlab. You can call a NumPy routine like np.dot, and the BLAS library
(e.g. Intel MKL) will do the multi-threading for you. We almost always
use Python to orchestrate C and Fortran. We can use OpenMP in C or
Fortran, or we can just release the GIL and use Python threads.

Second, the GIL it does not matter for MPI, as it works with
processes. Nor does it matter for os.fork or multiprocessing. On
clusters, which are as common in high-performance computing as SMP
systems, one has to use processes (usually MPI) rather than threads,
as there is no shared memory between processors. On SMP systems, MPI
can use shared-memory and be just as efficient as threads (OpenMP).
(MPI is usually faster due to cache problems with threads.)

Consider that Matlab does not even have threads (or did not last time
I checked). Yet it takes advantage of multi-core CPUs for numerical
computing. It's not the high-level interface that matters, it's the
low-level libraries. And Python is just that: a high-level "glue"
language.
For an outsider it does not look like a solution to the GIL mess or a
true breakthrough for performance are around the corner (even though
there seem to be many different attempts at working around these
problems or helping with parts). Am I wrong?

Yes you are.

We don't do CPU intensive work in "pure Python". We use Python to
control C and Fortran libraries. That gives us the opportunity to
multi-thread in C, release the GIL and multi-thread in Python, or
both.
 
S

sturlamolden

WRT a single machine, you should note that GPGPUs are a lot faster these
days than even multi-core CPUs. And Python has pretty good support for
GPUs, too.

With OpenCL, Python is better than C for heavy computing. The Python
or C/C++ program has to supply OpenCL code (structured text) to the
OpenCL driver, which does the real work on GPU or CPU. Python is much
better than C or C++ at processing text. There will soon be OpenCL
drivers for most processors on the market.

But OpenCL drivers will not be pre-installed on Windows, as Microsoft
has a competing COM-based technology (DirectX Compute, with an
atrocious API and syntax).
 
F

Felix

Yes you are.

We don't do CPU intensive work in "pure Python". We use Python to
control C and Fortran libraries. That gives us the opportunity to
multi-thread in C, release the GIL and multi-thread in Python, or
both.

Yes, this setup works very well and is (as I said) probably the reason
python is so widely used in scientific computing these days.
However I find that I can almost never do everything with vector
operations, but have to iterate over data structures at some point.
And here the combination of CPython slowness and the GIL means either
bad performance or having to write this in C (with which cython helps
fortunately). If it were possible to write simple, parallel,
reasonably fast loops in (some subset of) python directly that would
certainly be a great advantage. Given the performance of other JITs it
sounds like it should be possible, but maybe python is too complex to
make this realistic.

Felix

PS: No need to convince me that MATLAB is not the solution.
 
F

Felix

Felix, 09.07.2010 05:39:
> Well, at least its "parallel processing abilities" are quite good actually.
If you have really large computations, they usually run on more than one
computer (not just more than one processor). So you can't really get around
using something like MPI, in which case an additional threading layer is
basically worthless, regardless of the language you use. For computations,
threading keeps being highly overrated.

That is certainly true for large computations. But many smaller tasks
are run on single machines and it does make a difference if they take
1 minute per run or 10. The average number of cores per computer has
been increasing for quite a while now. It seems unfortunate to be
restricted to using only one of them at a time (for regular loops, not
mathematical vector operations). Python has made so many complicated
things easy, but I have not seen an easy way to parallelize a simple
loop on a multicore CPU without having to set up infrastructure and/or
incurring large overhead from many interpreters and marshalling data.
Just the fact that there is such a large number of attempts out there
to fix this suggests that something important is missing.
 
S

sturlamolden

PS: No need to convince me that MATLAB is not the solution.

What I mean is that Matlab and Mathematica are inherently "single
threaded" interpreters. Yet they are still used for serious parallel
computing. While Python has multiple threads but a GIL, only allowing
one thread in the interpreter is even more restrictive.
 
T

Terry Reedy

With OpenCL, Python is better than C for heavy computing. The Python
or C/C++ program has to supply OpenCL code (structured text) to the
OpenCL driver, which does the real work on GPU or CPU. Python is much
better than C or C++ at processing text. There will soon be OpenCL
drivers for most processors on the market.

For those as ignorant as me, OpenCL = Open Computing Language (for
parallel computing). Apple proposed, Khronos Group maintains spec (along
with OpenGL), AMD, NVidea, Intel support. Send C-like text to device,
as with OpenGL; device compiles and runs; mainly for number crunching
with all resources a machine has. OpenCL and OpenGL can work together.
There is already a Python binding:
http://sourceforge.net/projects/pyopencl/
 
G

geremy condra

For those as ignorant as me, OpenCL = Open Computing Language (for parallel
computing). Apple proposed, Khronos Group maintains spec (along with
OpenGL), AMD, NVidea, Intel support.  Send C-like text to device, as with
OpenGL; device compiles and runs; mainly for number crunching with all
resources a machine has. OpenCL and OpenGL can work together. There is
already a Python binding:
http://sourceforge.net/projects/pyopencl/

Its worth pointing out that right now you're generally better off with CUDA
than OpenCL, and that pycuda bindings are usable, if not what I would
call easy-to-use.

Geremy Condra
 

Members online

Forum statistics

Threads
474,175
Messages
2,570,942
Members
47,476
Latest member
blackwatermelon

Latest Threads

Top