Wow, Python much faster than MatLab

S

Stef Mientki

hi All,

instead of questions,
my first success story:

I converted my first MatLab algorithm into Python (using SciPy),
and it not only works perfectly,
but also runs much faster:

MatLab: 14 msec
Python: 2 msec

After taking the first difficult steps into Python,
all kind of small problems as you already know,
it nows seems a piece of cake to convert from MatLab to Python.
(the final programs of MatLab and Python can almost only be
distinguished by the comment character ;-)

Especially I like:
- more relaxed behavior of exceeded the upper limit of a (1-dimensional)
array
- much more functions available, like a simple "mean"
- reducing datatype if it's allowed (booleans of 1 byte)

thanks for all your help,
probably need some more in the future,
cheers,
Stef Mientki
 
B

Beliavsky

Stef said:
hi All,

instead of questions,
my first success story:

I converted my first MatLab algorithm into Python (using SciPy),
and it not only works perfectly,
but also runs much faster:

MatLab: 14 msec
Python: 2 msec

For times this small, I wonder if timing comparisons are valid. I do
NOT think SciPy is in general an order of magnitude faster than Matlab
for the task typically performed with Matlab.
After taking the first difficult steps into Python,
all kind of small problems as you already know,
it nows seems a piece of cake to convert from MatLab to Python.
(the final programs of MatLab and Python can almost only be
distinguished by the comment character ;-)

Especially I like:
- more relaxed behavior of exceeded the upper limit of a (1-dimensional)
array

Could you explain what this means? In general, I don't want a
programming language to be "relaxed" about exceeding array bounds.
 
S

Stef Mientki

For times this small, I wonder if timing comparisons are valid. I do
NOT think SciPy is in general an order of magnitude faster than Matlab
for the task typically performed with Matlab.
The algorithm is meant for real-time analysis,
where these kind of differences counts a lot.
I'm also a typical "surface programmer"
(don't need/want to know what's going inside),
just want to get my analysis done,
and the fact that Python has much more functions available,
means I've to write far less explicit or implicit for loops,
and thus I expect it to "look" faster for me always.
Could you explain what this means? In general, I don't want a
programming language to be "relaxed" about exceeding array bounds.
Well, I've to admit, that wasn't a very tactic remark, "noise" is still
an unwanted issue in software.
But in the meanwhile I've reading further and I should replace that by
some other great things:
- the very efficient way, comment is turned into help information
- the (at first sight) very easy, but yet quit powerfull OOPs implemetation.

cheers,
Stef Mientki
 
M

Mathias Panzenboeck

A other great thing: With rpy you have R bindings for python.
So you have the power of R and the easy syntax and big standard lib of python! :)
 
S

Stef Mientki

Mathias said:
A other great thing: With rpy you have R bindings for python.

forgive my ignorance, what's R, rpy ?
Or is only relevant for Linux users ?

cheers
Stef
 
S

sturlamolden

Stef said:
MatLab: 14 msec
Python: 2 msec

I have the same experience. NumPy is usually faster than Matlab. But it
very much depends on how the code is structured.

I wonder if it is possible to improve the performance of NumPy by
having its fundamental types in the language, instead of depending on
operator overloading. For example, in NumPy, a statement like

array3[:] = array1[:] + array2[:]

allocates an intermediate array that is not needed. This is because the
operator overloading cannot know if it's evaluating a part of a larger
statement like

array1[:] = (array1[:] + array2[:]) * (array3[:] + array4[:])

If arrays had been a part of the language, as it is in Matlab and
Fortran 95, the compiler could see this and avoid intermediate storage,
as well as looping over the data only once. This is one of the main
reasons why Fortran is better than C++ for scientific computing. I.e.
instead of

for (i=0; i<n; i++)
array1 = (array1 + array2) * (array3 + array4);

one actually gets something like three intermediates and four loops:

tmp1 = malloc(n*sizeof(whatever));
for (i=0; i<n; i++)
tmp1 = array1 + array2;
tmp2 = malloc(n*sizeof(whatever));
for (i=0; i<n; i++)
tmp2 = array3 + array4;
tmp3 = malloc(n*sizeof(whatever));
for (i=0; i<n; i++)
tmp3 = tmp1 + tmp2;
free(tmp1);
free(tmp2);
for (i=0; i<n; i++)
array1 = tmp3;
free(tmp3);

In C++ this is actually further bloated by constructor, destructor and
copyconstructor calls.
Why one should use Fortran over C++ is obvious. But it also applies to
NumPy, and also to the issue of Numpy vs. Matlab, as Matlab know about
arrays and has a compiler that can deal with this, whilst NumPy depends
on bloated operator overloading. On the other hand, Matlab is
fundamentally impaired on function calls and array slicing compared
with NumPy (basically copies are created instead of views). Thus, which
is faster - Matlab or NumPy - very much depends on how the code is
written.

Now for my question: operator overloading is (as shown) not the
solution to efficient scientific computing. It creates serious bloat
where it is undesired. Can NumPy's performance be improved by adding
the array types to the Python language it self? Or are the dynamic
nature of Python preventing this?

Sturla Molden
 
R

Robert Kern

sturlamolden said:
array3[:] = array1[:] + array2[:]

OT, but why are you slicing array1 and array2? All that does is create new array
objects pointing to the same data.
Now for my question: operator overloading is (as shown) not the
solution to efficient scientific computing. It creates serious bloat
where it is undesired. Can NumPy's performance be improved by adding
the array types to the Python language it self? Or are the dynamic
nature of Python preventing this?

Pretty much. Making the array types builtin rather than from a third party
module doesn't really change anything. However, if type inferencing tools like
psyco are taught about numpy arrays like they are already taught about ints,
then one could do make it avoid temporaries.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
K

Klaas

sturlamolden said:
as well as looping over the data only once. This is one of the main
reasons why Fortran is better than C++ for scientific computing. I.e.
instead of

for (i=0; i<n; i++)
array1 = (array1 + array2) * (array3 + array4);

one actually gets something like three intermediates and four loops:

tmp1 = malloc(n*sizeof(whatever));
for (i=0; i<n; i++)
tmp1 = array1 + array2;
tmp2 = malloc(n*sizeof(whatever));
for (i=0; i<n; i++)
tmp2 = array3 + array4;
tmp3 = malloc(n*sizeof(whatever));
for (i=0; i<n; i++)
tmp3 = tmp1 + tmp2;
free(tmp1);
free(tmp2);
for (i=0; i<n; i++)
array1 = tmp3;
free(tmp3);


C/C++ do not allocate extra arrays. What you posted _might_ bear a
small resemblance to what numpy might produce (if using vectorized
code, not explicit loop code). This is entirely unrelated to the
reasons why fortran can be faster than c.

-Mike
 
S

sturlamolden

Klaas said:
C/C++ do not allocate extra arrays. What you posted _might_ bear a
small resemblance to what numpy might produce (if using vectorized
code, not explicit loop code). This is entirely unrelated to the
reasons why fortran can be faster than c.

Array libraries in C++ that use operator overloading produce
intermediate arrays for the same reason as NumPy. There is a C++
library that are sometimes able to avoid intermediates (Blitz++), but
it can only do so for small arrays for which bounds are known at
compile time.

Operator overloading is sometimes portrayed as required for scientific
computing (e.g. in Java vs. C# flame wars), but the cure can be worse
than the disease.

C does not have operator overloading and is an entirely different case.
You can of course avoid intermediates in C++ if you use C++ as C. You
can do that in Python as well.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,812
Latest member
GracielaWa

Latest Threads

Top