Python and STL efficiency

R

Ray

That's to say,
python is still much faster?

Not really, see my test, in my other post in the same thread. I'm using
VC++ Express 2005. If we're comparing with Python 2.5 I think it's just
fair that for C++ we're using the latest as well.
I am a c++ newbie but I think c++ should be faster here.

Same here, although that said Python's implementation of those data
structure must already be as optimal as mortals can do it.

<snip>
 
M

Mc Osten

Ray said:
I'm using VC++ Express, I didn't care to tweak the optimizations, I
merely chose the "Release" configuration for the executable. It's
blazing fast, taking only 30+ ms each run.

Of course it is faster. We are looping 1000000 times, you just 10000.
 
M

Mc Osten

Ray said:
Not really, see my test, in my other post in the same thread. I'm using
VC++ Express 2005. If we're comparing with Python 2.5 I think it's just
fair that for C++ we're using the latest as well.

In your test, you are looping 10000 times, we looped 1000000.
In Python tests with 10000 elements, it was about 10 ms.

Moreover, we tried various Python and C++ configurations. Most of the
tests are done with Python 2.4, not 2.5.
And I used gcc4, that is to say the latest on my platform.
Same here, although that said Python's implementation of those data
structure must already be as optimal as mortals can do it.

I think this is the rationale behind it.
 
M

Mc Osten

That's to say,
python is still much faster?

Yes it is. But of course you can't sat that "Python is faster than C++".
We found that the code to do this, written in the most natural way, is a
lot faster in Python. However, if you optimze the code, C++ gets almost
as fast.

In other benchmarks C++ outperforms Python and is 10 or 100 times
faster.

Maybe someone can post this to the c++ maillist and they will tell how
to accelerate it.

There are enough C++ experts here to do it. The point is another.
 
R

Ray

Mc said:
Of course it is faster. We are looping 1000000 times, you just 10000.

Certainly--I was not comparing 1000000 against 10000. Referring to the
OP's statement: "However, while the python code gave the result almost
instantly, the C++ code took several seconds to run!" 30ms sounds like
a definite improvement over several seconds!

I'll try to tweak it later at home and report here. I'll try out the
1000000 too.

Cheers
Ray
 
R

Ray

Mc said:
In your test, you are looping 10000 times, we looped 1000000.
In Python tests with 10000 elements, it was about 10 ms.

Moreover, we tried various Python and C++ configurations. Most of the
tests are done with Python 2.4, not 2.5.
And I used gcc4, that is to say the latest on my platform.

Mine's VC 2005 Express--let me put the optimization parameters later
and measure again when I get home.

<snip>
 
G

GHUM

Mc said:
Yes it is. But of course you can't sat that "Python is faster than C++".

Of course not. Python is faster then assembler. Proofed @ EuroPython
2006 in CERN, near the LHC Beta, in the same room many Nobel laurates
gave their presentations before.

Harald
 
M

Mc Osten

Ray said:
Certainly--I was not comparing 1000000 against 10000. Referring to the
OP's statement: "However, while the python code gave the result almost
instantly, the C++ code took several seconds to run!" 30ms sounds like
a definite improvement over several seconds!

Of course. I suppose there's something broken in OP's C++ setup (in fact
the version I compiled with VCPP 2005 also takes a lot of seconds...
something like 20-30 seconds, but of course this makes me think I
haven't understood how it is supposed to work, since my gcc gives
results comparable to yours).
 
M

Mc Osten

GHUM said:
Proofed @ EuroPython
2006 in CERN, near the LHC Beta, in the same room many Nobel laurates
gave their presentations before.

Have you some link? I suppose it's kind of a joke they did or something
like that...
 
R

Ray

Mc said:
Of course. I suppose there's something broken in OP's C++ setup (in fact
the version I compiled with VCPP 2005 also takes a lot of seconds...
something like 20-30 seconds, but of course this makes me think I
haven't understood how it is supposed to work, since my gcc gives
results comparable to yours).

Yeah, my guess would be either he used the Debug configuration or he
actually created a Managed executable instead of a pure Win32
application. Sigh, now I can't wait to get home and try it out :)
 
M

Maric Michaud

Le mardi 22 août 2006 23:15, Fredrik Lundh a écrit :

Ah ! wrong, thanks, but not in our case :

In [80]: for i in (10**e for e in range(5)) :
....: res1 = timeit.Timer('s == t', 's, t = "e"*%i, "e"*%i'%(i,
i)).timeit()
....: res2 = timeit.Timer('s is t', 's, t = "e"*%i, "e"*%i'%(i,
i)).timeit()
....: print res1/res2
....:
....:
1.10532866525
1.27507328965
1.90244004672
8.33974283485
89.5215441627

Indeed, it's wrong for two instances of str, but not if we compare an instance
to itself :

In [79]: for i in (10**e for e in range(9)) :
....: r1=timeit.Timer('s is p', "s, p = ('s'*%s,)*2" % i).timeit()
....: r2=timeit.Timer('s == p', "s, p = ('s'*%s,)*2" % i).timeit()
....: print r1/r2
....:
....:
0.854690643008
0.872682262181
0.851785060822
0.871193603744
0.890304121256
0.86925960859
0.846364097331
0.91614070798
0.825424114324


So my lastest c++ algorithm is the good one still, as the python code is
like :

In [29]: timeit.Timer('set(t)', 't=("e"*10,)*10').timeit()
Out[29]: 1.883868932723999

In [30]: timeit.Timer('set(t)', 't=("e"*100,)*10').timeit()
Out[30]: 1.8789899349212646

and not :

In [27]: timeit.Timer('set(t)', 't=["e"*10 for i in range(10)]').timeit()
Out[27]: 2.6000990867614746

In [28]: timeit.Timer('set(t)', 't=["e"*100 for i in range(10)]').timeit()
Out[28]: 4.1173238754272461


1000000 loops, best of 3: 1.9 usec per loop


100000 loops, best of 3: 4.5 usec per loop

</F>

--
_____________

Maric Michaud
_____________

Aristote - www.aristote.info
3 place des tapis
69004 Lyon
Tel: +33 426 880 097
 
M

Maric Michaud

Tim, sorry for I send it to you personally, I was abused by thunderbird.

Tim N. van der Leeuw a écrit :
Your C++ version got me the following timings (using gcc 3.4.5 as the
compiler, MinGW version, with -O6):

...


Hmmm... Can we conclude now that carefully crafted C++ code is about
twice as fast as casually and intuitively written Python code? ;) (Just
kidding here of course)

Every C++ code must be carefully crafted :).

But your benchmark surprise me, here is what I get on my windows box :

Python2.4 :
===========

maric@aristote-2 /cygdrive/c/Documents and Settings/maric/Bureau
$ which python2.4
/usr/bin/python2.4

maric@aristote-2 /cygdrive/c/Documents and Settings/maric/Bureau
$ which python
/cygdrive/c/Python24/python

maric@aristote-2 /cygdrive/c/Documents and Settings/maric/Bureau
$ python2.4 testpython.py
so long...
What do you know
fool
chicken crosses road
Elapsed: 3.655000 seconds

maric@aristote-2 /cygdrive/c/Documents and Settings/maric/Bureau
$ python testpython.py
so long...
What do you know
fool
chicken crosses road
Elapsed: 3.077764 seconds

Cygwin version is slower, but not too much.

C++, compiled with VS2005, release, no other configuration :
============================================================

print_occurence_of_strings
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings_compared_by_address
What do you know?
chicken crosses road
fool
so long...
strings : 16.718 # memory allocation is quite bad
unique strings : 1.188
compared by address : 0.453

C++, with gcc 3.3/Cygwin
========================

maric@aristote-2 /cygdrive/c/Documents and Settings/maric/Bureau
$ g++ -O3 -o testcpp testcpp.cpp

maric@aristote-2 /cygdrive/c/Documents and Settings/maric/Bureau
$ ./testcpp
print_occurence_of_strings
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings_compared_by_address
What do you know?
chicken crosses road
fool
so long...
strings : 17.266 # still bad
unique strings : 1.547
compared by address : 0.375

Hum, with my old gcc I get equal or better performances than with VS2005.

Finally, the equivalent code is still about 10x faster in c++ than in
Python, as it was on my Linux box.



Mc Osten a écrit :
...
> However, I would have written the code using a proper compare function
> rather than using two sets. In this particular case the number of
> elements of the first set is negligible in respect of the initial vector
> size, thus copying it again does not take a lot of time.
> But such code is optimized for the problem itself: in the real world I

Of course, it's a quick hack to get the same behavior.
> suppose we would have passed set a proper comparison function that
> checks address and then string equality.
>

Yes, furthermore, that is *exactly* what the python code is doing (see
my other post to Fredrik).


--
_____________

Maric Michaud
_____________

Aristote - www.aristote.info
3 place des tapis
69004 Lyon
Tel: +33 426 880 097
 
B

bearophileHUGS

This thread can be useful for ShedSkin (the Python => C++ translator),
because often it manages strings slower than CPython still, some
suggestions from a C++ expert can surely improve things a lot. C++ is
fast, but you have to use and know it well, otherwise you don't obtain
much speed.

Maybe this can be useful to speed up C++ hashing (now used in ShedSkin
too):
http://www.azillionmonkeys.com/qed/hash.html

I suggest to test this code with the D language too, it has built-in
dicts too (associative arrays, impleented with trees and not hashes),
so the source code needed is pretty short.

Bye,
bearophile
 
M

Mc Osten

Ray said:
Yeah, my guess would be either he used the Debug configuration or he
actually created a Managed executable instead of a pure Win32
application. Sigh, now I can't wait to get home and try it out :)

Can be. But I suppose a Managed should not get *that* slow.
IronPython on Tim's machine is still faster than C++ (even though not as
fast as CPython).
 
T

Tim N. van der Leeuw

Mc said:
Can be. But I suppose a Managed should not get *that* slow.
IronPython on Tim's machine is still faster than C++ (even though not as
fast as CPython).

I have to admit to a stupid mistake, for which I feel quite ashamed - I
got the loop-size wrong in the Python code. So all Python results
posted by me were off by a factor of 10 :-(
I feel quite bad about that!

With the nr of loops corrected, Python on my laptop performs worse than
C++ under all circumstances, by a factor of about 2:

============ Python 2.4 =============
LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ /cygdrive/c/Python24/python.exe SpeedTest.py
Begin Test
Number of unique string objects: 4
so long...
What do you know
fool
chicken crosses road
Number of unique string objects: 4000000
so long...
What do you know
fool
chicken crosses road
Fast - Elapsed: 4.239721 seconds
Slow - Elapsed: 11.883234 seconds

============ Python 2.5 =============
LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ /cygdrive/c/Python25/python.exe SpeedTest.py
Begin Test
Number of unique string objects: 4
so long...
What do you know
fool
chicken crosses road
Number of unique string objects: 4000000
so long...
What do you know
fool
chicken crosses road
Fast - Elapsed: 4.031873 seconds
Slow - Elapsed: 11.314742 seconds


============ GCC 3.4.5, MinGW, -O6 =============
LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ ./SpeedTest.exe
Begin Test
What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...
Fast - Elapsed: 2.088 seconds
Slow - Elapsed: 7.033 seconds

============ VC++ 6, 'release' build =============
LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ ./SpeedTest_VC.exe
Begin Test
What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...
Fast - Elapsed: 4.585 seconds
Slow - Elapsed: 5.024 seconds

========== GCC 3.4.5, MinGW, -O6, with most optimized C++ code
==========
LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ ./testcpp.exe
print_occurence_of_strings
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings_compared_by_address
What do you know?
chicken crosses road
fool
so long...
strings : 2.338
unique strings : 1.109
compared by address : 0.23

LeeuwT@nlshl-leeuwt ~/My Documents/Python

============ IronPython 1.0rc2 =============

IronPython had a hard time coping with it; creating 4 million string
objects is a bit too much and the CLR was eating nearly a gigabyte of
memory near the end.
Here are the numbers:

IronPython 1.0.60816 on .NET 2.0.50727.42
Copyright (c) Microsoft Corporation. All rights reserved.Begin Test
Number of unique string objects: 4
What do you know
so long...
chicken crosses road
fool
Number of unique string objects: 4000000
What do you know
so long...
chicken crosses road
fool
Fast - Elapsed: 10.501273 seconds
Slow - Elapsed: 371.047343 seconds
============ Java 1.6.0 b2 =============
Set size: 4
chicken crosses road
What do you know
fool
so long...
Set size: 4
chicken crosses road
What do you know
fool
so long...
Fast - Elapsed 1.003 seconds
Slow - Elapsed 3.96 seconds

============ Java 1.5.0 =============
Set size: 4
fool
What do you know
so long...
chicken crosses road
Set size: 4
fool
What do you know
so long...
chicken crosses road
Fast - Elapsed 1.754 seconds
Slow - Elapsed 5.044 seconds
=========================

Note that the Python code creates a set of all unique id's of all
objects in list a, and prints the length of this set, to verify that
all strings are really unique instances or duplicate instances. The C++
versions don't do that (at least not for 4 million strings); so Python
is at a slight disadvantage here. Printing the number of strings still
didn't help me catch the off-by-ten errors though.

I included a Java version of the program, and it looks like it performs
quite well compared to C++ both with jdk1.5 and jdk1.6.


I humbly apologize for my misinformation yesterday.

Regards,

--Tim
 
R

Ray

Tim said:
With the nr of loops corrected, Python on my laptop performs worse than
C++ under all circumstances, by a factor of about 2:

*Phew*

Great to know that my model of how the world works is still correct!
(at least in relation to Python and C++!) :)

Thanks,
Ray
 
S

skip

Ray> Same here, although that said Python's implementation of those data
Ray> structure must already be as optimal as mortals can do it.

Perhaps more optimal. We've had (tim)bots working on the problem for years.

Skip
 
S

skip

Harald> Of course not. Python is faster then assembler. Proofed @
Harald> EuroPython 2006 in CERN, near the LHC Beta, in the same room
Harald> many Nobel laurates gave their presentations before.

Harald, do you have a reference to a talk abstract or a paper?

Thx,

Skip
 
P

Paul Boddie

Harald> Of course not. Python is faster then assembler. Proofed @
Harald> EuroPython 2006 in CERN, near the LHC Beta, in the same room
Harald> many Nobel laurates gave their presentations before.

Harald, do you have a reference to a talk abstract or a paper?

Via the lightning talks page:

http://wiki.python.org/moin/EuroPy2006LightningTalks

Here's a direct link:

http://wiki.python.org/moin/EuroPy2...=get&target=:FasterThenAssemblerLightning.odp

Of course, Harald's interpretation of the "faster than" qualification
is broader than a pure assessment of raw performance, so I wouldn't
expect too many technical revelations. And while footage exists of at
least one talk in the CERN Auditorium which led to a Nobel prize, I
don't think that any footage of the EuroPython lightning talks (or of
any of the other talks) has been released just yet.

Paul
 
A

Amanjit Gill

I was using VC++.net and IDLE, respectively. I had expected C++ to be
way faster. However, while the python code gave the result almost

- This code runs amortized 100ms on my machien (vc.net 2003 pro,
dinkumware stl, p4m 2.2GHz thinkpad, windows 2003 server), (10 loops in
1000ms)
- with STLPort 5.2, this code runs in 39-43ms (10 loops in 390ms ->
430ms).

a. did you compile in release mode? and if yes, vc.net _standard_
edition has no optimizing compiler in release mode, you need vc.net pro
or enterprise. or did you use vc.net 2005 ?

you should also include <algorithm> for ostream_operator.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,813
Latest member
lawrwtwinkle111

Latest Threads

Top