x64 speed

R

Robin Becker

Whilst doing some portability testing with reportlab I noticed a strange speedup
for our unittest suite with python2.5

host win32 xp3 unittest time=42.2 seconds
vmware RHEL x64 unittest time=30.9 seconds

so it looks like the vmware emulated system is much faster. Is it the x64
working faster at its design sizes or perhaps the compiler or could it be the
vmware system caching all writes etc etc? For the red hat x64 build the only
special configuration was to use ucs2

I know that the VT bit stuff has made virtualization much better, but this seems
a bit weird.
 
T

Tim Daneliuk

Robin said:
Whilst doing some portability testing with reportlab I noticed a strange
speedup for our unittest suite with python2.5

host win32 xp3 unittest time=42.2 seconds
vmware RHEL x64 unittest time=30.9 seconds

so it looks like the vmware emulated system is much faster. Is it the
x64 working faster at its design sizes or perhaps the compiler or could
it be the vmware system caching all writes etc etc? For the red hat x64
build the only special configuration was to use ucs2

I know that the VT bit stuff has made virtualization much better, but
this seems a bit weird.

Which vmware product?
 
M

Martin v. Löwis

Robin said:
Whilst doing some portability testing with reportlab I noticed a strange
speedup for our unittest suite with python2.5

host win32 xp3 unittest time=42.2 seconds
vmware RHEL x64 unittest time=30.9 seconds

so it looks like the vmware emulated system is much faster. Is it the
x64 working faster at its design sizes or perhaps the compiler or could
it be the vmware system caching all writes etc etc?

I follow David's guess that Linux does better IO than Windows (not
knowing anything about the benchmark, of course)

Regards,
Martin
 
D

Diez B. Roggisch

Robin said:
Whilst doing some portability testing with reportlab I noticed a strange
speedup for our unittest suite with python2.5

host win32 xp3 unittest time=42.2 seconds
vmware RHEL x64 unittest time=30.9 seconds

so it looks like the vmware emulated system is much faster. Is it the
x64 working faster at its design sizes or perhaps the compiler or could
it be the vmware system caching all writes etc etc? For the red hat x64
build the only special configuration was to use ucs2

I know that the VT bit stuff has made virtualization much better, but
this seems a bit weird.


AFAIK some VMs have difficulties with timers. For example, my
virtualized KDE has that jumping icon when starting a program - and
that's *much* faster jumping inside VBox :)

So - are you sure it *is* faster?

Diez
 
P

Paul Rubin

Robin Becker said:
so it looks like the vmware emulated system is much faster. Is it the
x64 working faster at its design sizes or perhaps the compiler or
could it be the vmware system caching all writes etc etc? For the red
hat x64 build the only special configuration was to use ucs2

You have to control all these variables separately in order to know.
But, 64 bit code is in general faster than 32 bit code when properly
compiled: more cpu registers, wider moves when copying large blocks of
data, floating point registers instead of the legacy stack-oriented
FPU, etc.
 
M

Martin v. Löwis

I follow David's guess that Linux does better IO than Windows (not
I originally thought it must be the vmware host stuff offloading IO to
the second core, but watching with sysinternals didn't show a lot of
extra stuff going on with the vm compared to just running on the host.

I'm not talking about vmware. I'm suggesting that Linux ext3, and the
Linux buffer handling, is just more efficient than NTFS, and the Windows
buffer handling.

If you split the total runtime into system time and user time, how do
the 30s split up?

Regards,
Martin
 
R

Robin Becker

Martin said:
I'm not talking about vmware. I'm suggesting that Linux ext3, and the
Linux buffer handling, is just more efficient than NTFS, and the Windows
buffer handling.

If you split the total runtime into system time and user time, how do
the 30s split up?
........
so here is one for the vm clock is bad theorists :)

[rptlab@localhost tests]$ time python25 runAll.py
.............................................................
..........................
----------------------------------------------------------------------
Ran 193 tests in 27.841s

OK

real 0m28.150s
user 0m26.606s
sys 0m0.917s
[rptlab@localhost tests]$

magical how the total python time is less than the real time.
 
R

Robin Becker

Martin said:
I'm not talking about vmware. I'm suggesting that Linux ext3, and the
Linux buffer handling, is just more efficient than NTFS, and the Windows
buffer handling.

If you split the total runtime into system time and user time, how do
the 30s split up?
........
so here is one for the vm clock is bad theorists :)

[rptlab@localhost tests]$ time python25 runAll.py
.............................................................
..........................
----------------------------------------------------------------------
Ran 193 tests in 27.841s

OK

real 0m28.150s
user 0m26.606s
sys 0m0.917s
[rptlab@localhost tests]$

magical how the total python time is less than the real time.
 
F

Floris Bruynooghe

[rptlab@localhost tests]$ time python25 runAll.py
.............................................................
.........................

----------------------------------------------------------------------
Ran 193 tests in 27.841s

real    0m28.150s
user    0m26.606s
sys     0m0.917s
[rptlab@localhost tests]$

magical how the total python time is less than the real time.

Not really. Python was still running at the time that it prints the
time of the tests. So it's only natural that the wall time Python
prints on just the tests is going to be smaller then the wall time
time prints for the entire python process. Same for when it starts,
some stuff is done in Python before it starts its timer.

Regards
Floris
 
M

Martin v. Löwis

Is it the
x64 working faster at its design sizes

Another guess (still from the darkness of not having received the
slightest clue what the test actually does): if it creates integers
in range(2**32, 2**64), then they fit into a Python int on AMD64-Linux,
but require a Python long on 32-bit Windows; long operations are much
slower than int operations.

Regards,
Martin
 
M

M.-A. Lemburg

Martin said:
I'm not talking about vmware. I'm suggesting that Linux ext3, and the
Linux buffer handling, is just more efficient than NTFS, and the Windows
buffer handling.

If you split the total runtime into system time and user time, how do
the 30s split up?
.......
so here is one for the vm clock is bad theorists :)

[rptlab@localhost tests]$ time python25 runAll.py
.............................................................
.........................
----------------------------------------------------------------------
Ran 193 tests in 27.841s

OK

real 0m28.150s
user 0m26.606s
sys 0m0.917s
[rptlab@localhost tests]$

magical how the total python time is less than the real time.

time(1) also measures the Python startup and shutdown time, so
I don't quite see the magic :-(

FWIW: VMware VMs need the VMware tools installed to make their
clocks work more or less. With Linux, you need some extra tweaks
as well, otherwise the clocks are just completely unreliable.

See these notes:

http://kb.vmware.com/selfservice/viewContent.do?language=en_US&externalId=1420
http://communities.vmware.com/message/782173

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Feb 04 2009)________________________________________________________________________

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/
 
R

Robin Becker

........
----------------------------------------------------------------------
Ran 193 tests in 27.841s

OK

real 0m28.150s
user 0m26.606s
sys 0m0.917s
[rptlab@localhost tests]$
magical how the total python time is less than the real time.

time(1) also measures the Python startup and shutdown time, so
I don't quite see the magic :-(

yes stupid me :(
FWIW: VMware VMs need the VMware tools installed to make their
clocks work more or less. With Linux, you need some extra tweaks
as well, otherwise the clocks are just completely unreliable.

I do have the tools installed and from what I can see the clock isn't so far
off. At least when I run the two tests side by side the vm run always finishes
first. Of course that could be because vmware is stealing cpu somehow.
 
R

Robin Becker

Martin said:
Another guess (still from the darkness of not having received the
slightest clue what the test actually does): if it creates integers
in range(2**32, 2**64), then they fit into a Python int on AMD64-Linux,
but require a Python long on 32-bit Windows; long operations are much
slower than int operations.
.......
I don't think we're doing a lot of bignum arithmetic, some masking operations
etc etc.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,240
Members
46,830
Latest member
HeleneMull

Latest Threads

Top