is there better 32 clock() timing?

R

Ray Schumacher

I have a need for a time.clock() with >0.000016 second (16us) accuracy.
The sleep() (on Python 2.3, Win32, at least) has a .001s limit.

Are they lower/better on other's platforms?

Test code, 2.4GHz P4
Python 2.3.3 (#51, Dec 18 2003, 20:22:39) [MSC v.1200 32 bit (Intel)] on win32

import time
t0 = time.clock()
t1 = time.clock()
t2 = time.clock()
t3 = time.clock()
t4 = time.clock()
t5 = time.clock()
print (-t0+t5)/5.
print t1-t0
print t2-t1
print t3-t2
print t4-t3
print t5-t4
ave 0.000342754564927
0.000321028401686
0.00030348379596
0.000297101358228
0.000295895991258

I had also considered forking a thread that would spin a loop checking time.clock() and firing the TTL pulse after the appropriate interval, but the real, ultimate resolution of time.clock() appears to be ~.00035s. If I increase process priority to real-time, it is ~.00028s
The alternative appears to be more C code...

Ray
BCI/Congitive Vision
 
M

Michael Hoffman

Ray said:
I have a need for a time.clock() with >0.000016 second (16us) accuracy.
The sleep() (on Python 2.3, Win32, at least) has a .001s limit.

Are they lower/better on other's platforms?

The meaning of time.clock() is entirely different on other platforms.
See the documentation. You could probably get a slight speedup by using
"from time import clock" and then just clock().
 
W

wittempj

alternatively you could use the now() method of the datetime module, it
has a resolution of 1 microsecond
 
P

Paul Rubin

Ray Schumacher said:
I have a need for a time.clock() with >0.000016 second (16us) accuracy.
The sleep() (on Python 2.3, Win32, at least) has a .001s limit.

Are they lower/better on other's platforms?

The alternative appears to be more C code...

C code is your best bet. The highest resolution timer on x86's these
days is the Pentium RTDSC instruction which counts the number of cpu
cycles since power-on. There's various C routines floating around
that let you access that instruction.
 
C

Claudio Grondi

On my 2.8GHz P4, Windows 2000 SP4 with Python 2.3.4 I am getting
totally different results compared to Ray. Does Python 2.3.4 already
use the Pentium RTDSC instruction for clock()?

Claudio
# \> Claudio Grondi, 2.8GHz P4 Python 2.3.4 (2005-01-24 14:32)
# time of taking time:
# 0.000001396825574200073100
# 0.000001676190689040086400
# 0.000001396825574200074000
# 0.000001676190689040088100
# 0.000001955555803880100500
# 0.000001620317666072084300 (average)
# statistics of 1.000.000 times of taking time in a while loop:
# 0.000001396825573429794100 (min)
# 0.002370692364532356300000 (max)
# 0.000001598858514140937100 (avg)
# >>> Ray Schumacher, 2.4GHz P4 Python 2.3.3 (#51, Dec 18 2003, 20:22:39)
[MSC v.1200 32 bit (Intel)] on win32
# 0.000321028401686
# 0.00030348379596
# 0.000297101358228
# 0.000295895991258
# 0.000342754564927 (average)

Here my code:
# Tests show, that the first call takes longer than subsequent calls,
# so it makes sense to run t = clock() just one time before the next calls
# are used:
t = clock()
t0 = clock()
t1 = clock()
t2 = clock()
t3 = clock()
t4 = clock()
t5 = clock()
print 'time of taking time: '
print ' %25.24f'%((t1-t0),)
print ' %25.24f'%((t2-t1),)
print ' %25.24f'%((t3-t2),)
print ' %25.24f'%((t4-t3),)
print ' %25.24f'%((t5-t4),)
print ' %25.24f (average)'%( ((-t0+t5)/5.),)
intCounter=1000000
fltTotTimeOfTakingTime = 0.0
fltMaxTimeOfTakingTime = 0.0
fltMinTimeOfTakingTime = 1.0
while(intCounter > 0):
t1 = clock()
t2 = clock()
timeDiff = t2-t1
if(timeDiff < fltMinTimeOfTakingTime): fltMinTimeOfTakingTime = timeDiff
if(timeDiff > fltMaxTimeOfTakingTime): fltMaxTimeOfTakingTime = timeDiff
fltTotTimeOfTakingTime+=timeDiff
intCounter-=1
#:while
fltAvgTimeOfTakingTime = fltTotTimeOfTakingTime / 1000000.0
print 'statistics of 1.000.000 times of taking time in a while loop:'
print ' %25.24f (min)'%(fltMinTimeOfTakingTime,)
print ' %25.24f (max)'%(fltMaxTimeOfTakingTime,)
print ' %25.24f (avg)'%(fltAvgTimeOfTakingTime,)

Ray Schumacher said:
I have a need for a time.clock() with >0.000016 second (16us) accuracy.
The sleep() (on Python 2.3, Win32, at least) has a .001s limit.

Are they lower/better on other's platforms?

Test code, 2.4GHz P4
Python 2.3.3 (#51, Dec 18 2003, 20:22:39) [MSC v.1200 32 bit (Intel)] on win32

import time
t0 = time.clock()
t1 = time.clock()
t2 = time.clock()
t3 = time.clock()
t4 = time.clock()
t5 = time.clock()
print (-t0+t5)/5.
print t1-t0
print t2-t1
print t3-t2
print t4-t3
print t5-t4
ave 0.000342754564927
0.000321028401686
0.00030348379596
0.000297101358228
0.000295895991258

I had also considered forking a thread that would spin a loop checking
time.clock() and firing the TTL pulse after the appropriate interval, but
the real, ultimate resolution of time.clock() appears to be ~.00035s. If I
increase process priority to real-time, it is ~.00028s
 
T

Tim Roberts

Ray Schumacher said:
I have a need for a time.clock() with >0.000016 second (16us) accuracy.
The sleep() (on Python 2.3, Win32, at least) has a .001s limit.

Are they lower/better on other's platforms?

You need to be careful about describing what you're seeing here. It is not
that time.clock() is inaccurate. The problem is that the "time.clock()"
statement takes several hundred microseconds to execute.
I had also considered forking a thread that would spin a loop checking
time.clock() and firing the TTL pulse after the appropriate interval,
but the real, ultimate resolution of time.clock() appears to be
~.00035s. If I increase process priority to real-time, it is ~.00028s
The alternative appears to be more C code...

Are you seriously considering writing a real-time application in Python on
Windows? The ONLY way to get small-integer microsecond responses in
Windows is to write a kernel driver, and even then there are no guarantees.
Windows is NOT a real-time system. If you have an environment where an
unexpected delay of a millisecond or more is going to cause damage, then
you need to redesign your application.
 
B

Bengt Richter

You need to be careful about describing what you're seeing here. It is not
that time.clock() is inaccurate. The problem is that the "time.clock()"
statement takes several hundred microseconds to execute.
What system are you on?
This is 300 mhz P2 and py2.4b1 gcc/mingw generated:
7.5428559824786134e-006

Even with the attribute lookup overhead, it's not several hundred microseconds
as a *minimum*. But on e.g. win32 you can get preempted for a number of milliseconds.
E.g., turn that to a max instead of a min:

I see a couple 20-30 ms ones ;-/
0.0070844179680875641
Are you seriously considering writing a real-time application in Python on
Windows? The ONLY way to get small-integer microsecond responses in
Windows is to write a kernel driver, and even then there are no guarantees.
Windows is NOT a real-time system. If you have an environment where an
unexpected delay of a millisecond or more is going to cause damage, then
you need to redesign your application.
For sure. The big requirements picture is missing (not uncommon ;-)

Regards,
Bengt Richter
 
S

Stephen Kellett

that time.clock() is inaccurate. The problem is that the "time.clock()"
The statement is incorrect. clock() itself isn't slow, but it is
accessing a resource, the accuracy of which is no better than 1ms.

There are various timers available, documented and undocumented, all of
which end up at 1ms or 1.1ms, give or take. For anything shorter you
need QueryPerformanceCounter() (but that *is* a slow call), or use the
RDTSC instruction which is fast but gives a count of instruction cycles
executed and is thus not totally accurate (multiple execution pipelines,
plus multithreading considerations).

You have to choose the system that works best for you. In many cases
RDTSC works OK.

Stephen
 
B

Bengt Richter

The statement is incorrect. clock() itself isn't slow, but it is
accessing a resource, the accuracy of which is no better than 1ms.
I believe that is quite wrong as a general statement. It may be right
on some benighted system, but not on win32/NT python 2.4.
How do you account for getting deltas of 6-7 microseconds
in abs(clock()-clock()) ? If the "resource" only had ~1ms granularity,
the minimum would be zero, as it is if you call time.time() in a tight loop,
since it doesn't tick over often enough. time.clock does tick over fast
enough that you can't snag the same reading on two successive clock() calls
on a 300mhz P2.
There are various timers available, documented and undocumented, all of
which end up at 1ms or 1.1ms, give or take. For anything shorter you
need QueryPerformanceCounter() (but that *is* a slow call), or use the
Have you timed it, to make that claim? What do you mean by "slow"?
RDTSC instruction which is fast but gives a count of instruction cycles
executed and is thus not totally accurate (multiple execution pipelines,
plus multithreading considerations).
Accurate for what. A single clock AFAIK drives RDTSC
You have to choose the system that works best for you. In many cases
RDTSC works OK.
I wrote an extension to access it directly, and was agble to get down
to 23 cycles IIRC for a C call pair like above on a 300 mhz P2. 23/300 us I guess,
less than 100 ns between the clock reads of two calls.

The main problem with a CPU clock based reading is that it's very stable unless
there's variable clock rate due to power management.

Why am I doing this? ;-)

Regards,
Bengt Richter
 
S

Stephen Kellett

Bengt Richter said:
I believe that is quite wrong as a general statement.

Actually my initial statement should have been written
"accessing a resource, the accuracy of which is no better than 10ms.". I
was thinking of the 1ms multimedia timer but wrote about clock()
instead.

10ms, coincidentally is the approx minimum scheduling granularity for
threads unless you are in a multimedia thread (or real time thread - not
sure about real time threads in NT).
If the "resource" only had ~1ms granularity,
the minimum would be zero, as it is if you call time.time() in a tight loop,

Correct. Write your app in C and call clock(). Thats what you get. You
can call clock 20000 times and still get a delta of zero. The next delta
(on my system) is 10ms at about 22000 calls.

Whoops here we go, same typo - should have been 10ms or 11ms. There is a
1ms timer in the multimedia timing group.
Have you timed it, to make that claim?
Yes.

What do you mean by "slow"?

Slower than any other Win32, CRT or Undocumented NT function you can use
to get timer information. Yes, I have timed them all, a few years ago.

QueryPerformanceCounter is 47 times slower to call than clock() on my
1Ghz Athlon.

QueryPerformanceCounter may have finer granularity, but called in a
tight loop it'll crush your program.
Accurate for what.

See below - you haven't taken things into account, despite my comment in
brackets above which gives a big hint.
A single clock AFAIK drives RDTSC
Correct.

The main problem with a CPU clock based reading is that it's very stable unless
there's variable clock rate due to power management.

Try running multiple apps at the same time you are doing your
measurement, each of which has a variable loading. Each of these apps is
contributing to the count returned by RDTSC. That is what I was
referring to.

Stephen
 
B

Bengt Richter

Actually my initial statement should have been written
"accessing a resource, the accuracy of which is no better than 10ms.". I
was thinking of the 1ms multimedia timer but wrote about clock()
instead.

10ms, coincidentally is the approx minimum scheduling granularity for
threads unless you are in a multimedia thread (or real time thread - not
sure about real time threads in NT).


Correct. Write your app in C and call clock(). Thats what you get. You
can call clock 20000 times and still get a delta of zero. The next delta
(on my system) is 10ms at about 22000 calls.


Whoops here we go, same typo - should have been 10ms or 11ms. There is a
1ms timer in the multimedia timing group.


Slower than any other Win32, CRT or Undocumented NT function you can use
to get timer information. Yes, I have timed them all, a few years ago.

QueryPerformanceCounter is 47 times slower to call than clock() on my
1Ghz Athlon.
That really makes me wonder. Perhaps the Athlon handles RDTSC by way of
an illegal instruction trap and faking the pentium instruction? That might
explain the terrible timing. Try it on a pentium that supports RDTSC.
The clock() usually gets high resolution bits from a low order 16 bits
of the timer chip that drives the old 55ms clock that came from IBM
using cheap TV crystal based oscillators instead of defining an
OS-implementer-friendly time base, I think. The frequency was nominally
1193182 hz I believe. Obviously the OS didn't get interrupted that often,
but if you divide by 2**16, you get the traditional OS tick of ~55ms:
0.054925401154224583

So that's a clue. By grabbing tick count and bits read from the fast-counting
harware clock register, you can compute time fairly accurately for the moment
you are sampling that register. IIRC, you couldn't get the whole 16 bits because
it was a toggling trigger or some such.
QueryPerformanceCounter may have finer granularity, but called in a
tight loop it'll crush your program.
Maybe on your Athlon, but my experience is different ;-)
See below - you haven't taken things into account, despite my comment in
brackets above which gives a big hint.
I've absorbed a lot of hints since around '59 when I began to work with
computers and timing issues ;-)
Try running multiple apps at the same time you are doing your
measurement, each of which has a variable loading. Each of these apps is
contributing to the count returned by RDTSC. That is what I was
referring to.

Ok, but that's another issue, which I also attempted to draw attention to ;-)

Quoting myself:
"""
Even with the attribute lookup overhead, it's not several hundred microseconds
as a *minimum*. But on e.g. win32 you can get preempted for a number of milliseconds.
E.g., turn that to a max instead of a min:

I see a couple 20-30 ms ones ;-/

0.0070844179680875641
"""

Depending on what your timing requirements are, you may be able to run
a zillion trials and throw out the bad data (or plot it and figure out
some interesting things about timing behavior of your system due to various
effects). E.g., timeit presumably tries to get a minimum and eliminate as
many glitches as possible.

But as mentioned, the big picture of requirements was not clear. Certainly
you can't expect to control ignition of a racing engine reliably with
an ordinary windows based program ;-)

Regards,
Bengt Richter
 
S

Stephen Kellett

Bengt Richter said:
That really makes me wonder. Perhaps the Athlon handles RDTSC by way of
an illegal instruction trap and faking the pentium instruction?

No. Athlon implements it correctly- if it didn't you'd end up in the
debugger with an illegal instruction trap - you don't. Also my stats
below show that Athlon's RDTSC is faster than Pentium's which I doubt
you'd get if you were faking things. Taking it further - the test to see
if a processor supports RDTSC is to wrap it in an exception handler and
execute the instruction - if it doesn't you end up in the __except part
of the SEH handler.

QueryPerformanceCounter and RDTSC are not the same thing.
QueryPerformanceCounter talks to hardware to get its results. I imagine
that differences in performance for QueryPerformanceCounter are down to
how the HAL talks to the hardware and can't be blamed on the processor
or manufacturer.

clock() gets its info from the OS at (I imagine) the same granularity as
the NT scheduler. Some systems schedule at 10ms/11ms others at about 6ms
or 7ms. I think this is to do with single/dual processors - unsure as I
don't have a dual processor box. If you call GetThreadTimes() you will
find values returned that match the approx clock() values - which is why
I think they are related.

I've just run some tests using the same old program. QPC is
QueryPerformanceCounter. QPF is QueryPerformanceFrequency. I've included
the QPC/QPF column to show the timings in seconds.

1Ghz Athlon, Windows XP, SP2 1,000,000 iterations
QPC QPC/QPF (seconds)
QueryPerformanceCounter 7156984 5.998233
GetThreadTimes 503277 0.421794
RDTSC 103430 0.086684
clock() 148909 0.124800

QPC QPC/QPF (seconds)
850Mhz Pentium III, W2K. 1,000,000 iterations
QueryPerformanceCounter 5652161 1.579017
GetThreadTimes 3608976 1.008222
RDTSC 842950 0.235491
clock() 699840 0.195511

The results surprise me - Pentium III clock() takes less time to execute
than Pentium III RDTSC!

It surprises me that the 850Mhz Pentium III QPC is faster than the 1Ghz
Athlon QPC, but whichever way you slice it, QPC is massively slower than
RDTSC or clock(). Also surprising is the W2K GetThreadTimes is so slow
compared to the Athlon GetThreadTimes().
of the timer chip that drives the old 55ms clock that came from IBM
using cheap TV crystal based oscillators instead of defining an
OS-implementer-friendly time base, I think. The frequency was nominally
1193182 hz I believe. Obviously the OS didn't get interrupted that often,
but if you divide by 2**16, you get the traditional OS tick of ~55ms:

I though that was the Windows 9x way of doing things. You get the 49 day
wrap around with this one I think.
you can't expect to control ignition of a racing engine reliably with
an ordinary windows based program ;-)

....and Schumacher is in the lead, oh look! The Ferrari has blue
screened. The new regulations to reduce speeds in F1 are working, that
has really slowed him down...

Stephen
 
P

Peter Hansen

Stephen said:
The statement is incorrect. clock() itself isn't slow, but it is
accessing a resource, the accuracy of which is no better than 1ms.

There are various timers available, documented and undocumented, all of
which end up at 1ms or 1.1ms, give or take. For anything shorter you
need QueryPerformanceCounter() (but that *is* a slow call), or use the
RDTSC instruction which is fast but gives a count of instruction cycles
executed and is thus not totally accurate (multiple execution pipelines,
plus multithreading considerations).

(I've read the five or so following messages you and Bengt
have posted, but not in detail so I'm not sure where you're
going with all this, but... )

According to the docs for time.clock(), "On Windows, this function returns
wall-clock seconds elapsed since the first call to this function, as a floating
point number, based on the Win32 function QueryPerformanceCounter(). The
resolution is typically better than one microsecond."

Depending on whether you really meant "accuracy" above, and
on other things, this is either irrelevant, or contradicts
your first statement...

-Peter
 
S

Stephen Kellett

Peter Hansen said:
(I've read the five or so following messages you and Bengt
have posted, but not in detail so I'm not sure where you're
going with all this, but... )

We've gone off at a tangent about Windows timing etc. Pretty much over
now.
According to the docs for time.clock(), "On Windows, this function returns
wall-clock seconds elapsed since the first call to this function, as a floating
point number, based on the Win32 function QueryPerformanceCounter(). The
resolution is typically better than one microsecond."

Depending on whether you really meant "accuracy" above, and
on other things, this is either irrelevant, or contradicts
your first statement...

No contradiction. Python (from what you write above) implements
time.clock() differently from the CRT clock() (which I had assumed
Python would call for simplicity). Hence the differing results.

Stephen
 

Members online

No members online now.

Forum statistics

Threads
473,968
Messages
2,570,150
Members
46,697
Latest member
AugustNabo

Latest Threads

Top