Strange Execution Times

curi42 · May 26, 2005

I am running two functions in a row that do the same thing. One runs
in .14 seconds, the other 56. I'm confused. I wrote another version
of the program and couldn't get the slow behavior again, only the fast.
I'm not sure what is causing it. Can anyone figure it out?

Here is my code (sorry it's a bit of a mess, but my cleaned up version
isn't slow!). Just skim to the bottom where the timing is. The first
time printed out is .14, the seond is 56.56.

f = open("/Users/curi/data.xml")

o = open("/Users/curi/out2.xml", "w")

import md5
import array

p1 = "<Password>"
p2 = "</Password>"

cnt = 0

m = md5.new
jo = "".join

adjust = len(p1) - 1

i = 1
s = f.read()
a = array.array('c', s).tolist()
spot = 0
k = 0
find = s.find

starts = []
ends = []

while k != -1:

#print len(s)
k = find(p2, spot)
if k != -1:
starts.append(find(p1, spot) + adjust)
ends.append(k)
spot = k + 1

#s = "".join([s[:j+1], md5.new(s[j+1:k-1]).hexdigest(), s[k:]])

#if k != -1: a[j+1:k-1] = m(jo(a[j+1:k-1])).hexdigest()

r = range(len(starts))
#r = range(20)
r.reverse()
import time

data = a[:]

md5 = m
join = jo

t1 = time.clock()
for j in r:
#print jo(s[starts[j]+1:ends[j]])
digest = m(jo(s[starts[j]+1:ends[j]])).hexdigest()

a[starts[j]+1:ends[j]] = digest
#cnt += 1
#if cnt % 100 == 0: print cnt

t2 = time.clock()
print "time is", round(t2-t1, 5)

t1 = time.clock()
for i in r:
data[starts:ends] =
md5(join(s[starts:ends])).hexdigest()
t2 = time.clock()
print "second time is", round(t2-t1, 5)

o.write(jo(a))

Fredrik Lundh · May 26, 2005

I am running two functions in a row that do the same thing. One runs
in .14 seconds, the other 56. I'm confused. I wrote another version
of the program and couldn't get the slow behavior again, only the fast.
I'm not sure what is causing it. Can anyone figure it out?

it would be a lot easier to help if you posted a self-contained example.

</F>

John Machin · May 26, 2005

I am running two functions in a row that do the same thing.

1. I see no functions here.

You should set out a script like this:

def main():
your_code_goes_here()

if __name__ == '__main__':
main()

for two reasons (a) your code will be referring to locals instead of
globals; this is faster, which might appeal to you (b) if somebody
accidentally imports the script, nothing happens.

2. The two loops to which you refer do *not* do the same thing; see later.

One runs
in .14 seconds, the other 56. I'm confused. I wrote another version
of the program and couldn't get the slow behavior again, only the fast.
I'm not sure what is causing it. Can anyone figure it out?

Here is my code (sorry it's a bit of a mess, but my cleaned up version
isn't slow!). Just skim to the bottom where the timing is. The first
time printed out is .14, the seond is 56.56.

[snip]

[following has extraneous blank lines and comments removed]

t1 = time.clock()
for j in r:
digest = m(jo(s[starts[j]+1:ends[j]])).hexdigest()
a[starts[j]+1:ends[j]] = digest
t2 = time.clock()
print "time is", round(t2-t1, 5)

t1 = time.clock()
for i in r:
data[starts:ends] = \
md5(join(s[starts:ends])).hexdigest()
t2 = time.clock()
print "second time is", round(t2-t1, 5)

General questions: what platform? what version of Python? how large is
the file? how much free memory do you have? how many passwords are
there? what is the average length of a password?

Ignoring the superficial-but-meaningless differences (i vs j, md5
[aarrgghh!!] vs m), jo vs join), these two loops differ in the following
respects:

(1) 'data' is a copy of 'a'
(2) the first loop's body is effectively: digest = RHS; LHS = digest
whereas the 2nd loop's body is: LHS = RHS
(3) the first loop uses starts[j]+1 whereas the second loop uses starts[j]

Item (1) may affect the timing if file is large compared with available
memory -- could be 'a' has to be swapped out, and 'data' swapped in.

Item (2) should make the 2nd loop very slightly faster, so we'll ignore
that

Item (3) means you are not comparing like with like. It means that the
1st loop has less work to do. So this could make an observable
difference for very short passwords -- but still nothing like 0.14
compared with 56.

So, some more questions:

The 56.56 is suspiciously precise -- you ran it a few times and it
printed exactly 56.56 each time?

Did you try putting the 2nd loop first [refer to Item (1) above]?
Did you try putting in a switch so that your script runs either 1st loop
or 2nd loop but not both? Note that each loop is making its target list
expand in situ; this may after a while (like inside loop 2) cause the
memory arena to become so fragmented that swapping will occur. This of
course can vary wildly depending on the platform; Win95 used to be the
most usual suspect but you're obviously not running on that.

Some observations:

(1) 's' is already a string, so ''.join(s[x:y]) is a slow way of doing
s[x:y]

(2) 'a' ends up as a list of one-byte strings, via a very circuitous
process: a = array.array('c', s).tolist()

A shorter route would be: a = list(s)

However what's wrong with what you presumably tried out first i.e. a =
array.array('c', s) ?? It doesn't need the final ''.join() before
writing to disk, and it takes up less memory. NOTE: the array variety
takes up 1 byte per character. The list variety takes up at least 4
bytes per character (on a machine where sizeof(PyObject *) == 4); to the
extent that the file contains characters that are not interned (i.e. not
[A-Za-z_] AFAIK), much more memory is required as a separate object
will be created for each such character. Was it consistently slower?

(3) If memory is your problem, you could rewrite the whole thing to
simply do one write per password; that way you only need 1.x copy of the
file contents in memory, not 2.x.

Hoping some of this helps,
John

Elliot Temple · May 26, 2005

hey FYI i found the problem: i accidentally copied an output file for
my test data. so all the passwords were exactly 32 chars long. so
when replacing them with new 32 char passwords, it went much much
faster, I guess because the list kept the same number of chars in it
and didn't have to copy lots of data around.

Python code problem	2	Apr 23, 2023
Python point location of intersect between two lines	0	Feb 28, 2018
TeX $\times$ symbol not working in matplotlib?	6	Apr 18, 2014
Drawing missing in bitmap in a pure C win32 program	4	Jun 3, 2023
I made a blockchain and want to make a cryptocurrency, but my code doesn't verify hash of each block	2	Jun 2, 2024
Range / empty list issues??	1	Dec 11, 2023
Web Page Parsing/Downloading	1	Nov 22, 2013
How to play corresponding sound?	2	Jun 10, 2023

Strange Execution Times

curi42

Fredrik Lundh

John Machin

Elliot Temple

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads