J
Joachim Worringen
I need to process large lists (in my real application, this is to parse the
content of a file). I noticed that the performance to access the individual list
elements degrades over runtime.
This can be reproduced easily using this code:
import time
N=100000
p=10000
A=[]
for i in range(N):
A.append(str(i))
j = 0
t = time.clock()
for i in range(len(A)):
j += int(A)
if i % p == 0:
t = time.clock() - t
print t
(the string conversion only servers to increase the duration of each iteration;
you can observer the same effect with ints, too).
When running this, I get output like this:
0.0
0.37
0.03
0.4
0.06
0.43
0.09
0.46
0.13
0.49
I use Python 2.3.4 (#1, Sep 3 2004, 12:08:45)
[GCC 2.96 20000731 (Red Hat Linux 7.3 2.96-110)] on linux2
I wonder why
1. The execution times alternate between "fast" and "slow" (I observe the same
effect in my much more complex application)
2. The execution times increase steadily, from "0" to 0.13 for the "fast"
phases, and from 0.37 to 0.49 for the slow phases.
Within my application, the effect is more drastical as the following numbers
show (the processing time per line is roughly the same for each line!):
at line 10000 (32258 lines/s)
at line 20000 (478 lines/s)
at line 30000 (21276 lines/s)
at line 40000 (475 lines/s)
at line 50000 (15873 lines/s)
at line 60000 (471 lines/s)
at line 70000 (12658 lines/s)
at line 80000 (468 lines/s)
at line 90000 (10638 lines/s)
at line 100000 (464 lines/s)
at line 110000 (9090 lines/s)
at line 120000 (461 lines/s)
at line 130000 (7936 lines/s)
at line 140000 (457 lines/s)
at line 150000 (7042 lines/s)
at line 160000 (454 lines/s)
at line 170000 (6369 lines/s)
at line 180000 (451 lines/s)
at line 190000 (5780 lines/s)
at line 200000 (448 lines/s)
at line 210000 (4854 lines/s)
at line 220000 (444 lines/s)
at line 230000 (4504 lines/s)
at line 240000 (441 lines/s)
at line 250000 (4201 lines/s)
at line 260000 (438 lines/s)
at line 270000 (3952 lines/s)
at line 280000 (435 lines/s)
at line 290000 (3717 lines/s)
at line 300000 (432 lines/s)
at line 310000 (3508 lines/s)
at line 320000 (429 lines/s)
at line 330000 (3322 lines/s)
at line 340000 (426 lines/s)
at line 350000 (3154 lines/s)
at line 360000 (423 lines/s)
at line 370000 (3003 lines/s)
at line 380000 (421 lines/s)
at line 390000 (2873 lines/s)
Any ideas why this is like this, and what I could do about it? It really makes
may application non-scalable as the lines/s go down even further.
content of a file). I noticed that the performance to access the individual list
elements degrades over runtime.
This can be reproduced easily using this code:
import time
N=100000
p=10000
A=[]
for i in range(N):
A.append(str(i))
j = 0
t = time.clock()
for i in range(len(A)):
j += int(A)
if i % p == 0:
t = time.clock() - t
print t
(the string conversion only servers to increase the duration of each iteration;
you can observer the same effect with ints, too).
When running this, I get output like this:
0.0
0.37
0.03
0.4
0.06
0.43
0.09
0.46
0.13
0.49
I use Python 2.3.4 (#1, Sep 3 2004, 12:08:45)
[GCC 2.96 20000731 (Red Hat Linux 7.3 2.96-110)] on linux2
I wonder why
1. The execution times alternate between "fast" and "slow" (I observe the same
effect in my much more complex application)
2. The execution times increase steadily, from "0" to 0.13 for the "fast"
phases, and from 0.37 to 0.49 for the slow phases.
Within my application, the effect is more drastical as the following numbers
show (the processing time per line is roughly the same for each line!):
at line 10000 (32258 lines/s)
at line 20000 (478 lines/s)
at line 30000 (21276 lines/s)
at line 40000 (475 lines/s)
at line 50000 (15873 lines/s)
at line 60000 (471 lines/s)
at line 70000 (12658 lines/s)
at line 80000 (468 lines/s)
at line 90000 (10638 lines/s)
at line 100000 (464 lines/s)
at line 110000 (9090 lines/s)
at line 120000 (461 lines/s)
at line 130000 (7936 lines/s)
at line 140000 (457 lines/s)
at line 150000 (7042 lines/s)
at line 160000 (454 lines/s)
at line 170000 (6369 lines/s)
at line 180000 (451 lines/s)
at line 190000 (5780 lines/s)
at line 200000 (448 lines/s)
at line 210000 (4854 lines/s)
at line 220000 (444 lines/s)
at line 230000 (4504 lines/s)
at line 240000 (441 lines/s)
at line 250000 (4201 lines/s)
at line 260000 (438 lines/s)
at line 270000 (3952 lines/s)
at line 280000 (435 lines/s)
at line 290000 (3717 lines/s)
at line 300000 (432 lines/s)
at line 310000 (3508 lines/s)
at line 320000 (429 lines/s)
at line 330000 (3322 lines/s)
at line 340000 (426 lines/s)
at line 350000 (3154 lines/s)
at line 360000 (423 lines/s)
at line 370000 (3003 lines/s)
at line 380000 (421 lines/s)
at line 390000 (2873 lines/s)
Any ideas why this is like this, and what I could do about it? It really makes
may application non-scalable as the lines/s go down even further.