C
Chris Angelico
The standard library function random.randint() seems to be quite slow
compared to random.random(), and worse in Python 3 than Python 2
(specifically that's 3.3a0 latest from Mercurial, and 2.6.6 that came
default on my Ubuntu install).
My test involves building a list of one million random integers
between 0 and ten million (for tinkering with sorting algorithms),
using a list comprehension:
import random
import time
sz=1000000
start=time.time()
a=[random.randint(0,sz*10-1) for i in range(sz)]
print("Time taken: ",time.time()-start)
The code works fine in either version of Python (although the display
looks a bit odd in Py2). But on my test system, it takes about 5
seconds to run in Py2, and about 10 seconds for Py3. (The obvious
optimization of breaking sz*10-1 out and storing it in a variable
improves both times, but leaves the dramatic difference.)
Replacing randint with random():
a=[int(random.random()*top) for i in range(sz)]
cuts the times down to about 1.5 secs for Py2, and 1.8 secs for Py3.
I suspect that the version difference is (at least in part) due to the
merging of the 'int' and 'long' types in Py3. This is supported
experimentally by rerunning the second list comp but using int() in
place of long() - the time increases to about 1.7-1.8 secs, matching
Py3.
But this still doesn't explain why randint() is so much slower. In
theory, randint() should be doing basically the same thing that I've
done here (multiply by the top value, truncate the decimal), only it's
in C instead of Python - if anything, it should be faster than doing
it manually, not slower.
A minor point of curiosity, nothing more... but, I think, a fascinating one.
ChrisA
compared to random.random(), and worse in Python 3 than Python 2
(specifically that's 3.3a0 latest from Mercurial, and 2.6.6 that came
default on my Ubuntu install).
My test involves building a list of one million random integers
between 0 and ten million (for tinkering with sorting algorithms),
using a list comprehension:
import random
import time
sz=1000000
start=time.time()
a=[random.randint(0,sz*10-1) for i in range(sz)]
print("Time taken: ",time.time()-start)
The code works fine in either version of Python (although the display
looks a bit odd in Py2). But on my test system, it takes about 5
seconds to run in Py2, and about 10 seconds for Py3. (The obvious
optimization of breaking sz*10-1 out and storing it in a variable
improves both times, but leaves the dramatic difference.)
Replacing randint with random():
a=[int(random.random()*top) for i in range(sz)]
cuts the times down to about 1.5 secs for Py2, and 1.8 secs for Py3.
I suspect that the version difference is (at least in part) due to the
merging of the 'int' and 'long' types in Py3. This is supported
experimentally by rerunning the second list comp but using int() in
place of long() - the time increases to about 1.7-1.8 secs, matching
Py3.
But this still doesn't explain why randint() is so much slower. In
theory, randint() should be doing basically the same thing that I've
done here (multiply by the top value, truncate the decimal), only it's
in C instead of Python - if anything, it should be faster than doing
it manually, not slower.
A minor point of curiosity, nothing more... but, I think, a fascinating one.
ChrisA