M
Maxim Khitrov
Hello all,
I'm currently writing a Python <-> MATLAB interface with ctypes and
array.array class, using which I'll need to push large amounts of data
to MATLAB. Everything is working well, but there was one strange
performance-related issue that I ran into and wanted to ask about.
Here's some example code to illustrate my point (this is developed on
Windows, hence the use of clock):
---
from array import array
from time import clock
input = array('B', range(256) * 10000)
# Case 1
start = clock()
data1 = array('B', input)
print format(clock() - start, '.10f')
# Case 2
start = clock()
data2 = array('B')
data2[:] = input
print format(clock() - start, '.10f')
# Case 3
start = clock()
data3 = array('B')
data3.extend(input)
print format(clock() - start, '.10f')
print input == data1 == data2 == data3
---
The output from this on my machine is as follows:
0.7080547730
0.0029827034
0.0028685943
True
That seems very wrong. In the end, all arrays have the same data, but
by specifying it in the constructor the creation process takes over
350x longer than the other two methods. Is this a bug, or is there
some valid reason for it?
In the latter case, it would be a good idea to mention this in the
documentation, since that can be a significant performance improvement
in some applications. Currently the documentation states "Otherwise,
the iterable initializer is passed to the extend() method," which
doesn't seem to be the case, based on the third example.
- Max
I'm currently writing a Python <-> MATLAB interface with ctypes and
array.array class, using which I'll need to push large amounts of data
to MATLAB. Everything is working well, but there was one strange
performance-related issue that I ran into and wanted to ask about.
Here's some example code to illustrate my point (this is developed on
Windows, hence the use of clock):
---
from array import array
from time import clock
input = array('B', range(256) * 10000)
# Case 1
start = clock()
data1 = array('B', input)
print format(clock() - start, '.10f')
# Case 2
start = clock()
data2 = array('B')
data2[:] = input
print format(clock() - start, '.10f')
# Case 3
start = clock()
data3 = array('B')
data3.extend(input)
print format(clock() - start, '.10f')
print input == data1 == data2 == data3
---
The output from this on my machine is as follows:
0.7080547730
0.0029827034
0.0028685943
True
That seems very wrong. In the end, all arrays have the same data, but
by specifying it in the constructor the creation process takes over
350x longer than the other two methods. Is this a bug, or is there
some valid reason for it?
In the latter case, it would be a good idea to mention this in the
documentation, since that can be a significant performance improvement
in some applications. Currently the documentation states "Otherwise,
the iterable initializer is passed to the extend() method," which
doesn't seem to be the case, based on the third example.
- Max