B
Brian Blais
Hello,
I am trying to translate some Matlab/mex code to Python, for doing neural
simulations. This application is definitely computing-time limited, and I need to
optimize at least one inner loop of the code, or perhaps even rethink the algorithm.
The procedure is very simple, after initializing any variables:
1) select a random input vector, which I will call "x". right now I have it as an
array, and I choose columns from that array randomly. in other cases, I may need to
take an image, select a patch, and then make that a column vector.
2) calculate an output value, which is the dot product of the "x" and a weight
vector, "w", so
y=dot(x,w)
3) modify the weight vector based on a matrix equation, like:
w=w+ eta * (y*x - y**2*w)
^
|
+---- learning rate constant
4) repeat steps 1-3 many times
I've organized it like:
for e in 100: # outer loop
for i in 1000: # inner loop
(steps 1-3)
display things.
so that the bulk of the computation is in the inner loop, and is amenable to
converting to a faster language. This is my issue:
straight python, in the example posted below for 250000 inner-loop steps, takes 20
seconds for each outer-loop step. I tried Pyrex, which should work very fast on such
a problem, takes about 8.5 seconds per outer-loop step. The same code as a C-mex
file in matlab takes 1.5 seconds per outer-loop step.
Given the huge difference between the Pyrex and the Mex, I feel that there is
something I am doing wrong, because the C-code for both should run comparably.
Perhaps the approach is wrong? I'm willing to take any suggestions! I don't mind
coding some in C, but the Python API seemed a bit challenging to me.
One note: I am using the Numeric package, not numpy, only because I want to be able
to use the Enthought version for Windows. I develop on Linux, and haven't had a
chance to see if I can compile numpy using the Enthought Python for Windows.
If there is anything else anyone needs to know, I'll post it. I put the main script,
and a dohebb.pyx code below.
thanks!
Brian Blais
--
-----------------
(e-mail address removed)
http://web.bryant.edu/~bblais
# Main script:
from dohebb import *
import pylab as p
from Numeric import *
from RandomArray import *
import time
x=random((100,1000)) # 1000 input vectors
numpats=x.shape[0]
w=random((numpats,1));
th=random((1,1))
params={}
params['eta']=0.001;
params['tau']=100.0;
old_mx=0;
for e in range(100):
rnd=randint(0,numpats,250000)
t1=time.time()
if 0: # straight python
for i in range(len(rnd)):
pat=rnd
xx=reshape(x[:,pat],(1,-1))
y=matrixmultiply(xx,w)
w=w+params['eta']*(y*transpose(xx)-y**2*w);
th=th+(1.0/params['tau'])*(y**2-th);
else: # pyrex
dohebb(params,w,th,x,rnd)
print time.time()-t1
p.plot(w,'o-')
p.xlabel('weights')
p.show()
#=============================================
# dohebb.pyx
cdef extern from "Numeric/arrayobject.h":
struct PyArray_Descr:
int type_num, elsize
char type
ctypedef class Numeric.ArrayType [object PyArrayObject]:
cdef char *data
cdef int nd
cdef int *dimensions, *strides
cdef object base
cdef PyArray_Descr *descr
cdef int flags
def dohebb(params,ArrayType w,ArrayType th,ArrayType X,ArrayType rnd):
cdef int num_iterations
cdef int num_inputs
cdef int offset
cdef double *wp,*xp,*thp
cdef int *rndp
cdef double eta,tau
eta=params['eta'] # learning rate
tau=params['tau'] # used for variance estimate
cdef double y
num_iterations=rnd.dimensions[0]
num_inputs=w.dimensions[0]
# get the pointers
wp=<double *>w.data
xp=<double *>X.data
rndp=<int *>rnd.data
thp=<double *>th.data
for it from 0 <= it < num_iterations:
offset=rndp[it]*num_inputs
# calculate the output
y=0.0
for i from 0 <= i < num_inputs:
y=y+wp*xp[i+offset]
# change in the weights
for i from 0 <= i < num_inputs:
wp=wp+eta*(y*xp[i+offset] - y*y*wp)
# estimate the variance
thp[0]=thp[0]+(1.0/tau)*(y**2-thp[0])
I am trying to translate some Matlab/mex code to Python, for doing neural
simulations. This application is definitely computing-time limited, and I need to
optimize at least one inner loop of the code, or perhaps even rethink the algorithm.
The procedure is very simple, after initializing any variables:
1) select a random input vector, which I will call "x". right now I have it as an
array, and I choose columns from that array randomly. in other cases, I may need to
take an image, select a patch, and then make that a column vector.
2) calculate an output value, which is the dot product of the "x" and a weight
vector, "w", so
y=dot(x,w)
3) modify the weight vector based on a matrix equation, like:
w=w+ eta * (y*x - y**2*w)
^
|
+---- learning rate constant
4) repeat steps 1-3 many times
I've organized it like:
for e in 100: # outer loop
for i in 1000: # inner loop
(steps 1-3)
display things.
so that the bulk of the computation is in the inner loop, and is amenable to
converting to a faster language. This is my issue:
straight python, in the example posted below for 250000 inner-loop steps, takes 20
seconds for each outer-loop step. I tried Pyrex, which should work very fast on such
a problem, takes about 8.5 seconds per outer-loop step. The same code as a C-mex
file in matlab takes 1.5 seconds per outer-loop step.
Given the huge difference between the Pyrex and the Mex, I feel that there is
something I am doing wrong, because the C-code for both should run comparably.
Perhaps the approach is wrong? I'm willing to take any suggestions! I don't mind
coding some in C, but the Python API seemed a bit challenging to me.
One note: I am using the Numeric package, not numpy, only because I want to be able
to use the Enthought version for Windows. I develop on Linux, and haven't had a
chance to see if I can compile numpy using the Enthought Python for Windows.
If there is anything else anyone needs to know, I'll post it. I put the main script,
and a dohebb.pyx code below.
thanks!
Brian Blais
--
-----------------
(e-mail address removed)
http://web.bryant.edu/~bblais
# Main script:
from dohebb import *
import pylab as p
from Numeric import *
from RandomArray import *
import time
x=random((100,1000)) # 1000 input vectors
numpats=x.shape[0]
w=random((numpats,1));
th=random((1,1))
params={}
params['eta']=0.001;
params['tau']=100.0;
old_mx=0;
for e in range(100):
rnd=randint(0,numpats,250000)
t1=time.time()
if 0: # straight python
for i in range(len(rnd)):
pat=rnd
xx=reshape(x[:,pat],(1,-1))
y=matrixmultiply(xx,w)
w=w+params['eta']*(y*transpose(xx)-y**2*w);
th=th+(1.0/params['tau'])*(y**2-th);
else: # pyrex
dohebb(params,w,th,x,rnd)
print time.time()-t1
p.plot(w,'o-')
p.xlabel('weights')
p.show()
#=============================================
# dohebb.pyx
cdef extern from "Numeric/arrayobject.h":
struct PyArray_Descr:
int type_num, elsize
char type
ctypedef class Numeric.ArrayType [object PyArrayObject]:
cdef char *data
cdef int nd
cdef int *dimensions, *strides
cdef object base
cdef PyArray_Descr *descr
cdef int flags
def dohebb(params,ArrayType w,ArrayType th,ArrayType X,ArrayType rnd):
cdef int num_iterations
cdef int num_inputs
cdef int offset
cdef double *wp,*xp,*thp
cdef int *rndp
cdef double eta,tau
eta=params['eta'] # learning rate
tau=params['tau'] # used for variance estimate
cdef double y
num_iterations=rnd.dimensions[0]
num_inputs=w.dimensions[0]
# get the pointers
wp=<double *>w.data
xp=<double *>X.data
rndp=<int *>rnd.data
thp=<double *>th.data
for it from 0 <= it < num_iterations:
offset=rndp[it]*num_inputs
# calculate the output
y=0.0
for i from 0 <= i < num_inputs:
y=y+wp*xp[i+offset]
# change in the weights
for i from 0 <= i < num_inputs:
wp=wp+eta*(y*xp[i+offset] - y*y*wp)
# estimate the variance
thp[0]=thp[0]+(1.0/tau)*(y**2-thp[0])