I would be very suprised if you achieve faster results threading this
problem. There's been much discussed on benefits or lack thereof to
threading in Python (or in general).
Threading is best used in situations where you are doing different
kinds of tasks. For example if you want to do your matrix
multiplication WHILE you were doing other things on your computer
where matrix multiplication was a background process chugging away
when you are not taxing the computer doing other stuff.
Threading adds efficiency when you would have lots of "blocking"
operations like reading in lots of big files from a comparable slow
hard drive (compared to how fast a CPU processes data) or waiting on
netword data (super slow compared to CPU processing).
When you are doing mass numeric processing, you want to minimize the
jumping from one function to another which uses overhead, recursion
adds a small amount of uneccessary overhead, you want to minimize the
need for the cpu to switch between threads or processes.
If you still feel the need to use threads for some reason, for numeric
processing I'd recommend using a "lighter" thread object, like a tiny
thread or green thread or a threadlet or whatever they are calling
them now.
Another thing to note is it seems you might be expecting threads to
run on different CPU cores expecting improvment. Be careful with this
assumption. This is not always true. It is up to the CPU and OS to
determine how threads are handled and perhaps the GIL to some extent.
Beaware that Python has a GIL (some distributions). Google it if you
don't know of it.
To encourage better use of multi-core cpus you might consider the
multiprocessing library included in Python 2.7 (I think) and above.
I'm assuming that speed is an issue because you where timing your
code. If you are doing actual serious number crunching there's lots of
advice on this. The python Numpy package as well as Stackless Python
(for microthreads or whatever thier called) comes to mind.
Another thought. Ask yourself if you need a large in-memory or live
set of processed numbers, in your case a fully and processed
multiplied matrix. Usually a large set of in-memory numbers is
something your going to use to simulate a model or to process and
crunch further.
Or is your actual usage going to be picking out a processed number
here or there from the matrix. If this is true look at iterators or
generators. Which would be a snapshot in time of your matrix
multiplication. I like to think of Python generators like integral
calculus (definition at:
http://en.wikipedia.org/wiki/Integral_calculus)
where the specific integral of a generator is often just 1.
I'm loving generators a lot. For example there are generator
accelorators which if you think it through means you can make
generator deccelorators, useful for doing interpolation between
elements of your matrix elements for example. I always forget if
generators are thread safe though.
Some indicators that generators could help: You're doing lots of for
loops with range().
Also it's been measured that list comprehensions are slightly faster
then while loops are a slightly faster then for loops. You can Google
to confirm, enter something like "python fast iteration".
Also if your numbers in your matix are actually not really numbers but
objects with numbers, __slots__ is used to for large sets of objects
(10s of millions at the very least) to minimize memory usage and
perhaps with speed, if used properly. Just mentioning. I'd stay away
from this though.
Some of my informatation may be inaccurate (and even completely wrong;
like I always get when a thread is best switched during a blocking
verse a non-blocking operation) but there are some things to consider.