C
Christopher T King
In a quest to speed up numarray computations, I tried writing a 'threaded
array' class for use on SMP systems that would distribute its workload
across the processors. I hit a snag when I found out that since the Python
interpreter is not reentrant, this effectively disables parallel
processing in Python. I've come up with two solutions to this problem,
both involving numarray's C functions that perform the actual vector
operations:
1) Surround the C vector operations with Py_BEGIN_ALLOW_THREADS and
Py_END_ALLOW_THREADS, thus allowing the vector operations (which don't
access Python structures) to run in parallel with the interpreter.
Python glue code would take care of threading and locking.
2) Move the parallelization into the C vector functions themselves. This
would likely get poorer performance (a chain of vector operations
couldn't be combined into one threaded operation).
I'd much rather do #1, but will playing around with the interpreter state
like that cause any problems?
array' class for use on SMP systems that would distribute its workload
across the processors. I hit a snag when I found out that since the Python
interpreter is not reentrant, this effectively disables parallel
processing in Python. I've come up with two solutions to this problem,
both involving numarray's C functions that perform the actual vector
operations:
1) Surround the C vector operations with Py_BEGIN_ALLOW_THREADS and
Py_END_ALLOW_THREADS, thus allowing the vector operations (which don't
access Python structures) to run in parallel with the interpreter.
Python glue code would take care of threading and locking.
2) Move the parallelization into the C vector functions themselves. This
would likely get poorer performance (a chain of vector operations
couldn't be combined into one threaded operation).
I'd much rather do #1, but will playing around with the interpreter state
like that cause any problems?