W
Wenning Qiu
I am researching issues related to emdedding Python in C++ for a
project.
My project will be running on an SMP box and requires scalability.
However, my test shows that Python threading has very poor performance
in terms of scaling. In fact it doesn't scale at all.
I wrote a simple test program to complete given number of iterations
of a simple loop. The total number of iterations can be divided evenly
among a number of threads. My test shows that as the number of threads
grows, the CPU usage grows and the response time gets longer. For
example, to complete the same amount of work, one thread takes 10
seconds, 2 threads take 20 seconds and 3 threads take 30 seconds.
The fundamental reason for lacking scalability is that Python uses a
global interpreter lock for thread safety. That global lock must be
held by a thread before it can safely access Python objects.
I thought I might be able to make embedded Python scalable by
embedding multiple interpreters and have them run independently in
different threads. However "Python/C API Reference Manual" chapter 8
says that "The global interpreter lock is also shared by all threads,
regardless of to which interpreter they belong". Therefore with
current implementation, even multiple interpreters do not provide
scalability.
Has anyone on this list run into the same problem that I have, or does
anyone know of any plan of totally insulating multiple embedded Python
interpreters?
Thanks,
Wenning Qiu
project.
My project will be running on an SMP box and requires scalability.
However, my test shows that Python threading has very poor performance
in terms of scaling. In fact it doesn't scale at all.
I wrote a simple test program to complete given number of iterations
of a simple loop. The total number of iterations can be divided evenly
among a number of threads. My test shows that as the number of threads
grows, the CPU usage grows and the response time gets longer. For
example, to complete the same amount of work, one thread takes 10
seconds, 2 threads take 20 seconds and 3 threads take 30 seconds.
The fundamental reason for lacking scalability is that Python uses a
global interpreter lock for thread safety. That global lock must be
held by a thread before it can safely access Python objects.
I thought I might be able to make embedded Python scalable by
embedding multiple interpreters and have them run independently in
different threads. However "Python/C API Reference Manual" chapter 8
says that "The global interpreter lock is also shared by all threads,
regardless of to which interpreter they belong". Therefore with
current implementation, even multiple interpreters do not provide
scalability.
Has anyone on this list run into the same problem that I have, or does
anyone know of any plan of totally insulating multiple embedded Python
interpreters?
Thanks,
Wenning Qiu