Correct way to handle independent interpreters when embedding in asingle-threaded C++ app

C

Craig Ringer

Hi folks

I'm a bit of a newbie here, though I've tried to appropriately research
this issue before posting. I've found a lot of questions, a few answers
that don't really answer quite what I'm looking for, but nothing that
really solves or explains all this. I'll admit to being stumped, hence
my question here.

I'm also trying to make this post as clear and detailed as possible.
Unfortunately, that means it's come out like a book. I hope a few kind
souls will be game to read it, on the theory that I'm a user who's
putting in the time to actually provide enough information for once.

I have a Python interpreter embedded in a C++/Qt application (Scribus -
http://www.scribus.net). Scribus, while using multi-threading enabled
libraries, runs in a single 'main' thread. The Python interpreter is
implemented as a plug-in that's used to run user scripts. Overall it's
working very well.

I've run into two problems that are proving very difficult to solve,
however, and I thought I'd ask here for some words of wisdom. I'm only
tackling the first one right now. First I'll provide some background on
how I'm doing things, and what I'm trying to achieve. If anything below
comes out as a request for Python functionality it's not intended to be
- it's just a description of what /I'm/ trying to do.

The Scribus Python plugin is pretty standard - it both embeds the Python
interpreter and provides an extension module to expose
application-specific functionality. It is used to permit users to
execute Python scripts to automate tasks within the application. I also
hope to make it possible to extend the application using Python, but
that's not the issue right now. I need to isolate individual script
executions as much as possible, so that to the greatest extent we can
manage each script runs in a new interpreter. In other words, I need to
minimise the chances of scripts treading on each others toes or leaking
too much with each execution.

Specifically, as much as possible I need to:

- Ensure that memory allocated by Python during a script run is all
freed, including any objects created and modules loaded. An
exception can be made for C extension modules, so long as they
don't leak every time a script is run.
- Ensure that no global state (eg loaded modules, globals namespace,
etc) persists across script executions.

I have no need to be able to run Python scripts in parallel with the
application, nor with each other. If a script goes into an endless loop,
that's a bug with the script, and not the application's problem. I'd
like to reduce the chances of scripts conflicting or messing up the app
state, but don't intend to even try to make it possible to safely run
untrusted scripts or to completely isolate scripts. If the odd C
extension module doesn't like it, I can deal with that too.

Also, some of the extension module functions make Qt gui calls (for
example, create and display a file chooser dialog) or access internal
application state in QObject derived classes. According to the Qt
documentation, this should only be done from the main thread. This is
another reason why I'm making no attempt to make it possible to run
normal Python scripts without blocking the application, or run scripts
in parallel. It also means that all my Python sub-interpreters need to
share the main (and in fact only) application thread.


I've hit two issues with this. The first is that executing a script
crashes the application with SIGABRT if a Python debug build is being
used. Python crashes the app with the error "Invalid thread state for
this thread". I'm working with Python 2.3.4 . The crash is triggered by
a check in pystate.c line 276 - in the PyThreadState_Swap() function:

The code in question is:

/* It should not be possible for more than one thread state
to be used for a thread. Check this the best we can in debug
builds.
*/
#if defined(Py_DEBUG) && defined(WITH_THREAD)
if (new) {
PyThreadState *check = PyGILState_GetThisThreadState();
if (check && check != new)
/* Py_FatalError("Invalid thread state for this thread"); */
printf("We would've died here\n");
}
#endif

A trimmed down and simplified version (eg no error checking, etc) of the
code I'm using in the plugin that hits this check is:

PyThreadState *stateo = PyEval_SaveThread();
PyThreadState *state = Py_NewInterpreter();
initscribus(Carrier); // init the extension module
PySys_SetArgv(1, scriptfilename);
PyObject* m = PyImport_AddModule("__main__");
PyObject* globals = PyModule_GetDict(m);
char* script_string = ... // build script that calls execfile()
PyObject* result = PyRun_String(script_string, Py_file_input,\
globals, globals);
... // handle possible failure and capture exception
Py_EndInterpreter(state);
PyEval_RestoreThread(stateo);

(The full version can be found in
scribus/plugins/scriptplugin/scriptplugin.cpp line 225-279 of Scribus
CVS, http://www.scribus.net/)

The script text isn't really important. It just execfiles()s the user's
script within a try/catch block to ignore SystemExit and to catch and
capture any other fatal exceptions.

The crash occurs at Py_NewInterpreter, when it calls PyThreadState_Swap.
It's pretty clear _what_ is happening - Python is aborting on a sanity
check because I'm trying to use multiple thread states in one thread -
what I'm looking for help with is _why_. When run with a non-debug
build, scripts run just fine. It also runs fine when I use a debug build
of Python without thread support (as is obvious from the code snippet
above). I'm sure there are cases where things can / do go wrong, but for
general use it appears to be just peachy.

So ... my question is, what are the issues behind this check? Does it
indicate that there will be a problem with this condition in all cases?
My understanding is that it's to do with the way Python doesn't use the
full capabilities of platform threading libraries, and has some shared
globals that could cause issues. Correct? If so, is there a way around
this?

All I'm looking to do is to create a clean sub-interpreter state, run a
script in it (in the main thread, with nothing else running) then
dispose of the interpreter at script exit. It's desirable to keep the
main interpreter usable as well, but there will never be more than one
sub-interpreter, and there will never be Python code running in the main
and sub interpreters at the same time. Does the existence of this check
mean that what I'm trying to do is incorrect or unsafe? If not, might it
be possible to provide apps with a way to disable this check (think an
"I know what I'm doing" flag)? Is there another, saner way to do what I
want?

This post describes a similar issue to mine, though their goals are
different, and I don't think the solution will work for me:
http://groups.google.com.au/[email protected]&rnum=7

This message describes the issue I'm seeing:
http://groups.google.com.au/[email protected]&rnum=5

Another related message:
http://groups.google.com.au/[email protected]&rnum=1

Someone says it's just broken:
http://groups.google.com.au/[email protected]&rnum=17


I've tried one other approach that doesn't involve
Py_NewInterpreter/Py_EndInterpreter, but didn't have much success. What
I tried to do was run each script with a new global dict, so that they
at least had separate global namespaces (though they'd still be able to
influence the next script's interpreter state / module state). If I
recall correctly I ended up with code like this:

execfile(filename, {'__builtins__'=__builtins__,
'__name__':'__main__',
'__file__':filename})

being called from PyRun_String.

This appeared to work fine, but turned out to leak memory like a sieve.
Objects in the script's global namespace weren't being disposed of when
the script terminated. Consequently, if I had a script with one line:

x = x = ' '*200000000

then each time I ran the script the app would gobble a large chunk more
memory and not release it. If I wrote a script that very carefully
deleted everything it put in the top-level namespace before it exited,
such as all variables, imports, classes, and functions, I still leaked a
little memory and a few references, but nothing much. Unfortunately,
doing that is also rather painful at best and seems _really_ clumsy.

It looked to me after some testing with a debug build like the global
dictionaries that were being created for each execfile() call were not
being disposed of after the call terminated, even though no code I was
aware of continued to hold references to them. Circular references? Do I
have to manually invoke the cyclic reference cleanup code in Python when
embedding?

I'm sorry for the lack of detail provided in the discussion of this
approach. It was a while ago. If folks here think it's viable I can go
back and get some more hard data.

With the 'new globals dict' approach, it was also possible for people to
mangle modules and for the next script to see the changes. If there's a
way to re-init modules between runs (at least the built-in ones like
sys, __builtins__, etc, plus the app's extension module and any modules
written in Python), that'd be fantastic.


If there's some way to do achieve what I want to do - get scripts to
execute in private or mostly-private environments in the main thread of
an application - I'd be overjoyed to hear it. I'm very sorry for the
mammoth message, and hope I've made some sense and provided enough
information without boring you all to tears. It's clear that there's
been quite a bit of interest in this topic from my digging through the
list archives, but I just wasn't able to find a clear, definitive
answer.

Phew. To anybody who got this far, thankyou very much for your time and
patience.
 
M

Mustafa Demirhan

If you are always running the Python scripts within the main thread of
the application, then why are you creating a new thread state and run
the script in that state? Why not just this:

Py_Initialize();
PyRun_SimpleString(...);
Py_Finalize();

(Instead of PyRun_SimpleString, do whatever you want to do there)

Since you are not running any python scripts or calling any python
related stuff from other threads, this is the best approach in my
opinion. This will also ensure that execution of one script wont
effect the execution of another because you call Py_Finalize after
the script and thus shut down the interpreter.

Mustafa Demirhan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,955
Messages
2,570,117
Members
46,705
Latest member
v_darius

Latest Threads

Top