Multiple modules with database access + general app design?

R

Robin Haswell

Hey people

I'm an experience PHP programmer who's been writing python for a couple of
weeks now. I'm writing quite a large application which I've decided to
break down in to lots of modules (replacement for PHP's include()
statement).

My problem is, in PHP if you open a database connection it's always in
scope for the duration of the script. Even if you use an abstraction layer
($db = DB::connect(...)) you can `global $db` and bring it in to scope,
but in Python I'm having trouble keeping the the database in scope. At the
moment I'm having to "push" the database into the module, but I'd prefer
the module to bring the database connection in ("pull") from its parent.

Eg:
import modules
modules.foo.c = db.cursor()
modules.foo.Bar()

Can anyone recommend any "cleaner" solutions to all of this? As far as I
can see it, Python doesn't have much support for breaking down large
programs in to organisable files and referencing each other.

Another problem is I keep having to import modules all over the place. A
real example is, I have a module "webhosting", a module "users", and a
module "common". These are all submodules of the module "modules" (bad
naming I know). The database connection is instantiated on the "db"
variable of my main module, which is "yellowfish" (a global module), so
get the situation where:

(yellowfish.py)
import modules
modules.webhosting.c = db.cursor()
modules.webhosting.Something()

webhosting needs methods in common and users:

from modules import common, users

However users also needs common:

from modules import common

And they all need access to the database

(users and common)
from yellowfish import db
c = db.cursor()

Can anyone give me advice on making this all a bit more transparent? I
guess I really would like a method to bring all these files in to the same
scope to make everything seem to be all one application, even though
everything is broken up in to different files.

One added complication in this particular application:

I used modules because I'm calling arbitrary methods defined in some XML
format. Obviously I wanted to keep security in mind, so my application
goes something like this:

import modules
module, method, args = getXmlAction()
m = getattr(modules, module)
m.c = db.cursor()
f = getattr(m, method)
f(args)

In PHP this method is excellent, because I can include all the files I
need, each containing a class, and I can use variable variables:

<?php
$class = new $module; // can't remember if this works, there are
// alternatves though
$class->$method($args);
?>

And $class->$method() just does "global $db; $db->query(...);".

Any advice would be greatly appreciated!

Cheers

-Robin Haswell
 
P

Paul McGuire

Robin Haswell said:
Hey people

I'm an experience PHP programmer who's been writing python for a couple of
weeks now. I'm writing quite a large application which I've decided to
break down in to lots of modules (replacement for PHP's include()
statement).

My problem is, in PHP if you open a database connection it's always in
scope for the duration of the script. Even if you use an abstraction layer
($db = DB::connect(...)) you can `global $db` and bring it in to scope,
but in Python I'm having trouble keeping the the database in scope. At the
moment I'm having to "push" the database into the module, but I'd prefer
the module to bring the database connection in ("pull") from its parent.

Eg:
import modules
modules.foo.c = db.cursor()
modules.foo.Bar()

Can anyone recommend any "cleaner" solutions to all of this?

Um, I think your Python solution *is* moving in a cleaner direction than
simple sharing of a global $db variable. Why make the Bar class have to
know where to get a db cursor from? What do you do if your program extends
to having multiple Bar() objects working with different cursors into the db?

The unnatural part of this (and hopefully, the part that you feel is
"unclean") is that you're trading one global for another. By just setting
modules.foo.c to the db cursor, you force all Bar() instances to use that
same cursor.

Instead, make the database cursor part of Bar's constructor. Now you can
externally create multiple db cursors, a Bar for each, and they all merrily
do their own separate, isolated processing, in blissful ignorance of each
other's db cursors (vs. colliding on the shared $db variable).

-- Paul
 
R

Robin Haswell

Um, I think your Python solution *is* moving in a cleaner direction than
simple sharing of a global $db variable. Why make the Bar class have to
know where to get a db cursor from? What do you do if your program extends
to having multiple Bar() objects working with different cursors into the db?

The unnatural part of this (and hopefully, the part that you feel is
"unclean") is that you're trading one global for another. By just setting
modules.foo.c to the db cursor, you force all Bar() instances to use that
same cursor.

Instead, make the database cursor part of Bar's constructor. Now you can
externally create multiple db cursors, a Bar for each, and they all merrily
do their own separate, isolated processing, in blissful ignorance of each
other's db cursors (vs. colliding on the shared $db variable).

Hm if truth be told, I'm not totally interested in keeping a separate
cursor for every class instance. This application runs in a very simple
threaded socket server - every time a new thread is created, we create a
new db.cursor (m = getattr(modules, module)\n m.c = db.cursor() is the
first part of the thread), and when the thread finishes all its actions
(of which there are many, but all sequential), the thread exits. I don't
see any situations where lots of methods will tread on another method's
cursor. My main focus really is minimising the number of connections.
Using MySQLdb, I'm not sure if every MySQLdb.connect or db.cursor is a
separate connection, but I get the feeling that a lot of cursors = a lot
of connections. I'd much prefer each method call with a thread to reuse
that thread's connection, as creating a connection incurs significant
overhead on the MySQL server and DNS server.

-Rob
 
D

Daniel Dittmar

Robin said:
cursor for every class instance. This application runs in a very simple
threaded socket server - every time a new thread is created, we create a
new db.cursor (m = getattr(modules, module)\n m.c = db.cursor() is the
first part of the thread), and when the thread finishes all its actions
(of which there are many, but all sequential), the thread exits. I don't

If you use a threading server, you can't put the connection object into
the module. Modules and hence module variables are shared across
threads. You could use thread local storage, but I think it's better to
pass the connection explicitely as a parameter.
separate connection, but I get the feeling that a lot of cursors = a lot
of connections. I'd much prefer each method call with a thread to reuse
that thread's connection, as creating a connection incurs significant
overhead on the MySQL server and DNS server.

You can create several cursor objects from one connection. There should
be no problems if you finish processing of one cursor before you open
the next one. In earlier (current?) versions of MySQL, only one result
set could be opened at a time, so using cursors in parallel present some
problems to the driver implementor.

Daniel
 
R

Robin Haswell

If you use a threading server, you can't put the connection object into
the module. Modules and hence module variables are shared across
threads. You could use thread local storage, but I think it's better to
pass the connection explicitely as a parameter.

Would you say it would be better if in every thread I did:

m = getattr(modules, module)
b.db = db

...

def Foo():
c = db.cursor()

?
 
F

Frank Millman

Robin said:
Hey people

I'm an experience PHP programmer who's been writing python for a couple of
weeks now. I'm writing quite a large application which I've decided to
break down in to lots of modules (replacement for PHP's include()
statement).

My problem is, in PHP if you open a database connection it's always in
scope for the duration of the script. Even if you use an abstraction layer
($db = DB::connect(...)) you can `global $db` and bring it in to scope,
but in Python I'm having trouble keeping the the database in scope. At the
moment I'm having to "push" the database into the module, but I'd prefer
the module to bring the database connection in ("pull") from its parent.

This is what I do.

Create a separate module to contain your global variables - mine is
called 'common'.

In common, create a class, with attributes, but with no methods. Each
attribute becomes a global variable. My class is called 'c'.

At the top of every other module, put 'from common import c'.

Within each module, you can now refer to any global variable as
c.whatever.

You can create class attributes on the fly. You can therefore have
something like -

c.db = MySql.connect(...)

All modules will be able to access c.db

As Daniel has indicated, it may not be safe to share one connection
across multiple threads, unless you can guarantee that one thread
completes its processing before another one attempts to access the
database. You can use threading locks to assist with this.

HTH

Frank Millman
 
D

Daniel Dittmar

Robin said:
Would you say it would be better if in every thread I did:

m = getattr(modules, module)
b.db = db

...

def Foo():
c = db.cursor()

I was thinking (example from original post):

import modules
modules.foo.Bar(db.cursor ())

# file modules.foo
def Bar (cursor):
cursor.execute (...)

The same is true for other objects like the HTTP request: always pass
them as parameters because module variables are shared between threads.

If you have an HTTP request object, then you could attach the database
connection to that object, that way you have to pass only one object.

Or you create a new class that encompasses everything useful for this
request: the HTTP request, the database connection, possibly an object
containing authorization infos etc.

I assume that in PHP, global still means 'local to this request', as PHP
probably runs in threads under Windows IIS (and Apache 2.0?). In Python,
you have to be more explicit about the scope.

Daniel
 
R

Robin Haswell

I was thinking (example from original post):

import modules
modules.foo.Bar(db.cursor ())

# file modules.foo
def Bar (cursor):
cursor.execute (...)

Ah I see.. sounds interesting. Is it possible to make any module variable
local to a thread, if set within the current thread? Your method, although
good, would mean revising all my functions in order to make it work?

Thanks
 
R

Robin Haswell

This is what I do.

Create a separate module to contain your global variables - mine is
called 'common'.

In common, create a class, with attributes, but with no methods. Each
attribute becomes a global variable. My class is called 'c'.

At the top of every other module, put 'from common import c'.

Within each module, you can now refer to any global variable as
c.whatever.

You can create class attributes on the fly. You can therefore have
something like -

c.db = MySql.connect(...)

All modules will be able to access c.db

As Daniel has indicated, it may not be safe to share one connection
across multiple threads, unless you can guarantee that one thread
completes its processing before another one attempts to access the
database. You can use threading locks to assist with this.

HTH

Frank Millman


Thanks, that sounds like an excellent idea. While I don't think it applies
to the database (threading seems to be becoming a bit of an issue at the
moment), I know I can use that in other areas :)

Cheers

-Rob
 
M

Magnus Lycka

Robin said:
Can anyone give me advice on making this all a bit more transparent? I
guess I really would like a method to bring all these files in to the same
scope to make everything seem to be all one application, even though
everything is broken up in to different files.

This is very much a deliberate design decision in Python.
I haven't used PHP, but in e.g. C, the #include directive
means that you pollute your namespace with all sorts of
strange names from all the third party libraries you are
using, and this doesn't scale well. As your application
grows, you'll get mysterious bugs due to strange name clashes,
removing some module you no-longer need means that your app
won't build since the include file you no longer include in
turn included another file that you should have included but
didn't etc. In Python, explicit is better than implicit (type
"import this" at the Python prompt) and while this causes some
extra typing it helps with code maintenance. You can always
see where a name in your current namespace comes from (unless
you use "from xxx import *"). No magic!


Concerning your database operations, it seems they are distributed
over a lot of different modules, and that might also cause problems,
whatever programming language we use. In typical database
applications, you need to keep track of transactions properly.

For each opened connection, you can perform a number of transactions
after each other. A transaction starts with the first database
operation after a connect, commit or rollback. A cursor should only
live within a transaction. In other words, you should close all
cursors before you perform a commit or rollback.

I find it very difficult to manage transactions properly if the
commits are spread out in the code. Usually I want one module to
contain some kind of transaction management logic, where I determine
the transaction boundries. This logic will hand out cursor object
to various pieces of code, and determine when to close the cursors
and commit the transaction.

I haven't really written multithreaded applications, so I don't
have any experiences in the problems that might cause. I know that
it's a fairly common pattern to have all database transactions in
one thread though, and to use Queue.Queue instances to pass data
to and from the thread that handles DB.

Anyway, you can only have one transaction going on at a time for
a connection, so if you share connections between threads (or use
a separate DB thread and queues) a rollback or commit in one thread
will affect the other threads as well...

Each DB-API 2.0 compliant library should be able to declare how it
can be used in a threaded application. See the DB-API 2.0 spec:
http://python.org/peps/pep-0249.html Look for "threadsafety".
 
D

Daniel Dittmar

Robin said:
Ah I see.. sounds interesting. Is it possible to make any module variable
local to a thread, if set within the current thread?

Not directly. The following class tries to simulate it (only in Python 2.4):

import threading

class ThreadLocalObject (threading.local):
def setObject (self, object):
setattr (self, 'object', object)

def clearObject (self):
setattr (self, 'object', None)

def __getattr__ (self, name):
object = threading.local.__getattribute__ (self, 'object')
return getattr (object, name)

You use it as:

in some module x:

db = ThreadLocalObject ()

in some module that create the database connection:

import x

def createConnection ()
localdb = ...connect (...)
x.db.setObject (localdb)

in some module that uses the databasse connection:

import x

def bar ():
cursor = x.db.cursor ()

The trick is:
- every attribute of a threading.local is thread local (see doc of
module threading)
- when accessing an attribute of object x.db, the method __getattr__
will first retrieve the thread local database connection and then access
the specific attribute of the database connection. Thus it looks as if
x.db is itself a database connection object.

That way, only the setting of the db variable would have to be changed.

I'm not exactly recommneding this, as it seems very error prone to me.
It's easy to overwrite the variable holding the cursors with an actual
cursor object.

Daniel
 
F

Frank Millman

Daniel said:
Not directly. The following class tries to simulate it (only in Python 2.4):

import threading

class ThreadLocalObject (threading.local):

Daniel, perhaps you can help me here.

I have subclassed threading.Thread, and I store a number of attributes
within the subclass that are local to the thread. It seems to work
fine, but according to what you say (and according to the Python docs,
otherwise why would there be a 'Local' class) there must be some reason
why it is not a good idea. Please can you explain the problem with this
approach.

Briefly, this is what I am doing.

class Link(threading.Thread): # each link runs in its own thread
"""Run a loop listening for messages from client."""

def __init__(self,args):
threading.Thread.__init__(self)
print 'link connected',self.getName()
self.ctrl, self.conn = args
self._db = {} # to store db connections for this client
connection
[create various other local attributes]

def run(self):
readable = [self.conn.fileno()]
error = []
self.sendData = [] # 'stack' of replies to be sent

self.running = True
while self.running:
if self.sendData:
writable = [self.conn.fileno()]
else:
writable = []
r,w,e = select.select(readable,writable,error,0.1) # 0.1
timeout
[continue to handle connection]

class Controller(object):
"""Run a main loop listening for client connections."""

def __init__(self):
self.s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self.s.bind((HOST,PORT))
self.s.listen(5)
self.running = True

def mainloop(self):
while self.running:
try:
conn,addr = self.s.accept()
Link(args=(self,conn)).start() # create thread to
handle connection
except KeyboardInterrupt:
self.shutdown()

Controller().mainloop()

TIA

Frank Millman
 
D

Daniel Dittmar

Frank said:
I have subclassed threading.Thread, and I store a number of attributes
within the subclass that are local to the thread. It seems to work
fine, but according to what you say (and according to the Python docs,
otherwise why would there be a 'Local' class) there must be some reason
why it is not a good idea. Please can you explain the problem with this
approach.

Your design is just fine. If you follow the thread upwards, you'll
notice that I encouraged the OP to pass everything by parameter.

Using thread local storage in this case was meant to be a kludge so that
not every def and every call has to be changed. There are other cases
when you don't control how threads are created (say, a plugin for web
framework) where thread local storage is useful.

threading.local is new in Python 2.4, so it doesn't seem to be that
essential to Python thread programming.

Daniel
 
F

Frank Millman

Daniel said:
Your design is just fine. If you follow the thread upwards, you'll
notice that I encouraged the OP to pass everything by parameter.

Many thanks, Daniel

Frank
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top